WO2019144608A1 - 图像处理方法、处理装置以及处理设备 - Google Patents

图像处理方法、处理装置以及处理设备 Download PDF

Info

Publication number
WO2019144608A1
WO2019144608A1 PCT/CN2018/101369 CN2018101369W WO2019144608A1 WO 2019144608 A1 WO2019144608 A1 WO 2019144608A1 CN 2018101369 W CN2018101369 W CN 2018101369W WO 2019144608 A1 WO2019144608 A1 WO 2019144608A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
training
output
processing
Prior art date
Application number
PCT/CN2018/101369
Other languages
English (en)
French (fr)
Inventor
刘瀚文
那彦波
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US16/329,893 priority Critical patent/US11281938B2/en
Priority to EP18849416.5A priority patent/EP3745347A4/en
Publication of WO2019144608A1 publication Critical patent/WO2019144608A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • Embodiments of the present disclosure relate to the field of image processing, and in particular, to an image processing method, a processing apparatus, and a processing apparatus.
  • An embodiment of the present disclosure provides an image processing method, including: acquiring an input image; performing image conversion processing on the input image by using a generated neural network to output a converted output image, wherein the generated neural network includes a plurality of a processing level, wherein an output result of the i-th processing level is input to an i+1th processing level and a jth processing level, the jth processing level further receiving an output result of the j-1th processing level, the j-th
  • the output result of the processing level has the same size as the output of the i-th processing level, where: i is less than j-1, and i and j are positive integers.
  • each of the plurality of processing levels of the generating neural network includes a convolution network, and at least a portion of the processing levels further includes at least one of a downsampling layer, an upsampling layer, and a normalization layer One.
  • the number of the downsampling layers is equal to the number of the upsampling layers.
  • the image processing method further includes: training the generated nerve based on the first training image, the first training output image The internet.
  • the training the generating neural network comprises: inputting the first training output image to an authentication neural network, and outputting an identifier of the first training output image having a conversion feature; using the first loss calculation unit according to The first training image, the first training output image, and the authentication tag calculate a loss value of the generated neural network, and optimize parameters of the generated neural network.
  • the first loss calculation unit includes an analysis network, a first loss calculator, and an optimizer
  • optimizing the parameter of the generated neural network by using the first loss calculation unit includes: outputting the first a training image and a content feature of the first training output image; calculating, by the first loss calculator, the generated neural network according to a first loss function according to a content feature output by the analysis network and an authentication tag of the first training output image Loss value; the optimizer is used to optimize parameters of the generated neural network based on the loss value of the generated neural network, wherein the parameter includes a convolution kernel and a bias of the convolutional network in the generated neural network.
  • the first loss function comprises at least one of a content loss function, a generated neural network loss function, and a normalized loss function.
  • the image processing method further comprising: acquiring a second sample image from a training database; using the identification neural network output Determining whether the first sample image and the second sample image have an authentication tag of the conversion feature; using the second loss calculation unit to train the identification tag according to the identification tag of the first sample image and the second sample image Describe the neural network.
  • the second loss calculation unit includes a second loss calculator and an optimizer
  • training the identification neural network with the second loss calculation unit includes: utilizing the second loss calculator according to the first
  • the authentication tag of the present image and the authentication tag of the second sample image calculate a loss value of the authentication neural network according to a second loss function, wherein the second loss function comprises identifying a neural network loss function; using the optimizer
  • the parameter identifying the neural network is optimized based on the loss value of the identified neural network, wherein the parameter includes a convolution kernel and a bias of the convolutional network in the identified neural network.
  • the training database includes a sample image having a conversion feature.
  • An embodiment of the present disclosure further provides an image processing apparatus, including: a neural network module, configured to perform image conversion processing on the input image to output a converted output image, where: the generating neural network module includes a plurality of processing levels, wherein an output result of the i-th processing level is input to an i+1th processing level and a jth processing level, the jth processing level further receiving an output result of the j-1th processing level, the jth The output result of the processing level has the same size as the output of the i-th processing level, where: i is less than j-1, and i and j are positive integers.
  • each processing level in the generating neural network module comprises a convolution network
  • at least a part of the processing levels further includes at least one of a downsampling layer, an upsampling layer, and a normalization layer
  • the number of the downsampling layers is equal to the number of the upsampling layers.
  • the image processing device further includes: a training neural network module, configured to output the first training image according to the first training image
  • the image training generates the neural network module
  • the training generation neural network module comprises: an identification neural network module, an output identifier for outputting whether the first training output image has a conversion feature
  • a first loss calculation unit configured to: Calculating a loss value of the generated neural network module according to the first training image, the first training output image, and the authentication tag, and optimizing parameters of the generated neural network module.
  • the first loss calculation unit includes: an analysis network, configured to output content features of the first training image and the first training output image; and a first loss calculator configured to output content characteristics according to the analysis network And calculating, by the first loss function, a loss value of the generated neural network module according to the first loss function, wherein the first loss function includes a content loss function, a generated neural network loss function, and a standardized loss function And at least one; and an optimizer for optimizing parameters of the generated neural network module according to the loss value of the generated neural network module, wherein the parameter comprises a convolution kernel and a convolution kernel of the convolutional network in the generated neural network module Offset.
  • the training neural network module is further configured to train the authentication neural network module according to the authentication tag of the authentication neural network module, wherein the input image is used as a second training image, and the output image is used as a first a sample image, the image obtained from the training database is used as a second sample image, and the authentication neural network module outputs an authentication tag according to the first sample image and the second sample image, wherein the training neural network module further includes And a second loss calculation unit configured to train the authentication neural network module according to the authentication tag of the first sample image and the authentication tag of the second sample image.
  • the second loss calculation unit includes: a second loss calculator, configured to calculate the second loss function according to the authentication tag of the first sample image and the authentication tag of the second sample image Identifying a loss value of the neural network module, wherein the second loss function includes an identification neural network module loss function; and an optimizer for optimizing parameters of the identification neural network module according to the loss value of the identification neural network module, wherein the parameter comprises a convolution kernel and an offset of the convolution network in the authentication neural network module.
  • the training database includes a sample image having a conversion feature.
  • Embodiments of the present disclosure also provide an image processing apparatus comprising: one or more processors; one or more memories, wherein the memory stores computer readable code, the computer readable code when The image processing method as described above is executed when one or more processors are running, or the image processing apparatus as described above is implemented.
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2A is a block diagram showing the structure of a generated neural network for implementing the image processing method of FIG. 1;
  • FIG. 2B shows a specific structure of a generated neural network for implementing the image processing method of FIG. 1;
  • Figure 3 shows a schematic diagram of the MUX layer
  • Figure 4 shows a flow chart of a training generated neural network
  • Figure 5 shows a block diagram of a training generated neural network
  • Figure 6 shows a specific structural diagram of the analysis network
  • Figure 7 shows a specific structural diagram of the identification neural network
  • Figure 8 shows a flow chart of training an authentication neural network
  • Figure 9 shows a block diagram of a training authentication neural network
  • FIG. 10 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • FIG. 1 A flowchart of an image processing method provided by an embodiment of the present disclosure is shown in FIG. 1.
  • step S110 an input image to be subjected to image conversion processing is acquired, which is usually a color image, and may be, for example, an RGB image or a grayscale image.
  • step S120 the input image is subjected to image conversion processing using a generated neural network, wherein the generated neural network is based on training.
  • the generated neural network can implement image feature conversion processing, which can be, but is not limited to, image style conversion, for example, the input photo image has the characteristics of oil painting, and can also be a seasonal feature conversion. For example, make the input image have the characteristics of winter.
  • the generated neural network In the process of image conversion processing using the generated neural network, due to the existence of downsampling processing in the neural network, for example, through the pooling layer, the original image information is lost in the output processing result and redundant conversion features are generated, resulting in image conversion. Poor effect.
  • the generated neural network by establishing a cross-level connection between networks of different processing levels in the generated neural network, the generated neural network maintains the original information of the input image during the image conversion process, so that the output converted image includes both conversion Features, and can retain the original image information to ensure image conversion effects.
  • the cross-level connection is to input the output result of the i-th processing level to both the i+1th processing level and the jth processing level, and the jth processing level further receives the output result of the j-1th processing level.
  • the output result of the j-1th processing level has the same size as the output result of the i-th processing level.
  • i is less than j-1, and i and j are positive integers.
  • the jth processing level performs image processing based on the output result of the i-th processing level and the output result of the j-1th processing level.
  • the generating neural network connects the output result of the ith processing level to the input of the jth processing level, and since the output result of the ith processing level is not processed by the processing level between the ith processing level and the jth processing level,
  • the output result of the i-th processing level contains more original information of the input image than the output result of the j-1th processing level, thereby ensuring the consistency of the output image and the input image.
  • the generated neural network outputs an output image subjected to image conversion processing, the output image having an image conversion feature.
  • the generated neural network can implement different image conversion processes through different training processes, for example, image style, scene, season, effect, or image conversion based on other features.
  • FIG. 2A shows a structural block diagram of a generated neural network for implementing the above image processing method
  • FIG. 2B shows a specific structure of a generated neural network for implementing the above image processing method.
  • the image processing method will be described in detail with reference to FIGS. 2A and 2B.
  • the generated neural network shown in Figures 2A and 2B is a neural network comprising five processing levels, wherein a convolutional network is included in each processing level. At least a portion of the processing levels of the five processing levels may further include at least one of a downsampling layer, an upsampling layer, and a normalization layer, as needed for image processing. Wherein, in generating the neural network, the number of the downsampling layer and the upsampling layer are equal. It should be noted that the generated neural network in FIGS. 2A and 2B is merely exemplary and does not constitute a limitation of the present disclosure. The generated neural network for implementing image conversion processing may have other processing levels, and the specific structure may be appropriately adjusted according to the needs of image conversion.
  • the convolution network includes at least a convolution layer, and may further include other processing layers, such as a pooling layer, an activation layer, and the like.
  • each convolutional layer can contain tens or hundreds of convolution kernels. The more layers, the more complex the structure of the convolutional network.
  • each of the plurality of processing levels in the generated neural network includes a convolutional network for implementing image conversion processing.
  • the at least one of the plurality of processing levels further includes at least one of a downsampling layer, an upsampling layer, and a normalization layer.
  • Cross-level connections are also included in the generated neural network. The cross-level connection is used to connect two-part networks at different processing levels as shown in Figures 2A and 2B.
  • the processing of the first processing level is first performed for extracting image features, and for the convenience of description, the result processed by the first processing level is represented as the result A.
  • the result A is processed by the second processing level in the generated neural network to obtain the result B.
  • the result B is processed by the third processing level in the generated neural network to obtain the result C.
  • the result C is processed by the fourth processing level in the generated neural network to obtain the result D.
  • the result D is processed by the fifth processing level in the generated neural network to obtain an output image.
  • the result A obtained through the first processing level processing is connected to the fifth processing level for processing with the result D through the fifth processing level to generate an output image.
  • the cross-level connection that is, the processing of the result A at the first processing level across the second, third, and fourth processing levels.
  • the image in the result A has the same size as the image in the result D.
  • the result A is not processed by the second, third, and fourth processing levels, the original information of the input image is included more than the result D, so that the output image can be based on the image feature conversion. More information about the input image is retained to maintain consistency with the input image.
  • the result B obtained through the first and second processing levels is connected to the fourth processing level for processing with the result C through the fourth processing level for generating the result D. That is, the result B of the second processing level is directly input to the fourth processing level across the processing of the third processing level. Among them, the image in the result B has the same size as the image in the result C.
  • the fourth processing level generates a result D based on the result B and the result C, wherein the result C is obtained by the processing of the result B by the third processing level, and since the result B is not processed by the third processing level, the result is compared with the result C. , which contains more raw information for the input image.
  • the generating neural network may also be a network including a greater number of processing levels, which may be sequentially implemented according to the order of processing levels in the neural network, where no longer Narration.
  • the first processing level in the generating neural network may include a convolution network for extracting image features in the input image to obtain a result A.
  • the second processing level may include a downsampling layer, a normalization layer, and a convolution network in sequence for obtaining the result B.
  • the third processing level may in turn include a downsampling layer, a normalization layer, a convolutional network, and an upsampling layer for obtaining the result C.
  • the fourth processing level may in turn include a convolution network, a normalization layer, and an upsampling layer for obtaining the result D.
  • the fifth processing level may in turn comprise a convolutional network and a normalization layer for generating an output image.
  • the generation neural network also establishes a cross-level connection between different processing levels. Specifically, a cross-level connection is established between the output result of the first processing level and the input of the fifth processing level, that is, the result A is input to both the second processing level and the fifth processing level, thereby, the fifth The processing level receives both the result D and the result A. Furthermore, a cross-level connection is also established between the output of the second processing level and the input of the fourth processing level, ie the result B is input both to the third processing level and to the fourth processing level, whereby The processing level receives both the result C and the result B.
  • a convolutional network for image conversion processing includes a number of convolutional layers.
  • the convolutional layer one neuron is only connected to neurons of a portion of the adjacent layer, and the convolutional layer can apply a number of convolution kernels to the input image to extract multiple types of features.
  • Each convolution kernel can extract a type of feature.
  • the convolution kernel can achieve reasonable weight by learning.
  • the result obtained by applying a convolution kernel to the input image is referred to as a feature image, the number of which is the same as the number of convolution kernels.
  • the downsampling layer may downsample the image (for example, may be a pooling layer), reduce the size of the feature image without changing the number of feature images, perform feature compression, and extract the main features.
  • the downsampling layer can reduce the size of the feature image to simplify the computational complexity and reduce the overfitting phenomenon to some extent.
  • the normalization layer is used for normalizing the feature image outputted by the upper level.
  • the normalization layer normalizes the mean and variance of each feature image.
  • T the number of selected feature images
  • C the number of features output by a convolutional layer
  • each feature image is a matrix of H rows and W columns
  • the feature image is represented as (T, C, W). , H)
  • the standardization formula is as follows:
  • x tijk is the value of the kth row of the jth column in the i th feature image of the tth feature patch in the feature image set outputted by a convolutional layer.
  • y tijk denotes the result of x tijk processed by the instance normalization layer, and ⁇ is an integer with a small value to avoid the denominator being 0.
  • the upsampling layer may be a MUX layer that performs pixel interleaving rearrangement processing on the input images such that the size of each image is increased on the basis of the number of images.
  • the MUX layer increases the number of pixels per image by arranging and combining pixels between different images.
  • Figure 3 shows a schematic diagram of upsampling using a 2*2 MUX layer.
  • the number of the upsampling layers should be the same as the number of the downsampling layers, so that the output image has the same image size as the input image, and the two-part processing result of the cross-level connection is guaranteed to have The same image size.
  • the conversion feature of the output image is determined by the parameters of the generated neural network.
  • the parameter is optimized by training the generated neural network to achieve the conversion purpose, so that the input image has the same image as the training image.
  • the parameters may include a convolution kernel and a bias of the convolutional network in the generated neural network
  • the convolution kernel and the offset may determine results for cross-level connections (eg, result A and result B)
  • the enabling condition determines the contribution of the result A and the result D to the generated output image during the processing through the fifth processing level by the convolution kernel and the offset, thereby controlling the cross-level connection.
  • the result A for the cross-level connection can be increased by adjusting the convolution kernel and the offset to have a larger weight than the result D in generating the output image, so that the output The image has more original information.
  • the result A for cross-level connections can also be reduced by adjusting the convolution kernel and offset to have a smaller weight than the result D in generating the output image.
  • the cross-level connection enables the generation of a neural network to have more flexibility in the process of image conversion processing.
  • the image conversion may be conversion of image style, season, effect, scene, etc., for example, converting a landscape image into an image with Van Gogh's works features, and converting an image with summer features into winter features. Images, converting brown horse images to zebra features, etc., can even convert cats into dogs.
  • Figure 4 shows a flow chart for training the generated neural network
  • Figure 5 shows a block diagram for training the generated neural network.
  • the process of training the generated neural network will be specifically described in conjunction with FIGS. 4 and 5.
  • the first training image is acquired.
  • the first training image may be the same as or different from the input image shown in FIG. 1, which is used to train the generated neural network and does not have the desired image conversion features.
  • step S420 the first training image is subjected to image conversion processing using the generated neural network to generate a first training output image.
  • This process is the same as the step of generating an output image by using the generated neural network in FIG. 1, and details are not described herein again.
  • step S430 the generated neural network is trained based on the first training image and the first training output image, the training is to optimize parameters in the network according to the processing result of the generated neural network, so that the output image With the expected image conversion features.
  • the expected image conversion feature is an image conversion process that is desired to be performed by the generated neural network such that an input image having the conversion feature is obtained from an input image that does not have the conversion feature.
  • the expected image conversion feature may be a feature of Van Gogh's painting
  • the first training image is a photo that does not have the characteristics of Van Gogh's painting
  • generating a first training output image using the generated neural network and determining the first output of the neural network by determining
  • the training output image has the characteristics of Van Gogh painting to train the parameters in the neural network.
  • the specific process of training the neural network in step S430 includes: inputting the first training output image to an authentication neural network, and outputting whether the first training output image has a conversion feature. Identifying a tag; calculating, by the first loss calculation unit, a loss value of the generated neural network according to the first training image, the first training output image, and the authentication tag, and optimizing parameters of the generated neural network.
  • the first loss calculation unit includes an analysis network, a first loss calculator, and an optimizer.
  • calculating the loss value of the generated neural network by using the first loss calculation unit includes: outputting, by using the analysis network, content features of the first training image and the first training output image; a loss calculator calculates a loss value of the generated neural network according to a first loss function according to a content feature of the analysis network and an authentication tag of the first training output image; and using the optimizer to generate a neural network according to the The loss value optimizes parameters of the generated neural network, wherein the parameters include convolution kernels and offsets of the convolutional network in the generated neural network.
  • the specific structure of the analysis network is as shown in FIG. 6, which is composed of several convolution networks and a pooling layer for extracting content features of the input image.
  • the output of each of the convolutional networks is a feature proposed from the input image, and the pooling layer is used to reduce the resolution of the feature image and pass it to the next convolutional network.
  • the feature images after each convolutional network characterize the features of the input image at different levels (such as textures, edges, objects, etc.).
  • the first training image I 1 and the first training output image R 1 are processed by the analysis network, the content features are extracted, and the extracted content features are input to the first loss calculator.
  • the contents of a first loss calculator, the first training output image features of R 1 and authentication tag, the value of loss generated in accordance with a first neural network calculates loss function based on an analysis of the first training image output from the network I.
  • the first loss calculator inputs the calculated total loss value of the generated neural network to the optimizer, and the optimizer optimizes the convolution kernel and the offset in the convolutional network of the neural network according to the loss value to achieve a closer image
  • the convolution kernel and offset can determine the enabling condition for the results of the cross-level connection (eg, result A and result B).
  • the system flexibility in the training process is increased, and the trained generation neural network including the cross-level connection can preserve the original information of the input image on the basis of the expected conversion characteristics of the output image, thereby avoiding the output. Inconsistency between the converted image and the input image.
  • the first loss function includes at least one of a content loss function, a generated neural network loss function, and a standardized loss function.
  • the content loss function is used to represent the content loss between the first training image I 1 and the first training output image R 1 , and P l and F l are respectively the feature images output by the layer 1 in the analysis network.
  • the content loss function is defined as follows:
  • C1 is a constant used to normalize the results
  • the first training output image R 1 processed by the generated neural network can be calculated and The content loss value L content_1 between the first training images I 1 .
  • the generated neural network is combined with the content loss function to train the generated neural network to ensure consistency between the converted image and the input image, so that the system is simple and easy to train.
  • the generating neural network loss function is expressed as:
  • E is an averaging process
  • Pdata is an image set that causes the neural network output to be 1 to be identified, that is, a training image having a target conversion feature
  • x is an image belonging to the Pdata set, for example, a second sample image
  • Pz is the set of input images that generate the neural network
  • z is the image belonging to the Pz set, for example, the first training image.
  • D is the identification neural network
  • G is the generation neural network
  • D(x) is represented as the process of passing the image x through the authentication neural network, and the authentication tag of the image x is output by the identification neural network.
  • G(z) is expressed as a process of generating an image by generating an image of the neural network
  • D(G(z)) is a process of passing the output image processed by the generated neural network through the authentication neural network, and outputting the output image.
  • the first calculated loss calculator generates a loss value of the neural network based on the L_G calculation.
  • the normalized loss function adopts a parameter regularization loss function L L1 , and other types of standardization losses may also be adopted.
  • convolution kernels and offsets are parameters that need to be trained.
  • the convolution kernel determines how the input image is processed, and the offset determines whether the output of the convolution kernel is input to the next layer.
  • the offset can be visually compared to a "switch" that determines whether the convolution kernel is "on” or "off.”
  • the network turns on or off different convolution kernels to achieve different processing effects.
  • the mean value of the absolute values of all convolution kernels in the neural network is:
  • is the sum of the values of all convolution kernels in the network
  • C w is the number of convolution kernels in the network.
  • the mean value of all offset absolute values in the neural network is:
  • is the sum of the values of all the offsets in the network
  • C b is the number of offsets in the network.
  • is a very small positive number used to ensure that the denominator is not zero.
  • the bias in the convolutional layer it is desirable for the bias in the convolutional layer to have a greater absolute value than the convolution kernel to more effectively function as a biased "switch.”
  • the first calculated loss calculator generates a parameter regularization loss value of the neural network according to the L L1 calculation.
  • the total loss of generating a neural network is:
  • R is the normalized loss value of the generated neural network
  • ⁇ , ⁇ , and ⁇ are the weight loss values of the total loss, the generated neural network loss value, and the normalized loss value, respectively, and the above parameters are regularized in the embodiment of the present disclosure.
  • the loss value represents the normalized loss value, and other types of standardized loss values can also be used.
  • the authentication neural network used in the process of training the generated neural network together with the generated neural network constitutes a set of confrontation networks.
  • the authentication neural network extracts the content features of the input image using a plurality of convolutional layers and pooling layers, and reduces the size of the feature image for further extraction of image features by the next convolutional layer.
  • the image features are then processed using the fully connected layer and the active layer, and finally an authentication tag indicating whether the input image has a conversion feature is output.
  • the fully connected layer has the same structure as the convolutional neural network except that the convolution kernel is replaced with a scalar value.
  • the activation layer is typically a RELU or sigmoid function. In the embodiment of the present disclosure, the specific structure of the authentication neural network is as shown in FIG. 7, wherein the activation layer is a sigmoid function, and finally the authentication tag is output.
  • the generating neural network converts the input image from the effect M into an output image having an effect N, which determines whether the output image has the feature of the effect N and outputs the authentication tag. For example, if it is judged that the output image has the feature of the effect N, the output is close to "1", and if it is judged that the output image does not have the feature of the effect N, the output is close to "0".
  • the generated neural network gradually generates an output image that makes the neural network output "1", and the identification neural network can more accurately determine whether the output image has a conversion feature. The two train synchronously and compete against each other to obtain better parameters. .
  • Figure 8 shows a flow chart for training the authentication neural network
  • Figure 9 shows a block diagram for training the authentication neural network. Next, the process of training the identification neural network will be specifically described with reference to FIGS. 8 and 9.
  • step S810 the neural network in accordance with the generated second training image to generate a first sample image I 2 R 2, I 2 of the second training image may be input in the same image shown in FIG 8, It may also be different, which is used to train the authentication neural network and does not have the desired image conversion features.
  • This process is the same as the step of generating an output image according to the input image by using the generated neural network in FIG. 1, and details are not described herein again.
  • the second sample image obtaining R 3 from the training database that contains the desired image conversion in the second sample image features.
  • the sample image in the training database contains expected conversion features. For example, it may be a set of Van Gogh's paintings, all of which have similar features in creation, color, composition, etc., so that the trained neural network can be trained. Convert the input image into an output image with the same characteristics.
  • the first sample image R 2 and the second sample image R 3 are discriminated with the above-described authentication neural network to have a conversion feature, and an authentication tag is output.
  • the second sample image R 3 is a "true sample” because it has a "true” label
  • the first sample image R 2 is naturally generated with a "false” label because it is generated by a generated neural network.
  • a "fake sample” is a "true sample” because it has a "true” label
  • the first sample image R 2 is naturally generated with a "false” label because it is generated by a generated neural network.
  • step S840 the authentication neural network is trained based on the authentication tag by the second loss calculation unit.
  • the second loss calculation unit includes: a second loss calculator and an optimizer.
  • the second neural network is used to calculate the authentication neural network according to the second loss function according to the authentication tag of the first sample image R 2 and the authentication tag of the second sample image R 3 Loss value, wherein the second loss function is a differential neural network loss function;
  • the optimizer is used to optimize the parameter of the identification neural network according to the loss value of the identification neural network, wherein the parameter includes the identification neural network The convolution kernel and offset of the convolutional network.
  • the first sample image R 2 is an output image obtained by converting the effect M to the effect N using the generated neural network, and is equivalent to a "false” sample.
  • the second sample image R 3 obtained from the training database is a "true” sample with an effect N.
  • the discriminating neural network is used to determine whether the R 2 and R 3 have an effect N, and an authentication tag is output.
  • the second loss function includes identifying a neural network loss function:
  • L_D - ⁇ x ⁇ Pdata(x) [logD(x)]- ⁇ z ⁇ Pz(z) [1-logD(G(z))]
  • E is an averaging process
  • Pdata is an image set that causes the neural network output to be 1 to be identified, that is, a training image having a target conversion feature
  • x is an image belonging to the Pdata set, for example, a second sample image
  • Pz is the set of input images that generate the neural network
  • z is the image belonging to the Pz set, for example, the first training image.
  • D is a discriminating neural network
  • G is a generating neural network, wherein D(x) is represented as a process of passing an image x through a discriminating neural network, and an authentication tag of the image x is output by the discriminating neural network.
  • G(z) is expressed as a process of generating an image through the generation of a neural network, that is, using the generated neural network to generate an output image according to the image x
  • D(G(z)) is represented by authenticating the output image processed by the generated neural network.
  • the processing of the neural network outputs whether the output image has an authentication tag of the conversion feature.
  • the identification neural network calculates the loss value of the identification neural network by the second loss calculator according to L_D, and optimizes the parameter of the identification neural network according to the loss value by using an optimizer, the parameter includes identifying a convolution network in the neural network network Convolution kernel and offset.
  • the trained neural network is trained to have optimized parameters that can be used to implement image conversion processing, using the input image to generate an output image having the desired conversion characteristics.
  • the trained neural network has an optimized parameter that can be used to determine whether the input image has the expected transition characteristics.
  • the loss calculation unit is used to train the generation neural network and the identification neural network according to the loss function, and the system is simple and easier to train. Moreover, by establishing a cross-level connection between different processing levels in the generated neural network, the output converted image is guaranteed to be consistent with the input image, that is, the converted image has both conversion characteristics and sufficient original image information. Avoid losing a lot of original image information during image processing.
  • Embodiments of the present disclosure also provide an image processing apparatus that can implement the above image processing method.
  • a schematic block diagram of the image processing apparatus is shown in FIG. 10, which includes generating a neural network module. It should be noted that the structure of the image processing apparatus shown in FIG. 10 is merely exemplary and not limiting, and the image processing apparatus may have other components depending on actual application requirements.
  • the generating neural network module can include the generated neural network described above.
  • the image processing apparatus provided by the embodiment of the present disclosure performs image conversion processing on the input image by using the generated neural network module to output the converted output image.
  • the image processing apparatus may further include a training neural network module, the training neural network module configured to train the generating neural network module according to the input image and the output image of the generating neural network module, so that the output image has an expected Image features.
  • the generating neural network module by establishing a cross-level connection between different processing levels in the generating neural network, the generating neural network module retains the original information of the input image during the image conversion process, thereby making the output
  • the converted image includes both the conversion feature and the original image information to ensure the image conversion effect.
  • Each processing level in the generating neural network module may include at least a part of a convolution network, a downsampling layer, an upsampling layer, and a normalization layer according to the needs of image processing.
  • the number of the downsampling layer and the upsampling layer are equal.
  • the generating neural network module performs image conversion processing on the input image to output an output image after converting the image feature.
  • the training neural network module is configured to train the generated neural network module according to the first training image and the first training output image.
  • said neural network generating module according to an output of said first training image I 1 through the first training output image after the image converting R 1, the training neural network training module based on the first image I 1, a first training The output image R 1 trains the generated neural network module.
  • the training aims to optimize the parameters in the network according to the processing results of the generated neural network module so that it can perform the expected image conversion processing.
  • the training generation neural network module includes an identification neural network module and a first loss calculation unit, and the identification neural network module includes the above-described identification neural network.
  • said neural network authentication means for outputting said first training output image R 1 whether the authentication ticket conversion characteristics.
  • the first loss calculation unit is configured to calculate a loss value of the generated neural network module according to the first training image I 1 , the first training output image R 1 and the authentication tag, and optimize parameters of the generated neural network module.
  • the first loss calculation unit includes an analysis network, a first loss calculator, and an optimizer.
  • the analysis network is configured to output content features of the first training image I 1 and the first training output image R 1 .
  • a first calculator for calculating the loss value of the loss generated in accordance with a first neural network module loss function based on the content of the network output, and wherein the analysis of the first training output image identification label R 1.
  • the first loss function comprises at least one of a content loss function, a generated neural network loss function, and a standardized loss function.
  • the optimizer is configured to optimize parameters of the generated neural network module according to the loss value of the generated neural network module, wherein the parameter generates a convolution kernel and a bias of a convolution network in the neural network module, the volume
  • the accumulation and offset can determine the enabling of cross-level connections in the neural network.
  • the training neural network module in the image processing apparatus is further configured to train the authentication neural network module according to the authentication tag of the authentication neural network module.
  • the above input image is taken as the second training image I 2
  • the output image is taken as the first sample image R 2
  • the training image in the training database is taken as the second sample image R 3 .
  • the authentication neural network module outputs the identification labels of the first sample image R 2 and the second sample image R 3 , respectively.
  • the training neural network module further includes a second loss calculation unit for training the authentication neural network module according to the authentication tag of the first sample image R 2 and the authentication tag of the second sample image R 3 .
  • the second loss calculation unit includes a second loss calculator and an optimizer.
  • the second loss calculation unit is configured to calculate the loss of the identification neural network module according to the second loss function according to the identification label of the first sample image R 2 and the authentication label of the second sample image R 3 a value, wherein the second loss function is a loss function that identifies the neural network in the neural network module.
  • the optimizer is configured to optimize parameters of the authentication neural network module based on the loss value of the identification neural network module, wherein the parameter includes a convolution kernel and a bias of the convolution network in the identification neural network module.
  • the trained generated neural network module may perform image conversion processing according to training to generate an output image capable of outputting the authentication neural network module to "1", that is, the authentication neural network module determines that the output image has a conversion characteristic.
  • the trained authentication neural network module can more accurately determine whether the output image output by the generated neural network module has a conversion feature according to the training.
  • the image processing apparatus includes a generated neural network module that establishes a cross-level connection between different processing levels.
  • the generated neural network module is trained according to the image in the training database and the loss function, and the trained neural network module can output the output image with the expected conversion feature while retaining the input by optimizing the parameters in the neural network.
  • the original information in the image ensures that the output image remains consistent with the input image, the system is simple, easy to train, and has greater flexibility.
  • An embodiment of the present disclosure further provides an image processing apparatus, which is shown in FIG. 11 and includes a processor 1102 and a memory 1104. It should be noted that the structure of the image processing apparatus shown in FIG. 11 is merely exemplary and not restrictive, and the image processing apparatus may have other components depending on actual application needs.
  • the processor 1102 and the memory 1104 can communicate with each other directly or indirectly. Communication between the processor 1102 and components such as the memory 1104 can be through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunications network, an Internet of Things based Internet and/or telecommunications network, and/or any combination of the above networks, and the like.
  • the wired network can be communicated by means of twisted pair, coaxial cable or optical fiber transmission, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee or WiFi communication.
  • the disclosure does not limit the type and function of the network.
  • the processor 1102 can control other components in the image processing device to perform the desired functions.
  • the processor 1102 can be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, etc., having data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture or the like.
  • the GPU can be integrated directly into the motherboard or built into the Northbridge of the motherboard.
  • the GPU can also be built into the central processing unit (CPU). Because the GPU has powerful image processing power.
  • Memory 1104 can include any combination of one or more computer program products, which can include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • Volatile memory can include, for example, random access memory (RAM) and/or caches and the like.
  • the non-volatile memory may include, for example, a read only memory (ROM), a hard disk, an erasable programmable read only memory (EPROM), a portable compact disk read only memory (CD-ROM), a USB memory, a flash memory, and the like.
  • One or more computer readable code or instructions may be stored on the memory 1104, and the processor 1102 may execute the computer instructions to perform the image processing methods described above or to implement the image processing apparatus described above.
  • the processor 1102 may execute the computer instructions to perform the image processing methods described above or to implement the image processing apparatus described above.
  • image processing method and the image processing apparatus For a detailed description of the image processing method and the image processing apparatus, reference may be made to the related description of the image processing method and the processing apparatus in this specification, and details are not described herein again.
  • Various applications and various data may also be stored in the computer readable storage medium, such as image data sets and various data (such as training data) used and/or generated by the application.
  • Embodiments of the present disclosure provide an image processing method, a processing apparatus, and a processing apparatus for implementing image conversion processing.
  • the image processing method, the processing device and the processing device generate an output image after the conversion feature is generated by using the generated neural network, and train the generated neural network by using the sample image and the loss function in the training database, so that the system is simple and easy to train.
  • the generated neural network establishes a cross-level connection between different processing levels, so that the output image can not only have the characteristics of image conversion, but also retain the original information of the input image, and ensure the output image and the input image. Consistent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法、处理装置和处理设备,利用生成神经网络结合图像内容特征实现图像转换,使得转换后的输出图像既包含转换特征又保持与输入图像的一致性。该图像处理方法包括:获取输入图像;利用生成神经网络对所述输入图像进行图像转换处理,以输出转换后的输出图像,其中,所述生成神经网络中包含多个处理层级,其中将第i处理层级的输出结果输入至第i+1处理层级和第j处理层级,所述第j处理层级还接收第j-1处理层级的输出结果,所述第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸,其中:i小于j-1,i和j为正整数。

Description

图像处理方法、处理装置以及处理设备
本申请要求于2018年01月26日递交的第201810079435.8号中国专利申请的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及图像处理领域,尤其涉及一种图像处理方法、处理装置以及处理设备。
背景技术
利用深度神经网络进行图像处理和转换是随着深度学习技术的发展而新兴起来的技术。然而,现有技术中的图像处理和转换系统的结构复杂且难于训练。因此,需要一种实现图像转换的图像处理方法、处理装置和处理设备,其既能对输入图像进行转换处理,又能保留输入图像的原始信息,保证输出图像与输入图像之间具有一致性。
发明内容
本公开的实施例提供一种图像处理方法,包括:获取输入图像;利用生成神经网络对所述输入图像进行图像转换处理,以输出转换后的输出图像,其中,所述生成神经网络中包括多个处理层级,其中将第i处理层级的输出结果输入至第i+1处理层级和第j处理层级,所述第j处理层级还接收第j-1处理层级的输出结果,所述第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸,其中:i小于j-1,i和j为正整数。
例如,所述生成神经网络的多个处理层级中的每个处理层级包括卷积网络,所述多个处理层级中的至少一部分处理层级还包括下采样层、上采样层和标准化层中的至少一个。
例如,在所述生成神经网络中,所述下采样层的个数与所述上采样层的个数相等。
例如,其中所述输入图像作为第一训练图像,所述输出图像作为第一训练输出图像,所述图像处理方法还包括:基于所述第一训练图像、第一训练 输出图像训练所述生成神经网络。
例如,其中,训练所述生成神经网络包括:将所述第一训练输出图像输入至鉴别神经网络,输出所述第一训练输出图像是否具有转换特征的鉴别标签;利用第一损失计算单元根据所述第一训练图像、第一训练输出图像和鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络的参数。
例如,其中,所述第一损失计算单元包括分析网络、第一损失计算器和优化器,并且利用第一损失计算单元优化所述生成神经网络的参数包括:利用所述分析网络输出所述第一训练图像和第一训练输出图像的内容特征;利用所述第一损失计算器根据分析网络输出的内容特征以及所述第一训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络的损失值;利用所述优化器根据所述生成神经网络的损失值优化所述生成神经网络的参数,其中,该参数包括所述生成神经网络中卷积网络的卷积核和偏置。
例如,其中:所述第一损失函数包括内容损失函数、生成神经网络损失函数和标准化损失函数中的至少一个。
例如,其中,所述输入图像作为第二训练图像,所述输出图像作为第一样本图像,所述图像处理方法还包括:从训练数据库获取第二样本图像;利用所述鉴别神经网络输出所述第一样本图像和所述第二样本图像是否具有转换特征的鉴别标签;利用第二损失计算单元根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签训练所述鉴别神经网络。
例如,其中,所述第二损失计算单元包括第二损失计算器和优化器,并且利用第二损失计算单元训练所述鉴别神经网络包括:利用所述第二损失计算器根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签按照第二损失函数计算所述鉴别神经网络的损失值,其中,所述第二损失函数包括鉴别神经网络损失函数;利用所述优化器根据所述鉴别神经网络的损失值优化所述鉴别神经网络的参数,其中,该参数包括所述鉴别神经网络中卷积网络的卷积核和偏置。
例如,其中,所述训练数据库中包括具有转换特征的样本图像。
本公开的实施例还提供一种图像处理装置,包括:生成神经网络模块,用于对所述输入图像进行图像转换处理,以输出转换后的输出图像,其中:所述生成神经网络模块中包括多个处理层级,其中第i处理层级的输出结果输入至第i+1处理层级和第j处理层级,所述第j处理层级还接收第j-1处理层 级的输出结果,所述第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸,其中:i小于j-1,i和j为正整数。
例如,其中,所述生成神经网络模块中的每个处理层级包括卷积网络,所述多个处理层级中的至少一部分处理层级还包括下采样层、上采样层和标准化层中的至少一个,其中:在所述生成神经网络模块中,所述下采样层的个数与所述上采样层的个数相等。
例如,其中,所述输入图像作为第一训练图像,所述输出图像作为第一训练输出图像,所述图像处理装置还包括:训练神经网络模块,用于根据第一训练图像和第一训练输出图像训练所述生成神经网络模块,其中所述训练生成神经网络模块包括:鉴别神经网络模块,用于输出所述第一训练输出图像是否具有转换特征的鉴别标签;第一损失计算单元,用于根据所述第一训练图像、第一训练输出图像和鉴别标签计算所述生成神经网络模块的损失值,优化所述生成神经网络模块的参数。
例如,其中,所述第一损失计算单元包括:分析网络,用于输出所述第一训练图像和第一训练输出图像的内容特征;第一损失计算器,用于根据分析网络输出的内容特征以及所述第一训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络模块的损失值,其中,所述第一损失函数包括内容损失函数、生成神经网络损失函数和标准化损失函数中的至少一个;以及优化器,用于根据所述生成神经网络模块的损失值优化所述生成神经网络模块的参数,其中,该参数包括所述生成神经网络模块中卷积网络的卷积核和偏置。
例如,其中,所述训练神经网络模块还用于根据所述鉴别神经网络模块的鉴别标签训练所述鉴别神经网络模块,其中,所述输入图像作为第二训练图像,所述输出图像作为第一样本图像,从训练数据库中获取的图像作为第二样本图像,所述鉴别神经网络模块根据所述第一样本图像、第二样本图像输出鉴别标签,其中,所述训练神经网络模块还包括:第二损失计算单元,用于根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签训练所述鉴别神经网络模块。
例如,其中,所述第二损失计算单元包括:第二损失计算器,用于根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签按照第二损失函数计算所述鉴别神经网络模块的损失值,其中,所述第二损失函数包括鉴 别神经网络模块损失函数;以及优化器,用于根据所述鉴别神经网络模块的损失值优化所述鉴别神经网络模块的参数,其中,该参数包括所述鉴别神经网络模块中卷积网络的卷积核和偏置。
例如,其中,所述训练数据库中包括具有转换特征的样本图像。
本公开的实施例还提供一种图像处理设备,包括:一个或多个处理器;一个或多个存储器,其中,所述存储器存储有计算机可读代码,所述计算机可读代码当由所述一个或多个处理器运行时执行如上所述的图像处理方法,或实现如上所述的图像处理装置。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1示出了本公开实施例提供的图像处理方法的流程图;
图2A示出了用于实现图1中所述图像处理方法的生成神经网络的结构框图;
图2B示出了用于实现图1中所述图像处理方法的生成神经网络的具体结构;
图3示出了MUX层的示意图;
图4示出了训练生成神经网络的流程图;
图5示出了训练生成神经网络的框图;
图6示出了分析网络的具体结构图;
图7示出了鉴别神经网络的具体结构图;
图8示出了训练鉴别神经网络的流程图;
图9示出了训练鉴别神经网络的框图;
图10示出了本公开实施例提供的图像处理装置的示意性框图;
图11示出了本公开实施例提供的图像处理设备的示意性框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本公开一部分的实施例,而 不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供的图像处理方法的流程图如图1所示。
在步骤S110,获取要接受图像转换处理的输入图像,该输入图像通常为彩色图像,例如可以是RGB图像,也可以是灰度图像。
接着,在步骤S120,利用生成神经网络对所述输入图像进行图像转换处理,其中,所述生成神经网络是基于训练得到的。通过训练,所述生成神经网络能实现图像特征转换处理,所述特征转换处理可以但不限于是图像风格的转换,例如,使输入的照片图像具有油画的特征,也可以是季节特征的转换,例如,使输入图像具有冬季的特色。
在利用生成神经网络进行图像转换处理的过程中,由于神经网络中存在下采样处理,例如通过池化层实现,会在输出的处理结果中丢失原图信息并产生多余的转换特征,导致图像转换效果较差的现象。在本公开中,通过在生成神经网络中不同处理层级的网络之间建立跨层级连接,使得生成神经网络在图像转换处理的过程中保持输入图像的原始信息,从而使得输出的转换图像既包括转换特征,又能保留原图信息,保证图像转换效果。
其中,所述跨层级连接是将第i处理层级的输出结果既输入至第i+1处理层级又输入至第j处理层级,所述第j处理层级还接收第j-1处理层级的输出结果,其中,第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸。其中,i小于j-1,i、j为正整数。由此,即在第i处理层级与第j处理层级之间建立了跨层级的连接,其中,第i处理层级的输出结果跨过其后续处理层级的处理步骤,直接地输入至第j处理层级,第j处理层级基于第i处理层级的输出结果和第j-1处理层级的输出结果进行图像处理。所述生成神经网络将第i处理层级的输出结果跨层级连接至第j处理层级的输入,由于第i处理层级的输出结果未经过其与第j处理层级之间的处理层级的处理,因此,第i处理层级的输出结果与第j-1处理层级的输出结果相比包含了更多的输入图像的原始信息,从而保证了输出图像与输入图像的一致性。
接着,在步骤S130,该生成神经网络输出经过图像转换处理的输出图像,所述输出图像具备图像转换特征。所述生成神经网络经过不同的训练过程能实现不同的图像转换处理,例如,可以是图像风格、场景、季节、效果或基于其他特征的图像转换。
图2A示出了用于实现上述图像处理方法的生成神经网络的结构框图,图2B示出了用于实现上述图像处理方法的生成神经网络的具体结构。以下,将结合图2A和图2B详细介绍所述图像处理方法。
图2A和图2B中示出的生成神经网络是包含5个处理层级的神经网络,其中,每个处理层级中包括卷积网络。根据图像处理的需要,所述5个处理层级中的至少一部分处理层级还可以包括下采样层、上采样层、标准化层中的至少一个。其中,在生成神经网络中,所述下采样层与所述上采样层的个数相等。需要注意的是,图2A和图2B中的生成神经网络仅为示例性的,并不构成对本公开的限制。所述用于实现图像转换处理的生成神经网络可以具有其它个数的处理层级,其具体的结构也可以根据图像转换的需求进行适当的调整。
所述卷积网络中至少包括卷积层,还可以包括其他的处理层,例如池化层、激活层等。通常情况下每个卷积层可以包含数十个或数百个卷积核,层数越多,卷积网络的结构越复杂。
在本公开实施例中,所述生成神经网络中的多个处理层级中的每个处理层级包括用于实现图像转换处理的卷积网络。其中,所述多个处理层级中的至少一部分处理层级还包括下采样层、上采样层、标准化层中的至少一个。所述生成神经网络中还包括跨层级连接。所述跨层级连接用于如图2A和图2B所示的将处于不同处理层级的两部分网络连接起来。
如图2A所示,将输入图像输入至生成神经网络后,首先经过第一处理层级的处理,用于提取图像特征,为描述方便,将经第一处理层级处理后的结果表示为结果A。
接着,结果A经过生成神经网络中的第二处理层级的处理,得到结果B。接着,结果B经过生成神经网络中的第三处理层级的处理,得到结果C。接着,结果C经过生成神经网络中的第四处理层级的处理,得到结果D。接着,结果D经过生成神经网络中的第五处理层级的处理,得到输出图像。
在此基础上,将经过第一处理层级处理得到的结果A跨层级连接至第五处理层级,用于与结果D一起经过第五处理层级的处理,生成输出图像。所述跨层级连接,即,将处于第一处理层级的结果A跨过第二、三、四处理层级的处理。其中,结果A中的图像与结果D中的图像具有相同的尺寸。此外,由于结果A未经过所述第二、第三、第四处理层级的处理,因此与结果D相 比,包含更多的输入图像的原始信息,使得输出图像在图像特征转换的基础上能更多地保留输入图像的信息,保持与输入图像的一致性。
同理,将经过第一、第二处理层级得到的结果B跨层级连接至第四处理层级,用于与结果C一起经过第四处理层级的处理,用于生成结果D。即,将第二处理层级的结果B跨过第三处理层级的处理,直接输入至第四处理层级。其中,结果B中的图像与结果C中的图像具有相同的尺寸。第四处理层级基于结果B和结果C生成结果D,其中,结果C为结果B经过第三处理层级的处理得到,由于结果B未经过所述第三处理层级的处理,因此与结果C相比,其包含更多的输入图像的原始信息。
在根据本公开的其他实施例中,所述生成神经网络还可以是包括更多数目的处理层级的网络,所述跨层级连接可以根据神经网络中的处理层级的顺序依次实现,在此不再赘述。
如图2B所示,在本公开实施例中,所述生成神经网络中的所述第一处理层级中可以包括卷积网络,其用于提取输入图像中的图像特征,得到结果A。所述第二处理层级中可以依次包括下采样层、标准化层、卷积网络,用于得到结果B。所述第三处理层级可以依次包括下采样层、标准化层、卷积网络以及上采样层,用于得到结果C。所述第四处理层级可以依次包括卷积网络、标准化层以及上采样层,用于得到结果D。所述第五处理层级可以依次包括卷积网络和标准化层,用于生成输出图像。其中,所述生成神经网络中还在不同的处理层级之间建立跨层级连接。具体地,在第一处理层级的输出结果与第五处理层级的输入之间建立跨层级连接,即,将结果A既输入至第二处理层级还输入至第五处理层级,由此,第五处理层级既接收结果D还接收结果A。此外,还在第二处理层级的输出结果与第四处理层级的输入之间建立跨层级连接,即,将结果B既输入至第三处理层级还输入至第四处理层级,由此,第四处理层级既接收结果C还接收结果B。
具体地,在生成神经网络中,用于图像转换处理的卷积网络中包含若干数目的卷积层。在所述卷积层中,一个神经元只与部分相邻层的神经元连接,卷积层可以对输入的图像应用若干个卷积核,以提取多种类型的特征。每个卷积核可以提取一种类型的特征,在训练生成神经网络的过程中,卷积核通过学习达到合理的权值。对输入的图像应用一个卷积核之后得到的结果被称为特征图像,其数目与卷积核的数目相同。
下采样层可以对图像进行下采样处理(例如,可以是池化层),可以在不改变特征图像数量的基础上减少特征图像的尺寸,进行特征压缩,提取主要特征。此外,下采样层可以缩减特征图像的规模,以简化计算的复杂度,在一定程度上减小过拟合的现象。
标准化层用于对上一层级输出的特征图像进行标准化处理,在本公开实施例中,所述标准化层对每个特征图像的均值和方差进行标准化处理。假设选定特征图像(mini-batch)的个数为T,某卷积层输出的特征数量为C,每个特征图像均为H行W列的矩阵,特征图像表示为(T,C,W,H),则标准化公式如下:
Figure PCTCN2018101369-appb-000001
其中x tijk为某卷积层输出的特征图像集合中的第t个特征块(patch)的第i个特征图像中的第j列第k行的值。y tijk表示x tijk经过实例标准化层处理得到的结果,ε为值很小的整数,以避免分母为0。
所述上采样层,例如,可以是MUX层,其可以对输入的若干图像进行像素交错重排处理,使得在图像数目不变的基础上,增加每个图像的尺寸。由此,MUX层通过不同图像间像素的排列组合,增加了每幅图像的像素数目。图3示出了使用2*2MUX层进行上采样的示意图。对于输入的4幅图像INPUT 4n、INPUT 4n+1、INPUT 4n+2和INPUT 4n+3,假设输入的图像的像素数目为a*b,经过2*2MUX层的像素重排处理后,输出4幅像素数目为2a*2b的图像OUTPUT 4n、OUTPUT 4n+1、OUTPUT 4n+2和OUTPUT 4n+3,增加了每幅图像的像素信息。在本公开实施例中,所述上采样层的个数应与所述下采样层的个数相同,使得输出图像与输入图像具有相同的图像尺寸,并且保证跨层级连接的两部分处理结果具有相同的图像尺寸。
输出图像的转换特征由所述生成神经网络的参数决定,根据图像转换应用,通过对所述生成神经网络进行训练,优化所述参数,以实现转换目的,使得输入图像具有与训练图像相同的图像特征。其中,所述参数可以包括所述生成神经网络中卷积网络的卷积核和偏置,所述卷积核和偏置可以决定用于跨层级连接的结果(例如,结果A和结果B)的使能情况,例如,通过所述卷积核和偏置决定所述结果A与结果D在经过第五处理层级处理的过程中对生成输出图像的贡献大小,从而控制所述跨层级连接。
例如,在图2B中的生成神经网络中,通过调整所述卷积核和偏置可以增加用于跨层级连接的结果A在生成输出图像过程中具有与结果D相比较大的权重,使得输出图像具有更多的原图信息。同理,也可以通过调整所述卷积核和偏置减少用于跨层级连接的结果A在生成输出图像过程中具有与结果D相比较小的权重。因此,所述跨层级连接使得生成神经网络在图像转换处理的过程中具有更多的灵活性。其中,所述图像转换可以是图像风格、季节、效果、场景等的转换,例如将一幅风景图像转换为具有梵高作品特征的图像、将一幅具有夏季特征的图像转换为具有冬季特征的图像、将棕色马的图像转换为斑马的特征等,甚至可以是将猫转换为狗。
图4示出了训练所述生成神经网络的流程图,图5示出了训练所述生成神经网络的框图。以下,将结合图4和图5来具体地描述训练所述生成神经网络的过程。
根据本公开实施例的图像处理方法中,如图4所示,在步骤S410,获取第一训练图像。所述第一训练图像可以与图1中所示的输入图像相同,也可以不同,其用于训练所述生成神经网络,并且不具有预期的图像转换特征。
接着,在步骤S420,利用所述生成神经网络对所述第一训练图像进行图像转换处理以生成第一训练输出图像。此过程与图1中利用所述生成神经网络生成输出图像的步骤相同,在此不再赘述。
然后,在步骤S430,基于所述第一训练图像以及所述第一训练输出图像训练所述生成神经网络,该训练旨在根据生成神经网络的处理结果,优化该网络中的参数,使得输出图像具备预期的图像转换特征。所述预期的图像转换特征为希望通过所述生成神经网络实现的图像转换处理而使得从不具备该转换特征的输入图像,得到具备该转换特征的输出图像。例如,预期的图像转换特征可以是梵高画作的特征,第一训练图像则为不具备梵高画作特征的照片,利用生成神经网络生成第一训练输出图像,通过判断生成神经网络输出的第一训练输出图像中是否具有梵高画作的特征来训练生成神经网络中的参数。
如图5所示,步骤S430中的训练所述生成神经网络的具体过程包括:将所述第一训练输出图像输入至鉴别神经网络,用于输出所述第一训练输出图像是否具有转换特征的鉴别标签;利用第一损失计算单元根据所述第一训练图像、第一训练输出图像和鉴别标签计算所述生成神经网络的损失值,优化 所述生成神经网络的参数。
如图5所示,所述第一损失计算单元包括分析网络、第一损失计算器以及优化器。在本公开实施例中,利用第一损失计算单元计算所述生成神经网络的损失值包括:利用所述分析网络输出所述第一训练图像和第一训练输出图像的内容特征;利用所述第一损失计算器根据所述分析网络输出的内容特征以及所述第一训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络的损失值;以及,利用优化器根据所述生成神经网络的损失值优化所述生成神经网络的参数,其中,该参数包括所述生成神经网络中卷积网络的卷积核和偏置。
所述分析网络的具体结构如图6所示,其由若干个卷积网络和池化层组成,用于提取输入的图像的内容特征。其中每一个卷积网络的输出都是从输入的图像中提出的特征,池化层用于降低特征图像的分辨率并传递给下一个卷积网络。经过每个卷积网络后的特征图像都表征了输入图像在不同级别上的特征(如纹理、边缘、物体等)。
在本公开实施例中,利用分析网络对第一训练图像I 1、第一训练输出图像R 1进行处理,提取其内容特征,并将提取的内容特征输入至第一损失计算器。
所述第一损失计算器根据分析网络输出的第一训练图像I 1、第一训练输出图像R 1的内容特征以及鉴别标签,按照第一损失函数计算生成神经网络的损失值。第一损失计算器将计算得到的生成神经网络的总损失值输入到优化器,所述优化器根据损失值优化生成神经网络的卷积网络中的卷积核和偏置,以实现更接近图像转换特征的处理效果。其中,所述卷积核和偏置可以决定用于跨层级连接的结果(例如,结果A和结果B)的使能情况。使得增加了训练过程中的系统灵活性,并且,经过训练的所述包含跨层级连接的生成神经网络能在使输出图像具备预期转换特征的基础上保留输入的图像的原始信息,避免了输出的转换图像与输入的图像之间的不一致性。
在本公开实施例中,所述第一损失函数包括内容损失函数、生成神经网络损失函数以及标准化损失函数中的至少一个。其中,所述内容损失函数用于表征第一训练图像I 1与第一训练输出图像R 1之间的内容损失,设P l和F l分别为他们在分析网络中第l层输出的特征图像,则内容损失函数的定义如下:
Figure PCTCN2018101369-appb-000002
其中C1为一个常数,用于对结果进行标准化处理,
Figure PCTCN2018101369-appb-000003
表示在分析网络中第l个卷积层中第i个卷积核输出的F l中第j个位置的值,
Figure PCTCN2018101369-appb-000004
表示第l个卷积层中第i个卷积核输出的P l中第j个位置的值。
按照内容损失函数的公式,根据第一训练图像I 1与第一训练输出图像R 1在分析网络中分别输出的特征图像,则可计算出经过生成神经网络处理的第一训练输出图像R 1与第一训练图像I 1之间的内容损失值L content_1
通过计算所述生成神经网络的内容损失值可以保证其输出的转换图像与输入图像保持一致性,使得输出图像在经过处理后在具有转换特征的基础上保留足够的原始信息。本公开实施例中利用生成神经网络结合内容损失函数对生成神经网络进行训练,保证转换图像与输入图像具有一致性,使得系统简单,易于训练。
在本公开实施例中,所述生成神经网络损失函数表示为:
Figure PCTCN2018101369-appb-000005
其中,E为取平均处理,Pdata为使得鉴别神经网络输出为1的图像集合,即具备目标转换特征的训练图像,x为属于Pdata集合的图像,例如,第二样本图像。Pz为生成神经网络的输入图像集合,z为属于Pz集合的图像,例如,第一训练图像。D为鉴别神经网络,G为生成神经网络,D(x)表示为将图像x经过鉴别神经网络的处理,利用该鉴别神经网络输出图像x的鉴别标签。G(z)表示为将图像z经过生成神经网络的处理,生成输出图像,D(G(z))为将经过生成神经网络处理后的输出图像经过鉴别神经网络的处理,输出该输出图像是否具备转换特征的鉴别标签。第一计算损失计算器根据L_G计算生成神经网络的损失值。
本公开实施例中,所述标准化损失函数采用参数正则化损失函数L L1,也可以采用其他类型的标准化损失。在神经网络中,卷积核和偏置都是需要通过训练得到的参数。卷积核决定了对输入的图像进行怎样的处理,偏置则决定了该卷积核的输出是否输入到下一层。因此,在神经网络中,偏置可形象地比喻为“开关”,决定了该卷积核是“打开”还是“关闭”。针对不同的输入图像,网络打开或关闭不同的卷积核以达到不同的处理效果。
神经网络中所有卷积核绝对值的均值为:
Figure PCTCN2018101369-appb-000006
其中,Σ||w||为对网络中的所有卷积核的值求和,C w为网络中卷积核的数目。
神经网络中所有偏置绝对值的均值为:
Figure PCTCN2018101369-appb-000007
其中,Σ||b||为对网络中所有偏置的值求和,C b为网络中偏置的数目。
则参数正则化损失函数为:
Figure PCTCN2018101369-appb-000008
其中ε为一个极小的正数,用于保证分母不为0。
本公开实施例中,希望卷积层中的偏置与卷积核相比具有更大的绝对值,以使得更有效的发挥偏置的“开关”的作用。训练过程中,第一计算损失计算器根据L L1计算生成神经网络的参数正则化损失值。
综上所述,生成神经网络的总损失为:
L total=αL content+βL_G+χR
其中,R为生成神经网络的标准化损失值,α、β和χ分别为总损失中内容损失值、生成神经网络损失值和标准化损失值所占的权重,本公开实施例中采用上述参数正则化损失值表示标准化损失值,也可采用其他类型的标准化损失值。
在训练生成神经网络过程中使用的鉴别神经网络与所述生成神经网络一起构成一组对抗网络。所述鉴别神经网络利用若干个卷积层和池化层提取输入的图像的内容特征,并减少特征图像的尺寸,用于下一卷积层进一步提取图像特征。再利用全连接层和激活层处理图像特征,最终输出表示输入图像是否具有转换特征的鉴别标签。所述全连接层具有和卷积神经网络相同的结构,只是用标量值替换了卷积核。所述激活层通常为RELU或者sigmoid函数。在本公开实施例中,鉴别神经网络的具体结构如图7所示,其中激活层为sigmoid函数,最终输出鉴别标签。
在对抗网络中,生成神经网络将输入的图像从效果M转换成具有效果N的输出图像,所述鉴别神经网络判断输出图像是否具有效果N的特征,并输出鉴别标签。例如,若判断输出图像具有效果N的特征则输出接近于“1”,若判断输出图像不具有效果N的特征则输出接近于“0”。通过训练,生成神 经网络逐渐生成使得鉴别神经网络输出“1”的输出图像,鉴别神经网络逐渐可以更准确的判断输出图像是否具有转换特征,两者同步训练,互相对抗,以获得更优的参数。
图8示出了训练所述鉴别神经网络的流程图,图9示出了训练所述鉴别神经网络的框图。下面,将结合图8和图9具体描述训练所述鉴别神经网络的过程。
如图8所示,在步骤S810,利用生成神经网络根据第二训练图像I 2生成第一样本图像R 2,所述第二训练图像I 2可以与图1中所示的输入图像相同,也可以不同,其用于训练所述鉴别神经网络,并且不具有预期的图像转换特征。此过程与图1中利用所述生成神经网络根据输入图像生成输出图像的步骤相同,在此不再赘述。
接下来,在步骤S820,从训练数据库获取第二样本图像R 3,所述第二样本图像中包含预期的图像转换特征。所述训练数据库中的样本图像包含预期的转换特征,例如,其可以是一组梵高的绘画作品,则均具有相似的创作、色彩、构图等方面的特征,使得经过训练的生成神经网络能将输入图像转化为具有相同特征的输出图像。
在步骤S830,利用上述鉴别神经网络鉴别所述第一样本图像R 2和所述第二样本图像R 3是否具有转换特征,并输出鉴别标签。应了解,所述第二样本图像R 3因天然带有“真”标签而作为“真样本”,而所述第一样本图像R 2因由生成神经网络生成而天然带有“假”标签,作为“假样本”。
最后,在步骤S840,利用第二损失计算单元根据鉴别标签来训练所述鉴别神经网络。
如图9所示,所述第二损失计算单元包括:第二损失计算器和优化器。在所述图像处理方法中,利用第二损失计算器根据所述第一样本图像R 2的鉴别标签和所述第二样本图像R 3的鉴别标签按照第二损失函数计算所述鉴别神经网络的损失值,其中,所述第二损失函数为鉴别神经网络损失函数;利用优化器根据所述鉴别神经网络的损失值优化所述鉴别神经网络的参数,其中,该参数包括所述鉴别神经网络中卷积网络的卷积核和偏置。
所述第一样本图像R 2为利用生成神经网络从效果M转换为效果N得到的输出图像,相当于“假”样本。从训练数据库获取的第二样本图像R 3为具有效果N的“真”样本。利用鉴别神经网络对所述R 2和R 3进行是否具有效果N 的判断,输出鉴别标签。
所述第二损失函数包括鉴别神经网络损失函数:
L_D=-Ε x~Pdata(x)[logD(x)]-Ε z~Pz(z)[1-logD(G(z))]
其中,E为取平均处理,Pdata为使得鉴别神经网络输出为1的图像集合,即具备目标转换特征的训练图像,x为属于Pdata集合的图像,例如,第二样本图像。Pz为生成神经网络的输入图像集合,z为属于Pz集合的图像,例如,第一训练图像。D为鉴别神经网络,G为生成神经网络,其中,D(x)表示为将图像x经过鉴别神经网络的处理,利用该鉴别神经网络输出图像x的鉴别标签。G(z)表示为将图像z经过生成神经网络的处理,即利用该生成神经网络根据图像x生成输出图像,D(G(z))表示为将经过生成神经网络处理后的输出图像经过鉴别神经网络的处理,输出该输出图像是否具备转换特征的鉴别标签。所述鉴别神经网络由第二损失计算器按照L_D计算所述鉴别神经网络的损失值,并利用优化器根据损失值优化所述鉴别神经网络的参数,该参数包括鉴别神经网络网络中卷积网络的卷积核和偏置。
经过训练的所述生成神经网络,具有优化后的参数,可以用于实现图像转换处理,利用输入图像生成具有预期的转换特征的输出图像。经过训练的所述鉴别神经网络,具有优化后的参数,可以用于判断输入的图像是否具有预期的转换特征。
本公开中利用损失计算单元,根据损失函数对所述生成神经网络以及鉴别神经网络进行训练,系统简单,更易于训练。并且,通过在生成神经网络中的不同处理层级之间建立跨层级连接,从而保证输出的转换图像与输入图像具有一致性,即转换后的图像既具有转换特征,又包括足够的原始图像信息,避免在图像处理过程中丢失大量的原图信息。
本公开实施例还提供一种图像处理装置,其可以实现上述图像处理方法。所述图像处理装置的示意性框图如图10所示,其包括生成神经网络模块。需要注意的是,图10中所示的图像处理装置的结构只是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以具有其他组件。
所述生成神经网络模块可以包括上述生成神经网络。本公开实施例提供的图像处理装置利用生成神经网络模块对输入图像进行图像转换处理,以输出转换后的输出图像。所述图像处理装置还可以包括训练神经网络模块,所述训练神经网络模块用于根据所述生成神经网络模块的输入图像和输出图像 来训练所述生成神经网络模块,从而使得输出图像具备预期的图像特征。
在所述生成神经网络模块中,通过在生成神经网络中的不同处理层级之间建立跨层级连接,使得生成神经网络模块在图像转换处理的过程中保留了输入图像的原始信息,从而使得输出的转换图像既包括转换特征,又能保留原图信息,保证图像转换效果。
所述生成神经网络模块中每个处理层级根据图像处理的需要可以包括卷积网络、下采样层、上采样层、标准化层中的至少一部分。其中,所述下采样层与所述上采样层的个数相等。在本公开实施例中,所述生成神经网络模块对输入图像进行图像转换处理,以输出转换图像特征后的输出图像。
所述训练神经网络模块,用于根据第一训练图像和第一训练输出图像训练所述生成神经网络模块。其中,所述生成神经网络模块根据所述第一训练图像I 1输出经过图像转换后的第一训练输出图像R 1,所述训练神经网络模块基于所述第一训练图像I 1、第一训练输出图像R 1训练所述生成神经网络模块。该训练旨在根据生成神经网络模块的处理结果,优化该网络中的参数,使得其可以完成预期的图像转换处理。
在本公开实施例中,所述训练生成神经网络模块包括鉴别神经网络模块和第一损失计算单元,所述鉴别神经网络模块中包含上述鉴别神经网络。其中,所述鉴别神经网络模块用于输出所述第一训练输出图像R 1是否具有转换特征的鉴别标签。所述第一损失计算单元用于根据所述第一训练图像I 1、第一训练输出图像R 1和鉴别标签计算所述生成神经网络模块的损失值,优化所述生成神经网络模块的参数。
所述第一损失计算单元包括分析网络、第一损失计算器和优化器。其中,所述分析网络用于输出所述第一训练图像I 1和第一训练输出图像R 1的内容特征。所述第一损失计算器用于根据分析网络输出的内容特征以及所述第一训练输出图像R 1的鉴别标签按照第一损失函数计算所述生成神经网络模块的损失值。其中,所述第一损失函数包括内容损失函数、生成神经网络损失函数以及标准化损失函数中的至少一个。所述优化器用于根据所述生成神经网络模块的损失值优化所述生成神经网络模块的参数,其中,该参数所述生成神经网络模块中卷积网络的卷积核和偏置,所述卷积核和偏置可以决定生成神经网络中的跨层级连接的使能情况。
根据本公开实施例,上述图像处理装置中的所述训练神经网络模块还用 于根据所述鉴别神经网络模块的鉴别标签训练所述鉴别神经网络模块。上述输入图像作为第二训练图像I 2,所述输出图像作为第一样本图像R 2,所述训练数据库中的训练图像作为第二样本图像R 3。所述鉴别神经网络模块分别输出所述第一样本图像R 2、第二样本图像R 3的鉴别标签。所述训练神经网络模块还包括第二损失计算单元,用于根据所述第一样本图像R 2的鉴别标签和所述第二样本图像R 3的鉴别标签训练所述鉴别神经网络模块。
所述第二损失计算单元包括第二损失计算器和优化器。其中,所述第二损失计算单元用于根据所述第一样本图像R 2的鉴别标签和所述第二样本图像R 3的鉴别标签按照第二损失函数计算所述鉴别神经网络模块的损失值,其中,所述第二损失函数为鉴别神经网络模块中鉴别神经网络的损失函数。所述优化器用于根据所述鉴别神经网络模块的损失值优化所述鉴别神经网络模块的参数,其中,该参数包括所述鉴别神经网络模块中卷积网络的卷积核和偏置。
经过训练的生成神经网络模块可以按照训练进行图像转换处理,以生成能够使鉴别神经网络模块输出为“1”的输出图像,即鉴别神经网络模块判断所述输出图像具有转换特征。经过训练的鉴别神经网络模块可以按照训练更准确的判断由生成神经网络模块输出的输出图像是否具有转换特征。
本公开实施例提供的图像处理装置中包括在不同处理层级之间建立跨层级连接的生成神经网络模块。根据训练数据库中的图像以及损失函数对所述生成神经网络模块进行训练,通过优化神经网络中的参数,使得经过训练的生成神经网络模块既能输出具有预期转换特征的输出图像,又保留了输入图像中的原始信息,保证输出图像保持与输入图像的一致性,系统简单,易于训练,并且具有较大的灵活性。
本公开实施例还提供了一种图像处理设备,其结构框图如图11所示,包括处理器1102和存储器1104。应当注意,图11中所示的图像处理设备的结构只是示例性的,而非限制性的,根据实际应用需要,该图像处理设备还可以具有其他组件。
在本公开的实施例中,处理器1102和存储器1104之间可以直接或间接地互相通信。处理器1102和存储器1104等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以 采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
处理器1102可以控制图像处理装置中的其它组件以执行期望的功能。处理器1102可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理可以力和/或程序执行可以力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。由于GPU具有强大的图像处理可以力。
存储器1104可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
在存储器1104上可以存储一个或多个计算机可读代码或指令,处理器1102可以运行所述计算机指令,以执行上述图像处理方法或实现上述图像处理装置。关于所述图像处理方法和所述图像处理装置的详细描述可以参考本说明书中关于图像处理方法和处理装置的相关描述,在此不再赘述。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如图像数据集以及应用程序使用和/或产生的各种数据(诸如训练数据)等。
本公开实施例提供一种图像处理方法、处理装置和处理设备,用于实现图像转换处理。所述图像处理方法、处理装置和处理设备利用生成神经网络生成转换特征后的输出图像,并利用训练数据库中的样本图像以及损失函数训练所述生成神经网络,使得系统简单,易于训练。在此基础上,所述生成神经网络中在不同处理层级之间建立跨层级连接,使得输出图像既能具备图像转换的特征,又能保留输入图像的原始信息,保证输出图像与输入图像之间具有一致性。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种图像处理方法,包括:
    获取输入图像;
    利用生成神经网络对所述输入图像进行图像转换处理,以输出转换后的输出图像,其中,
    所述生成神经网络中包括多个处理层级,其中将第i处理层级的输出结果输入至第i+1处理层级和第j处理层级,所述第j处理层级还接收第j-1处理层级的输出结果,所述第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸,其中:
    i小于j-1,i和j为正整数。
  2. 根据权利要求1所述的图像处理方法,其中:
    所述生成神经网络的多个处理层级中的每个处理层级包括卷积网络,所述多个处理层级中的至少一部分处理层级还包括下采样层、上采样层和标准化层中的至少一个。
  3. 根据权利要求1或2所述的图像处理方法,其中:
    在所述生成神经网络中,所述下采样层的个数与所述上采样层的个数相等。
  4. 根据权利要求1-3任一项所述的图像处理方法,其中所述输入图像作为第一训练图像,所述输出图像作为第一训练输出图像,所述图像处理方法还包括:
    基于所述第一训练图像、第一训练输出图像训练所述生成神经网络。
  5. 根据权利要求1-4任一项所述的图像处理方法,其中,训练所述生成神经网络包括:
    将所述第一训练输出图像输入至鉴别神经网络,输出所述第一训练输出图像是否具有转换特征的鉴别标签;
    利用第一损失计算单元根据所述第一训练图像、第一训练输出图像和鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络的参数。
  6. 根据权利要求1-5任一项所述的图像处理方法,其中,所述第一损失计算单元包括分析网络、第一损失计算器和优化器,所述利用第一损失计算单元优化所述生成神经网络的参数包括:
    利用所述分析网络输出所述第一训练图像和第一训练输出图像的内容特 征;
    利用所述第一损失计算器根据分析网络输出的内容特征以及所述第一训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络的损失值;
    利用所述优化器根据所述生成神经网络的损失值优化所述生成神经网络的参数,其中,该参数包括所述生成神经网络中卷积网络的卷积核和偏置。
  7. 根据权利要求1-6任一项所述的图像处理方法,其中:
    所述第一损失函数包括内容损失函数、生成神经网络损失函数和标准化损失函数中的至少一个。
  8. 根据权利要求1-7任一项所述的图像处理方法,其中,所述输入图像作为第二训练图像,所述输出图像作为第一样本图像,所述图像处理方法还包括:
    从训练数据库获取第二样本图像;
    利用所述鉴别神经网络输出所述第一样本图像和所述第二样本图像是否具有转换特征的鉴别标签;
    利用第二损失计算单元根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签训练所述鉴别神经网络。
  9. 根据权利要求1-8任一项所述的图像处理方法,其中,所述第二损失计算单元包括第二损失计算器和优化器,所述利用第二损失计算单元训练所述鉴别神经网络包括:
    利用所述第二损失计算器根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签按照第二损失函数计算所述鉴别神经网络的损失值,其中,所述第二损失函数包括鉴别神经网络损失函数;
    利用所述优化器根据所述鉴别神经网络的损失值优化所述鉴别神经网络的参数,其中,该参数包括所述鉴别神经网络中卷积网络的卷积核和偏置。
  10. 根据权利要求1-9任一项所述的图像处理方法,其中,所述训练数据库中包括具有转换特征的样本图像。
  11. 一种图像处理装置,包括:
    生成神经网络模块,用于对所述输入图像进行图像转换处理,以输出转换后的输出图像,其中:
    所述生成神经网络模块中包括多个处理层级,其中第i处理层级的输出结果输入至第i+1处理层级和第j处理层级,所述第j处理层级还接收第j-1 处理层级的输出结果,所述第j-1处理层级的输出结果与第i处理层级的输出结果具有相同的尺寸,其中:
    i小于j-1,i和j为正整数。
  12. 根据权利要求11所述的图像处理装置,其中,所述生成神经网络模块中的每个处理层级包括卷积网络,所述多个处理层级中的至少一部分处理层级还包括下采样层、上采样层和标准化层中的至少一个,其中:
    在所述生成神经网络模块中,所述下采样层的个数与所述上采样层的个数相等。
  13. 根据权利要求11或12所述的图像处理装置,其中所述输入图像作为第一训练图像,所述输出图像作为第一训练输出图像,所述图像处理装置还包括:
    训练神经网络模块,用于根据第一训练图像和第一训练输出图像训练所述生成神经网络模块,其中所述训练生成神经网络模块包括:
    鉴别神经网络模块,用于输出所述第一训练输出图像是否具有转换特征的鉴别标签;
    第一损失计算单元,用于根据所述第一训练图像、第一训练输出图像和鉴别标签计算所述生成神经网络模块的损失值,优化所述生成神经网络模块的参数。
  14. 根据权利要求11-13任一项所述的图像处理装置,其中,所述第一损失计算单元包括:
    分析网络,用于输出所述第一训练图像和第一训练输出图像的内容特征;
    第一损失计算器,用于根据分析网络输出的内容特征以及所述第一训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络模块的损失值,其中,所述第一损失函数包括内容损失函数、生成神经网络损失函数和标准化损失函数中的至少一个;以及
    优化器,用于根据所述生成神经网络模块的损失值优化所述生成神经网络模块的参数,其中,该参数包括所述生成神经网络模块中卷积网络的卷积核和偏置。
  15. 根据权利要求11-14任一项所述的图像处理装置,其中,所述训练神经网络模块还用于根据所述鉴别神经网络模块的鉴别标签训练所述鉴别神经网络模块,其中,
    所述输入图像作为第二训练图像,所述输出图像作为第一样本图像,从训练数据库中获取的图像作为第二样本图像,所述鉴别神经网络模块根据所述第一样本图像、第二样本图像输出鉴别标签,
    其中,所述训练神经网络模块还包括:
    第二损失计算单元,用于根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签训练所述鉴别神经网络模块。
  16. 根据权利要求11-15任一项所述的图像处理装置,其中,所述第二损失计算单元包括:
    第二损失计算器,用于根据所述第一样本图像的鉴别标签和所述第二样本图像的鉴别标签按照第二损失函数计算所述鉴别神经网络模块的损失值,其中,所述第二损失函数包括鉴别神经网络模块损失函数;以及
    优化器,用于根据所述鉴别神经网络模块的损失值优化所述鉴别神经网络模块的参数,其中,该参数包括所述鉴别神经网络模块中卷积网络的卷积核和偏置。
  17. 根据权利要求11-16任一项所述的图像处理装置,其中,所述训练数据库中包括具有转换特征的样本图像。
  18. 一种图像处理设备,包括:
    一个或多个处理器;
    一个或多个存储器,
    其中,所述存储器存储有计算机可读代码,所述计算机可读代码当由所述一个或多个处理器运行时执行如权利要求1-10任一项所述的图像处理方法,或实现如权利要求11-17任一项所述的图像处理装置。
PCT/CN2018/101369 2018-01-26 2018-08-20 图像处理方法、处理装置以及处理设备 WO2019144608A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/329,893 US11281938B2 (en) 2018-01-26 2018-08-20 Image processing method, processing apparatus and processing device
EP18849416.5A EP3745347A4 (en) 2018-01-26 2018-08-20 IMAGE PROCESSING METHOD, PROCESSING APPARATUS AND PROCESSING DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810079435.8 2018-01-26
CN201810079435.8A CN109754357B (zh) 2018-01-26 2018-01-26 图像处理方法、处理装置以及处理设备

Publications (1)

Publication Number Publication Date
WO2019144608A1 true WO2019144608A1 (zh) 2019-08-01

Family

ID=66402335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101369 WO2019144608A1 (zh) 2018-01-26 2018-08-20 图像处理方法、处理装置以及处理设备

Country Status (4)

Country Link
US (1) US11281938B2 (zh)
EP (1) EP3745347A4 (zh)
CN (1) CN109754357B (zh)
WO (1) WO2019144608A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627609A (zh) * 2021-06-30 2021-11-09 国网福建省电力有限公司信息通信分公司 一种基于仿射变换的网络度量方法和存储设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685116B (zh) * 2018-11-30 2022-12-30 腾讯科技(深圳)有限公司 图像描述信息生成方法和装置及电子装置
CN112052935B (zh) * 2019-06-06 2024-06-14 奇景光电股份有限公司 卷积神经网络系统
CN111091493B (zh) * 2019-12-24 2023-10-31 北京达佳互联信息技术有限公司 图像翻译模型训练方法、图像翻译方法及装置和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105913450A (zh) * 2016-06-22 2016-08-31 武汉理工大学 基于神经网络图像处理的轮胎橡胶炭黑分散度评价方法及系统
CN106097355A (zh) * 2016-06-14 2016-11-09 山东大学 基于卷积神经网络的胃肠道肿瘤显微高光谱图像处理方法
CN107133942A (zh) * 2017-04-24 2017-09-05 南京天数信息科技有限公司 一种基于深度学习的医疗图像处理方法
CN107145902A (zh) * 2017-04-27 2017-09-08 厦门美图之家科技有限公司 一种基于卷积神经网络的图像处理方法、装置及移动终端
CN107330852A (zh) * 2017-07-03 2017-11-07 深圳市唯特视科技有限公司 一种基于实时零点图像操作网络的图像处理方法
CN107424184A (zh) * 2017-04-27 2017-12-01 厦门美图之家科技有限公司 一种基于卷积神经网络的图像处理方法、装置及移动终端
CN107547773A (zh) * 2017-07-26 2018-01-05 新华三技术有限公司 一种图像处理方法、装置及设备

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651772B (zh) * 2009-09-11 2011-03-16 宁波大学 一种基于视觉注意的视频感兴趣区域的提取方法
CN104680508B (zh) * 2013-11-29 2018-07-03 华为技术有限公司 卷积神经网络和基于卷积神经网络的目标物体检测方法
US10726593B2 (en) * 2015-09-22 2020-07-28 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US10277877B2 (en) * 2015-11-13 2019-04-30 Vefxi Corporation 3D system including a neural network
CN108074215B (zh) * 2016-11-09 2020-04-14 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
US10074038B2 (en) * 2016-11-23 2018-09-11 General Electric Company Deep learning medical systems and methods for image reconstruction and quality evaluation
EP3555812B1 (en) * 2016-12-15 2022-07-06 Google LLC Transforming source domain images into target domain images
CN106920227B (zh) * 2016-12-27 2019-06-07 北京工业大学 基于深度学习与传统方法相结合的视网膜血管分割方法
CN106886023B (zh) * 2017-02-27 2019-04-02 中国人民解放军理工大学 一种基于动态卷积神经网络的雷达回波外推方法
CN107122826B (zh) * 2017-05-08 2019-04-23 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
IL252657A0 (en) * 2017-06-04 2017-08-31 De Identification Ltd System and method for preventing image recognition
CN107256396A (zh) * 2017-06-12 2017-10-17 电子科技大学 基于卷积神经网络的船目标isar图像特征学习方法
CN107369189A (zh) * 2017-07-21 2017-11-21 成都信息工程大学 基于特征损失的医学图像超分辨率重建方法
CN109426858B (zh) 2017-08-29 2021-04-06 京东方科技集团股份有限公司 神经网络、训练方法、图像处理方法及图像处理装置
CN107578775B (zh) * 2017-09-07 2021-02-12 四川大学 一种基于深度神经网络的多分类语音方法
US10891723B1 (en) * 2017-09-29 2021-01-12 Snap Inc. Realistic neural network based image style transfer
CN107613299A (zh) * 2017-09-29 2018-01-19 杭州电子科技大学 一种利用生成网络提高帧速率上转换效果的方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097355A (zh) * 2016-06-14 2016-11-09 山东大学 基于卷积神经网络的胃肠道肿瘤显微高光谱图像处理方法
CN105913450A (zh) * 2016-06-22 2016-08-31 武汉理工大学 基于神经网络图像处理的轮胎橡胶炭黑分散度评价方法及系统
CN107133942A (zh) * 2017-04-24 2017-09-05 南京天数信息科技有限公司 一种基于深度学习的医疗图像处理方法
CN107145902A (zh) * 2017-04-27 2017-09-08 厦门美图之家科技有限公司 一种基于卷积神经网络的图像处理方法、装置及移动终端
CN107424184A (zh) * 2017-04-27 2017-12-01 厦门美图之家科技有限公司 一种基于卷积神经网络的图像处理方法、装置及移动终端
CN107330852A (zh) * 2017-07-03 2017-11-07 深圳市唯特视科技有限公司 一种基于实时零点图像操作网络的图像处理方法
CN107547773A (zh) * 2017-07-26 2018-01-05 新华三技术有限公司 一种图像处理方法、装置及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3745347A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627609A (zh) * 2021-06-30 2021-11-09 国网福建省电力有限公司信息通信分公司 一种基于仿射变换的网络度量方法和存储设备
CN113627609B (zh) * 2021-06-30 2023-06-09 国网福建省电力有限公司信息通信分公司 一种基于仿射变换的网络度量方法和存储设备

Also Published As

Publication number Publication date
EP3745347A4 (en) 2021-12-15
CN109754357A (zh) 2019-05-14
EP3745347A1 (en) 2020-12-02
US20210365728A1 (en) 2021-11-25
US11281938B2 (en) 2022-03-22
CN109754357B (zh) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2019091181A1 (zh) 图像处理方法、处理装置和处理设备
CN107730474B (zh) 图像处理方法、处理装置和处理设备
WO2020200030A1 (zh) 神经网络的训练方法、图像处理方法、图像处理装置和存储介质
CN107767343B (zh) 图像处理方法、处理装置和处理设备
WO2019144608A1 (zh) 图像处理方法、处理装置以及处理设备
US20200311871A1 (en) Image reconstruction method and device
WO2020078236A1 (zh) 二维码生成方法、装置、存储介质及电子设备
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN111738403B (zh) 一种神经网络的优化方法及相关设备
CN112164002A (zh) 人脸矫正模型的训练方法、装置、电子设备及存储介质
WO2022111387A1 (zh) 一种数据处理方法及相关装置
WO2024001806A1 (zh) 一种基于联邦学习的数据价值评估方法及其相关设备
CN112990340B (zh) 一种基于特征共享的自学习迁移方法
WO2024114659A1 (zh) 一种摘要生成方法及其相关设备
WO2024046144A1 (zh) 一种视频处理方法及其相关设备
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN116912268A (zh) 一种皮肤病变图像分割方法、装置、设备及存储介质
CN114120406B (zh) 基于卷积神经网络的人脸特征提取分类方法
CN112801153B (zh) 一种嵌入lbp特征的图的半监督图像分类方法及系统
CN114219977A (zh) 一种年龄估计方法、系统、电子设备及存储介质
CN117036727B (zh) 脑网络数据多层嵌入向量特征提取方法及装置
WO2022179586A1 (zh) 一种模型训练方法及其相关联设备
CN115797709B (zh) 一种图像分类方法、装置、设备和计算机可读存储介质
CN113742428B (zh) 一种基于区块链的神经网络数据集存储方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18849416

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018849416

Country of ref document: EP

Effective date: 20200826