WO2018205676A1 - 用于卷积神经网络的处理方法和系统、和存储介质 - Google Patents

用于卷积神经网络的处理方法和系统、和存储介质 Download PDF

Info

Publication number
WO2018205676A1
WO2018205676A1 PCT/CN2018/073434 CN2018073434W WO2018205676A1 WO 2018205676 A1 WO2018205676 A1 WO 2018205676A1 CN 2018073434 W CN2018073434 W CN 2018073434W WO 2018205676 A1 WO2018205676 A1 WO 2018205676A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
layer
generator
image
discriminator
Prior art date
Application number
PCT/CN2018/073434
Other languages
English (en)
French (fr)
Inventor
那彦波
刘瀚文
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US16/073,195 priority Critical patent/US11537873B2/en
Publication of WO2018205676A1 publication Critical patent/WO2018205676A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Embodiments of the present invention relate to the field of image processing, and more particularly to a processing method and system for a convolutional neural network, and a storage medium.
  • CNN Convolutional Neural Network
  • At least one embodiment of the present invention provides a processing method for a convolutional neural network, comprising a training generator and a training discriminator, wherein the training generator comprises: extracting a low resolution color image from a high resolution color image Using the low resolution color image and the noisy image as input images, based on parameters of the discriminator network, training parameters of the generator network by reducing a generator cost function, wherein: the generator network includes a lifting layer, The resolution is increased for the luminance component and the chrominance component of the input image; the generator cost function represents a degree of difference between the output image of the generator network and the high resolution color image; the training discriminator includes: Inputting the output image of the trained generator network and the high resolution color image to a discriminator network; respectively, training parameters of the discriminator network by reducing a discriminator cost function, wherein: the discriminator The network includes a pooling layer for reducing resolution; the discriminator cost function representing an output of the generator network As the degree of high-resolution color image corresponding to.
  • the processing method further comprises the step of alternately performing the training generator network and the training discriminator network.
  • the generator cost function consists of a first item, a second item and an optional third item, the first item being passed through the discriminator network based on an output image of the generator network Outputting, the second term is based on the degradation of the low resolution color image after the same degradation process from the high resolution color image to the low resolution color image as the output image of the generator network A difference between images, the third term being based on a ratio of a magnitude of a weight of a filter of a convolutional layer included in a parameter of the generator network to a magnitude of an offset of an active layer.
  • the discriminator cost function consists of a first item, a second item, and an optional third item, the first item being passed through the discriminator network based on an output image of the generator network An output of the second item based on the output of the discriminator network based on the high resolution color image, the third item being based on a combination of an output image of the generator network and the high resolution color image The output of the discriminator network.
  • the resolution of the luminance component is increased to the same extent as the resolution of the chrominance component
  • the generator network comprises any one of the following: a first generator network, which is The luminance component and the chrominance component both have the same number of first enhancement layers; a second generator network having a certain number of first enhancement layers for the luminance component, having a first ratio to the chrominance components a first lifting layer having a smaller number of lifting layers and a second lifting layer different from the first lifting layer; a third generator network having a certain number of first lifting layers for the brightness component, The chrominance component has a second lift layer that is different from the first lift layer.
  • the lifting layer is interposed between a convolutional layer and an activation layer of the generator network
  • the degraded layer is interposed between a convolutional layer and an activation layer of the discriminator network
  • the parameter of the generator network includes a weight of a filter of the convolution layer in the generator network, an offset of the activation layer, and a lifting parameter of the lifting layer
  • the discriminator The parameters of the network include an offset of the activation layer in the discriminator network, a weight of a filter of the convolution layer, and a degradation parameter of the degradation layer, wherein, at initialization, the generator network
  • the parameters and parameters of the discriminator network are predetermined or random.
  • the activation layer is a switching unit for turning on when the activation condition is satisfied.
  • the discriminator network further comprises an averager for averaging all pixels of the reduced resolution image through the pooled layer to obtain an indication that the input of the discriminator network is the training
  • the output image of the subsequent generator network is also an indicator of the high resolution color image.
  • extracting the low resolution color image comprises: segmenting from a series of high resolution color sample images to obtain a plurality of high resolution color images having a size smaller than the high resolution color sample image; The plurality of high resolution color images are subjected to degradation processing to obtain a plurality of low resolution color images with reduced resolution.
  • the noise image is a white noise image.
  • the lifting layer copies the pixels of the input lifting layer to a plurality of different positions in the output of the lifting layer that are higher than the resolution of the input pixels.
  • the pooling layer comprises at least one of a degradation layer, a maximum pooling layer and an average pooling layer corresponding to the lifting layer.
  • the resolution of the output image of the generator network is the same as the resolution of the high resolution color image.
  • the low resolution color image is an average of a plurality of consecutive video frames.
  • At least one embodiment of the present invention provides a processing system for a convolutional neural network comprising: one or more processors; one or more memories in which computer readable code is stored, the computer readable code
  • a processing method according to any of at least one embodiment of the present invention is performed when executed by the one or more processors.
  • At least one embodiment of the present invention provides a non-transitory computer storage medium having stored therein computer readable code that, when executed by one or more processors, performs at least one implementation in accordance with the present invention The processing method of any of the examples.
  • the present invention introduces a new convolutional neural network system architecture and provides a processing method and system for convolutional neural networks, and a storage medium. Improve the quality of resolution-amplified images by effectively utilizing color information, improve internal convolutional network architecture and training strategies, and replace traditional training strategies with anti-network methods, including the necessary additional noise inputs. Allows manual generation of details.
  • Figure 1 shows a simplified schematic of a convolutional neural network.
  • FIG. 2A shows a flow diagram of a training generator in a processing method for a convolutional neural network, in accordance with one embodiment of the present invention.
  • FIG. 2B shows a flow chart of a training discriminator in a processing method for a convolutional neural network, in accordance with one embodiment of the present invention.
  • 3A-3C respectively show schematic diagrams of three alternative types of generator networks in accordance with one embodiment of the present invention.
  • Figure 4 illustrates an example of a promotion layer in a generator network in accordance with one embodiment of the present invention.
  • Figure 5 shows a schematic diagram of a discriminator network in accordance with one embodiment of the present invention.
  • Figure 6A shows a schematic diagram of the location of a lift layer in a generator network, in accordance with an embodiment of the present invention.
  • 6B shows a schematic diagram of the location of a degradation layer in a discriminator network, in accordance with an embodiment of the present invention.
  • Figures 7A and 7B show two embodiments of an activation layer in a generator network, respectively.
  • FIG. 8 illustrates an exemplary processing system that can be used to implement the processing methods of the present disclosure.
  • AI artificial intelligence
  • neural networks and deep learning structures are much larger in number of filters and layers.
  • Convolutional neural networks are a special structure of neural networks. Their main uses are roughly divided into two categories: first, classification; second, super-resolution.
  • classification problems the input is high resolution data (eg, image or video) and the output is low resolution data (eg, label, location of the object, etc.).
  • low resolution data eg, label, location of the object, etc.
  • max-pooling layer one of the most common layers in a deep learning structure.
  • super-resolution problems require high-resolution data as input (images) and increase their size to a larger amount of data (higher resolution), which increases resolution. This completely changed the design of the deep learning structure.
  • a convolutional network is a neural network structure that uses an image as an input/output and replaces a scalar weight with a filter (convolution).
  • Figure 1 shows a simplified schematic of a convolutional neural network with a simple structure of three layers.
  • the convolutional neural network is used, for example, for image processing, using images as inputs and outputs, such as by filters (ie, convolutional layers) instead of scalar weights.
  • the structure acquires 4 input images at one convolutional layer 101 of the four input terminals on the left side, has 3 units (output image) at the center active layer 102, and is in another volume.
  • the stack 103 has 2 units and produces 2 output images.
  • Each box corresponds to a filter (eg, a 3x3 or 5x5 core), where k is a label indicating the input layer number, and i and j are labels indicating the input and output units, respectively.
  • Bias Is the scalar added to the output of the convolution.
  • the result of adding several convolutions and offsets is then passed through the activation box of the activation layer, which typically corresponds to a rectifying linear unit (ReLU) or a sigmoid function or hyperbolic tangent or the like.
  • the weights and offsets of the filters are fixed during operation of the system, obtained through a training process using a set of input/output sample images, and adjusted to suit some optimization criteria depending on the application.
  • a typical configuration involves one tenth or hundreds of filters in each layer.
  • a network with 3 layers is considered shallow, while a number of layers greater than 5 or 10 is generally considered to be deep.
  • the current deep learning study on super-resolution problems avoids the problem of increasing the input dimension by adding a traditional upscaler to the convolutional neural network as the first stage (for example, bicubic).
  • a deep learning structure that reduces or increases the size of features and output images.
  • the present disclosure introduces a new convolutional neural network system architecture that improves the quality of resolution-amplified images by effectively utilizing color information, improving internal convolutional network architecture and training strategies.
  • the present disclosure replaces traditional training strategies by combating network methods and includes the necessary additional noise inputs to allow for manual generation of details.
  • the confrontation network uses two convolutional neural network systems, namely: a so-called “generator”, which is a generator network; and a so-called “discriminator” network, Used to evaluate the quality of an image with a magnified resolution.
  • the "discriminator” can receive a color image as an input and output a number of, for example, -1 or 1. If the output is 1, the "discriminator” considers the color image to correspond to the original high resolution content. If the output is -1, the Discriminator will consider the color image to be the processed output through the generator network. Train the generator to maximize the output of the "discriminator” to make the output as realistic as possible.
  • the "discriminator” is trained to accurately distinguish between the original high-resolution content and the processed content.
  • the two networks alternately perform training steps to compete with each other and obtain optimal parameters.
  • FIG. 2A shows a flow diagram of a training generator in a processing method for a convolutional neural network, in accordance with one embodiment of the present invention.
  • step S201 a high resolution color image is utilized to extract a low resolution color image therefrom for use in training the generator network.
  • step S202 using the low resolution color image and the noise image as the input image, the parameters of the generator network are trained by reducing the generator cost function based on the parameters of the discriminator network, so that the generator network The difference between the output image and the high resolution color image is reduced.
  • the generator network includes a boosting layer to enhance resolution of a luminance component and a chrominance component of an input image, the generator cost function representing an output image of the generator network and the high resolution color image The extent of the difference.
  • FIG. 2B shows a flow chart of a training discriminator in a processing method for a convolutional neural network, in accordance with one embodiment of the present invention.
  • step S203 the output image of the trained generator network and the high resolution color image are respectively input to the discriminator network.
  • the parameters of the discriminator network are trained by reducing the discriminator cost function such that the discriminator network outputs whether the input of the discriminator network is the output image of the generator network or the high The indicator of the resolution color image.
  • the discriminator network includes a pooling layer to reduce resolution, for example, the pooling layer may be a degrading layer corresponding to a lifting layer in a generator network.
  • the discriminator cost function represents a degree to which an output image of the generator network corresponds to the high resolution color image.
  • the generator network and the discriminator network are both in the form of a convolutional neural network and have various parameters of the convolutional neural network.
  • the parameters of the generator network may include a weight of a filter of the convolution layer in the generator network, an offset of the activation layer, and a lifting parameter of the lifting layer
  • the parameters of the discriminator network may include an offset of the activation layer in the discriminator network, a weight of a filter of the convolutional layer, and a degraded parameter of the degraded layer.
  • the parameters of the generator network and the parameters of the discriminator network may be predetermined or random.
  • inputting a noisy image to the generator network may also add artificial detail in the output image of the generator network, and may produce varying artifacts in each training.
  • the steps of training the network of training generators and training the discriminator network as described above are alternated using different low resolution color images and noisy images, by continuously reducing the generator cost function and the discriminator cost
  • the function gets the parameters of the final generator network and the parameters of the discriminator network.
  • the trained generator network strives to generate an output image that causes the discriminator network output to approach 1 and the trained discriminator network strives to distinguish between the image generated by the generator network and its original high resolution image, which are mutually antagonistic And get further training.
  • the generator network and the discriminator network are alternately trained, here alternate training in the order of training generator, training discriminator, training generator, training discriminator, etc., one of the generator training steps and one discriminator
  • the training step can be referred to as an iteration.
  • the steps of the training generator and the training discriminator may also be exchanged in order, that is, alternate training by a training discriminator, a training generator, a training discriminator, a training generator, etc., wherein one discriminator training step and one generator training step Can be called an iteration.
  • the alternate training can also be performed in other manners, and is not limited to the above training sequence.
  • the training of the generator network is based on the training result of the discriminator network (ie the training result of the parameters of the discriminator network), and because the discriminator network is trained
  • the process requires the output image of the generator network, so the training of the discriminator network is based on the training results of the generator network (ie the training results of the parameters of the generator network), which is called "confrontation", ie the generator
  • the network and the discriminator network "confront" each other.
  • different low resolution color images and different noise images may be employed.
  • the same noise image may be employed, but with different low resolution color images.
  • the generator cost function may consist of a first item, a second item, and an optional third item, the first item may pass the discriminator based on an output image of the generator network An output of the network, the second item being based on the same degradation processing of the low resolution color image as the output image of the generator network from the high resolution color image to the low resolution color image
  • the difference between the subsequent degraded images may be based on a ratio of the magnitude of the weight of the filter of the convolutional layer included in the parameters of the generator network to the size of the bias of the active layer.
  • the first attempt attempts to maximize the output of the discriminator network, or equivalently attempts to make the output image that is boosted by the generator network look like the original high resolution color image. If only the first item is used, the generator network will find the simplest realistic images that are unrelated to the input image, which are unchanged for those low resolution images. Therefore, it is hoped that not only the first item will be considered to solve the cost function problem of the generator network.
  • the second item emphasizes that the output image of the generator network should be matched to these low-resolution color images as much as possible after extracting the low-resolution color image from the high-resolution color image. To encourage the generator network to find a meaningful solution.
  • the third term is used to improve the results of the generator network by using a larger bias than the weight of the filter.
  • the generator cost function may represent the degree of difference between the output image of the generator network and the high resolution color image, and may have the above-mentioned “confrontation” based on the parameters of the discriminator network due to having the first term "effect.
  • the discriminator cost function may consist of a first item, a second item, and an optional third item, the first item may be subjected to the authentication based on an output image of the generator network An output of the network, the second item being based on an output of the high resolution color image through the discriminator network, the third item being based on an output image of the generator network and the high resolution color The combination of images passes through the output of the discriminator network.
  • the discriminator cost function can represent the extent to which the output image of the generator network corresponds to the high resolution color image.
  • the resolution of the luminance component is increased to the same extent as the resolution of the chrominance component
  • the generator network may comprise any of the following: a first generator network, the pair The luminance component and the chrominance component both have the same number of first enhancement layers; a second generator network having a certain number of first enhancement layers for the luminance components, having a ratio to the chrominance components a first lifting layer having a smaller number of lifting layers and a second lifting layer different from the first lifting layer; a third generator network having a certain number of first lifting layers for the brightness component, The chrominance component has a second lift layer that is different from the first lift layer.
  • the first lifting layer may copy the pixels of the input lifting layer to a plurality of different positions in the output of the lifting layer higher than the resolution of the input pixels to achieve the lifting effect
  • the second lifting layer may be Traditional bicubic interpolation.
  • the structure of the three generator networks can reflect different performance improvements in chroma component resolution and require different computational costs.
  • the structure of the appropriate generator network can be selected based on the different capabilities required and the different computational costs.
  • the lifting layer may be inserted between the convolution layer and the activation layer of the generator network (ie, in the order of a convolution layer, a lifting layer, an activation layer), and the degradation layer may Inserted between the convolutional layer and the active layer of the discriminator network (ie, in the order of the convolutional layer, the degraded layer, and the active layer).
  • the activation layer may employ a conventional rectifying linear unit (ReLU) (as described in FIG. 1), or a switching unit for turning on when the activation condition is satisfied.
  • This "switching unit” does not add a constant in its output.
  • the convolutional neural network will not generate a constant term in the output. This is better for interpolation tasks such as image resolution magnification.
  • the discriminator network may further include an averager for averaging all pixels of the reduced resolution image through the pooled layer to obtain an indication that the input of the discriminator network is
  • the output image of the trained generator network is also an indicator of the high resolution color image.
  • Such an indicator may be a number that more simply indicates whether the input of the discriminator network is the output image of the trained generator network or the high resolution color image.
  • the low resolution color image may be extracted by dividing from a series of high resolution color sample images to obtain a plurality of high resolutions having a size smaller than the high resolution color sample image.
  • Color image degrading a plurality of high resolution color images to obtain a plurality of low resolution color images with reduced resolution.
  • the noise image may be a white noise image, such as uniform noise, Gaussian noise, etc. with a fixed distribution, where each pixel value is a random number that is uncorrelated with its neighboring values.
  • the noise image is not limited to the above-exemplified example, and any image that can provide a certain style can be used here.
  • the pooling layer may include at least one of a degradation layer, a max-pooling, and an average-pooling corresponding to the lifting layer.
  • the pooling layer is not limited to the above-exemplified examples, but can be used as long as it can be pooled to reduce the resolution.
  • the resolution of the output image of the generator network is the same as the resolution of the high resolution color image. In this way, the difference between the output image of the generator network and the high resolution color image can be compared correspondingly.
  • the low resolution color image comprises an average of a plurality of consecutive video frames.
  • two mutually “opposed” networks, a generator network, and a discriminator network are competed and continuously improved in each training based on better and better results of another network. To train more and better and optimal parameters.
  • 3A-3C respectively show schematic diagrams of three alternative types of generator networks in accordance with one embodiment of the present invention.
  • the chrominance components are typically amplified using bicubic or other standard lifting techniques. For larger amplification factors (eg, 6x, 8x, ...), the effects of different processing of the luminance component and the chrominance component can result in visible artifacts.
  • the present disclosure herein uses three alternative configurations to process color images having different qualities and performances. Note that a noise image is also added to the input image. This helps create artificial detail in the output image.
  • the generator network uses three channels Y (luminance component), U and V (chrominance components) of a convolutional neural network with two lifting layers (exemplified in the figure as MUX layers).
  • the CO block corresponds to a conventional convolution layer or an activation layer.
  • the generator network converts RGB (red, green, blue) input images into three channels of Y, U, and V, and then passes through a convolution layer, a lifting layer, an activation layer, a convolution layer, a lifting layer, and an activation. Layers, etc., get an image with improved resolution, and then convert it back to RGB image for output by YUV to RGB.
  • the generator network uses a convolutional neural network with 2 MUX layers to boost only the Y channel.
  • U and V use only one MUX layer, and another MUX layer that enhances the Y channel is replaced by a standard sizing technique such as Bicubic.
  • the generator network also passes through the RGB to YUV at the input and the YUV to RGB at the output, which will not be described here.
  • this generator network uses a convolutional neural network with only 2 MUX layers for the Y channel only. Resolution of the color channels U and V is performed using standard techniques such as Bicubic. The generator network also passes through the RGB to YUV at the input and the YUV to RGB at the output, which will not be described here.
  • the degree of amplification of the resolution of the luminance component is the same as the degree of amplification of the resolution of the chrominance component, ie the output and chrominance components of the luminance component are guaranteed.
  • the resolution of the output is the same so as to be combined into a color output image with the same resolution.
  • FIG. 3B there are two MUX layers in the processing of the luminance component, and 2 ⁇ 2 resolution amplification is performed respectively, and a total of 2*2*2*2 is amplified, and in the processing of the chrominance components.
  • There is only one MUX layer which is first amplified by 2*2, and then passed through the traditional Bicubic layer for another 2*2 amplification.
  • FIG. 3C there are two MUX layers in the processing of the luminance component, and 2*2 resolution amplification is performed respectively, and a total of 2*2*2*2 is amplified, and for the chrominance component, the conventional Bicubic layer for a 4*4 magnification.
  • FIG. 4 illustrates an example of a promotion layer (illustrated as a MUX layer in the figure) in a generator network in accordance with one embodiment of the present invention.
  • the general definition of the MUX layer is as follows:
  • U 1 ,...,U M are the upsampling operators in the feature that replicates pixels to different locations than zero:
  • the number of outputs of the feature is constant, which is equal to c.
  • c denotes the first input terminal and (p, q) denotes the input pixel.
  • FIG. 4 only shows an example of a lifting layer, but the present disclosure is not limited thereto, and other lifting layer embodiments may also be employed.
  • Figure 5 shows a schematic diagram of a discriminator network in accordance with one embodiment of the present invention.
  • the discriminator network shown in Figure 5 includes a degraded layer (example TMUX layer) corresponding to the MUX layer in the generator network, thereby downgrading the high resolution image input to the discriminator network to the input image with the generator network Low resolution images of the same resolution.
  • a degraded layer (example TMUX layer) corresponding to the MUX layer in the generator network, thereby downgrading the high resolution image input to the discriminator network to the input image with the generator network Low resolution images of the same resolution.
  • the discriminator network uses a convolutional neural network to output an image "IQ Map” similar to other image quality metrics (eg, structural similarity index (SSIM)).
  • image quality metrics eg, structural similarity index (SSIM)
  • the average value is obtained by averaging all the pixels in the "IQ Map” as a single digital "IQ index” as an output.
  • the degradation layer is not limited to the degradation layer (exemplary of the TMUX layer) corresponding to the MUX layer in the generator network, and may also be other pooling layers, such as a maximum pooling layer, an average pooling layer, and the like.
  • Figure 6A shows a schematic diagram of the location of a lift layer in a generator network, in accordance with an embodiment of the present invention.
  • a lift layer is interposed between the convolution layer and the active layer. This sequence works better because the output of the convolutional layer is not selected by the active layer until it is amplified by the lift layer before being selected by the active layer.
  • Figure 6B illustrates a schematic representation of the location of a degradation layer in a discriminator network in accordance with an embodiment of the present invention.
  • the degraded layer is inserted between the convolution layer and the active layer, and the order is better because the output of the convolution layer is not selected by the active layer until it is degraded by the degraded layer. .
  • Figures 7A and 7B show two embodiments of an activation layer in a generator network, respectively.
  • Figure 7A shows a standard ReLU as an active layer.
  • i j represents the number of rows and columns of pixels of the input image
  • L represents the first layer
  • n represents the first input terminals.
  • a and b represent coefficients.
  • FIG. 7B shows a switching unit as an active unit in the active layer.
  • switch condition A1, B1
  • A1 otherwise it is B1.
  • switch condition A1, B1
  • a new type of active layer called a "switching unit” does not add a constant to the output.
  • switching unit When a switching unit is used in a convolutional neural network, the convolutional neural network will not generate constant terms in the output. This is better for interpolation tasks such as image resolution magnification.
  • the generator network and the discriminator network include a large number of parameters as a convolutional neural network, for example, the weight (W) of the filter including the convolution layer, the parameter (A) of the boost layer or the degradation layer, and the bias of the active layer.
  • Set (B) One set of parameters (or set of parameters) corresponds to the generator network, and another set of parameters (or set of parameters) corresponds to the discriminator network.
  • the training process is to obtain the optimal parameter set of the generator network and the parameter set of the discriminator network.
  • an image patch is used as training data.
  • An image patch is a subset of a larger image obtained from an image data set, ie, a smaller image is segmented from the larger image as an image patch. For example, from 500 image data sets of 480 x 320 pixels, a set of 30,000 80x80 pixel image patches can be randomly extracted from the image data set, and these image patches can be obtained from random locations within each image of the image data set. A collection of these image patches can be used as a REFERENCE for the original high resolution color image.
  • the set of image patches may be reduced in resolution using a standard degradation algorithm (eg, region, bicubic, etc.) to obtain a set of INPUT (input) image patch examples as an example of a low resolution color image.
  • the generator network will only use the low resolution INPUT image patch as the training data to be entered.
  • the discriminator network will use the output image of the low-resolution INPUT image patch through the generator network and the high-resolution REFERENCE image patch as the training data to be input.
  • the cost function provides a fraction of the convolutional neural network performance that is reduced during the training process.
  • the respective examples (not limitations) of the discriminator cost function and the generator cost function according to various embodiments of the present disclosure are as follows:
  • the discriminator cost function is comprised of a first item, a second item, and an optional third item, the first item being output based on an output image of the generator network through the discriminator network, the The binomial is based on the output of the high resolution color image passing through the discriminator network, the third term being based on the combination of the output image of the generator network and the high resolution color image passing through the discriminator network Output.
  • the first item is
  • D() represents a function of the discriminator network. Therefore, The output image representing the generator network passes through the output of the discriminator network. In the example of the present disclosure, the output is the value selected in -1 and 1. N represents the number of INPUT image patches that are input to the generator network.
  • the second item is
  • N the number of REFERENCE image patches, and is the same as the number of low resolution INPUT image patches. Therefore, The average of the output of the discriminator network representing the high resolution REFERENCE image patch.
  • the second term is based on the output of the discriminator network based on the high resolution color image (REFERENCE image patch).
  • the combination of the first item and the second item represents an average of the output of the N low resolution INPUT image patches through the discriminator network and an average of the output of the high resolution REFERENCE image patch through the discriminator network. difference.
  • the generator cost function consists of a first item, a second item, and an optional third item, the first item being based on an output of the generator network passing through the discriminator network, the second item being based on a difference between the low resolution color image and a degraded image of the output image of the generator network that is subjected to the same degradation process from the high resolution color image to the low resolution color image,
  • the third term is based on the ratio of the size of the weight of the filter of the convolution layer included in the parameters of the generator network to the magnitude of the offset of the active layer.
  • the first item Represents the average of the output of the N low resolution INPUT image patches through the discriminator network.
  • the first attempt attempts to maximize the output of the discriminator network D(), or equivalently attempts to make the output image that is boosted by the generator network look like the original high resolution color image. If only the first item is used, the generator network will find the simplest realistic images that are unrelated to the input image, which are unchanged for those low resolution images. Therefore, it is hoped that not only the first item will be considered to solve the cost function problem of the generator network.
  • the second item emphasizes that the output image of the generator network should be matched to these low-resolution color images as much as possible after extracting the low-resolution color image from the high-resolution color image. To encourage the generator network to find a meaningful solution.
  • the third term is used to improve the results of the generator network by using a larger bias than the weight of the filter.
  • ⁇ 1 and ⁇ 2 are weighting coefficients.
  • the low-resolution INPUT image patch mentioned above is extracted from the original high-resolution REFERENCE image patch by downgrading the original high-resolution REFERENCE image patch to reduce the resolution.
  • it is added as a regularization term ⁇ 1 DownReg(INPUT) in an embodiment of the present disclosure.
  • Output represents the output image generated by the input low-resolution image patch INPUT through the generator network as mentioned before.
  • Downscale (Output (INPUT)) indicates that the output image Output (INPUT) of the generator network is extracted from the high-resolution color image (REFERENCE image patch) by the low-resolution color image (INPUT image patch) Degraded image obtained after the same degradation process.
  • MSE Downscale (Output (INPUT)
  • INPUT INPUT
  • SSIM Downscale (Output (INPUT), INPUT)
  • MSE Downscale (Output (INPUT)
  • INPUT INPUT
  • SSIM Downscale (Output (INPUT), INPUT)
  • MSE Downscale (Output (INPUT)
  • INPUT INPUT
  • SSIM Downscale (Output (INPUT), INPUT)
  • MSE Downscale (Output (INPUT)
  • INPUT INPUT
  • SSIM Downscale (Output
  • MSE Mean Square Error
  • ⁇ XY ⁇ XY - ⁇ X ⁇ Y
  • ⁇ O and ⁇ R are the average values of Output (INPUT) and REFERENCE, respectively.
  • ⁇ O and ⁇ R are the variances of Output(INPUT) and REFERENCE, respectively.
  • ⁇ OR is the covariance of Output(INPUT) and REFERENCE, respectively.
  • H and W indicate that the resolution of the image is H (high) * W (width).
  • the L1 or L1 norm in this math refers to the average absolute value.
  • B biasing parameters
  • W weight of the filter of the convolutional layer. Therefore, it is desirable to apply a larger bias (B) than the weight (W) by using a regularization term:
  • the specific formula examples of the generator cost function and the discriminator cost function are described above. However, this is not a limitation.
  • the generator cost function and the discriminator cost function may take the form of other formulas so that the generator cost function is based on the discriminator network.
  • the parameters (using the output of the discriminator network), and the discriminator cost function is based on the parameters of the generator network (using the output image of the generator network).
  • two mutually “opposed” generator networks and discriminator networks are competed in each iteration based on better and better results of another network, according to an innovative cost function. And continuous improvement to train more and better and optimal parameters.
  • Various embodiments of the present disclosure may also be used to enhance the resolution of a video sequence.
  • An easy way is to raise the video frames one by one.
  • using this strategy can be problematic because the edges of the output frame and the motion of the object can produce visible flicker.
  • the average of several video frames can be used as input:
  • c k is a fixed weight assigned to each frame, and A, B is an integer (positive or negative). According to the study, the linear combination of inputs to the generator network is more likely to produce realistic artificial output, smooth operation, and consistent with the input image.
  • FIG. 8 illustrates an exemplary processing system that can be used to implement the processing methods of the present disclosure.
  • the processing system 1000 includes at least one processor 1002 that executes instructions stored in the memory 1004. These instructions may be, for example, instructions for implementing functions described as being performed by one or more of the above-described modules or instructions for implementing one or more of the above methods.
  • the processor 1002 can access the memory 1004 through the system bus 1006. In addition to storing executable instructions, the memory 1004 can also store training data and the like.
  • the processor 1002 can be a variety of computing capable devices such as a central processing unit (CPU) or a graphics processing unit GPU.
  • the CPU can be an X86 or ARM processor; the GPU can be integrated directly into the motherboard or embedded in the Northbridge of the motherboard or built into the central processing unit (CPU).
  • Processing system 1000 also includes a data store 1008 that is accessible by processor 1002 via system bus 1006.
  • Data store 1008 can include executable instructions, multiple image training data, and the like.
  • Processing system 1000 also includes an input interface 1010 that allows an external device to communicate with processing system 1000.
  • input interface 1010 can be used to receive instructions from an external computer device, from a user, and the like.
  • Processing system 1000 can also include an output interface 1012 that interfaces processing system 1000 with one or more external devices.
  • processing system 1000 can display images and the like through output interface 1012. It is contemplated that external devices that communicate with processing system 1000 through input interface 1010 and output interface 1012 can be included in an environment that provides a user interface with which virtually any type of user can interact.
  • Examples of user interface types include graphical user interfaces, natural user interfaces, and the like.
  • the graphical user interface can accept input from a user's input device(s), such as a keyboard, mouse, remote control, etc., and provide output on an output device such as a display.
  • the natural language interface may enable a user to interact with the processing system 1000 in a manner that is not subject to constraints imposed by input devices such as keyboards, mice, remote controls, and the like.
  • natural user interfaces may rely on speech recognition, touch and stylus recognition, gesture recognition on and near the screen, aerial gestures, head and eye tracking, speech and speech, vision, touch, gestures, and machine intelligence.
  • processing system 1000 is illustrated as a single system, it will be appreciated that the processing system 1000 can also be a distributed system, and can also be arranged as a cloud facility (including a public cloud or a private cloud). Thus, for example, several devices can communicate over a network connection and can collectively perform tasks that are described as being performed by processing system 1000.
  • Computer readable media includes computer readable storage media.
  • the computer readable storage medium can be any available storage medium that can be accessed by a computer.
  • such computer readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage device, or can be used to carry or store instructions or data structures. Any other medium that expects program code and can be accessed by a computer.
  • Computer readable media also includes communication media including any medium that facilitates transfer of a computer program from one place to another.
  • the connection can be, for example, a communication medium.
  • the software uses coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave to transmit from a web site, server, or other remote source, then
  • the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of communication medium. Combinations of the above should also be included within the scope of computer readable media.
  • the functions described herein may be performed at least in part by one or more hardware logic components.
  • illustrative types of hardware logic components include Field Programmable Gate Array (FPGA), Program Specific Integrated Circuit (ASIC), Program Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种用于卷积神经网络的处理方法、系统(1000)和计算机可读介质,该处理方法包括训练生成器和训练鉴别器,其中,训练生成器包括:从高分辨率彩色图像中提取低分辨率彩色图像(S201);利用所述低分辨率彩色图像和噪声图像作为输入图像,基于鉴别器网络的参数,通过减小生成器成本函数来训练生成器网络的参数(S202),其中:所述生成器网络包括提升层,用于对输入图像的亮度分量和色度分量提升分辨率;所述生成器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像之间的差异的程度;训练鉴别器包括:将所述训练后的生成器网络的输出图像和所述高分辨率彩色图像分别输入到鉴别器网络(S203);通过减小鉴别器成本函数来训练所述鉴别器网络的参数(S204),其中:所述鉴别器网络包括池化层,用于降低分辨率;所述鉴别器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像对应的程度。

Description

用于卷积神经网络的处理方法和系统、和存储介质
本申请要求于2017年5月8日提交的中国专利申请第201710318147.9的优先权,该中国专利申请的全文通过引用的方式结合于此以作为本申请的一部分。
技术领域
本发明的实施例涉及图像处理领域,且更具体地涉及一种用于卷积神经网络的处理方法和系统、和存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如图像分类、图像捕获和搜索、面部识别、年龄和语音识别等领域取得了巨大进展。卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来并引起广泛重视的一种人工神经网络,其可以用于分类和超分辨领域。
发明内容
本发明的至少一个实施例提供了一种用于卷积神经网络的处理方法,包括训练生成器和训练鉴别器,其中,训练生成器包括:从高分辨率彩色图像中提取低分辨率彩色图像;利用所述低分辨率彩色图像和噪声图像作为输入图像,基于鉴别器网络的参数,通过减小生成器成本函数来训练生成器网络的参数,其中:所述生成器网络包括提升层,用于对输入图像的亮度分量和色度分量提升分辨率;所述生成器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像之间的差异的程度;训练鉴别器包括:将所述训练后的生成器网络的输出图像和所述高分辨率彩色图像分别输入到鉴别器网络;通过减小鉴别器成本函数来训练所述鉴别器网络的参数,其中:所述鉴别器网络包括池化层,用于降低分辨率;所述鉴别器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像对应的程度。
根据本发明实施例,所述的处理方法还包括交替进行所述训练生成器网 络和训练鉴别器网络的步骤。
根据本发明实施例,所述生成器成本函数由第一项、第二项和可选的第三项组成,所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项基于所述低分辨率彩色图像、与所述生成器网络的输出图像的经过从所述高分辨率彩色图像到所述低分辨率彩色图像相同的降级处理后的降级图像之间的差异,所述第三项基于所述生成器网络的参数中包括的卷积层的过滤器的权重的大小与激活层的偏置的大小的比例。
根据本发明实施例,所述鉴别器成本函数由第一项、第二项、以及可选的第三项组成,所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项基于所述高分辨率彩色图像经过所述鉴别器网络的输出,所述第三项基于所述生成器网络的输出图像和所述高分辨率彩色图像的组合经过所述鉴别器网络的输出。
根据本发明实施例,所述亮度分量的分辨率提升与所述色度分量的分辨率提升的程度相同,其中所述生成器网络包括如下中的任一:第一生成器网络,其对所述亮度分量和所述色度分量都具有相同数量的第一提升层;第二生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有比第一提升层的数量更少的第一提升层、以及与所述第一提升层不同的第二提升层;第三生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有与所述第一提升层不同的第二提升层。
根据本发明实施例,所述提升层插入在所述生成器网络的卷积层和激活层之间,且所述降级层插入在所述鉴别器网络的卷积层和激活层之间,其中,所述生成器网络的参数包括所述生成器网络中的所述卷积层的过滤器的权重、所述激活层的偏置、和所述提升层的提升参数,其中,所述鉴别器网络的参数包括所述鉴别器网络中的所述激活层的偏置、所述卷积层的过滤器的权重、和所述降级层的降级参数,其中,在初始化时,所述生成器网络的参数和所述鉴别器网络的参数是预定的或随机的。
根据本发明实施例,所述激活层是用于在激活条件满足的情况下开启的开关单元。
根据本发明实施例,所述鉴别器网络还包括平均器,用于平均化通过所述池化层降低分辨率后的图像的所有像素,以获得指示所述鉴别器网络的输 入是所述训练后的生成器网络的输出图像还是所述高分辨率彩色图像的指标。
根据本发明实施例,提取所述低分辨率彩色图像包括:从一系列高分辨率彩色样本图像中分割以得到尺寸小于所述高分辨率彩色样本图像的多个高分辨率彩色图像;对所述多个高分辨率彩色图像进行降级处理以得到分辨率降低的多个低分辨率彩色图像。
根据本发明实施例,其中,所述噪声图像是白噪声图像。
根据本发明实施例,所述提升层将输入提升层的像素复制到比输入的像素的分辨率更高的提升层的输出中的多个不同位置处。
根据本发明实施例,所述池化层包括与所述提升层对应的降级层、最大值池化层、平均值池化层中的至少一种。
根据本发明实施例,所述生成器网络的输出图像的分辨率与所述高分辨率彩色图像的分辨率相同。
根据本发明实施例,所述低分辨率彩色图像是多个连续视频帧的平均值。
本发明的至少一个实施例提供了一种用于卷积神经网络的处理系统,包括:一个或多个处理器;一个或多个存储器,其中存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器执行时进行根据本发明的至少一个实施例中的任一所述的处理方法。
本发明的至少一个实施例提供了一种非暂时性计算机存储介质,其中存储了计算机可读代码,所述计算机可读代码当由一个或多个处理器执行时进行根据本发明的至少一个实施例中的任一所述的处理方法。
本发明引入了一种新的卷积神经网络系统架构,并提供了用于卷积神经网络的处理方法和系统、和存储介质。通过有效利用彩色信息,提高内部卷积网络架构和训练策略,提高了分辨率放大的图像的质量,并且通过对抗网络方法取代了传统训练策略,在此基础上,包括必要的附加噪声输入,以便允许人工产生细节。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例, 而非对本公开的限制。
图1示出了一种卷积神经网络的简单示意图。
图2A示出了根据本发明的一个实施例的用于卷积神经网络的处理方法中训练生成器的流程图。
图2B示出了根据本发明的一个实施例的用于卷积神经网络的处理方法中训练鉴别器的流程图。
图3A-图3C分别示出了根据本发明的一个实施例的生成器网络的三种可选类型的示意图。
图4示出了根据本发明的一个实施例的生成器网络中的提升层的一个例子。
图5示出了根据本发明的一个实施例的鉴别器网络的示意图。
图6A示出了根据本发明的实施例的提升层在生成器网络中的位置的示意图。
图6B示出了根据本发明的实施例的降级层在鉴别器网络中的位置的示意图。
图7A和图7B分别示出了生成器网络中的激活层的两种实施方式。
图8示出了可以用于实现本公开的处理方法的示例性处理系统。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非 限定于物理的或者机械的连接,而是可以包括电性连接或信号连接,不管是直接的还是间接的。
信息技术市场在过去5年里在深度学习领域已经有了巨大的投资。今天这种技术的主要用途是解决人工智能(AI)问题,如:推荐引擎、图像分类、图像字幕和搜索、面部识别、年龄识别、语音识别等。一般来说,深度学习技术已经成功地解决了类似数据的理解,例如描述图像的内容或识别困难的图像中的对象条件,或在嘈杂环境中识别语音。深度学习的另一个优点是其通用结构,允许相对类似的系统解决非常不同的问题。与其前代方法相比,神经网络、深层学习结构在过滤器和层的数量上大得多。
卷积神经网络是神经网络的一种特殊结构,其主要用途大致分为两类:第一、分类(classification);第二、超分辨率(super-resolution)。对于分类问题,输入是高分辨率数据(例如,图像或视频),并且输出是低分辨率数据(例如,标签、对象的位置等)。对于这种特殊情况,目前深入学习取得了很大的进展。例如,深度学习结构中最常见的层之一是所谓的最大池化层(max-pooling layer),通过取相邻像素中的最大值来减小特征图像的分辨率。另一方面,超分辨率问题需要高分辨率数据作为输入(图像),并将其尺寸增加到更大量的数据(更高分辨率),即提升了分辨率。这完全改变了深度学习结构的设计。
深度学习系统的主要组成部分是卷积网络。卷积网络是使用图像作为输入/输出并且通过滤波器(卷积)来替换标量权重的神经网络结构。图1示出了一种卷积神经网络的简单示意图,其具有3层的简单结构。该卷积神经网络例如用于图像处理,使用图像作为输入和输出,例如通过滤波器(即,卷积层)替代标量权重。如图1所示,该结构在左侧的四个输入端子的一个卷积层101处获取4个输入图像,在中心的激活层102处具有3个单位(输出图像),并且在另一卷积层103处具有2个单位,产生2个输出图像。具有权重
Figure PCTCN2018073434-appb-000001
的每个框对应于过滤器(例如,3×3或5×5内核),其中k是指示输入层编号的标签,i和j分别是指示输入和输出单元的标签。偏置(bias) 是添加到卷积的输出的标量。加入几个卷积和偏置的结果然后通过激活层的 激活盒(activity box),激活盒通常对应于整流线性单元(rectifying linear unit,ReLU)或S形函数或双曲正切等。滤波器的权重和偏置在系统的操作期间是固定的,通过使用一组输入/输出示例图像的训练过程获得,并且被调整以适合取决于应用的一些优化准则。典型的配置涉及每层中的十分之一或数百个过滤器。具有3层的网络被认为是浅的,而大于5或10的层的数目通常被认为是深的。
目前关于超分辨率问题的深度学习研究通过在卷积神经网络中加入传统的生成器(upscaler)作为第一阶段(例如双三次插值(bicubic)),避免了增加输入维度的问题,然后应用不减小或增加特征和输出图像的大小的深度学习结构。
本公开引入了一种新的卷积神经网络系统架构,通过有效利用彩色信息,提高内部卷积网络架构和训练策略,提高了分辨率放大的图像的质量。本公开通过对抗网络方法取代了传统训练策略,且包括必要的附加噪声输入,以便允许人工产生细节。
根据本发明的实施例的对抗网络使用两个卷积神经网络系统,即:所谓的“生成器(generator)”,它是一种生成器网络;和所谓的“鉴别器(discriminator)”网络,用于评估放大了分辨率的图像的质量。在本公开中,“鉴别器”可以接收彩色图像作为输入,并输出例如,-1或1的数字。如果输出为1,则“鉴别器”认为彩色图像对应于原始的高分辨率内容。如果输出为-1,则“鉴别器”会认为彩色图像是经过生成器网络的处理后的输出。训练生成器来最大化“鉴别器”的输出,以使该输出变得尽量真实。并且对“鉴别器”进行训练,以准确地区分原始高分辨率内容和处理后的内容。两个网络交替地进行训练步骤,从而相互竞争,并获得最佳参数。
图2A示出了根据本发明的一个实施例的用于卷积神经网络的处理方法中训练生成器的流程图。如图2A所示,在步骤S201,利用高分辨率彩色图像,从中提取低分辨率彩色图像,以用于训练生成器网络。
接着,在步骤S202,利用所述低分辨率彩色图像和噪声图像作为输入图像,基于鉴别器网络的参数,通过减小生成器成本函数来训练生成器网络的参数,以使得所述生成器网络的输出图像与所述高分辨率彩色图像的差异减小。其中所述生成器网络包括提升层以对输入图像的亮度分量和色度分量提 升分辨率,所述生成器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像之间的差异的程度。
图2B示出了根据本发明的一个实施例的用于卷积神经网络的处理方法中训练鉴别器的流程图。如图2B所示,在步骤S203,将所述训练后的生成器网络的输出图像和所述高分辨率彩色图像分别输入到鉴别器网络。
接着,在步骤S204,通过减小鉴别器成本函数来训练所述鉴别器网络的参数,以使得鉴别器网络输出指示所述鉴别器网络的输入是所述生成器网络的输出图像还是所述高分辨率彩色图像的指标。其中所述鉴别器网络包括池化层以降低分辨率,例如,所述池化层可以是与生成器网络中提升层相对应的降级层。所述鉴别器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像对应的程度。
根据本发明的实施例,所述生成器网络和鉴别器网络都是卷积神经网络的形式,且具有卷积神经网络的各个参数。例如,所述生成器网络的参数可以包括所述生成器网络中的所述卷积层的过滤器的权重、所述激活层的偏置、和所述提升层的提升参数,其中,所述鉴别器网络的参数可以包括所述鉴别器网络中的所述激活层的偏置、所述卷积层的过滤器的权重、和所述降级层的降级参数。且在初始化时,所述生成器网络的参数和所述鉴别器网络的参数可以是预定的或随机的。
根据本发明的实施例,将噪声图像也输入到生成器网络可以增加生成器网络的输出图像中的人造细节,且可以在每次训练中产生有变化的人造细节。
根据本发明实施例,利用不同的低分辨彩色图像和噪声图像交替进行如上所述的训练生成器网络和训练鉴别器网络的步骤,通过不断减小所述生成器成本函数和所述鉴别器成本函数来得到最终的生成器网络的参数和鉴别器网络的参数。经过训练的生成器网络努力生成使得鉴别器网络输出接近1的输出图像,经过训练的鉴别器网络努力区分由生成器网络生成的图像和其原始的高分辨率图像,两者之间通过相互对抗而得到进一步地训练。
如此,生成器网络和鉴别器网络被交替训练,在此举例了按训练生成器、训练鉴别器、训练生成器、训练鉴别器等的顺序进行交替训练,其中一个生成器训练步骤和一个鉴别器训练步骤可以被称为一次迭代。或者,训练生成器和训练鉴别器的步骤也可以交换顺序,即按训练鉴别器、训练生成器、训 练鉴别器、训练生成器等进行交替训练,其中一个鉴别器训练步骤和一个生成器训练步骤可以被称为一次迭代。所述交替训练还可以按照其他的方式进行,并不局限于上述的训练顺序。
因为在上述训练生成器网络的过程是基于鉴别器网络的参数进行的,因此生成器网络的训练基于鉴别器网络的训练结果(即鉴别器网络的参数的训练结果),而因为训练鉴别器网络的过程需要用到生成器网络的输出图像,因此鉴别器网络的训练又基于生成器网络的训练结果(即生成器网络的参数的训练结果),这种方式称为“对抗”,即生成器网络和鉴别器网络相互“对抗”。这种方法使得两个相互“对抗”的网络在每次迭代中基于另一网络的越来越好的结果而进行竞争和不断改进,以训练得到越来越优以至于最优的参数。
在本发明的一个实施例中,可以采用不同的低分辨率彩色图像和不同的噪声图像。在根据本发明的其他实施例中,也可以采用相同的噪声图像,但采用不同的低分辨率彩色图像。
在一个实施例中,所述生成器成本函数可以由第一项、第二项和可选的第三项组成,所述第一项可以基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项可以基于所述低分辨率彩色图像、与所述生成器网络的输出图像的经过从所述高分辨率彩色图像到所述低分辨率彩色图像相同的降级处理后的降级图像之间的差异,所述第三项可以基于所述生成器网络的参数中包括的卷积层的过滤器的权重的大小与激活层的偏置(bias)的大小的比例。
第一项尝试最大化鉴别器网络的输出,或者等效地对鉴别器网络尝试使经过生成器网络提升的输出图像看起来像原始的高分辨率彩色图像。如果只使用该第一项,生成器网络将会找到与输入图像无关的最简单的逼真图像,它们是不变的那些低分辨率图像。所以,希望不仅考虑该第一项来解决生成器网络的成本函数问题。第二项强调了将所述生成器网络的输出图像按从所述高分辨率彩色图像提取出所述低分辨率彩色图像的相同的降级处理后应该尽可能匹配这些低分辨率彩色图像,这样来促使生成器网络找到有意义的解决方案。第三项用于通过与过滤器的权重相比使用大小更大的偏置来改善生成器网络的结果。通常,卷积神经网络具有很大的偏置(B)是方便的,因 为它们将特征分离成彼此独立处理的类,而这些特征的值取决于过滤器的权重(W),因此在第三项中,希望相比于权重施加更大的偏置。
如此,生成器成本函数可以表示所述生成器网络的输出图像与所述高分辨率彩色图像之间的差异的程度,且由于具有第一项而可以基于鉴别器网络的参数,实现上述“对抗”效果。
在一个实施例中,所述鉴别器成本函数可以由第一项、第二项、以及可选的第三项组成,所述第一项可以基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项可以基于所述高分辨率彩色图像经过所述鉴别器网络的输出,所述第三项可以基于所述生成器网络的输出图像和所述高分辨率彩色图像的组合经过所述鉴别器网络的输出。
如此,鉴别器成本函数可以表示所述生成器网络的输出图像与所述高分辨率彩色图像对应的程度。
在一个实施例中,所述亮度分量的分辨率提升与所述色度分量的分辨率提升的程度相同,其中所述生成器网络可以包括如下中的任一:第一生成器网络,其对所述亮度分量和所述色度分量都具有相同数量的第一提升层;第二生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有比第一提升层的数量更少的第一提升层、以及与所述第一提升层不同的第二提升层;第三生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有与所述第一提升层不同的第二提升层。
例如,第一提升层可以将输入提升层的像素复制到比输入的像素的分辨率更高的提升层的输出中的多个不同位置处,来实现提升的效果,而第二提升层可以是传统的双三次插值(bicubic)方式。
如此,三种生成器网络的结构可以体现对色度分量的分辨率提升的不同性能以及需要不同的计算成本。因此,可以基于需要的不同性能和不同的计算成本来选择合适的生成器网络的结构。
在一个实施例中,所述提升层可以插入在所述生成器网络的卷积层和激活层之间(即,以卷积层、提升层、激活层的顺序),且所述降级层可以插入在所述鉴别器网络的卷积层和激活层之间(即,以卷积层、降级层、激活层的顺序)。
如此,这样的顺序的效果更好,因为卷积层的输出不被激活层选择直到 被提升层放大后才被激活层选择。
在一个实施例中,所述激活层可以采用传统的整流线性单元(rectifying linear unit,ReLU)(如图1所述),也可以采用用于在激活条件满足的情况下开启的开关单元。这种“开关单元”在其输出中不添加常量。当在卷积神经网络中使用这样的开关单元时,卷积神经网络将不会在输出中生成常量项。这对于诸如图像分辨率放大的插值任务更好。
在一个实施例中,所述鉴别器网络还可以包括平均器,用于平均化通过所述池化层降低分辨率后的图像的所有像素,以获得指示所述鉴别器网络的输入是所述训练后的生成器网络的输出图像还是所述高分辨率彩色图像的指标。这样的指标可以是一个数字,如此更简单明了地指示所述鉴别器网络的输入是所述训练后的生成器网络的输出图像还是所述高分辨率彩色图像。
在一个实施例中,所述低分辨率彩色图像可以是通过如下步骤提取的:从一系列高分辨率彩色样本图像中分割以得到尺寸小于所述高分辨率彩色样本图像的多个高分辨率彩色图像;将多个高分辨率彩色图像进行降级处理以得到分辨率降低的多个低分辨率彩色图像。所述噪声图像可以是白噪声图像,例如具有固定分布的均匀(uniform)噪声、高斯噪声等,其中每个像素值是与其相邻值不相关的随机数。当然,噪声图像也不限于上述举例的例子,只要是可以提供某种风格的图像,都可以在此使用。
在一个实施例中,所述池化层可以包括与所述提升层对应的降级层、最大值池化层(Max-pooling)、平均值池化层(Average-Pooling)中的至少一种。当然,池化层不限于上述举例的例子,而是只要可以池化以降低分辨率的都可以采用。
在一个实施例中,所述生成器网络的输出图像的分辨率与所述高分辨率彩色图像的分辨率相同。如此,可以对应地比较生成器网络的输出图像和高分辨率彩色图像的差异。
在一个实施例中,在对包括若干连续视频帧的视频序列进行分辨率的提升的情况下,所述低分辨率彩色图像包括多个连续视频帧的平均值。
综上,根据本公开的各个实施例,使得两个相互“对抗”的网络、生成器网络和鉴别器网络在每次训练中基于另一网络的越来越好的结果而进行竞争和不断改进,以训练得到越来越优以至于最优的参数。
图3A-图3C分别示出了根据本发明的一个实施例的生成器网络的三种可选类型的示意图。
传统的大多数超分辨率网络仅对彩色图像的亮度分量(YUV中的Y分量)进行特殊提升。而色度分量(U和V通道)通常使用双三次或其他标准提升技术进行分辨率的放大。对于较大的放大因子(例如6x,8x,...),对亮度分量和色度分量的不同处理的影响可导致可见的伪像。这里本公开使用3种可选的配置来处理具有不同质量和性能的彩色图像。要注意,在输入图像中还增加了噪声图像。这有助于在输出图像中产生人造细节。
下面还将结合图3A-图3C来更详细地描述根据本公开的各个实施例的具体细节。
在图3A中,该生成器网络使用三个通道Y(亮度分量)、U和V(色度分量)的具有2个提升层(图中示例为MUX层)的卷积神经网络。其中,CO块对应于传统的卷积层或激活层。具体地,该生成器网络将RGB(红色、绿色、蓝色)输入图像转换为Y、U、V三个通道,再经过卷积层、提升层、激活层、卷积层、提升层、激活层等,得到分辨率提升的图像,再通过YUV转RGB来将其转换回RGB图像以供输出。
在图3B中,该生成器网络使用具有2个MUX层的卷积神经网络来仅对Y通道进行提升。U和V仅采用1个MUX层,而通过标准的大小调整技术(如Bicubic)替代对Y通道进行提升的另一个MUX层。该生成器网络也要经过输入处的RGB转YUV以及输出处的YUV转RGB,在此不赘述。
在图3C中,此生成器网络仅对Y通道使用具有2个MUX层的卷积神经网络。对于颜色通道U和V采用诸如Bicubic的标准技术进行分辨率的放大。该生成器网络也要经过输入处的RGB转YUV以及输出处的YUV转RGB,在此不赘述。
当然,在这三种可选的生成器网络中,对亮度分量的分辨率的放大的程度与对色度分量的分辨率的放大的程度是相同的,即保证亮度分量的输出和色度分量的输出的分辨率是相同的,以便组合为具有相同分辨率的彩色输出图像。例如,在图3B中,对于亮度分量的处理中有两个MUX层,分别进行2*2的分辨率放大,则一共进行2*2*2*2的放大,而对于色度分量的处理中仅具有一个MUX层,先进行2*2的放大,再经过传统的Bicubic层,以进 行另一次2*2的放大。而在图3C中,对于亮度分量的处理中有两个MUX层,分别进行2*2的分辨率放大,则一共进行2*2*2*2的放大,而对于色度分量,经过传统的Bicubic层,以进行一次4*4的放大。
图4示出了根据本发明的一个实施例的生成器网络中的提升层(在图中示例为MUX层)的一个例子。
具体地,如图4所示,该MUX层的因子M=M x×M y增加了输入特征到输出特征的分辨率。其中:
所述输入特征为:
Figure PCTCN2018073434-appb-000003
所述输出特征为:
Figure PCTCN2018073434-appb-000004
该MUX层的通用定义如下:
首先,U 1,…,U M为将像素复制到不同位置处的大于零的特征中的增采样算子:
Figure PCTCN2018073434-appb-000005
其中%是“模数”运算符,并且
Figure PCTCN2018073434-appb-000006
是<x的最大整数,使得n=M ya+b+1。MUX层需要输入特征的数目是M的倍数,也就是说,具有整数G的c=G·M。特征的输出数目不变,即等于c。其中,c表示第几个输入端子,(p,q)表示输入的像素。特征以M个特征的组被处理,因此将该组中的输入和输出划分为:x=[x 1...x G]和=[y 1...y G]。然后, MUX层的输出可以写为:
y 1=U 1x 1+…+U Mx M
y 2=U 2x 1+…+U 1x M
……
y G=U Mx 1+…+U Mx M
图4的例子中,M y=M x=2(M=4)。
如此,通过添加提升层(MUX层)从而提升分辨率。
当然,图4仅示出了一个提升层的例子,但是本公开不限于此,还可以采用其他提升层的实施方式。
图5示出了根据本发明的一个实施例的鉴别器网络的示意图。
图5所示的鉴别器网络包括与生成器网络中的MUX层相对应的降级层(示例为TMUX层),从而将输入到鉴别器网络的高分辨率图像降级为与生成器网络的输入图像相同的分辨率的低分辨率图像。
鉴别器网络使用卷积神经网络以输出类似于其他图像质量度量(如,结构相似性(structural similarity index,SSIM))的图像“IQ地图(Map)”。通过对“IQ地图”中的所有像素平均后获得平均值作为单个数字“IQ指标”作为输出。
当然,降级层不限于与生成器网络中的MUX层相对应的降级层(示例为TMUX层),还可以是其他池化层,例如最大值池化层、平均值池化层等等。
图6A示出了根据本发明的实施例的提升层在生成器网络中的位置的示意图。
如图6A所示,提升层被插入在卷积层和激活层之间。这样的顺序的效果更好,因为卷积层的输出不被激活层选择直到被提升层放大后才被激活层选择。
图6B示出了根据本发明的实施例的降级层在鉴别器网络中的位置的示 意图。
如图6B所示,降级层被插入在卷积层和激活层之间,这样的顺序的效果更好,因为卷积层的输出不被激活层选择直到被降级层降级后才被激活层选择。
图7A和图7B分别示出了生成器网络中的激活层的两种实施方式。
首先,图7A示出了标准的ReLU作为激活层。其中,i,j表示输入图像的像素点的行数和列数,L表示第几层,n表示第几个输入端子。这里的a、b表示系数。
而图7B示出了一种开关单元作为激活层中的激活单元。在开关单元中,如果“条件”为真,则switch(条件A1,B1)等于A1,否则为B1。其次,称为“开关单元”的新型的激活层不会在输出中增加常量。当在卷积神经网络中使用开关单元时,卷积神经网络将不会在输出中生成常量项。这对于诸如图像分辨率放大的插值任务更好。
下面详细介绍生成器网络和鉴别器网络的训练过程中使用的成本函数。
注意,生成器网络和鉴别器网络由于作为卷积神经网络都包括大量参数,例如,包括卷积层的滤波器的权重(W),提升层或降级层的参数(A)和激活层的偏置(B)。一组参数(或称为参数集)对应于生成器网络,另一组参数(或称为参数集)对应于鉴别器网络。而训练过程是为了得到最优的生成器网络的参数集和鉴别器网络的参数集。
在如本公开的各个实施例所描述的“对抗”网络中,“生成器网络”和“鉴别器网络”独立完成参数搜索过程(或称为训练过程)。在本公开中,使用图像补丁(image patch)作为训练数据。图像补丁是从图像数据集获得的较大图像的子集,即从较大图像中分割出更小的图像作为图像补丁。例如,从500幅480x320像素的图像数据集中,可以从该图像数据集中随机提取一组30000个80x80像素的图像补丁,这些图像补丁可以从图像数据集的各图像内的随机位置中获取。这些图像补丁的集合可用作原始的高分辨率彩色图像的REFERENCE(参考)。该图像补丁的集合可以使用标准降级算法(例如,区域、双三次插值(bicubic)等)进行分辨率的缩小,以获得作为低分辨率彩色图像的示例的一组INPUT(输入)图像补丁示例。生成器网络将仅使用低分辨率的INPUT图像补丁作为要输入的训练数据。而鉴别器网络将使 用低分辨率的INPUT图像补丁经过生成器网络后的输出图像、和高分辨率的REFERENCE(参考)的图像补丁作为要输入的训练数据。
另外,还为“生成器网络”和“鉴别器网络”指定各自的成本函数。成本函数为卷积神经网络性能提供了在训练过程减少的一个分数。根据本公开的各个实施例的鉴别器成本函数和生成器成本函数的各自的例子(而非限制)如下:
1)鉴别器成本函数:
Figure PCTCN2018073434-appb-000007
所述鉴别器成本函数由第一项、第二项、以及可选的第三项组成,所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项基于所述高分辨率彩色图像经过所述鉴别器网络的输出,所述第三项基于所述生成器网络的输出图像和所述高分辨率彩色图像的组合经过所述鉴别器网络的输出。
在此,第一项为
Figure PCTCN2018073434-appb-000008
这里,
Figure PCTCN2018073434-appb-000009
是输入的低分辨率的图像补丁INPUT经过生成器网络而生成的输出图像。而D()代表鉴别器网络的函数。因此,
Figure PCTCN2018073434-appb-000010
代表生成器网络的输出图像经过所述鉴别器网络的输出。在本公开的例子中,该输出是在-1和1中选择的取值。N代表输入到生成器网络的INPUT图像补丁的数量。
因此,
Figure PCTCN2018073434-appb-000011
代表N个低分辨率INPUT图像补丁经过所述鉴别器网络的输出的平均值。
第二项为
Figure PCTCN2018073434-appb-000012
x是原始的高分辨率的REFERENCE图像补丁,N代表REFERENCE图 像补丁的数量,且与低分辨率的INPUT图像补丁的数量是相同的。因此,
Figure PCTCN2018073434-appb-000013
表示高分辨率的REFERENCE图像补丁经过所述鉴别器网络的输出的平均值。
该第二项基于所述高分辨率彩色图像(REFERENCE图像补丁)经过所述鉴别器网络的输出。
而第一项和第二项的组合表示N个低分辨率INPUT图像补丁经过所述鉴别器网络的输出的平均值与高分辨率的REFERENCE图像补丁经过所述鉴别器网络的输出的平均值的差异。
而第三项中的
Figure PCTCN2018073434-appb-000014
即低分辨率的图像补丁INPUT经过生成器网络而生成的输出图像和所述高分辨率彩色图像(REFERENCE图像补丁)的组合(其中,ε~U[0,1]是一个在训练期间随机变化的数字)。
Figure PCTCN2018073434-appb-000015
表示梯度运算。这些图像
Figure PCTCN2018073434-appb-000016
的分辨率为H(高)*W(宽度)。f表示R、G、B中的某个通道,3则表示R、G、B的3个通道。而λ则为加权系数。该第三项是参考了随机梯度下降(Stochastic gradient descent,SGD)的详细定义进行的改进。
2)生成器成本函数:
Figure PCTCN2018073434-appb-000017
生成器成本函数由第一项、第二项和可选的第三项组成,所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输出,所述第二项基于所述低分辨率彩色图像、与所述生成器网络的输出图像的经过从所述高分辨率彩色图像到所述低分辨率彩色图像相同的降级处理后的降级图像之间的差异,所述第三项基于所述生成器网络的参数中包括的卷积层的过滤器的权重的大小与激活层的偏置的大小的比例。
这里,第一项
Figure PCTCN2018073434-appb-000018
代表N个低分辨率INPUT图像补丁经过所述鉴别器网络的输出的平均值。
第一项尝试最大化鉴别器网络D()的输出,或者等效地对鉴别器网络尝试使经过生成器网络提升的输出图像看起来像原始的高分辨率彩色图像。如果只使用该第一项,生成器网络将会找到与输入图像无关的最简单的逼真图像,它们是不变的那些低分辨率图像。所以,希望不仅考虑该第一项来解决生成器网络的成本函数问题。第二项强调了将所述生成器网络的输出图像按从所述高分辨率彩色图像提取出所述低分辨率彩色图像的相同的降级处理后应该尽可能匹配这些低分辨率彩色图像,这样来促使生成器网络找到有意义的解决方案。第三项用于通过与过滤器的权重相比使用大小更大的偏置来改善生成器网络的结果。通常,卷积神经网络具有很大的偏置(B)是方便的,因为它们将特征分离成彼此独立处理的类,而这些特征的值取决于过滤器的权重(W),因此在第三项中,希望相比于权重施加更大的偏置。
而λ 1和λ 2则为加权系数。
下面详细介绍第二项λ 1DownReg(INPUT)和第三项λ 2WBratioReg(W G,b G)。
1)重现降级处理:
上述提到的低分辨率的INPUT图像补丁是通过对原始的高分辨率REFERENCE图像补丁进行降级处理来降低分辨率,来从原始的高分辨率REFERENCE图像补丁提取的。为了在鉴别器网络的输出中强制施加这一属性,在本公开的实施例中将其添加为正则化项λ 1DownReg(INPUT)。
DownReg(INPUT)=MSE(Downscale(Output(INPUT)),INPUT)
DownReg(INPUT)=1-SSIM(Downscale(Output(INPUT),INPUT)
其中,Output(INPUT)如之前提到地代表输入的低分辨率的图像补丁INPUT经过生成器网络而生成的输出图像。Downscale(Output(INPUT))表示将所述生成器网络的输出图像Output(INPUT)按从所述高分辨率彩色图像(REFERENCE图像补丁)提取出所述低分辨率彩色图像(INPUT图像补丁)的相同的降级处理后得到的降级图像。而MSE(Downscale(Output(INPUT)),INPUT)和SSIM(Downscale(Output(INPUT),INPUT)表示上述得到的降级图像与低分辨率彩色图像(INPUT图像补丁)之间的差异的程度。MSE()和SSIM()的函数是传统的均方误差(MSE)和结构相似性(SSIM)技术。下面具体地详细描述MSE()和SSIM()的定义:
均方误差(MSE):
Figure PCTCN2018073434-appb-000019
结构相似性(SSIM):
Figure PCTCN2018073434-appb-000020
其中,
Figure PCTCN2018073434-appb-000021
(在上述公式中,X=O(表示Output(INPUT)或X=R(表示REFERENCE))。
Figure PCTCN2018073434-appb-000022
σ XY=μ XYXμ Y
其中C1和C2是系数,例如C1=(0.01) 2且C2=(0.03) 2
其中μ O和μ R分别是Output(INPUT)和REFERENCE的平均值。而σ O和σ R分别是Output(INPUT)和REFERENCE的方差。而σ OR分别是Output(INPUT)和REFERENCE的协方差。这里的H、W表示图像的分辨率为H(高)*W(宽度)。
2)权重偏置比的L1范数:
在此数学中的L1或L1范数是指平均绝对值。通常,卷积神经网络具有很大的偏置参数(B)是方便的,因为它们将特征分离成彼此独立处理的类。这些特征的值取决于卷积层的过滤器的权重(W)。因此,希望通过使用正则化项来相比于权重(W)施加更大的偏置(B):
Figure PCTCN2018073434-appb-000023
其中
Figure PCTCN2018073434-appb-000024
(
Figure PCTCN2018073434-appb-000025
即求所有层L、特征F和过滤器元素N×M上的总和),以及
Figure PCTCN2018073434-appb-000026
(
Figure PCTCN2018073434-appb-000027
即,求在所有层L和特征F上的总和),eps是固定的小数,以防止除法中无穷大或无穷小的情况。例如eps=1e-6。
以上介绍了生成器成本函数和鉴别器成本函数的具体的公式示例,然而,这并非限制,生成器成本函数和鉴别器成本函数可以采用其他公式的形式, 以便生成器成本函数基于鉴别器网络的参数(利用了鉴别器网络的输出),且鉴别器成本函数基于生成器网络的参数(利用了生成器网络的输出图像)。
由于生成器成本函数和鉴别器成本函数的目的是为了不断减小到最小化以便获得最优的生成器网络的参数集和鉴别器网络的参数集,因此最后,可采用标准随机梯度降序(SGD)算法、动量SGD、Adam、RMSProp、AdaDelta、Wasserstein生成对抗网络(Wasserstein Generative Adversarial Network,WGAN)及其改进算法等来获得最优的参数集。这些算法均为已有的算法,在此不赘述其详细原理。
如此,根据本公开的各个实施例,根据创新的成本函数,使得两个相互“对抗”的生成器网络和鉴别器网络在每次迭代中基于另一网络的越来越好的结果而进行竞争和不断改进,以训练得到越来越优以至于最优的参数。
视频提升和帧率上转换
本公开的各个实施例还可用于提升视频序列的分辨率。一个简单的方法是逐一提升视频帧。对于大的提升因子(分辨率提升倍数大),使用此策略可能出现问题,因为输出帧中的边缘和对象的运动可能产生可见的闪烁。
在一个实施例中,可以使用几个视频帧的平均值作为输入:
Figure PCTCN2018073434-appb-000028
这里,c k是分配给每个帧的固定权重,A,B是整数(正或负)。根据研究,向生成器网络输入的线性组合更容易产生逼真的人造输出、平稳的运行、且与输入的图像一致。
图8示出了可以用于实现本公开的处理方法的示例性处理系统。
该处理系统1000包括执行存储在存储器1004中的指令的至少一个处理器1002。这些指令可以是例如用于实现被描述为由上述一个或多个模块执行的功能的指令或用于实现上述方法中的一个或多个步骤的指令。处理器1002可以通过系统总线1006访问存储器1004。除了存储可执行指令,存储器1004还可存储训练数据等。处理器1002可以为中央处理器(CPU)或图形处理器GPU等各种具有计算能力的器件。该CPU可以为X86或ARM处理器;GPU可单独地直接集成到主板上,或者内置于主板的北桥芯片中,也可以内置于 中央处理器(CPU)上。
处理系统1000还包括可由处理器1002通过系统总线1006访问的数据存储1008。数据存储1008可包括可执行指令、多图像训练数据等。处理系统1000还包括允许外部设备与处理系统1000进行通信的输入接口1010。例如,输入接口1010可被用于从外部计算机设备、从用户等处接收指令。处理系统1000也可包括使处理系统1000和一个或多个外部设备相接口的输出接口1012。例如,处理系统1000可以通过输出接口1012显示图像等。考虑了通过输入接口1010和输出接口1012与处理系统1000通信的外部设备可被包括在提供实质上任何类型的用户可与之交互的用户界面的环境中。用户界面类型的示例包括图形用户界面、自然用户界面等。例如,图形用户界面可接受来自用户采用诸如键盘、鼠标、遥控器等之类的(诸)输入设备的输入,以及在诸如显示器之类的输出设备上提供输出。此外,自然语言界面可使得用户能够以无需受到诸如键盘、鼠标、遥控器等之类的输入设备强加的约束的方式来与处理系统1000交互。相反,自然用户界面可依赖于语音识别、触摸和指示笔识别、屏幕上和屏幕附近的手势识别、空中手势、头部和眼睛跟踪、语音和语音、视觉、触摸、手势、以及机器智能等。
另外,处理系统1000尽管图中被示出为单个系统,但可以理解,处理系统1000也可以是分布式系统,还可以布置为云设施(包括公有云或私有云)。因此,例如,若干设备可以通过网络连接进行通信并且可共同执行被描述为由处理系统1000执行的任务。
本文中描述的各功能(包括但不限于卷积神经网络模块、选择模块等)可在硬件、软件、固件或其任何组合中实现。如果在软件中实现,则这些功能可以作为一条或多条指令或代码存储在(非暂时性)计算机可读介质上或藉其进行传送。计算机可读介质包括计算机可读存储介质。计算机可读存储介质可以是能被计算机访问的任何可用存储介质。作为示例而非限定,这样的计算机可读介质可包括RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储或其他磁存储设备、或能被用来承载或存储指令或数据结构形式的期望程序代码且能被计算机访问的任何其他介质。另外,所传播的信号不被包括在计算机可读存储介质的范围内。计算机可读介质还包括通信介质,其包括促成计算机程序从一地向另一地转移的任何介质。连接例如可以是通 信介质。例如,如果软件使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)、或诸如红外线、无线电、以及微波之类的无线技术来从web网站、服务器、或其它远程源传输,则该同轴电缆、光纤电缆、双绞线、DSL、或诸如红外线、无线电、以及微波之类的无线技术被包括在通信介质的定义中。上述的组合应当也被包括在计算机可读介质的范围内。替换地或另选地,此处描述的功能可以至少部分由一个或多个硬件逻辑组件来执行。例如,可使用的硬件逻辑组件的说明性类型包括现场可编程门阵列(FPGA)、程序专用的集成电路(ASIC)、程序专用的标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑器件(CPLD)等。
以上所述仅是本公开的示范性实施方式,而非用于限制本公开的保护范围,本公开的保护范围由所附的权利要求确定。

Claims (16)

  1. 一种用于卷积神经网络的处理方法,包括训练生成器和训练鉴别器,其中,训练生成器包括:
    从高分辨率彩色图像中提取低分辨率彩色图像;
    利用所述低分辨率彩色图像和噪声图像作为输入图像,基于鉴别器网络的参数,通过减小生成器成本函数来训练生成器网络的参数,其中:
    所述生成器网络包括提升层,用于对输入图像的亮度分量和色度分量提升分辨率;所述生成器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像之间的差异的程度;
    训练鉴别器包括:
    将所述训练后的生成器网络的输出图像和所述高分辨率彩色图像分别输入到鉴别器网络;
    通过减小鉴别器成本函数来训练所述鉴别器网络的参数,其中:
    所述鉴别器网络包括池化层,用于降低分辨率;所述鉴别器成本函数表示所述生成器网络的输出图像与所述高分辨率彩色图像对应的程度。
  2. 根据权利要求1所述的处理方法,还包括交替进行所述训练生成器网络和训练鉴别器网络的步骤。
  3. 根据权利要求1或2所述的处理方法,其中,所述生成器成本函数由第一项、第二项和可选的第三项组成,
    所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输出;
    所述第二项基于所述低分辨率彩色图像、与所述生成器网络的输出图像的经过从所述高分辨率彩色图像到所述低分辨率彩色图像相同的降级处理后的降级图像之间的差异;
    所述第三项基于所述生成器网络的参数中包括的卷积层的过滤器的权重的大小与激活层的偏置的大小的比例。
  4. 根据权利要求1-3任一项所述的处理方法,其中,所述鉴别器成本函数由第一项、第二项、以及可选的第三项组成,
    所述第一项基于所述生成器网络的输出图像经过所述鉴别器网络的输 出;
    所述第二项基于所述高分辨率彩色图像经过所述鉴别器网络的输出;
    所述第三项基于所述生成器网络的输出图像和所述高分辨率彩色图像的组合经过所述鉴别器网络的输出。
  5. 根据权利要求1-4任一项所述的处理方法,其中所述亮度分量的分辨率提升与所述色度分量的分辨率提升的程度相同,其中所述生成器网络包括如下中的任一:
    第一生成器网络,其对所述亮度分量和所述色度分量都具有相同数量的第一提升层;
    第二生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有比第一提升层的数量更少的第一提升层、以及与所述第一提升层不同的第二提升层;
    第三生成器网络,其对所述亮度分量具有一定数量的第一提升层,对所述色度分量具有与所述第一提升层不同的第二提升层。
  6. 根据权利要求1-5任一项所述的处理方法,其中:
    所述提升层插入在所述生成器网络的卷积层和激活层之间,且所述降级层插入在所述鉴别器网络的卷积层和激活层之间;
    所述生成器网络的参数包括所述生成器网络中的所述卷积层的过滤器的权重、所述激活层的偏置、和所述提升层的提升参数;
    所述鉴别器网络的参数包括所述鉴别器网络中的所述激活层的偏置、所述卷积层的过滤器的权重、和所述降级层的降级参数;
    在初始化时,所述生成器网络的参数和所述鉴别器网络的参数是预定的或随机的。
  7. 根据权利要求5所述的处理方法,其中,所述激活层是用于在激活条件满足的情况下开启的开关单元。
  8. 根据权利要求1-7任一项所述的处理方法,其中,所述鉴别器网络还包括平均器,用于平均化通过所述池化层降低分辨率后的图像的所有像素,以获得指示所述鉴别器网络的输入是所述训练后的生成器网络的输出图像还是所述高分辨率彩色图像的指标。
  9. 根据权利要求1-8任一项所述的处理方法,其中,提取所述低分辨率 彩色图像包括:
    从一系列高分辨率彩色样本图像中分割以得到尺寸小于所述高分辨率彩色样本图像的多个高分辨率彩色图像;
    对所述多个高分辨率彩色图像进行降级处理以得到分辨率降低的多个低分辨率彩色图像。
  10. 根据权利要求1-9任一项所述的处理方法,其中,所述噪声图像是白噪声图像。
  11. 根据权利要求1-10任一项所述的处理方法,其中,所述提升层将输入提升层的像素复制到比输入的像素的分辨率更高的提升层的输出中的多个不同位置处。
  12. 根据权利要求1-11任一项所述的处理方法,其中,所述池化层包括与所述提升层对应的降级层、最大值池化层、平均值池化层中的至少一种。
  13. 根据权利要求1-12任一项所述的处理方法,其中,所述生成器网络的输出图像的分辨率与所述高分辨率彩色图像的分辨率相同。
  14. 根据权利要求1-13任一项所述的处理方法,其中,所述低分辨率彩色图像是多个连续视频帧的平均值。
  15. 一种用于卷积神经网络的处理系统,包括:
    一个或多个处理器;
    一个或多个存储器,其中存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器执行时进行如权利要求1-14中任一所述的处理方法。
  16. 一种非暂时性计算机存储介质,其中存储了计算机可读代码,所述计算机可读代码当由一个或多个处理器执行时进行如权利要求1-14中任一所述的处理方法。
PCT/CN2018/073434 2017-05-08 2018-01-19 用于卷积神经网络的处理方法和系统、和存储介质 WO2018205676A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/073,195 US11537873B2 (en) 2017-05-08 2018-01-19 Processing method and system for convolutional neural network, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710318147.9 2017-05-08
CN201710318147.9A CN107122826B (zh) 2017-05-08 2017-05-08 用于卷积神经网络的处理方法和系统、和存储介质

Publications (1)

Publication Number Publication Date
WO2018205676A1 true WO2018205676A1 (zh) 2018-11-15

Family

ID=59728186

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/073434 WO2018205676A1 (zh) 2017-05-08 2018-01-19 用于卷积神经网络的处理方法和系统、和存储介质

Country Status (3)

Country Link
US (1) US11537873B2 (zh)
CN (1) CN107122826B (zh)
WO (1) WO2018205676A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109524020A (zh) * 2018-11-20 2019-03-26 上海海事大学 一种语音增强处理方法
CN111353585A (zh) * 2020-02-25 2020-06-30 北京百度网讯科技有限公司 神经网络模型的结构搜索方法和装置
CN112699777A (zh) * 2020-12-29 2021-04-23 成都信息工程大学 一种基于卷积神经网络的盲信号调制类型识别方法
WO2021104381A1 (en) * 2019-11-27 2021-06-03 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and device for stylizing video and storage medium
US11328184B2 (en) 2017-11-09 2022-05-10 Boe Technology Group Co., Ltd. Image classification and conversion method and device, image processor and training method therefor, and medium
CN117455774A (zh) * 2023-11-17 2024-01-26 武汉大学 一种基于差分输出的图像重建方法及系统
CN117455774B (zh) * 2023-11-17 2024-05-14 武汉大学 一种基于差分输出的图像重建方法及系统

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122826B (zh) 2017-05-08 2019-04-23 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107633223A (zh) * 2017-09-15 2018-01-26 深圳市唯特视科技有限公司 一种基于深层对抗网络的视频人体属性识别方法
CN109523493A (zh) * 2017-09-18 2019-03-26 杭州海康威视数字技术股份有限公司 一种图像生成方法、装置及电子设备
CN111226257B (zh) * 2017-09-22 2024-03-01 豪夫迈·罗氏有限公司 组织图像中的伪像移除
CN107613299A (zh) * 2017-09-29 2018-01-19 杭州电子科技大学 一种利用生成网络提高帧速率上转换效果的方法
CN108021863B (zh) * 2017-11-01 2022-05-06 平安科技(深圳)有限公司 电子装置、基于图像的年龄分类方法及存储介质
CN107767384B (zh) * 2017-11-03 2021-12-03 电子科技大学 一种基于对抗训练的图像语义分割方法
CN107801026B (zh) 2017-11-09 2019-12-03 京东方科技集团股份有限公司 图像压缩方法及装置、图像压缩及解压缩系统
CN107767408B (zh) * 2017-11-09 2021-03-12 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN107730474B (zh) 2017-11-09 2022-02-22 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN107767343B (zh) * 2017-11-09 2021-08-31 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN108122209B (zh) * 2017-12-14 2020-05-15 浙江捷尚视觉科技股份有限公司 一种基于对抗生成网络的车牌去模糊方法
CN109754357B (zh) * 2018-01-26 2021-09-21 京东方科技集团股份有限公司 图像处理方法、处理装置以及处理设备
CN108399422A (zh) * 2018-02-01 2018-08-14 华南理工大学 一种基于wgan模型的图像通道融合方法
CN108846817B (zh) * 2018-06-22 2021-01-12 Oppo(重庆)智能科技有限公司 图像处理方法、装置以及移动终端
CN109034208B (zh) * 2018-07-03 2020-10-23 怀光智能科技(武汉)有限公司 一种高低分辨率组合的宫颈细胞切片图像分类系统
CN109101992B (zh) * 2018-07-04 2022-02-22 北京市商汤科技开发有限公司 图像匹配方法、装置和计算机可读存储介质
CN110956575B (zh) 2018-09-26 2022-04-12 京东方科技集团股份有限公司 转变图像风格的方法和装置、卷积神经网络处理器
US11514313B2 (en) * 2018-09-27 2022-11-29 Google Llc Sampling from a generator neural network using a discriminator neural network
CN110163237B (zh) * 2018-11-08 2023-03-14 腾讯科技(深圳)有限公司 模型训练及图像处理方法、装置、介质、电子设备
WO2020107265A1 (zh) * 2018-11-28 2020-06-04 深圳市大疆创新科技有限公司 神经网络处理装置、控制方法以及计算系统
CN109740505B (zh) * 2018-12-29 2021-06-18 成都视观天下科技有限公司 一种训练数据生成方法、装置及计算机设备
CN109461188B (zh) * 2019-01-30 2019-04-26 南京邮电大学 一种二维x射线头影测量图像解剖特征点自动定位方法
CN111767979B (zh) * 2019-04-02 2024-04-23 京东方科技集团股份有限公司 神经网络的训练方法、图像处理方法、图像处理装置
CN110123367B (zh) * 2019-04-04 2022-11-15 平安科技(深圳)有限公司 计算机设备、心音识别装置、方法、模型训练装置及存储介质
CN110009702B (zh) * 2019-04-16 2023-08-04 聊城大学 用于智能喷药机器人的美国白蛾幼虫网幕图像定位方法
US11521011B2 (en) * 2019-06-06 2022-12-06 Samsung Electronics Co., Ltd. Method and apparatus for training neural network model for enhancing image detail
CN110570355B (zh) * 2019-09-12 2020-09-01 杭州海睿博研科技有限公司 多尺度自动聚焦超分辨率处理系统和方法
WO2022047625A1 (zh) * 2020-09-01 2022-03-10 深圳先进技术研究院 一种图像处理方法、系统和计算机存储介质
WO2022088089A1 (zh) * 2020-10-30 2022-05-05 京东方科技集团股份有限公司 图像处理方法、图像处理装置、电子设备及可读存储介质
WO2023070448A1 (zh) * 2021-10-28 2023-05-04 京东方科技集团股份有限公司 视频处理方法、装置、电子设备和可读存储介质
CN116188255A (zh) * 2021-11-25 2023-05-30 北京字跳网络技术有限公司 基于gan网络的超分图像处理方法、装置、设备及介质
CN114757832B (zh) * 2022-06-14 2022-09-30 之江实验室 基于交叉卷积注意力对抗学习的人脸超分辨方法和装置
CN115546848B (zh) * 2022-10-26 2024-02-02 南京航空航天大学 一种对抗生成网络训练方法、跨设备掌纹识别方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127702A (zh) * 2016-06-17 2016-11-16 兰州理工大学 一种基于深度学习的图像去雾算法
CN106296692A (zh) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 基于对抗网络的图像显著性检测方法
US20170031920A1 (en) * 2015-07-31 2017-02-02 RCRDCLUB Corporation Evaluating performance of recommender system
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6172718B1 (en) * 1998-04-17 2001-01-09 S3 Incorporated Adaptive dynamic aperture correction
CN108074215B (zh) 2016-11-09 2020-04-14 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
CN107169927B (zh) 2017-05-08 2020-03-24 京东方科技集团股份有限公司 一种图像处理系统、方法及显示装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170031920A1 (en) * 2015-07-31 2017-02-02 RCRDCLUB Corporation Evaluating performance of recommender system
CN106127702A (zh) * 2016-06-17 2016-11-16 兰州理工大学 一种基于深度学习的图像去雾算法
CN106296692A (zh) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 基于对抗网络的图像显著性检测方法
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, KUNFENG ET AL.: "Generative Adversarial Networks: The State of the Art and Beyond", ACTA AUTOMATICA SINICA, vol. 43, no. 3, 31 March 2017 (2017-03-31), pages 322 - 329, XP055612268 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11328184B2 (en) 2017-11-09 2022-05-10 Boe Technology Group Co., Ltd. Image classification and conversion method and device, image processor and training method therefor, and medium
CN109524020A (zh) * 2018-11-20 2019-03-26 上海海事大学 一种语音增强处理方法
WO2021104381A1 (en) * 2019-11-27 2021-06-03 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and device for stylizing video and storage medium
CN111353585A (zh) * 2020-02-25 2020-06-30 北京百度网讯科技有限公司 神经网络模型的结构搜索方法和装置
CN112699777A (zh) * 2020-12-29 2021-04-23 成都信息工程大学 一种基于卷积神经网络的盲信号调制类型识别方法
CN117455774A (zh) * 2023-11-17 2024-01-26 武汉大学 一种基于差分输出的图像重建方法及系统
CN117455774B (zh) * 2023-11-17 2024-05-14 武汉大学 一种基于差分输出的图像重建方法及系统

Also Published As

Publication number Publication date
CN107122826B (zh) 2019-04-23
US11537873B2 (en) 2022-12-27
US20210209459A1 (en) 2021-07-08
CN107122826A (zh) 2017-09-01

Similar Documents

Publication Publication Date Title
WO2018205676A1 (zh) 用于卷积神经网络的处理方法和系统、和存储介质
CN109472270B (zh) 图像风格转换方法、装置及设备
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
US11593916B2 (en) Image super-resolution method and apparatus
WO2019120110A1 (zh) 图像重建方法及设备
WO2022199583A1 (zh) 图像处理方法、装置、计算机设备和存储介质
US20210272246A1 (en) Method, system, and computer-readable medium for improving quality of low-light images
CN110738605A (zh) 基于迁移学习的图像去噪方法、系统、设备及介质
CN112419151B (zh) 图像退化处理方法、装置、存储介质及电子设备
JP2023504669A (ja) 画像処理方法、スマート機器及びコンピュータプログラム
CN106663314A (zh) 实时皮肤平滑图像增强滤波器
WO2021115242A1 (zh) 一种超分辨率图像处理方法以及相关装置
US20210256657A1 (en) Method, system, and computer-readable medium for improving quality of low-light images
WO2023284401A1 (zh) 图像美颜处理方法、装置、存储介质与电子设备
WO2023000895A1 (zh) 图像风格转换方法、装置、电子设备和存储介质
CN110717953A (zh) 基于cnn-lstm组合模型的黑白图片的着色方法和系统
CN111079864A (zh) 一种基于优化视频关键帧提取的短视频分类方法及系统
CN113095470A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN115482529A (zh) 近景色水果图像识别方法、设备、存储介质及装置
Liu et al. Single image super-resolution using a deep encoder–decoder symmetrical network with iterative back projection
CN113096023A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN113538304B (zh) 图像增强模型的训练方法及装置、图像增强方法及装置
WO2022099710A1 (zh) 图像重建方法、电子设备和计算机可读存储介质
WO2023273515A1 (zh) 目标检测方法、装置、电子设备和存储介质
CN116469172A (zh) 一种多时间尺度下的骨骼行为识别视频帧提取方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18797537

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 100620)

122 Ep: pct application non-entry in european phase

Ref document number: 18797537

Country of ref document: EP

Kind code of ref document: A1