WO2019091459A1 - 图像处理方法、处理装置和处理设备 - Google Patents

图像处理方法、处理装置和处理设备 Download PDF

Info

Publication number
WO2019091459A1
WO2019091459A1 PCT/CN2018/114848 CN2018114848W WO2019091459A1 WO 2019091459 A1 WO2019091459 A1 WO 2019091459A1 CN 2018114848 W CN2018114848 W CN 2018114848W WO 2019091459 A1 WO2019091459 A1 WO 2019091459A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
training
output
resolution
Prior art date
Application number
PCT/CN2018/114848
Other languages
English (en)
French (fr)
Inventor
刘瀚文
那彦波
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to JP2020526028A priority Critical patent/JP7438108B2/ja
Priority to EP18876502.8A priority patent/EP3709255A4/en
Publication of WO2019091459A1 publication Critical patent/WO2019091459A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Definitions

  • the present disclosure relates to image processing, and more particularly to an image processing method, a processing apparatus, and a processing apparatus.
  • An embodiment of the present disclosure provides an image processing method, including: performing image conversion processing on the input image according to an input image and a first noise image by using a generated neural network to output a converted first output image; using a super-resolution nerve The network performs high resolution conversion processing on the first output image according to the first output image and the second noise image to output a second output image.
  • the input image includes a first color channel, a second color channel, and a third color channel;
  • the first noise image includes N channels, N is a positive integer greater than or equal to 1, the second
  • the noise image includes M channels, M is a positive integer greater than or equal to 1;
  • the input of the generated neural network includes a first noise image channel and a first color channel, a second color channel, and a third color channel of the input image;
  • the output of the generated neural network is a first output image that includes a first color channel, a second color channel, and a third color channel.
  • the generating neural network includes one or more downsampling modules, one or more residual modules, and one or more upsampling modules, wherein: the downsampling module includes convolution layers connected in sequence a downsampling layer and an instance normalization layer; the residual module includes a convolution layer and an instance normalization layer connected in sequence; the upsampling module includes an upsampling layer, an instance normalization layer, and a convolution layer sequentially connected, where The number of the upsampling modules is equal to the number of the downsampling modules.
  • an input of the super-resolution neural network includes a second noise image channel and a first color channel, a second color channel, and a third color channel of the first output image; the super-resolution neural network
  • the output is a second output image that includes a first color channel, a second color channel, and a third color channel.
  • the super-resolution neural network includes a lifting module and a transformation module that are sequentially connected, and performing high-resolution conversion processing using the super-resolution neural network includes: using the lifting module to the first output image and the The second noise image is subjected to upsampling processing, and outputs a first intermediate image including a luminance channel, a first color difference channel, and a second color difference channel; and the first intermediate image outputted by the lifting module is converted into the first color channel by using the transformation module a second output image of the second color channel and the third color channel.
  • the promotion module includes a first sub-network, a second sub-network, and a third sub-network, wherein: the input of each sub-network is a first output image and a second noise image; each sub-network has the same
  • the structure contains the same number of convolution and lift layers.
  • the image processing method further includes: generating, by the generated neural network, a second training output image according to the first training image and the first training noise image; and using the generated neural network according to the first And training a second training output image, wherein the second training noise image is different from the first training noise image; based on the first training image, the first training output image, and the second training The output image trains the generated neural network.
  • training the generating a neural network includes: inputting the first training output image to an authentication neural network, and outputting whether the first training output image has an authentication tag of a conversion feature; Outputting an image to the authentication neural network, outputting whether the second training output image has an authentication tag of the conversion feature; using the first loss calculation unit according to the first training image, the first training output image, the second training output image, and The corresponding authentication tag calculates the loss value of the generated neural network, and optimizes the parameters of the generated neural network.
  • the first loss calculation unit includes an analysis network, a first loss calculator, and an optimizer
  • calculating the loss value of the generated neural network by using the first loss calculation unit includes: outputting the first by using an analysis network a content feature of the training image, the first training output image, the second training output image, and a style feature of the first training output image and the second training output image outputted by the analysis network; extracted by the first loss calculator according to the analysis network a content feature, a style feature, and an authentication tag of the first training output image and the second training output image calculate a loss value of the generated neural network according to a first loss function; using a optimizer to generate a loss according to the generated neural network The value optimizes the parameters of the generated neural network.
  • the first loss function includes a style difference loss function
  • calculating the loss value of the generated neural network includes: utilizing the first loss calculator according to a style characteristic and a first
  • the style feature of the second training output image calculates the style loss value of the generated neural network according to the style difference loss function.
  • the first loss function further includes a content loss function
  • calculating the loss value of the generated neural network includes: calculating, according to the content loss function, the content features of the first training image, the first training output image, and the second training output image The generating a content loss value of the neural network.
  • the image processing method further includes extracting a low resolution image as a super-resolution training image from the first sample image, the resolution of the super-resolution training image being lower than that of the first sample image Resolution; outputting a second sample image according to the super-resolution training image and the super-resolution training noise image by using a super-resolution neural network, the resolution of the second sample image being equal to the resolution of the first sample image;
  • the sample image and the second sample image optimize parameters of the super-resolution neural network by reducing the cost function of the super-resolution neural network.
  • An embodiment of the present disclosure further provides an image processing apparatus, including: a neural network module configured to perform image conversion processing on the input image according to an input image and a first noise image to output a converted first output image;
  • the super-resolution neural network module is configured to perform high-resolution conversion processing on the first output image according to the first output image and the second noise image to output the second output image.
  • the input image includes a first color channel, a second color channel, and a third color channel
  • the input of the generating neural network module includes a first noise image channel and a first color channel of the input image, a second color channel and a third color channel
  • the output of the generating neural network module is a first output image comprising a first color channel, a second color channel, and a third color channel.
  • the generating neural network module includes one or more downsampling modules, one or more residual modules, and one or more upsampling modules, wherein: the downsampling module includes convolutions connected in sequence a layer, a downsampling layer, and an instance normalization layer; the residual module includes a convolution layer and an instance normalization layer connected in sequence; the upsampling module includes an upsampling layer, an instance normalization layer, and a convolution layer sequentially connected, The number of upsampling modules is equal to the number of the downsampling modules.
  • an input of the super-resolution neural network module includes a second noise image channel and a first color channel, a second color channel, and a third color channel of the first output image; the super-resolution neural network
  • the output of the module is a second output image that includes a first color channel, a second color channel, and a third color channel.
  • the super-resolution neural network module includes a lifting module and a transformation module that are sequentially connected: the lifting module is configured to perform upsampling processing on the first output image and the second noise image, and output including brightness a first intermediate image of the channel, the first color difference channel, and the second color difference channel; the transform module configured to transform the first intermediate image output by the boost module into a first color channel, a second color channel, and a third color channel The second output image.
  • the lifting module includes a first sub-network, a second sub-network, and a third sub-network.
  • the inputs of the first sub-network, the second sub-network, and the third sub-network are a first output image and a second noise image
  • the output image has 3 channels, including a luminance channel, and a first a color difference channel and a second color difference channel
  • the first sub-network, the second sub-network, and the third sub-network have the same structure, and each includes at least one lifting sub-module, each of the lifting sub-modules including one or Multiple convolution layers and one lifting layer.
  • the image processing apparatus further includes a training neural network module configured to train the generated neural network module according to the output image of the generated neural network module.
  • the generating neural network module further outputs the converted first training output image according to the first training image and the first training noise image; the generating neural network module further outputs the conversion according to the first training image and the second training noise image a second training output image, wherein the second training noise image is different from the first training noise image; the training neural network module is based on the first training image, the first training output image, and the second training output image Train the generated neural network module.
  • the training neural network module includes: an authentication neural network module configured to output an authentication tag of whether the first training output image and the second training output image have a conversion feature; the first loss calculation unit is configured Optimizing parameters of the generated neural network module for calculating a loss value of the generated neural network module according to the first training image, the first training output image, the second training output image, and a corresponding authentication tag, wherein the A loss calculation unit includes: an analysis network configured to output content features of the first training image, the first training output image, and the second training output image; a first loss calculator configured to extract content according to the analysis network a feature, a genre feature, and an authentication tag of the first training output image and the second training output image calculate a loss value of the generated neural network module according to a first loss function; an optimizer configured to generate a neural The loss value of the network module optimizes the parameters of the generated neural network module.
  • the first loss function includes a style difference loss function for calculating a style loss value of the generated neural network module according to a style feature of the first training output image and a style feature of the second training output image;
  • the first loss function further includes a content loss function for calculating a content loss value of the generated neural network module according to content features of the first training image, the first training output image, and the second training output image.
  • the training neural network module is further configured to train the super-resolution neural network module according to the output of the super-resolution neural network, the super-resolution neural network module also based on the super-resolution training image and the acquired super-resolution
  • the training noise image outputs a second sample image, wherein the super-resolution training image is a low-resolution image extracted from the first sample image
  • the training neural network module further comprises: a second authentication neural network module, The method is configured to output an authentication tag based on the first sample image and the second sample image; wherein the optimizer optimizes parameters of the super-resolution neural network module by reducing a cost function of the super-resolution neural network module.
  • Embodiments of the present disclosure also provide an image processing apparatus including one or more processors and one or more memories.
  • the memory stores computer readable code that, when executed by the one or more processors, performs the image processing method described above or implements the image processing device described above.
  • FIG. 1 is a flowchart showing an example of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2 shows an example structural diagram of a neural network for implementing the image processing method described in FIG. 1;
  • FIG. 3 is a block diagram showing a specific example structure of the generated neural network in FIG. 2;
  • Figure 4 shows an example schematic of a lifting layer
  • FIG. 5 is a schematic diagram showing an example structure of the super-resolution neural network of FIG. 2;
  • FIG. 6 is a block diagram showing a specific example structure of the super-resolution neural network of FIG. 5;
  • Figure 7 shows an example flow diagram of a training generated neural network
  • Figure 8 shows an example block diagram of a training generated neural network
  • FIG. 9 shows a specific example structural diagram of an analysis network
  • FIG. 10 shows a specific example structural diagram of an authentication neural network
  • Figure 11 shows an example flow diagram for training a super-resolution neural network
  • FIG. 12 shows a specific example structural diagram of a second authentication neural network
  • FIG. 13 is a block diagram showing a schematic example of an image processing apparatus provided by an embodiment of the present disclosure.
  • FIG. 14 shows a schematic example block diagram of an image processing device provided by an embodiment of the present disclosure.
  • Embodiments of the present disclosure provide an image processing method, a processing apparatus, and a processing apparatus for implementing image conversion.
  • the image processing method, processing apparatus, and processing apparatus perform image conversion based on a generated neural network, a super-resolution neural network, and content perception.
  • the detail information of the converted image is generated by adding a noise image to the input.
  • the content feature loss function is used to train the generated neural network to ensure that the converted output image has content consistency with the input image, and the neural network is trained by using the style difference loss function between the processing results to ensure the diversity between the output results.
  • the system is simple and easy to train. On this basis, the resolution of the transformed image generated by the neural network is improved by using the super-resolution neural network, and the high-resolution converted image is obtained to meet the demand for image resolution of the product.
  • step S110 an input image to be subjected to image conversion processing is acquired, the input image as a raw information including a first color channel, a second color channel, and
  • the third color channel in some embodiments of the present disclosure, is an RGB three channel, although the disclosure is not limited thereto.
  • step S120 a first noise image and a second noise image are acquired, wherein the first noise image includes N channels, and N is a positive integer greater than or equal to 1. In some embodiments, the first noise image may be different from the second noise image.
  • N may be 1, that is, the first noise image is input as a fourth channel to the generated neural network together with the RGB channel information of the input image.
  • the noise may be random noise such as Gaussian noise.
  • N may also be 3, and an input image containing noise information is generated by adding 3 channels of the first noise image to the RGB channels of the original image desired to be subjected to image conversion processing, respectively.
  • the generating neural network performs image conversion processing on the original image according to the input image. This situation is not described in this specification. Since each input noise image contains random noise, multiple image processing operations performed by the same set of generated neural networks according to the same input image can obtain conversion results with different detail information, that is, bring about diversity of conversion results. In addition, the order in which the input image is acquired and the noise image is acquired on the flow does not affect the image processing result.
  • the acquired input image is input to the generated neural network along with the first noise image (eg, in some embodiments, depending on the particular implementation of the generated neural network, the input image may be superimposed with the first noise image and
  • the single image data is input to the generated neural network, and the input image and the data in the first noise image may be separately input into the neural network as different data channels, and the image processing operation (for example, image conversion processing) is completed.
  • the generated neural network outputs a first output image subjected to image conversion processing, the first output image having 3 channels, which is an RGB three channel in the embodiment of the present disclosure, but the present disclosure is not limited thereto.
  • the generated neural network can implement different image processing through different training processes, such as image style, scene, season, effect or image conversion based on other features.
  • the first output image of the generated neural network output is input to the super-resolution neural network along with the second noise image (eg, in some embodiments, depending on the specific implementation of the super-resolution neural network, An output image is superimposed with the second noise image and input to the super-resolution neural network as a single image data, and the data in the first output image and the second noise image may also be input into the super-resolution neural network as different data channels respectively) And completing a high-resolution conversion process to improve the resolution of the first output image, wherein the second noise image includes M channels, and M is a positive integer greater than or equal to 1.
  • M may be 1.
  • the second noise image is input as a separate channel to the super-resolution neural network for generating image detail information during the super-resolution conversion process.
  • M may also be 3, and a first output image containing noise information is generated by adding 3 channels of the second noise image to the RGB channels of the first output image, respectively.
  • the super-resolution neural network performs resolution enhancement processing on the first output image, and such a situation is not described in this specification.
  • step S160 the super-resolution neural network outputs a second output image with improved resolution. Since the super-resolution neural network combines the information of the second noise image in the process of performing the enhanced resolution processing, multiple image processing operations performed by the same set of super-resolution neural networks according to the same input image can be obtained differently. The output of the detailed information further brings about the diversity of the conversion results.
  • FIG. 2 An example structural diagram of a neural network for implementing the above image processing method is shown in FIG. 2, which mainly includes two parts: a neural network and a super-resolution neural network, and FIG. 3 shows the neural network generated in FIG.
  • the generated neural network will be described in detail below with reference to FIGS. 2 and 3.
  • the input of the generated neural network includes three channels (features) of the input image, such as specifically including a first color channel, a second color channel, and a third color channel, which are RGB in the embodiment of the present disclosure.
  • the three channels further include the first noise image.
  • the output of the generated neural network is a first output image having 3 channels, which is an RGB three channel in the embodiment of the present disclosure, but the present disclosure is not limited thereto.
  • the generating neural network includes one or more downsampling modules, one or more residual modules, and one or more upsampling modules.
  • the depth of the generated neural network is determined by the number of the downsampling module, the residual module, and the downsampling module, and is determined according to a specific conversion application.
  • the number of downsampling modules and upsampling modules may be the same to ensure that the output image has the same image size as the input image.
  • the downsampling module is configured to perform convolution processing on the input image and the noise image to extract image features and reduce the size of the feature image.
  • the residual module further processes the feature image output by the downsampling module by convolution without changing the feature image size.
  • the upsampling module is configured to perform amplification and normalization processing on the feature image output by the residual module, and output an output image after converting the feature.
  • the conversion feature of the output image is determined by the parameters of the generated neural network. According to the conversion application, the generated neural network is trained by using the training image, and the parameters are optimized to achieve the conversion purpose.
  • the image conversion application may be a conversion of image style, season, effect, scene, etc., for example, converting a landscape image into an image with Van Gogh's feature, converting an image with summer features into an image with winter features. Converting the image of a brown horse into a zebra's character, etc., or even converting a cat into a dog.
  • the downsampling module includes a convolutional layer, a downsampling layer, and an instance normalization layer that are sequentially connected.
  • a convolution kernel is only connected to a portion of the pixels of the output feature image of the previous convolutional layer, and the convolutional layer can apply a number of convolution kernels to the input image to extract multiple types of features.
  • Each convolution kernel can extract a type of feature.
  • the convolution kernel can achieve reasonable weight by learning.
  • the result obtained by applying a convolution kernel to the input image is referred to as a feature image, the number of which is the same as the number of convolution kernels.
  • Each feature image consists of a number of pixels arranged in a rectangular shape and convolved by a convolution kernel.
  • the convolution kernel of the same feature image can share weights.
  • the feature image outputted by one layer of the convolution layer is processed by the next convolution layer to obtain a new feature image.
  • the input image can be processed by a convolution layer to obtain its content feature
  • the content feature can be processed by the convolution layer of the next layer to obtain a style feature.
  • the downsampling layer may downsample the image (for example, may be a pooling layer), reduce the size of the feature image without changing the number of feature images, perform feature compression, and extract the main features.
  • the downsampling layer can reduce the size of the feature image to simplify the computational complexity and reduce the overfitting phenomenon to some extent.
  • the instance normalization layer is used for normalizing the feature image outputted by the upper level, and is normalized according to the mean and variance of each feature image in the embodiment of the present disclosure. It is assumed that the batch size used in the generation of neural network training (for example, using the mini-batch training method) is T, the number of feature images output by a convolutional layer is C, and each feature image is H line. For the matrix of the W column, the feature image is represented as (T, C, W, H), so that the normalization formula is as follows:
  • x tijk is the value of the jth column and the kth row of the i-th feature image of the tth batch of the feature image set outputted by a convolutional layer.
  • y tijk denotes the result of x tijk processed by the instance normalization layer, and ⁇ is a positive number with a small value to avoid the denominator being 0.
  • both the convolution layer and the instance normalization layer are included, and the cross-layer connection is also included, so that the residual module has two parts, and one part is a processing part having a convolution layer and an instance normalization layer. The other part is a cross-layer portion that does not process the input image, the cross-layer connection directly superimposing the input of the residual module to the output of the processing portion.
  • Introducing cross-layer connections in the residual module can bring greater flexibility to the generation of neural networks. After the training of the generated neural network is completed, in the deployment phase of the system, the degree of influence of the processing part and the cross-layer part in the residual module on the image processing result can be determined.
  • the structure of the generated neural network can be tailored to improve the operating efficiency and processing rate of the network. For example, if it is judged that the effect of the cross-layer connection portion on the image processing result is much larger than the processing portion, the cross-layer portion in the residual module can be used only when the image processing is performed by using the generated neural network, thereby improving the processing efficiency of the network.
  • the upsampling module includes an upsampling layer, an instance normalization layer, and a convolution layer that are sequentially connected, for extracting features of the input image, and normalizing the feature image.
  • the upsampling layer may be a lifting layer (or MUX layer) that performs pixel interleaving rearrangement processing on the input images such that the size of each image is increased on the basis of the number of images.
  • the MUX layer increases the number of pixels per image by arranging and combining pixels between different images.
  • Figure 4 shows an example schematic of upsampling using a 2*2 MUX layer.
  • the first noise image channel and the N channels of the input image are input into the generated neural network, and the input image and the noise image are subjected to the down sampling described above.
  • the processing of the module, the residual module, and the upsampling module extracts its feature image and finally outputs a first output image having the transformed feature.
  • the noise image has random noise for generating detailed information in the first output image, and since each input noise image is different, even if the same input image is input twice for the same generated neural network, details can be obtained.
  • the difference in the converted image enriches the details in the converted image and provides a better user experience.
  • FIG. 5 shows an exemplary structural diagram of the super-resolution neural network shown in FIG. 2
  • FIG. 6 shows a specific example structural diagram of the super-resolution neural network shown in FIG. 2, which will be combined with FIG. 2, FIG. 5 and Figure 6 illustrates the super-resolution neural network in detail.
  • the input of the super-resolution neural network includes a second noise image channel and a first color channel, a second color channel, and a third color channel of the first output image.
  • the output of the super-resolution neural network is a second output image processed by high resolution conversion, which includes a first color channel, a second color channel, and a third color channel, although the disclosure is not limited thereto.
  • the first color channel, the second color channel, and the third color channel may be RGB channels.
  • the second noise image has random noise such as Gaussian noise for generating image detail information during high-resolution image conversion of the super-resolution neural network, so that the output second output image has higher resolution. It also includes image detail information, that is, the output has image diversity.
  • the super-resolution neural network includes a lifting module and a transform module that are sequentially connected, wherein the high-resolution conversion processing by using the super-resolution neural network includes: using the lifting module to the first output image and The second noise image is subjected to upsampling processing, and outputs a first intermediate image including a luminance channel, a first color difference channel, and a second color difference channel, which is a YUV three channel in the embodiment of the present disclosure; The first intermediate image is transformed into a second output image including a first color channel, a second color channel, and a third color channel, which are RGB three channels in the embodiment of the present disclosure.
  • the first intermediate image has an improved image resolution compared to the first output image, and the multiple of the image resolution is determined by a specific structure of the lifting module.
  • the lifting module may increase the number of pixels of the input image by 16 times, which is called a 4*4 lifting module, that is, if the number of pixels of the first output image is m*n, Then, the number of pixels of the first intermediate image outputted after being processed by the 4*4 lifting module is 4m*4n.
  • the first intermediate image with increased resolution and image detail information is converted to a second output image having RGB three channels via a transform module.
  • FIG. 1 A specific example block diagram of a super-resolution neural network including a 4*4 boost module is shown in FIG.
  • the 4*4 lifting module includes a first sub-network, a second sub-network, and a third sub-network, wherein: the input of each sub-network is a first output image and a second noise image, and each sub-network has the same
  • the structure that is, contains the same number of convolutional layers CO and lifting layer MUX. It should be understood that the specific parameters of each sub-network are different.
  • the super-resolution neural network may include a plurality of promotion modules, and the promotion module may include a plurality of sub-networks, which are three sub-networks in the embodiment of the present disclosure.
  • the boost module may include one or more sub-networks in other embodiments, and may also include amplification of image resolution by standard techniques such as Bicubic.
  • each sub-network includes at least one promotion sub-module, and each promotion sub-module includes at least one convolution layer and one MUX layer connected in sequence.
  • each sub-network may further include at least one convolution layer after the plurality of promotion sub-modules.
  • each of the promotion sub-modules in each sub-network specifically includes two convolutional layers CO and MUX layers connected in sequence (the specific structure is as shown in FIG. 6 ), and the convolutional layer CO is used to extract image features.
  • the MUX layer is configured to perform upsampling processing on the feature image extracted by the convolution layer.
  • the specific functions of the convolution layer and the MUX layer are the same as those in the above-described generation neural network, and are not described herein again.
  • the first sub-network outputs luminance channel information of the first intermediate image, that is, Y channel information
  • the second sub-network outputs first chroma channel information of the first intermediate image, that is, U channel information.
  • the third sub-network outputs second color difference channel information of the first intermediate image, that is, V channel information, but the disclosure is not limited thereto.
  • a first intermediate image comprising a YUV channel is transformed by the transform module into a second output image comprising RGB channels.
  • the resolution of the first output image with lower resolution generated by the generated neural network is improved by the super-resolution network, and finally the second output image with higher resolution is output, so that the image conversion result is more satisfied. Shows the product's requirements for image resolution for a better user experience.
  • Figure 7 shows an example flow diagram for training the generated neural network
  • Figure 8 shows an example block diagram for training the generated neural network. Next, the process of training the generated neural network will be specifically described with reference to FIGS. 7 and 8.
  • a first training image I1 including three channels is acquired.
  • the first training image I1 may be an image similar to the input image described in connection with FIG.
  • the first training noise image N1 and the second training noise image N2 are acquired, wherein the noise images N1 and N2 have different random noises, for example, may be Gaussian noise.
  • the first training noise image N1 and/or the second training noise image N2 may be a noise image similar to the first noise image described in connection with FIG.
  • step S730 the generating neural network generates a first training output image Ra according to the first training image I1 and the first training noise image N1, and according to the first training image I1 and the second training noise image N2,
  • the second training output image Rb is generated, and the flow of converting the input image according to the input image and the noise image by the generated neural network to output the converted image is the same as the flow shown in FIG. 1, and will not be specifically described herein.
  • step S740 the generated neural network is trained based on the first training image I1, the first training output image Ra, and the second training output image Rb.
  • the training aims to optimize the parameters in the network according to the processing results of the generated neural network so that it can complete the conversion target.
  • the specific process of generating the neural network in step S740 includes: inputting the first training output image Ra to an authentication neural network, and outputting whether the first training output image Ra has a conversion feature.
  • An authentication tag ; calculating, by the first loss calculation unit, the loss value of the generated neural network according to the first training image I 1 , the first training output image Ra, the second training output image Rb, and the authentication tag, and optimizing the generation The parameters of the neural network.
  • the first training output image Ra may be input to the authentication neural network together with the second training output image Rb, and the authentication tags are respectively output, together for training the generated neural network.
  • the first loss calculation unit includes an analysis network, a first loss calculator, and an optimizer.
  • the specific structure of the analysis network is as shown in FIG. 9, which is composed of a number of convolution networks and a pooling layer for extracting content features of the input image.
  • the output of each of the convolutional layers is a feature proposed from the input image, and the pooling layer is used to reduce the resolution of the feature image and pass it to the next convolutional layer.
  • the feature images after each convolution layer characterize the features of the input image at different levels (such as textures, edges, objects, etc.).
  • the first training image I1, the first training output image Ra, and the second training output image Rb are processed by the analysis network, the content features are extracted, and the extracted content features are input to the first loss calculation. Device.
  • the first loss calculator calculates a loss value of the generated network according to the first loss calculation function according to the content features of the first training image I1, the first training output image Ra, and the second training output image Rb, and the authentication tag.
  • the first loss calculator inputs the calculated total loss value of the generated neural network to the optimizer, and the optimizer optimizes the convolution kernel and the offset in the convolutional layer of the neural network according to the loss value to achieve closer image conversion The processing effect of the target.
  • the first loss calculation function includes a style difference loss function for calculating the style loss value of the generated neural network according to the style feature of the first training output image Ra and the style feature of the second training output image Rb. .
  • the output of each convolutional layer is characteristic of the input image.
  • a convolutional layer with N l convolution kernels whose output contains N l feature images, assuming that the size of each feature image is M l (width x height of the feature image).
  • the output of such a layer can be stored in the matrix in. Indicates the value of the jth position in the feature image of the i-th convolution kernel output in the first layer.
  • the difference between the output images is characterized based on the style loss value between the training output images Ra and Rb.
  • the style loss function at this level is:
  • N l indicates that there are N l convolution kernels in the first layer in the analysis network, and the output of the convolution layer includes N l feature images.
  • the size of each feature image is M l (width x height of the feature image).
  • the Gram matrices A l and G l are defined as:
  • Representing a Gram matrix corresponding to the i-th convolution kernel in the lth convolutional layer ( The value of the jth position in the style feature)
  • Representing a Gram matrix corresponding to the i-th convolution kernel in the lth convolutional layer ( The value of the jth position in the style feature).
  • the total style loss function is expressed as:
  • w l is the weight of the first layer style loss as the total style loss.
  • the style feature may be extracted by analyzing multiple convolution layers in the network, or may be extracted by a convolution layer, which is not specifically limited herein.
  • C3 is a constant used to normalize the results.
  • the style loss of the two output results should be as large as possible. Therefore, the style loss is expressed as:
  • the first loss calculator calculates the style loss value between the output images according to the total style loss function L DVST according to the style characteristics of the first training output image Ra and the second training output image Rb output by the analysis network, and ensures the output image between the images. There is a diversity of results.
  • the first loss calculation function may further include a content loss function.
  • I1 is used as the input image
  • Ra is the first training output image
  • P l and F l are the feature images of the output of the layer 1 in the analysis network respectively.
  • the content loss function is defined as follows:
  • C1 is a constant used to normalize the results
  • the first training output image Ra processed by the generated neural network can be calculated.
  • the content loss values L content_a and L content_b of the second training output image Rb with respect to the first training image are calculated.
  • the generated neural network is combined with the content loss function to train the generated neural network to ensure that the converted image is consistent with the input image, and the system is simple and easy to train.
  • the first loss calculation function may further include a loss function of the generator:
  • Pdata is a set of images that causes the identification of the neural network output to be 1.
  • Pz is a collection of input images that generate neural networks.
  • D is the identification neural network, and G is the generation neural network.
  • the first loss calculator can generate a counter-loss value of the neural network based on the L_G calculation.
  • the first loss calculation function may further include a parameter regularization loss function L L1 .
  • convolution kernels and offsets are parameters that need to be trained.
  • the convolution kernel determines how the input image is processed, and the offset determines whether the output of the convolution kernel is input to the next layer.
  • the offset can be visually compared to a "switch" that determines whether the convolution kernel is "on” or "off.”
  • the network turns on or off different convolution kernels to achieve different processing effects.
  • the mean value of the absolute values of all convolution kernels in the neural network is:
  • C w is the number of convolution kernels in the network.
  • is a very small positive number used to ensure that the denominator is not zero.
  • the bias in the convolutional layer has a greater absolute value than the convolution kernel to more effectively function as a biased "switch.”
  • the first loss calculator generates a parameter regularization loss value of the neural network according to the L L1 calculation.
  • the total loss of generating a neural network can be:
  • R is the normalized loss value of the generated neural network
  • ⁇ , ⁇ , ⁇ , and ⁇ are the weights of the content loss value, the anti-loss value, the style loss value, and the standardized loss value in the total loss, respectively, which are used in the embodiments of the present disclosure.
  • the normalized loss value of the above parameters represents the normalized loss value, and other types of regularization losses can also be used.
  • the authentication neural network used in the process of training the generated neural network together with the generated neural network constitutes a set of confrontation networks.
  • the discriminating neural network extracts the content features of the input image using a plurality of convolution layers and pooling layers, and reduces the size of the feature image for further extracting image features by the next convolution layer.
  • the image features are then processed using the fully connected layer and the active layer, and finally the scalar value of the authentication tag as the input image with the transition feature is output.
  • the fully connected layer has the same structure as the convolutional neural network except that the convolution kernel is replaced with a scalar value.
  • the activation layer is typically a RELU or sigmoid function. In the embodiment of the present disclosure, the specific structure of the authentication neural network is as shown in FIG. 10, wherein the activation layer may be a sigmoid function, and finally the authentication tag is output, but the present disclosure is not limited thereto.
  • the generating neural network converts the input image from the effect A to the output image having the effect B, which determines whether the output image has the feature of the effect B, and outputs the authentication tag. For example, if it is judged that the output image has the feature of the effect B, the output is close to "1", and if it is judged that the output image does not have the feature of the effect B, "0" is output.
  • the generated neural network gradually generates an output image that makes the neural network output "1", and the identification neural network can more accurately determine whether the output image has a conversion feature. The two train synchronously and compete against each other to obtain better parameters. .
  • Training the identifying the neural network comprises: using the generating neural network to output the first output image as the first sample image Ra according to the input image and the first noise image; acquiring the sample image Rc from the data set; the first sample image Ra is The output image obtained by converting the effect A to the effect B using the generated neural network is equivalent to a "false” sample.
  • the sample image Rc is taken from the data set as a "true” sample with effect B.
  • the discriminating neural network is used to judge whether the Ra and Rc have the effect B, and the authentication tag is output.
  • the second sample image Rc naturally carries a "true” label, ie has a conversion feature
  • the first sample image Ra naturally carries a "false” label, which is obtained by image processing of the generated neural network. Convert features.
  • the neural network is identified based on the training of the authentication tag. It gradually makes it possible to more accurately determine whether the input image has corresponding image features.
  • FIG. 11 a flowchart of training the super-resolution neural network is shown in FIG. 11, and the training of the super-resolution neural network will be described in detail below with reference to FIG.
  • step S1110 an input image and a first noise image are acquired, wherein the input image has three channels, which are RGB three channels in the embodiment of the present disclosure, but the present disclosure is not limited thereto.
  • the first noise image has random noise such as Gaussian noise for generating image detail information during image conversion.
  • the generating neural network performs image conversion processing on the input image according to the acquired input image and the first noise image, and outputs a first output image, which is used as the first sample image R1 for training.
  • the super-resolution neural network is used as the first sample image R1 for training.
  • step S1130 the super-resolution training noise image N3 is acquired, and in step S1140, the low-resolution image is extracted from the first sample image R1 as the super-resolution training image I2.
  • the resolution of the super-resolution training image I2 is lower than the resolution of the first sample image R1 and includes the content feature of the first sample image R1. It will be appreciated that the first sample image R1 can be recovered from the super-resolution training image I2.
  • step S1150 the second sample image R2 is output from the super-resolution training image I2 and the super-resolution training noise image N3 using the super-resolution neural network.
  • the resolution of the second sample image R2 is higher than the resolution of the super-resolution training image I2, and may be equal to the resolution of the first sample image R1.
  • training is performed by inputting the super-resolution training noise image N3 and the super-resolution training image I2 together to the super-resolution neural network for generating detailed information in the output image, and since each input noise image Differently, the changed image details can be generated during each image processing, so that the output super-resolution images are diverse.
  • step S1160 parameters of the super-resolution neural network are optimized by reducing the cost function of the super-resolution neural network according to the first sample image R1 and the second sample image R2.
  • the cost function of the super-resolution neural network may be based on an authentication tag of the second authentication neural network.
  • the generating process of the authentication tag includes: inputting the first sample image R1 and the second sample image R2 to a second authentication neural network for evaluating the second sample image R2 with improved resolution
  • the image quality is output, and an output label indicating whether the sample image is an output image of the super-resolution neural network (second sample image R2) or an original image (first sample image R1) from which the low-resolution image is extracted is output.
  • the second authentication neural network may receive an input image having RGB three channels (second sample image R2 in the embodiment of the present disclosure) and output a number of, for example, -1 or 1.
  • the second authentication neural network considers the input image to correspond to the original high resolution content (in the embodiment of the present disclosure, the first sample image R1). If the output is -1, the second authentication neural network considers the second sample image R2 to be an output image after the resolution is increased via the generated neural network.
  • the super-resolution neural network By training the super-resolution neural network to maximize the authentication tag of the second authentication neural network, the authentication tag is gradually made as realistic as possible.
  • the second identification neural network is trained to accurately distinguish the original high resolution image and the image after the resolution is improved.
  • the super-resolution neural network and the second authentication neural network form a set of confrontation networks. The two groups of networks are alternately trained to compete with one another and obtain optimal parameters.
  • the specific structure of the second authentication neural network is as shown in FIG. 12, and includes at least a degrading submodule, each degrading submodule including at least one convolution layer and one degraded TMUX layer sequentially connected.
  • the second authentication neural network may further comprise at least one convolutional layer.
  • each of the degraded sub-modules specifically includes two convolutional layer CO and TMUX layers connected in sequence.
  • the TMUX layer performs a degradation process corresponding to the MUX layer in the super-resolution neural network, thereby downgrading the output image generated according to the second sample image input to the second authentication neural network to the same resolution as the second sample image Low resolution image.
  • the process of degrading the input image by the TMUX layer is opposite to the lifting process of the MUX layer.
  • the second authentication neural network utilizes the convolutional layer to output an image "IQ Map” similar to other image quality metrics (eg, structural similarity index (SSIM)). By averaging all the pixels in the "IQ Map", an average value is obtained as an "authentication label” of a single number and the authentication label is output.
  • image quality metrics eg, structural similarity index (SSIM)
  • Embodiments of the present disclosure provide a method for implementing an image conversion image processing that performs image conversion processing based on a generated neural network, a super-resolution neural network, and content perception.
  • the detail information of the converted image is generated by adding a noise image to the input.
  • the content feature loss function is used to train the generated neural network to ensure that the converted output image has content consistency with the input image, and the neural network is trained by using the style difference loss function between the processing results to ensure the diversity between the output results. Makes the system simple and easy to train.
  • the resolution of the transformed image that generates the neural output is boosted by the super-resolution neural network, and the parameters of the super-resolution neural network are optimized by reducing the cost function of the super-resolution neural network.
  • a high-resolution converted image can be obtained by using the trained generated neural network and the super-resolution neural network, and the converted image includes both the conversion feature and the product's requirement for image resolution.
  • the embodiment of the present disclosure further provides an image processing apparatus, as shown in FIG. 13, including a generating neural network module 1302, configured to perform image conversion processing on the input image according to the input image and the first noise image, to output the converted image.
  • a first output image wherein the first noise image includes N channels, and N is a positive integer greater than or equal to 1.
  • the generating neural network module can include the generated neural network described above.
  • the image processing apparatus provided by the embodiment of the present disclosure performs image conversion processing on the input image and the noise image by using the generated neural network module to output the converted output image.
  • the image processing apparatus further includes a super-resolution neural network module 1304 that performs high-resolution conversion processing on the first output image and the second noise image by using a super-resolution neural network module, and outputs a second output image
  • the second noise image includes M channels, and M is a positive integer greater than or equal to 1, wherein the first noise image and the second noise image are different.
  • the input image includes a first color channel, a second color channel, and a third color channel, which are RGB channels in embodiments of the present disclosure.
  • the input to generate the neural network module includes a first noise image channel and an RGB channel of the input image.
  • the output of the generated neural network module is a first output image that includes a first color channel, a second color channel, and a third color channel, which are RGB channels in embodiments of the present disclosure.
  • the generating neural network module includes one or more downsampling modules, one or more residual modules, and one or more upsampling modules.
  • the downsampling module comprises a convolution layer, a downsampling layer and an instance normalization layer connected in sequence
  • the residual module comprises a convolution layer and an instance normalization layer connected in sequence
  • the upsampling module comprises an upper connection The sampling layer, the instance normalization layer, and the convolution layer, the number of the upsampling modules being equal to the number of the downsampling modules.
  • the input of the super-resolution neural network module includes a second noise image channel and an RGB channel of the first output image.
  • the output of the super-resolution neural network module is a second output image comprising a first color channel, a second color channel, and a third color channel, which in the disclosed embodiment is an RGB channel.
  • the super-resolution neural network module includes a lifting module and a transformation module connected in sequence: the lifting module is configured to perform upsampling processing on the first output image and the second noise image, and output the brightness channel, the first color difference channel, and The first intermediate image of the second color difference channel is a YUV channel in the embodiment of the present disclosure.
  • the transform module is configured to transform the first intermediate image output by the boost module into a second output image including an RGB channel.
  • the promotion module comprises a first sub-network, a second sub-network and a third sub-network, and: the input of each sub-network is a first output image and a second noise image; each sub-network has the same structure, including The same number of convolution and lift layers.
  • the generating neural network module performs image conversion using the first training image I1 and the first training noise image N1 to output the converted first training output image Ra, and utilizes the first training image I1 and The second training noise image N2 performs image conversion to output the converted first training output image Rb.
  • the training neural network module trains the generated neural network module based on the first training image I1, the first training output image Ra, and the second training output image Rb.
  • the training aims to optimize the parameters in the network according to the processing results of the generated neural network module so that it can complete the conversion target.
  • the training neural network module includes: an authentication neural network module, configured to output whether the first training output image Ra has an identification label of a conversion feature; a first loss calculation unit, configured to use the first training image I 1 , The first training output image Ra, the second training output image Rb, and the authentication tag calculate a loss value of the generated neural network, and optimize parameters of the generated neural network module.
  • the parameters include a convolution kernel and a bias of the convolutional layer in the generated neural network module.
  • the first training output image Ra may be input to the authentication neural network module together with the second training output image Rb, and the authentication tags are respectively output, together for training the generated neural network.
  • the trained neural network module is trained to have optimized parameters and can be used to implement target image conversion processing.
  • the first loss calculation unit is used to perform training in combination with the content features of the input image, the first output image, and the second output image, and the system is simplified and easier to train.
  • the result diversity loss function is used to ensure the diversity between the converted images output by the generated neural network module.
  • the content loss function is used to ensure that the converted image of the output is consistent with the input image, that is, the converted image has both conversion features and sufficient original image information to avoid losing a large amount of original image information during image processing.
  • the training neural network module further includes: a second authentication neural network module, configured to output, according to the first sample image R1 and the second sample image R2, whether the second sample image R2 has a corresponding An authentication tag for the content feature of the first sample image.
  • the training neural network module further trains the super-resolution neural module based on an authentication tag of the output of the second authentication neural network. For example, the optimizer optimizes parameters of the super-resolution neural network module by reducing a cost function of the super-resolution neural network module.
  • the generating neural network generates a first output image according to the input image and the first noise image, the first output image as a first sample image having a conversion feature and including an RGB channel.
  • the super-resolution neural network module further outputs a second sample image according to the super-resolution training image and the acquired super-resolution training noise image, wherein the super-resolution training image is a low-resolution image extracted from the first sample image.
  • the training neural network module optimizes parameters of the super-resolution neural network module by reducing a cost function of the super-resolution neural network module based on the first sample image and the second sample image, the parameters may include the The convolution kernel and offset of the convolutional layer in the super-resolution neural network module.
  • the image conversion image processing apparatus performs image conversion processing based on a generated neural network, a super-resolution neural network, and content perception, which includes generating a neural network module and a super-resolution network module.
  • the detail information of the converted image is generated by adding a noise image to the input.
  • the content feature loss function is used to train the generated neural network module to ensure that the converted output image has content consistency with the input image, and the neural network module is trained by using the style difference loss function between the processing results to ensure the output between the results. Diversity makes the system simple and easy to train.
  • the resolution of the transformed image that generates the neural output is improved by the super-resolution neural network module, and the parameters of the super-resolution neural network module are optimized by reducing the cost function of the super-resolution neural network module.
  • a high-resolution converted image can be obtained by using the trained generated neural network module and the super-resolution neural network module, and the converted image includes both the conversion feature and the product requirement for image resolution.
  • An embodiment of the present disclosure further provides an image processing apparatus, which is shown in FIG. 14 and includes a processor 1402 and a memory 1404. It should be noted that the configuration of the image processing apparatus shown in FIG. 14 is merely exemplary and not restrictive, and the image processing apparatus may have other components depending on actual application needs.
  • the processor 1402 and the memory 1404 may communicate with each other directly or indirectly. Communication between components such as processor 1402 and memory 1404 can be through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunications network, an Internet of Things based Internet and/or telecommunications network, and/or any combination of the above networks, and the like.
  • the wired network can be communicated by means of twisted pair, coaxial cable or optical fiber transmission, for example, a wireless communication network such as a 3G/4G/5G mobile communication network, Bluetooth, Zigbee or WiFi.
  • the disclosure does not limit the type and function of the network.
  • the processor 1402 can control other components in the image processing device to perform the desired functions.
  • the processor 1402 can be a device having data processing capabilities and/or program execution capabilities, such as a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU.
  • the central processing unit (CPU) can be an X86 or ARM architecture or the like.
  • the GPU can be integrated directly into the motherboard or built into the Northbridge of the motherboard.
  • the GPU can also be built into the central processing unit (CPU). Because the GPU has powerful image processing capabilities.
  • Memory 1404 can comprise any combination of one or more computer program products, which can comprise various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • Volatile memory can include, for example, random access memory (RAM) and/or caches and the like.
  • the non-volatile memory may include, for example, a read only memory (ROM), a hard disk, an erasable programmable read only memory (EPROM), a portable compact disk read only memory (CD-ROM), a USB memory, a flash memory, and the like.
  • One or more computer readable code or instructions may be stored on the memory 1404, and the processor 1402 may execute the computer instructions to perform the image processing methods described above or to implement the image processing apparatus described above.
  • the processor 1402 may execute the computer instructions to perform the image processing methods described above or to implement the image processing apparatus described above.
  • image processing method and the image processing apparatus For a detailed description of the image processing method and the image processing apparatus, reference may be made to the related description of the image processing method and the processing apparatus in this specification, and details are not described herein again.
  • Various applications and various data may also be stored in the computer readable storage medium, such as image data sets and various data (such as training data) used and/or generated by the application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法、处理装置和处理设备。该图像处理方法包括利用生成神经网络根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像;利用超分辨率神经网络根据第一输出图像和第二噪声图像对第一输出图像进行高分辨率转换处理,输出第二输出图像。

Description

图像处理方法、处理装置和处理设备
相关申请的交叉引用
本申请要求于2017年11月9日递交的题为“图像处理方法、处理装置和处理设备”的中国专利申请(申请号201711100015.5)的优先权,在此以全文引用的方式将该中国专利申请并入本文中。
技术领域
本公开涉及图像处理,尤其涉及一种图像处理方法、处理装置和处理设备。
背景技术
利用深度神经网络进行图像处理和转换是随着深度学习技术的发展而新兴起来的技术。然而,相关技术中的图像处理和转换系统的结构复杂且难于训练,并且输出图像缺乏多样性。因此,需要一种实现图像转换的图像处理方法、装置和设备,其既能保证输出图像与输入图像之间具有一致性,又能保证不同输出图像之间具有多样性。
发明内容
本公开实施例提供一种图像处理方法,包括:利用生成神经网络根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像;利用超分辨率神经网络根据第一输出图像和第二噪声图像对第一输出图像进行高分辨率转换处理,以输出第二输出图像。
根据本公开实施例,所述输入图像包括第一颜色通道、第二颜色通道和第三颜色通道;所述第一噪声图像包括N个通道,N为大于等于1的正整数,所述第二噪声图像包括M个通道,M为大于等于1的正整数;所述生成神经网络的输入包括第一噪声图像通道以及输入图像的第一颜色通道、第二颜色通道和第三颜色通道;所述生成神经网络的输出为第一输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
根据本公开实施例,所述生成神经网络包括一个或多个下采样模块、一个 或多个残差模块和一个或多个上采样模块,其中:所述下采样模块包括依次连接的卷积层、下采样层和实例标准化层;所述残差模块包括依次连接的卷积层和实例标准化层;所述上采样模块包括依次连接的上采样层、实例标准化层和卷积层,其中:所述上采样模块的个数与所述下采样模块的个数相等。
根据本公开实施例,所述超分辨率神经网络的输入包括第二噪声图像通道以及第一输出图像的第一颜色通道、第二颜色通道和第三颜色通道;所述超分辨率神经网络的输出为第二输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
根据本公开实施例,所述超分辨率神经网络包括依次连接的提升模块和变换模块,并且利用超分辨率神经网络进行高分辨率转换处理包括:利用所述提升模块对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像;利用所述变换模块将提升模块输出的第一中间图像变换为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像。
根据本公开实施例,所述提升模块包括第一子网络、第二子网络和第三子网络,其中:每个子网络的输入均为第一输出图像和第二噪声图像;每个子网络具有相同的结构,包含相同个数的卷积层和提升层。
根据本公开实施例,所述图像处理方法还包括:利用所述生成神经网络根据第一训练图像和第一训练噪声图像,生成第二训练输出图像;利用所述生成神经网络根据所述第一训练图像和第二训练噪声图像,生成第二训练输出图像,其中,所述第二训练噪声图像不同于所述第一训练噪声图像;基于第一训练图像、第一训练输出图像和第二训练输出图像训练所述生成神经网络。
根据本公开实施例,训练所述生成神经网络包括:将所述第一训练输出图像输入至鉴别神经网络,输出所述第一训练输出图像是否具有转换特征的鉴别标签;将所述第二训练输出图像输入至鉴别神经网络,输出所述第二训练输出图像是否具有转换特征的鉴别标签;利用第一损失计算单元根据所述第一训练图像、第一训练输出图像、第二训练输出图像和相应鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络的参数。其中:所述第一损失计算单元包括分析网络、第一损失计算器和优化器,并且利用所述第一损失计算单元 计算所述生成神经网络的损失值包括:利用分析网络输出所述第一训练图像、第一训练输出图像、第二训练输出图像的内容特征以及利用分析网络输出所述第一训练输出图像和第二训练输出图像的风格特征;利用第一损失计算器根据分析网络提取的内容特征、风格特征和所述第一训练输出图像和所述第二训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络的损失值;利用优化器根据所述生成神经网络的损失值优化所述生成神经网络的参数。
根据本公开实施例,所述第一损失函数包括风格差异损失函数,并且,计算所述生成神经网络的损失值包括:利用所述第一损失计算器根据第一训练输出图像的风格特征和第二训练输出图像的风格特征按照风格差异损失函数计算所述生成神经网络的风格损失值。所述第一损失函数还包括内容损失函数,并且,计算所述生成神经网络的损失值包括:根据第一训练图像、第一训练输出图像和第二训练输出图像的内容特征按照内容损失函数计算所述生成神经网络的内容损失值。
根据本公开实施例,所述图像处理方法还包括:从第一样本图像提取低分辨率图像作为超分辨训练图像,所述超分辨训练图像的分辨率低于所述第一样本图像的分辨率;利用超分辨率神经网络根据超分辨训练图像和超分辨训练噪声图像输出第二样本图像,所述第二样本图像的分辨率等于所述第一样本图像的分辨率;根据第一样本图像和第二样本图像,通过减少所述超分辨率神经网络的成本函数优化超分辨率神经网络的参数。
本公开实施例还提供一种图像处理装置,包括:生成神经网络模块,被配置为根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像;超分辨率神经网络模块,被配置为根据第一输出图像和第二噪声图像对第一输出图像进行高分辨率转换处理,以输出第二输出图像。
根据本公开实施例,所述输入图像包括第一颜色通道、第二颜色通道和第三颜色通道;所述生成神经网络模块的输入包括第一噪声图像通道以及输入图像的第一颜色通道、第二颜色通道和第三颜色通道;所述生成神经网络模块的输出为第一输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
根据本公开实施例,所述生成神经网络模块包括一个或多个下采样模块、一个或多个残差模块和一个或多个上采样模块,其中:所述下采样模块包括依 次连接的卷积层、下采样层和实例标准化层;所述残差模块包括依次连接的卷积层和实例标准化层;所述上采样模块包括依次连接的上采样层、实例标准化层和卷积层,所述上采样模块的个数与所述下采样模块的个数相等。
根据本公开实施例,所述超分辨率神经网络模块的输入包括第二噪声图像通道以及第一输出图像的第一颜色通道、第二颜色通道和第三颜色通道;所述超分辨率神经网络模块的输出为第二输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
根据本公开实施例,所述超分辨率神经网络模块包括依次连接的提升模块和变换模块:所述提升模块被配置为对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像;所述变换模块被配置为将提升模块输出的第一中间图像变换为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像。其中:所述提升模块包括第一子网络、第二子网络和第三子网络。
根据本公开实施例,所述第一子网络、第二子网络和第三子网络的输入为第一输出图像和第二噪声图像,并且,输出图像具有3个通道,包括亮度通道、第一色差通道和第二色差通道;所述第一子网络、第二子网络和第三子网络具有相同的结构,并且每个包括至少一个提升子模块,每个提升子模块包括依次连接的一个或多个卷积层和一个提升层。
根据本公开实施例,所述图像处理装置还包括训练神经网络模块,被配置为根据所述生成神经网络模块的输出图像来训练所述生成神经网络模块。所述生成神经网络模块还根据第一训练图像和第一训练噪声图像来输出转换后的第一训练输出图像;所述生成神经网络模块还根据第一训练图像和第二训练噪声图像来输出转换后的第二训练输出图像,其中,所述第二训练噪声图像不同于所述第一训练噪声图像;所述训练神经网络模块基于第一训练图像、第一训练输出图像和第二训练输出图像训练所述生成神经网络模块。
其中,所述训练神经网络模块包括:鉴别神经网络模块,被配置为输出所述第一训练输出图像和所述第二训练输出图像是否具有转换特征的鉴别标签;第一损失计算单元,被配置为根据所述第一训练图像、第一训练输出图像、第二训练输出图像和相应的鉴别标签计算所述生成神经网络模块的损失值,优化 所述生成神经网络模块的参数,其中所述第一损失计算单元包括:分析网络,被配置为输出所述第一训练图像、第一训练输出图像、第二训练输出图像的内容特征;第一损失计算器,被配置为根据分析网络提取的内容特征、风格特征和所述第一训练输出图像和所述第二训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络模块的损失值;优化器,被配置为根据所述生成神经网络模块的损失值优化所述生成神经网络模块的参数。
根据本公开实施例,所述第一损失函数包括风格差异损失函数,用于根据第一训练输出图像的风格特征和第二训练输出图像的风格特征计算所述生成神经网络模块的风格损失值;所述第一损失函数还包括内容损失函数,用于根据第一训练图像、第一训练输出图像和第二训练输出图像的内容特征计算所述生成神经网络模块的内容损失值。
根据本公开实施例,训练神经网络模块还被配置为根据超分辨率神经网络的输出来训练超分辨率神经网络模块,所述超分辨率神经网络模块还根据超分辨训练图像和获取的超分辨训练噪声图像输出第二样本图像,其中,所述超分辨训练图像是从第一样本图像提取的低分辨率图像,其中,所述训练神经网络模块还包括:第二鉴别神经网络模块,被配置为基于第一样本图像和第二样本图像输出鉴别标签;其中,所述优化器通过减少所述超分辨率神经网络模块的成本函数优化所述超分辨率神经网络模块的参数。
本公开实施例还提供一种图像处理设备包括一个或多个处理器和一个或多个存储器。其中所述存储器存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器运行时执行上述图像处理方法或实现上述图像处理装置。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开实施例提供的图像处理方法的示例流程图;
图2示出了用于实现图1中所述图像处理方法的神经网络的示例结构示意图;
图3示出了图2中生成神经网络的具体示例结构图;
图4示出了提升层的示例示意图;
图5示出了图2中超分辨率神经网络的示例结构示意图;
图6示出了图5中超分辨率神经网络的具体示例结构图;
图7示出了训练生成神经网络的示例流程图;
图8示出了训练生成神经网络的示例框图;
图9示出了分析网络的具体示例结构图;
图10示出了鉴别神经网络的具体示例结构图;
图11示出了训练超分辨率神经网络的示例流程图;
图12示出了第二鉴别神经网络的具体示例结构图;
图13示出了本公开实施例提供的图像处理装置的示意性示例框图;
图14示出了本公开实施例提供的图像处理设备的示意性示例框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供一种图像处理方法、处理装置和处理设备,用于实现图像转换。所述图像处理方法、处理装置和处理设备基于生成神经网络、超分辨率神经网络和内容感知进行图像转换。通过在输入中加入噪声图像以生成转换图像的细节信息。利用内容特征损失函数来训练生成神经网络,保证转换后的输出图像与输入图像具有内容一致性,利用处理结果之间的风格差异损失函数训练生成神经网络,保证输出结果之间的多样性,使得系统简单,易于训练。在此基础上,利用超分辨率神经网络提升生成神经网络输出的转换图像的分辨率,获得高分辨率的转换图像,以满足产品对于图像分辨率的需求。
本公开实施例提供的图像处理方法的示例流程图如图1所示,在步骤S110, 获取要接受图像转换处理的输入图像,该输入图像作为原始信息包括第一颜色通道、第二颜色通道和第三颜色通道,在本公开一些实施例中为RGB三通道,然而本公开不限于此。接着,在步骤S120,获取第一噪声图像和第二噪声图像,其中,所述第一噪声图像包括N个通道,N为大于等于1的正整数。在一些实施例中,第一噪声图像可以和第二噪声图像不相同。在本公开实施例中,例如N可以为1,即所述第一噪声图像作为第4通道,与输入图像的RGB通道信息一起输入至生成神经网络。所述噪声可以是例如高斯噪声的随机噪声。在本公开的其它实施例中,例如,N也可以为3,通过将第一噪声图像的3个通道分别添加至期望进行图像转换处理的原始图像的RGB通道中,生成包含噪声信息的输入图像,所述生成神经网络根据所述输入图像对所述原始图像进行图像转换处理此种情形在本说明书中不再赘述。由于每次输入的噪声图像包含随机噪声,使得利用同一组生成神经网络根据同一幅输入图像进行的多次图像处理操作可以得到具有不同细节信息的转换结果,即带来转换结果的多样性。此外,获取输入图像和获取噪声图像在流程上的先后顺序不影响图像处理结果。
在步骤S130,将获取的输入图像与第一噪声图像一起输入至生成神经网络(例如,在一些实施例中,取决于生成神经网络的具体实现,可以将输入图像与第一噪声图像叠加并作为单一图像数据输入至生成神经网络,也可以将输入图像与第一噪声图像中的数据作为不同数据通道分别输入生成神经网络中),完成图像处理操作(例如,图像转换处理)。在步骤S140,该生成神经网络输出经过图像转换处理的第一输出图像,所述第一输出图像具有3个通道,在本公开实施例中为RGB三通道,然而本公开不限于此。该生成神经网络经过不同的训练过程可以实现不同的图像处理,例如图像风格、场景、季节、效果或基于其他特征的图像转换。在步骤S150,将生成神经网络输出的第一输出图像与第二噪声图像一起输入至超分辨率神经网络(例如,在一些实施例中,取决于超分辨率神经网络的具体实现,可以将第一输出图像与第二噪声图像叠加并作为单一图像数据输入至超分辨率神经网络,也可以将第一输出图像与第二噪声图像中的数据作为不同数据通道分别输入超分辨率神经网络中),完成高分辨率转换处理,提升第一输出图像的分辨率,其中,所述第二噪声图像包括M个通道,M为大于等于1的正整数,在本公开实施例中,例如M可以为1, 即所述第二噪声图像作为单独的通道输入至超分辨率神经网络,用于在超分辨转换过程中生成图像细节信息。在本公开的其它实施例中,例如,M也可以为3,通过将第二噪声图像的3个通道分别添加至所述第一输出图像的RGB通道中,生成包含噪声信息的第一输出图像,所述超分辨率神经网络对所述第一输出图像进行分辨率提升处理,此种情形在本说明书中不再赘述。
在步骤S160,该超分辨率神经网络输出提升了分辨率的第二输出图像。由于超分辨率神经网络在进行提升分辨率处理的过程中结合了第二噪声图像的信息,使得利用同一组超分辨率神经网络根据同一幅输入的图像进行的多次图像处理操作可以得到具有不同细节信息的输出结果,进一步带来转换结果的多样性。
图2中示出了用于实现上述图像处理方法的神经网络的示例结构示意图,其主要包括生成神经网络和超分辨率神经网络两部分,图3示出了图2中所述生成神经网络的具体示例结构图,下面将结合图2和图3对所述生成神经网络进行详细描述。
如图2所示,所述生成神经网络的输入包括输入图像的三个通道(特征),例如具体包括第一颜色通道、第二颜色通道和第三颜色通道,在本公开实施例中为RGB三通道,还包括所述第一噪声图像。所述生成神经网络的输出为具有3个通道的第一输出图像,在本公开实施例中为RGB三通道,然而本公开不限于此。所述生成神经网络包括一个或多个下采样模块、一个或多个残差模块和一个或多个上采样模块。生成神经网络的深度由所述下采样模块、残差模块、下采样模块的个数决定,根据具体的转换应用确定。此外,在一些实施例中,所述下采样模块和上采样模块的个数可以相同,以保证输出图像与输入图像具有相同的图像尺寸。
所述下采样模块用于对输入图像和噪声图像进行卷积处理以提取图像特征,并减小特征图像的尺寸。所述残差模块在不改变特征图像尺寸的基础上通过卷积进一步处理下采样模块输出的特征图像。所述上采样模块用于对残差模块输出的所述特征图像进行放大和标准化处理,输出转换特征后的输出图像。该输出图像的转换特征由所述生成神经网络的参数决定,根据转换应用,通过使用训练图像对所述生成神经网络进行训练,优化所述参数,以实现转换目的。 所述图像转换应用可以是图像风格、季节、效果、场景等的转换,例如将一幅风景图像转换为具有梵高作品特征的图像、将一幅具有夏季特征的图像转换为具有冬季特征的图像、将棕色马的图像转换为斑马的特征等,甚至可以是将猫转换为狗。
如图3所示,下采样模块包括依次连接的卷积层、下采样层和实例标准化层。
在卷积层中,一个卷积核只与前一卷积层的输出特征图像的部分像素连接,卷积层可以对输入的图像应用若干个卷积核,以提取多种类型的特征。每个卷积核可以提取一种类型的特征,在训练生成神经网络的过程中,卷积核通过学习达到合理的权值。对输入的图像应用一个卷积核之后得到的结果被称为特征图像,其数目与卷积核的数目相同。每个特征图像由一些矩形排列的、经卷积核所卷积的像素组成,同一特征图像的卷积核可共享权值。经过一层卷积层输出的特征图像经由下一层卷积层处理后可以得到新的特征图像。例如,输入的图像经过一层卷积层处理后可以得到其内容特征,所述内容特征经由下一层的卷积层处理后可以得到风格特征。
下采样层可以对图像进行下采样处理(例如,可以是池化层),可以在不改变特征图像数量的基础上减少特征图像的尺寸,进行特征压缩,提取主要特征。此外,下采样层其可以缩减特征图像的规模,以简化计算的复杂度,在一定程度上减小过拟合的现象。
实例标准化层用于对上一层级输出的特征图像进行标准化处理,本公开实施例中为根据每个特征图像的均值和方差进行标准化。假设该生成神经网络训练时(例如,采用mini-batch训练方式)所使用的批次尺寸(batch size)为T,某卷积层输出的特征图像数量为C,每个特征图像均为H行W列的矩阵,则特征图像表示为(T,C,W,H),从而标准化公式如下:
Figure PCTCN2018114848-appb-000001
其中x tijk为某卷积层输出的特征图像集合中的第t个批次(batch)的第i个特征图像的第j列、第k行的值。y tijk表示x tijk经过实例标准化层处理得到的结果,ε为值很小的正数,以避免分母为0。
如图3所示,在残差模块中,既包括卷积层和实例标准化层,而且还包括跨层连接,使得残差模块具有两部分,一部分为具有卷积层和实例标准化层的处理部分,另一部分为对输入的图像不进行处理的跨层部分,该跨层连接将残差模块的输入直接叠加到所述处理部分的输出。在残差模块中引入跨层连接可以给生成神经网络带来更大的灵活性。当对生成神经网络的训练完成后,在系统的部署阶段,可以判断残差模块中处理部分与跨层部分对于图像处理结果的影响程度。根据该影响程度可以对生成神经网络的结构进行一定的裁剪,以提高网络的运行效率和处理速率。例如,若通过判断,跨层连接部分对于图像处理结果的影响远大于处理部分,则在利用该生成神经网络进行图像处理时可以只使用残差模块中的跨层部分,提高网络的处理效率。
如图3所示,上采样模块包括依次连接的上采样层、实例标准化层和卷积层,用于提取输入的图像的特征,并对特征图像进行标准化处理。
所述上采样层,例如,可以是提升层(或MUX层),其可以对输入的若干图像进行像素交错重排处理,使得在图像数量不变的基础上,增加每个图像的尺寸。由此,MUX层通过不同图像间像素的排列组合,增加了每幅图像的像素数目。图4示出了使用2*2MUX层进行上采样的示例示意图。对于输入的4幅图像INPUT 4n、INPUT 4n+1、INPUT 4n+2和INPUT 4n+3,假设输入的图像的像素数目为a*b,经过2*2MUX层的像素重排处理后,输出4幅像素数目为2a*2b的图像OUTPUT 4n、OUTPUT 4n+1、OUTPUT 4n+2和OUTPUT 4n+3,增加了每幅图像的像素信息。
在本公开实施例中,第一噪声图像通道和输入图像的N个通道(本公开实施例中为RGB通道)一起输入到所述生成神经网络中,所述输入图像和噪声图像经过上述下采样模块、残差模块和上采样模块的处理,提取其特征图像,并最终输出具有转换特征的第一输出图像。所述噪声图像具有随机噪声,用于生成第一输出图像中的细节信息,并且由于每次输入的噪声图像不同,即使对同一生成神经网络先后输入两次相同的输入图像,也可以得到具有细节差异的转换图像,丰富了转换图像中的细节信息,可以提供更好的用户体验。
图5示出了图2中所示超分辨率神经网络的示例结构示意图,图6示出了图2中所示超分辨率神经网络的具体示例结构图,下面将结合图2、图5和图 6对所述超分辨率神经网络进行详细描述。
如图2所示,所述超分辨率神经网络的输入包括第二噪声图像通道以及第一输出图像的第一颜色通道、第二颜色通道和第三颜色通道。所述超分辨率神经网络的输出为经过高分辨率转换处理的第二输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道,然而本公开不限于此。在本公开实施例中,所述第一颜色通道、第二颜色通道和第三颜色通道可以为RGB通道。所述第二噪声图像具有例如高斯噪声的随机噪声,用于在超分辨率神经网络进行图像高分辨率转换的过程中生成图像细节信息,使得输出的第二输出图像既具有较高分辨率,又包括图像细节信息,即输出结果具有图像多样性。
如图5所示,所述超分辨率神经网络包括依次连接的提升模块和变换模块,其中,利用超分辨率神经网络进行高分辨率转换处理包括:利用所述提升模块对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像,本公开实施例中为YUV三通道;利用所述变换模块将提升模块输出的第一中间图像变换为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像,本公开实施例中为RGB三通道。其中,所述第一中间图像与所述第一输出图像相比,具有提高了的图像分辨率,所述图像分辨率提高的倍数由所述提升模块的具体结构决定。在本公开实施例中,例如,所述提升模块可以将输入的图像的像素数目提升16倍,称为4*4提升模块,即,若所述第一输出图像的像素数目为m*n,则经过4*4提升模块处理后输出的第一中间图像的像素数目为4m*4n。所述增加了分辨率和图像细节信息的第一中间图像经由变换模块转换为具有RGB三通道的第二输出图像。
图6中示出了包括4*4提升模块的超分辨率神经网络的具体示例结构图。其中,所述4*4提升模块包括第一子网络、第二子网络和第三子网络,其中:每个子网络的输入均为第一输出图像和第二噪声图像,并且每个子网络具有相同的结构,即,包含相同个数的卷积层CO和提升层MUX。应了解,每个子网络的具体参数不同。在本公开实施例中,超分辨率神经网络可以包括多个提升模块,所述提升模块可以包括多个子网络,在本公开实施例中为3个子网络。应了解,所述提升模块在其他实施例中可以包括一个或多个子网络,也可以包 括例如Bicubic的标准技术实现图像分辨率的放大。并且,每个子网络包括至少一个提升子模块,每个提升子模块包括依次连接的至少一个卷积层和一个MUX层。而且,每个子网络在多个提升子模块之后还可以包括至少一个卷积层。例如,所述每个子网络中的每个提升子模块具体包括依次连接的两个卷积层CO和MUX层(具体结构图如图6所示),所述卷积层CO用于提取图像特征,所述MUX层用于对所述卷积层提取的特征图像进行上采样处理。所述卷积层和MUX层的具体功能与上述生成神经网络中的相同,在此不再赘述。
在本公开实施例中,所述第一子网络输出第一中间图像的亮度通道信息,即Y通道信息,所述第二子网络输出第一中间图像的第一色差通道信息,即U通道信息,所述第三子网络输出第一中间图像的第二色差通道信息,即V通道信息,然而本公开不限于此。包括YUV通道的第一中间图像经过所述变换模块处理变换为包括RGB通道的第二输出图像。
本公开实施例中,通过超分辨率网络提升生成神经网络输出的具有较低分辨率的第一输出图像的分辨率,最终输出较高分辨率的第二输出图像,使得图像转换结果更能满足显示产品对于图像分辨率的要求,获得更好的用户体验。
图7示出了训练所述生成神经网络的示例流程图,图8示出了训练所述生成神经网络的示例框图。下面,结合图7和图8来具体地描述训练所述生成神经网络的过程。
在本公开实施例提供的图像处理方法中,如图7所示,在步骤S710,获取包括3个通道的第一训练图像I1。在一些实施例中,第一训练图像I1可以是类似于结合图1所描述的输入图像的图像。
在步骤S720,获取第一训练噪声图像N1以及第二训练噪声图像N2,其中,所述噪声图像N1和N2具有不相同的随机噪声,例如,可以是高斯噪声。在一些实施例中,第一训练噪声图像N1和/或第二训练噪声图像N2可以是类似于结合图1所描述的第一噪声图像的噪声图像。
在步骤S730,所述生成神经网络根据所述第一训练图像I1和第一训练噪声图像N1,生成第一训练输出图像Ra,并且根据所述第一训练图像I1和第二训练噪声图像N2,生成第二训练输出图像Rb,利用生成神经网络根据输入图像和噪声图像对输入图像进行转换处理以输出转换图像的流程与图1中所 示流程相同,在此不再具体描述。
然后,在步骤S740,基于第一训练图像I1、第一训练输出图像Ra和第二训练输出图像Rb训练所述生成神经网络。该训练旨在根据生成神经网络的处理结果,优化该网络中的参数,使得其可以完成转换目标。
如图8所示,步骤S740的训练所述生成神经网络的具体过程包括:将所述第一训练输出图像Ra输入至鉴别神经网络,用于输出所述第一训练输出图像Ra是否具有转换特征的鉴别标签;利用第一损失计算单元根据所述第一训练图像I 1、第一训练输出图像Ra、第二训练输出图像Rb和鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络的参数。在本公开的实施例中,可以将第一训练输出图像Ra与第二训练输出图像Rb一起输入至鉴别神经网络,分别输出鉴别标签,一起用于训练所述生成神经网络。
如图8所示,所述第一损失计算单元包括分析网络、第一损失计算器、优化器三部分。所述分析网络的具体结构如图9所示,其由若干个卷积网络和池化层组成,用于提取输入的图像的内容特征。其中每一个卷积层的输出都是从输入的图像中提出的特征,池化层用于降低特征图像的分辨率并传递给下一个卷积层。经过每个卷积层后的特征图像都表征了输入图像在不同级别上的特征(如纹理、边缘、物体等)。在本公开实施例中,利用分析网络对第一训练图像I1、第一训练输出图像Ra和第二训练输出图像Rb进行处理,提取其内容特征,并将提取的内容特征输入至第一损失计算器。
所述第一损失计算器根据第一训练图像I1、第一训练输出图像Ra和第二训练输出图像Rb的内容特征以及鉴别标签,按照第一损失计算函数计算生成网络的损失值。第一损失计算器将计算得到的生成神经网络的总损失值输入到优化器,所述优化器根据损失值优化生成神经网络的卷积层中卷积核和偏置,以实现更接近图像转换目标的处理效果。
在本公开实施例中,第一损失计算函数包括风格差异损失函数,用于根据第一训练输出图像Ra的风格特征和第二训练输出图像Rb的风格特征计算所述生成神经网络的风格损失值。在分析网络(如图9所示)中,每一个卷积层的输出都是输入图像的特征。假设某个具有N l个卷积核的卷积层,其输出包含N l个特征图像,假设每个特征图像的尺寸都是M l(特征图像的宽x高)。这样 l层的输出可以存储在矩阵
Figure PCTCN2018114848-appb-000002
中。
Figure PCTCN2018114848-appb-000003
表示第l层中第i个卷积核输出的特征图像中第j个位置的值。
在本公开实施例中,根据训练输出图像Ra和Rb之间的风格损失值来表征输出图像之间的差异。假设
Figure PCTCN2018114848-appb-000004
Figure PCTCN2018114848-appb-000005
分别为两张输入至分析网络的图像(例如,第一训练输出图像Ra和第二训练输出图像Rb),其在第l层输出的格拉姆(Gram)矩阵分别为A l和G l,则
Figure PCTCN2018114848-appb-000006
Figure PCTCN2018114848-appb-000007
在该层的风格损失函数为:
Figure PCTCN2018114848-appb-000008
其中E l表示所述风格损失函数,C2为一常数,用于对结果进行标准化处理。N l表示分析网络中第l层中具有N l个卷积核,则该卷积层的输出中包含N l个特征图像。每个特征图像的尺寸都是M l(特征图像的宽x高)。所述Gram矩阵A l和G l定义为:
Figure PCTCN2018114848-appb-000009
其中,
Figure PCTCN2018114848-appb-000010
表示在所述第l个卷积层中第i个卷积核对应的格拉姆矩阵(
Figure PCTCN2018114848-appb-000011
的风格特征)中第j个位置的值,则
Figure PCTCN2018114848-appb-000012
表示在所述第l个卷积层中第i个卷积核对应的格拉姆矩阵(
Figure PCTCN2018114848-appb-000013
的风格特征)中第j个位置的值。
因此,若分析网络通过L个卷积层提取输入的图像的风格特征,则总风格损失函数表示为:
Figure PCTCN2018114848-appb-000014
其中,w l为第l层风格损失占总风格损失的权重。
在本公开实施例中,风格特征可以通过分析网络中多个卷积层提取,也可通过一个卷积层提取,在此不作具体限制。
因此,两幅训练输出图像Ra和Rb的风格差异为:
Figure PCTCN2018114848-appb-000015
其中C3为一常数,用于对结果进行标准化处理。
为了使输出结果之间的多样性更明显,即要求两个输出结果的风格损失应 该越大越好,因此,风格损失表示为:
Figure PCTCN2018114848-appb-000016
第一损失计算器根据分析网络输出的第一训练输出图像Ra和第二训练输出图像Rb的风格特征,按照上述总风格损失函数L DVST计算输出图像之间的风格损失值,保证输出图像之间具有结果多样性。
在本公开实施例中,第一损失计算函数还可包括内容损失函数。I1作为输入图像,Ra作为第一训练输出图像,P l和F l分别为他们在分析网络中第l层的输出的特征图像,则内容损失函数的定义如下:
Figure PCTCN2018114848-appb-000017
其中C1为一个常数,用于对结果进行标准化处理,
Figure PCTCN2018114848-appb-000018
表示在分析网络中第l个卷积层中第i个卷积核输出的F l中第j个位置的值,
Figure PCTCN2018114848-appb-000019
表示第l个卷积层中第i个卷积核输出的P l中第j个位置的值。
按照内容损失公式,根据第一训练图像I1、第一训练输出图像Ra和第二训练输出图像Rb在分析网络中输出的特征图像,则可计算出经过生成神经网络处理的第一训练输出图像Ra和第二训练输出图像Rb相对于第一训练图像的内容损失值L content_a和L content_b
通过计算所述生成神经网络的内容损失值可以保证其输出的转换图像与输入图像保持一致性,使得输出图像在经过处理后在具有转换特征的基础上保留足够的原始信息。本公开实施例中利用生成神经网络结合内容损失函数对生成神经网络进行训练,保证转换图像与输入图像具有一致性,并且系统简单,易于训练。
本公开实施例中,第一损失计算函数还可包括生成器的损失函数:
L_G=E x~Pdata(x)[log D(x)]+E z~Pz(z)[1-log D(G(z))]
其中,Pdata为使得鉴别神经网络输出为1的图像集合。Pz为生成神经网络的输入图像集合。D为鉴别神经网络,G为生成神经网络。第一损失计算器可根据L_G计算生成神经网络的对抗损失值。
本公开实施例中,第一损失计算函数还可以包括参数正则化损失函数L L1。 在神经网络中,卷积核和偏置都是需要通过训练得到的参数。卷积核决定了对输入的图像进行怎样的处理,偏置则决定了该卷积核的输出是否输入到下一层。因此,在神经网络中,偏置可形象地比喻为“开关”,决定了该卷积核是“打开”还是“关闭”。针对不同的输入图像,网络打开或关闭不同的卷积核以达到不同的处理效果。
神经网络中所有卷积核绝对值的均值为:
Figure PCTCN2018114848-appb-000020
其中,C w为网络中卷积核的数量。神经网络中所有偏置绝对值的均值:
Figure PCTCN2018114848-appb-000021
其中C b为网络中偏置的数量。则参数正则化损失函数为:
Figure PCTCN2018114848-appb-000022
其中ε为一个极小的正数,用于保证分母不为0。
本公开实施例中希望卷积层中的偏置与卷积核相比具有更大的绝对值,以使得更有效的发挥偏置的“开关”的作用。训练过程中,第一损失计算器根据L L1计算生成神经网络的参数正则化损失值。
综上所述,在一些实施例中,生成神经网络的总损失可以为:
L total=αL content+βL_G+χL DVST+δR
其中,R为生成神经网络的标准化损失值,α、β、χ和δ分别为总损失中内容损失值、对抗损失值、风格损失值和标准化损失值所占的权重,本公开实施例中采用上述参数正则化损失值表示标准化损失值,也可采用其他类型的正则化损失。
在训练生成神经网络过程中使用的鉴别神经网络与所述生成神经网络一起构成一组对抗网络。所述鉴别神经网络利用若干个卷积层和池化层提取输入的图像的内容特征,并减少特征图像的尺寸,用于下一层卷积层进一步提取图像特征。再利用全连接层和激活层处理图像特征,最终输出作为输入图像是否具有转换特征的鉴别标签的标量值。所述全连接层具有和卷积神经网络相同的 结构,只是用标量值替换了卷积核。所述激活层通常为RELU或者sigmoid函数。在本公开实施例中,鉴别神经网络的具体结构如图10所示,其中激活层可为sigmoid函数,最终输出鉴别标签,然而本公开不限于此。
在对抗网络中,生成神经网络将输入的图像从效果A转换成具有效果B的输出图像,所述鉴别神经网络判断输出图像是否具有效果B的特征,并输出鉴别标签。例如,若判断输出图像具有效果B的特征则输出接近于“1”,若判断输出图像不具有效果B的特征则输出“0”。通过训练,生成神经网络逐渐生成使得鉴别神经网络输出“1”的输出图像,鉴别神经网络逐渐可以更准确的判断输出图像是否具有转换特征,两者同步训练,互相对抗,以获得更优的参数。
训练所述鉴别神经网络包括:利用生成神经网络根据输入图像和第一噪声图像输出第一输出图像作为第一样本图像Ra;从数据集获取样本图像Rc;所述第一样本图像Ra为利用生成神经网络从效果A转换为效果B得到的输出图像,相当于“假”样本。从数据集获取样本图像Rc为具有效果B的“真”样本。利用鉴别神经网络对所述Ra和Rc进行是否具有效果B的判断,输出鉴别标签。应了解,所述第二样本图像Rc天然带有“真”标签,即具有转换特征,而所述第一样本图像Ra天然带有“假”标签,其经过生成神经网络的图像处理而获得转换特征。根据鉴别标签训练所述鉴别神经网络。使其逐渐能更准确的判断输入的图像是否具有相应的图像特征。
本公开实施例提供的图像处理方法中,训练所述超分辨率神经网络的流程图如图11所示,下面将结合图11对训练所述超分辨率神经网络进行详细描述。
如图11所示,在步骤S1110,获取输入图像和第一噪声图像,其中,所述输入图像具有三个通道,在本公开实施例中为RGB三通道,然而本公开不限于此。所述第一噪声图像具有例如高斯噪声的随机噪声,用于在图像转换过程中生成图像细节信息。在步骤S1120,生成神经网络根据获取的输入图像和第一噪声图像对所述输入图像进行图像转换处理,输出第一输出图像,所述第一输出图像作为第一样本图像R1,用于训练所述超分辨率神经网络。
在步骤S1130,获取超分辨训练噪声图像N3,在步骤S1140,从第一样本图像R1提取低分辨率图像作为超分辨训练图像I2。所述超分辨训练图像I2 的分辨率低于所述第一样本图像R1的分辨率,并且包含所述第一样本图像R1的内容特征。应了解,从所述超分辨训练图像I2能够恢复出所述第一样本图像R1。
然后,在步骤S1150,利用超分辨率神经网络根据超分辨训练图像I2和超分辨训练噪声图像N3输出第二样本图像R2。所述第二样本图像R2的分辨率高于所述超分辨训练图像I2的分辨率,并且可以等于所述第一样本图像R1的分辨率。在该步骤中,通过将超分辨训练噪声图像N3和所述超分辨训练图像I2一起输入到超分辨率神经网络进行训练,用于生成输出图像中的细节信息,且由于每次输入的噪声图像各不相同,则可以在每次图像处理过程中产生有变化的图像细节,使得输出的超分辨率图像具有多样性。
在步骤S1160,根据第一样本图像R1和第二样本图像R2,通过减少所述超分辨率神经网络的成本函数优化超分辨率神经网络的参数。
在本公开实施例中,所述超分辨率神经网络的成本函数可以基于第二鉴别神经网络的鉴别标签。所述鉴别标签的生成过程包括:将第一样本图像R1和第二样本图像R2输入至第二鉴别神经网络,该第二鉴别神经网络用于评估提升了分辨率的第二样本图像R2的图像质量,并输出指示所述样本图像是超分辨率神经网络的输出图像(第二样本图像R2)还是从其中提取了低分辨率图像的原始图像(第一样本图像R1)的鉴别标签。在公开实施例中,第二鉴别神经网络可以接收具有RGB三通道的输入图像(在本公开实施例中为第二样本图像R2),并输出例如,-1或1的数字。如果输出为1,则第二鉴别神经网络认为输入的图像对应于原始的高分辨率内容(在本公开实施例中为第一样本图像R1)。如果输出为-1,则第二鉴别神经网络认为第二样本图像R2是经由生成神经网络提升分辨率后的输出图像。通过训练超分辨率神经网络以最大化第二鉴别神经网络的鉴别标签,逐渐使得该鉴别标签尽量真实。同时对第二鉴别神经网络进行训练,以准确地区分原始高分辨率图像和提升分辨率后的图像。所述超分辨率神经网络与所述第二鉴别神经网络构成一组对抗网络。两组网络交替地进行训练,从而相互竞争,并获得最佳参数。
所述第二鉴别神经网络的具体结构如图12所示,包括至少降级子模块,每个降级子模块包括依次连接的至少一个卷积层和一个降级TMUX层。而且, 在多个降级子模块之后,所述第二鉴别神经网络还可以包括至少一个卷积层。例如,所述每个降级子模块具体包括依次连接的两个卷积层CO和TMUX层。所述TMUX层进行与超分辨率神经网络中的MUX层相对应的降级过程,从而将输入到第二鉴别神经网络的根据第二样本图像生成的输出图像降级为与第二样本图像相同分辨率的低分辨率图像。所述TMUX层对输入的图像进行降级的过程与所述MUX层的提升过程相反。第二鉴别神经网络利用卷积层输出类似于其他图像质量度量(如,结构相似性(structural similarity index,SSIM))的图像“IQ地图(Map)”。通过对“IQ地图”中的所有像素平均后获得平均值作为单个数字的“鉴别标签”并输出所述鉴别标签。
本公开实施例提供一种用于实现图像转换图像处理方法,所述图像处理方法基于生成神经网络、超分辨率神经网络和内容感知进行图像转换处理。通过在输入中加入噪声图像以生成转换图像的细节信息。利用内容特征损失函数来训练所述生成神经网络,保证转换后的输出图像与输入图像具有内容一致性,利用处理结果之间的风格差异损失函数训练生成神经网络,保证输出结果之间的多样性,使得系统简单,易于训练。在此基础上,利用超分辨率神经网络提升生成神经输出的转换图像的分辨率,并且,通过减少所述超分辨率神经网络的成本函数来优化超分辨率神经网络的参数。由此利用经过训练的生成神经网络和超分辨率神经网络可以获得高分辨率的转换图像,该转换图像既包括转换特征,又能满足产品对于图像分辨率的需求。
本公开实施例还提供了一种图像处理装置,如图13所示,包括生成神经网络模块1302,用于根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像,其中,所述第一噪声图像包括N个通道,N为大于等于1的正整数。所述生成神经网络模块可以包括上述生成神经网络。本公开实施例提供的图像处理装置利用生成神经网络模块对输入图像和噪声图像进行图像转换处理,以输出转换后的输出图像。所述图像处理装置还包括超分辨率神经网络模块1304,图像处理装置利用超分辨率神经网络模块对第一输出图像和第二噪声图像进行高分辨率转换处理,输出第二输出图像,所述第二噪声图像包括M个通道,M为大于等于1的正整数,其中,所述第一噪声图像和所述第二噪声图像不相同。
所述输入图像包括第一颜色通道、第二颜色通道和第三颜色通道,在本公开实施例中为RGB通道。所述生成神经网络模块的输入包括第一噪声图像通道以及输入图像的RGB通道。所述生成神经网络模块的输出为第一输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道,在本公开实施例中为RGB通道。
所述生成神经网络模块包括一个或多个下采样模块、一个或多个残差模块和一个或多个上采样模块。其中:所述下采样模块包括依次连接的卷积层、下采样层和实例标准化层;所述残差模块包括依次连接的卷积层和实例标准化层;所述上采样模块包括依次连接的上采样层、实例标准化层和卷积层,所述上采样模块的个数与所述下采样模块的个数相等。
所述超分辨率神经网络模块的输入包括第二噪声图像通道以及第一输出图像的RGB通道。所述超分辨率神经网络模块的输出为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像,在本公开实施例中为RGB通道。
所述超分辨率神经网络模块包括依次连接的提升模块和变换模块:所述提升模块用于对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像,本公开实施例中为YUV通道。所述变换模块用于将提升模块输出的第一中间图像变换为包括RGB通道的第二输出图像。其中:所述提升模块包括第一子网络、第二子网络和第三子网络,并且:每个子网络的输入均为第一输出图像和第二噪声图像;每个子网络具有相同的结构,包含相同个数的卷积层和提升层。
在本公开实施例中,所述生成神经网络模块利用第一训练图像I1和第一训练噪声图像N1进行图像转换,以输出转换后的第一训练输出图像Ra,并且利用第一训练图像I1和第二训练噪声图像N2进行图像转换,以输出转换后的第一训练输出图像Rb。
所述训练神经网络模块基于第一训练图像I1、第一训练输出图像Ra和第二训练输出图像Rb对所述生成神经网络模块进行训练。该训练旨在根据生成神经网络模块的处理结果,优化该网络中的参数,使得其可以完成转换目标。
所述训练神经网络模块包括:鉴别神经网络模块,用于输出所述第一训练 输出图像Ra是否具有转换特征的鉴别标签;第一损失计算单元,用于根据所述第一训练图像I 1、第一训练输出图像Ra、第二训练输出图像Rb和鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络模块的参数。例如,所述参数包括所生成神经网络模块中卷积层的卷积核和偏置。在本公开的实施例中,可以将第一训练输出图像Ra与第二训练输出图像Rb一起输入至鉴别神经网络模块,分别输出鉴别标签,一起用于训练所述生成神经网络。
经过训练的所述生成神经网络模块,具有优化后的参数,可以用于实现目标图像转换处理。本公开中利用第一损失计算单元,结合输入图像、第一输出图像和第二输出图像的内容特征进行训练,系统简化,更易于训练。其中,利用结果多样性损失函数保证由生成神经网络模块输出的转换图像间具有多样性。利用内容损失函数保证输出的转换图像与输入图像具有一致性,即转换后的图像既具有转换特征,又包括足够的原始图像信息,避免在图像处理过程中丢失大量的原图信息。
根据本公开实施例,所述训练神经网络模块还包括:第二鉴别神经网络模块,用于根据所述第一样本图像R1和第二样本图像R2输出指示第二样本图像R2是否具有对应于第一样本图像的内容特征的鉴别标签。所述训练神经网络模块进一步根据所述第二鉴别神经网络的输出的鉴别标签来训练所述超分辨率神经模块。例如,所述优化器通过减少所述超分辨率神经网络模块的成本函数优化所述超分辨率神经网络模块的参数。
在本公开实施例中,所述生成神经网络根据输入图像和第一噪声图像生成第一输出图像,所述第一输出图像作为第一样本图像,具有转换特征,并包括RGB通道。所述超分辨率神经网络模块还根据超分辨训练图像和获取的超分辨训练噪声图像输出第二样本图像,其中,所述超分辨训练图像是从第一样本图像提取的低分辨率图像。所述训练神经网络模块基于第一样本图像和第二样本图像,通过减少所述超分辨率神经网络模块的成本函数优化所述超分辨率神经网络模块的参数,所述参数可以包括所述超分辨率神经网络模块中卷积层的卷积核和偏置。
本公开实施例提供的用于实现图像转换图像处理装置基于生成神经网络、超分辨率神经网络和内容感知进行图像转换处理,其包括生成神经网络模块和 超分辨率网络模块。通过在输入中加入噪声图像以生成转换图像的细节信息。利用内容特征损失函数来训练所述生成神经网络模块,保证转换后的输出图像与输入图像具有内容一致性,利用处理结果之间的风格差异损失函数训练生成神经网络模块,保证输出结果之间的多样性,使得系统简单,易于训练。在此基础上,利用超分辨率神经网络模块提升生成神经输出的转换图像的分辨率,并且,通过减少所述超分辨率神经网络模块的成本函数来优化超分辨率神经网络模块的参数。由此利用经过训练的生成神经网络模块和超分辨率神经网络模块可以获得高分辨率的转换图像,该转换图像既包括转换特征,又能满足产品对于图像分辨率的需求。
本公开实施例还提供了一种图像处理设备,其结构框图如图14所示,包括处理器1402和存储器1404。应当注意,图14中所示的图像处理设备的的结构只是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以具有其他组件。
在本公开的实施例中,处理器1402和存储器1404之间可以直接或间接地互相通信。处理器1402和存储器1404等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
处理器1402可以控制图像处理装置中的其它组件以执行期望的功能。处理器1402可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。由于GPU具有强大的图像处理能力。
存储器1404可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非 易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
在存储器1404上可以存储一个或多个计算机可读代码或指令,处理器1402可以运行所述计算机指令,以执行上述图像处理方法或实现上述图像处理装置。关于所述图像处理方法和所述图像处理装置的详细描述可以参考本说明书中关于图像处理方法和处理装置的相关描述,在此不再赘述。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如图像数据集以及应用程序使用和/或产生的各种数据(诸如训练数据)等。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种图像处理方法,包括:
    利用生成神经网络根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像;
    利用超分辨率神经网络根据第一输出图像和第二噪声图像对所述第一输出图像进行高分辨率转换处理,以输出第二输出图像。
  2. 根据权利要求1所述的图像处理方法,其中:
    所述输入图像包括第一颜色通道、第二颜色通道和第三颜色通道;
    所述第一噪声图像包括N个通道,N为大于等于1的正整数;
    所述生成神经网络的输入包括第一噪声图像通道以及输入图像的第一颜色通道、第二颜色通道和第三颜色通道;
    所述生成神经网络的输出为第一输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
  3. 根据权利要求1所述的图像处理方法,其中,所述生成神经网络包括一个或多个下采样模块、一个或多个残差模块和一个或多个上采样模块,其中:
    所述下采样模块包括依次连接的卷积层、下采样层和实例标准化层;
    所述残差模块包括依次连接的卷积层和实例标准化层;
    所述上采样模块包括依次连接的上采样层、实例标准化层和卷积层,其中:所述上采样模块的个数与所述下采样模块的个数相等。
  4. 根据权利要求1所述的图像处理方法,其中,所述第二噪声图像包括M个通道,M为大于等于1的正整数;
    所述超分辨率神经网络的输入包括第二噪声图像通道以及第一输出图像的第一颜色通道、第二颜色通道和第三颜色通道;
    所述超分辨率神经网络的输出为第二输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
  5. 根据权利要求1所述的图像处理方法,其中,所述超分辨率神经网络包括依次连接的提升模块和变换模块,并且利用超分辨率神经网络进行高分辨率转换处理包括:
    利用所述提升模块对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像;
    利用所述变换模块将提升模块输出的第一中间图像变换为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像。
  6. 根据权利要求5所述的图像处理方法,其中,所述提升模块包括第一子网络、第二子网络和第三子网络,其中:
    每个子网络的输入均为第一输出图像和第二噪声图像;
    每个子网络具有相同的结构,包含相同个数的卷积层和提升层。
  7. 根据权利要求1所述的图像处理方法,还包括:
    利用所述生成神经网络根据第一训练图像和第一训练噪声图像,生成第一训练输出图像;
    利用所述生成神经网络根据所述第一训练图像和第二训练噪声图像,生成第二训练输出图像,其中,所述第二训练噪声图像不同于所述第一训练噪声图像;
    基于第一训练图像、第一训练输出图像和第二训练输出图像训练所述生成神经网络。
  8. 根据权利要求7所述的图像处理方法,其中,训练所述生成神经网络包括:
    将所述第一训练输出图像输入至鉴别神经网络,输出所述第一训练输出图像是否具有转换特征的鉴别标签;将所述第二训练输出图像输入至鉴别神经网络,输出所述第二训练输出图像是否具有转换特征的鉴别标签;利用第一损失计算单元根据所述第一训练图像、第一训练输出图像、第二训练输出图像和相应鉴别标签计算所述生成神经网络的损失值,优化所述生成神经网络的参数,其中:
    所述第一损失计算单元包括分析网络、第一损失计算器和优化器,并且利用所述第一损失计算单元计算所述生成神经网络的损失值包括:
    利用分析网络输出所述第一训练图像、第一训练输出图像、第二训练输出图像的内容特征以及利用分析网络输出所述第一训练输出图像和第二训练输出图像的风格特征;
    利用第一损失计算器根据分析网络提取的内容特征、风格特征和所述第一训练输出图像和所述第二训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络的损失值;
    利用优化器根据所述生成神经网络的损失值优化所述生成神经网络的参数。
  9. 根据权利要求8所述的图像处理方法,其中,
    所述第一损失函数包括风格差异损失函数,并且,计算所述生成神经网络的损失值包括:利用所述第一损失计算器根据第一训练输出图像的风格特征和第二训练输出图像的风格特征按照风格差异损失函数计算所述生成神经网络的风格损失值;
    所述第一损失函数还包括内容损失函数,并且,计算所述生成神经网络的损失值包括:根据第一训练图像、第一训练输出图像和第二训练输出图像的内容特征按照内容损失函数计算所述生成神经网络的内容损失值。
  10. 根据权利要求1所述的图像处理方法,还包括:
    从第一样本图像提取低分辨率图像作为超分辨训练图像,所述超分辨训练图像的分辨率低于所述第一样本图像的分辨率;
    利用超分辨率神经网络根据超分辨训练图像和超分辨训练噪声图像输出第二样本图像,所述第二样本图像的分辨率等于所述第一样本图像的分辨率;
    根据第一样本图像和第二样本图像,通过减少所述超分辨率神经网络的成本函数优化超分辨率神经网络的参数。
  11. 一种图像处理装置,包括:
    生成神经网络模块,被配置为根据输入图像和第一噪声图像对所述输入图像进行图像转换处理,以输出转换后的第一输出图像;
    超分辨率神经网络模块,被配置为根据第一输出图像和第二噪声图像对所述第一输出图像进行高分辨率转换处理,以输出第二输出图像。
  12. 根据权利要求11所述的图像处理装置,其中:
    所述输入图像包括第一颜色通道、第二颜色通道和第三颜色通道;
    所述生成神经网络模块的输入包括第一噪声图像通道以及输入图像的第一颜色通道、第二颜色通道和第三颜色通道;
    所述生成神经网络模块的输出为第一输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
  13. 根据权利要求11所述的图像处理装置,其中:
    所述生成神经网络模块包括一个或多个下采样模块、一个或多个残差模块和一个或多个上采样模块,其中:
    所述下采样模块包括依次连接的卷积层、下采样层和实例标准化层;
    所述残差模块包括依次连接的卷积层和实例标准化层;
    所述上采样模块包括依次连接的上采样层、实例标准化层和卷积层,所述上采样模块的个数与所述下采样模块的个数相等。
  14. 根据权利要求11所述的图像处理装置,其中,所述超分辨率神经网络模块的输入包括第二噪声图像通道以及第一输出图像的第一颜色通道、第二颜色通道和第三颜色通道;
    所述超分辨率神经网络模块的输出为第二输出图像,其包括第一颜色通道、第二颜色通道和第三颜色通道。
  15. 根据权利要求11所述的图像处理装置,其中,所述超分辨率神经网络模块包括依次连接的提升模块和变换模块:
    所述提升模块被配置为对第一输出图像和第二噪声图像进行上采样处理,并输出包括亮度通道、第一色差通道和第二色差通道的第一中间图像;
    所述变换模块被配置为将提升模块输出的第一中间图像变换为包括第一颜色通道、第二颜色通道和第三颜色通道的第二输出图像,其中:
    所述提升模块包括第一子网络、第二子网络和第三子网络。
  16. 根据权利要求15所述的图像处理装置,其中:
    所述第一子网络、第二子网络和第三子网络的输入为第一输出图像和第二噪声图像,并且,输出图像具有3个通道,包括亮度通道、第一色差通道和第二色差通道;
    所述第一子网络、第二子网络和第三子网络具有相同的结构,并且每个包括至少一个提升子模块,每个提升子模块包括依次连接的一个或多个卷积层和一个提升层。
  17. 根据权利要求11所述的图像处理装置,还包括:
    训练神经网络模块,被配置为根据所述生成神经网络模块的输出图像来训练所述生成神经网络模块,
    所述生成神经网络模块还根据第一训练图像和第一训练噪声图像来输出转换后的第一训练输出图像;所述生成神经网络模块还根据第一训练图像和获取的第二训练噪声图像来输出转换后的第二训练输出图像,其中,所述第二训练噪声图像不同于所述第一训练噪声图像;
    所述训练神经网络模块基于第一训练图像、第一训练输出图像和第二训练输出图像训练所述生成神经网络模块,其中所述训练神经网络模块包括:
    鉴别神经网络模块,被配置为输出所述第一训练输出图像和所述第二训练输出图像是否具有转换特征的鉴别标签;
    第一损失计算单元,被配置为根据所述第一训练图像、第一训练输出图像、第二训练输出图像和相应的鉴别标签计算所述生成神经网络模块的损失值,优化所述生成神经网络模块的参数,其中所述第一损失计算单元包括:
    分析网络,被配置为输出所述第一训练图像、第一训练输出图像、第二训练输出图像的内容特征;第一损失计算器,被配置为根据分析网络提取的内容特征、风格特征和所述第一训练输出图像和所述第二训练输出图像的鉴别标签按照第一损失函数计算所述生成神经网络模块的损失值;优化器,被配置为根据所述生成神经网络模块的损失值优化所述生成神经网络模块的参数。
  18. 根据权利要求17所述的图像处理装置,其中,
    所述第一损失函数包括风格差异损失函数,用于根据第一训练输出图像的风格特征和第二训练输出图像的风格特征计算所述生成神经网络模块的风格损失值;
    所述第一损失函数还包括内容损失函数,用于根据第一训练图像、第一训练输出图像和第二训练输出图像的内容特征计算所述生成神经网络模块的内容损失值。
  19. 根据权利要求17所述的图像处理装置,其中,训练神经网络模块还被配置为根据超分辨率神经网络的输出来训练超分辨率神经网络模块,
    所述超分辨率神经网络模块还根据超分辨训练图像和获取的超分辨训练噪声图像输出第二样本图像,其中,所述超分辨训练图像是从第一样本图像提 取的低分辨率图像,
    其中,所述训练神经网络模块还包括:
    第二鉴别神经网络模块,被配置为基于第一样本图像和第二样本图像输出鉴别标签;
    其中,所述优化器通过减少所述超分辨率神经网络模块的成本函数优化所述超分辨率神经网络模块的参数。
  20. 一种图像处理设备,包括:
    一个或多个处理器;
    一个或多个存储器,
    其中所述存储器存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器运行时执行权利要求1-10所述的图像处理方法,或实现权利要求11-19所述的图像处理装置。
PCT/CN2018/114848 2017-11-09 2018-11-09 图像处理方法、处理装置和处理设备 WO2019091459A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2020526028A JP7438108B2 (ja) 2017-11-09 2018-11-09 画像処理方法、処理装置及び処理デバイス
EP18876502.8A EP3709255A4 (en) 2017-11-09 2018-11-09 IMAGE PROCESSING PROCESS, PROCESSING APPARATUS AND PROCESSING DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711100015.5A CN107767343B (zh) 2017-11-09 2017-11-09 图像处理方法、处理装置和处理设备
CN201711100015.5 2017-11-09

Publications (1)

Publication Number Publication Date
WO2019091459A1 true WO2019091459A1 (zh) 2019-05-16

Family

ID=61272242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/114848 WO2019091459A1 (zh) 2017-11-09 2018-11-09 图像处理方法、处理装置和处理设备

Country Status (5)

Country Link
US (1) US10430683B2 (zh)
EP (1) EP3709255A4 (zh)
JP (1) JP7438108B2 (zh)
CN (1) CN107767343B (zh)
WO (1) WO2019091459A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11948273B2 (en) 2020-09-30 2024-04-02 Canon Kabushiki Kaisha Image processing method, storage medium, image processing apparatus, trained model producing method, learning method, learning apparatus, and image processing system

Families Citing this family (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767343B (zh) * 2017-11-09 2021-08-31 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
WO2019123544A1 (ja) * 2017-12-19 2019-06-27 オリンパス株式会社 データ処理方法およびデータ処理装置
CN108288251A (zh) * 2018-02-11 2018-07-17 深圳创维-Rgb电子有限公司 图像超分辨率方法、装置及计算机可读存储介质
US10599951B2 (en) * 2018-03-28 2020-03-24 Kla-Tencor Corp. Training a neural network for defect detection in low resolution images
CN111937392B (zh) * 2018-04-17 2024-05-10 联发科技股份有限公司 视频编解码的神经网络方法和装置
CN110390234B (zh) * 2018-04-23 2023-10-13 佳能株式会社 图像处理装置和方法及存储介质
CN110458754B (zh) * 2018-05-07 2021-12-03 Tcl科技集团股份有限公司 图像生成方法及终端设备
CN108595916B (zh) * 2018-05-10 2020-10-20 浙江工业大学 基于生成对抗网络的基因表达全谱推断方法
CN108875787B (zh) * 2018-05-23 2020-07-14 北京市商汤科技开发有限公司 一种图像识别方法及装置、计算机设备和存储介质
CN109801214B (zh) * 2018-05-29 2023-08-29 京东方科技集团股份有限公司 图像重构装置及方法、设备、计算机可读存储介质
CN110555808B (zh) * 2018-05-31 2022-05-31 杭州海康威视数字技术股份有限公司 一种图像处理方法、装置、设备及机器可读存储介质
KR102096388B1 (ko) * 2018-06-05 2020-04-06 네이버 주식회사 모바일 환경에서 실시간 추론이 가능한 dnn 구성을 위한 최적화 기법
KR102543650B1 (ko) * 2018-07-30 2023-06-15 주식회사 엔씨소프트 모션 합성 장치 및 모션 합성 방법
CN109064428B (zh) * 2018-08-01 2021-04-13 Oppo广东移动通信有限公司 一种图像去噪处理方法、终端设备及计算机可读存储介质
US10949951B2 (en) * 2018-08-23 2021-03-16 General Electric Company Patient-specific deep learning image denoising methods and systems
WO2020053837A1 (en) * 2018-09-13 2020-03-19 Spectrum Optix Inc. Photographic underexposure correction using a neural network
US10949964B2 (en) * 2018-09-21 2021-03-16 Kla Corporation Super-resolution defect review image generation through generative adversarial networks
CN110956575B (zh) 2018-09-26 2022-04-12 京东方科技集团股份有限公司 转变图像风格的方法和装置、卷积神经网络处理器
CN112771578B (zh) * 2018-09-27 2024-05-24 渊慧科技有限公司 使用细分缩放和深度上缩放的图像生成
US11055819B1 (en) * 2018-09-27 2021-07-06 Amazon Technologies, Inc. DualPath Deep BackProjection Network for super-resolution
EP3857447A4 (en) 2018-09-30 2022-06-29 BOE Technology Group Co., Ltd. Apparatus and method for image processing, and system for training neural network
CN109345456B (zh) * 2018-09-30 2021-01-19 京东方科技集团股份有限公司 生成对抗网络训练方法、图像处理方法、设备及存储介质
WO2020073758A1 (en) * 2018-10-10 2020-04-16 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for training machine learning modle, apparatus for video style transfer
CN109191382B (zh) * 2018-10-18 2023-12-05 京东方科技集团股份有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
KR20200063289A (ko) * 2018-11-16 2020-06-05 삼성전자주식회사 영상 처리 장치 및 그 동작방법
US11388432B2 (en) * 2018-12-10 2022-07-12 Qualcomm Incorporated Motion estimation through input perturbation
CN109636727B (zh) * 2018-12-17 2022-11-15 辽宁工程技术大学 一种超分辨率重建影像空间分辨率评价方法
CN109697470B (zh) * 2018-12-27 2021-02-09 百度在线网络技术(北京)有限公司 一种识别模型的训练方法、装置、电子设备及存储介质
CN109766895A (zh) 2019-01-03 2019-05-17 京东方科技集团股份有限公司 用于图像风格迁移的卷积神经网络的训练方法和图像风格迁移方法
CN109816612A (zh) * 2019-02-18 2019-05-28 京东方科技集团股份有限公司 图像增强方法和装置、计算机可读存储介质
CN109889800B (zh) * 2019-02-28 2021-09-10 深圳市商汤科技有限公司 图像增强方法和装置、电子设备、存储介质
JP7504120B2 (ja) * 2019-03-18 2024-06-21 グーグル エルエルシー 高分解能なリアルタイムでのアーティスティックスタイル転送パイプライン
CN111724448A (zh) * 2019-03-18 2020-09-29 华为技术有限公司 一种图像超分辨重建方法、装置和终端设备
CN111767979B (zh) * 2019-04-02 2024-04-23 京东方科技集团股份有限公司 神经网络的训练方法、图像处理方法、图像处理装置
JP7269778B2 (ja) * 2019-04-04 2023-05-09 富士フイルムヘルスケア株式会社 超音波撮像装置、および、画像処理装置
US10489936B1 (en) * 2019-04-29 2019-11-26 Deep Render Ltd. System and method for lossy image and video compression utilizing a metanetwork
KR102266903B1 (ko) * 2019-05-16 2021-06-18 삼성전자주식회사 영상 처리 장치 및 그 동작방법
CN110163801B (zh) * 2019-05-17 2021-07-20 深圳先进技术研究院 一种图像超分辨和着色方法、系统及电子设备
CN110120024B (zh) 2019-05-20 2021-08-17 百度在线网络技术(北京)有限公司 图像处理的方法、装置、设备和存储介质
CN110458794B (zh) * 2019-05-23 2023-05-12 上海离原工程自动化有限公司 用于轨道列车的配件质量检测方法及装置
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
JP7312026B2 (ja) * 2019-06-12 2023-07-20 キヤノン株式会社 画像処理装置、画像処理方法およびプログラム
JP7075674B2 (ja) * 2019-06-17 2022-05-26 株式会社アクセル 学習方法、コンピュータプログラム、分類器、生成器、及び処理システム
US11842283B2 (en) 2019-06-17 2023-12-12 Axell Corporation Learning method, computer program, classifier, generator, and processing system
CN110458906B (zh) * 2019-06-26 2024-03-15 广州大鱼创福科技有限公司 一种基于深度颜色迁移的医学图像着色方法
CN110363288B (zh) * 2019-07-15 2023-05-09 上海点积实业有限公司 一种神经网络的输入图像生成方法和系统
CN110378842A (zh) * 2019-07-25 2019-10-25 厦门大学 一种图像纹理滤波方法、终端设备及存储介质
CN110428382B (zh) * 2019-08-07 2023-04-18 杭州微帧信息科技有限公司 一种用于移动终端的高效视频增强方法、装置和存储介质
CN110533594B (zh) * 2019-08-30 2023-04-07 Oppo广东移动通信有限公司 模型训练方法、图像重建方法、存储介质及相关设备
US11023783B2 (en) * 2019-09-11 2021-06-01 International Business Machines Corporation Network architecture search with global optimization
US10943353B1 (en) 2019-09-11 2021-03-09 International Business Machines Corporation Handling untrainable conditions in a network architecture search
CN112529775A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 一种图像处理的方法和装置
CN111091532B (zh) * 2019-10-30 2023-07-18 中国四维测绘技术有限公司 一种基于多层感知机的遥感影像色彩评价方法和系统
US20220092735A1 (en) * 2019-11-21 2022-03-24 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN114365180A (zh) 2019-11-29 2022-04-15 奥林巴斯株式会社 图像处理方法、学习装置以及图像处理装置
CN111192206A (zh) * 2019-12-03 2020-05-22 河海大学 一种提高图像清晰度的方法
CN111179166B (zh) * 2019-12-09 2023-06-09 平安国际智慧城市科技股份有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN111105375B (zh) * 2019-12-17 2023-08-22 北京金山云网络技术有限公司 图像生成方法及其模型训练方法、装置及电子设备
CN111291866B (zh) * 2020-01-22 2024-03-26 上海商汤临港智能科技有限公司 神经网络的生成、图像处理、智能行驶控制方法及装置
US12014569B2 (en) * 2020-01-30 2024-06-18 The Regents Of The University Of California Synthetic human fingerprints
CN111402153B (zh) * 2020-03-10 2023-06-13 上海富瀚微电子股份有限公司 一种图像处理方法及系统
CN111402142A (zh) * 2020-03-25 2020-07-10 中国计量大学 基于深度递归卷积网络的单张图像超分辨率重建方法
JP7458857B2 (ja) * 2020-04-01 2024-04-01 キヤノン株式会社 画像処理装置、画像処理方法及びプログラム
CN111710011B (zh) * 2020-06-10 2021-06-25 广州梦映动漫网络科技有限公司 一种漫画生成方法、系统、电子设备及介质
US11436703B2 (en) * 2020-06-12 2022-09-06 Samsung Electronics Co., Ltd. Method and apparatus for adaptive artificial intelligence downscaling for upscaling during video telephone call
CN111899185A (zh) * 2020-06-18 2020-11-06 深圳先进技术研究院 图像降噪模型的训练方法、装置、电子设备和存储介质
CN111951177B (zh) * 2020-07-07 2022-10-11 浙江大学 一种基于图像超分辨损失函数的红外图像细节增强方法
KR20230038144A (ko) * 2020-07-14 2023-03-17 광동 오포 모바일 텔레커뮤니케이션즈 코포레이션 리미티드 비디오 처리 방법, 장치, 기기, 디코더, 시스템 및 저장 매체
CN114173137A (zh) * 2020-09-10 2022-03-11 北京金山云网络技术有限公司 视频编码方法、装置及电子设备
US20220108423A1 (en) * 2020-10-02 2022-04-07 Google Llc Conditional Axial Transformer Layers for High-Fidelity Image Transformation
CN112232485B (zh) * 2020-10-15 2023-03-24 中科人工智能创新技术研究院(青岛)有限公司 漫画风格图像转换模型的训练方法、图像生成方法及装置
CN112330053A (zh) * 2020-11-23 2021-02-05 香港中文大学(深圳) 数据感知方法及装置
KR20220129995A (ko) 2021-03-17 2022-09-26 주식회사 에스아이에이 딥러닝 기반 초해상도 이미징 방법
KR102337412B1 (ko) * 2021-03-17 2021-12-09 주식회사 에스아이에이 딥러닝 기반 초해상도 이미징 방법
CN113327219B (zh) * 2021-06-21 2022-01-28 易成功(厦门)信息科技有限公司 基于多源数据融合的图像处理方法与系统
JPWO2023112172A1 (zh) * 2021-12-14 2023-06-22
KR102406287B1 (ko) * 2021-12-31 2022-06-08 주식회사 에스아이에이 협력 학습을 이용한 초해상도 이미징 방법
CN114528920B (zh) * 2022-01-19 2023-08-18 西北大学 基于生成对抗网络的兵马俑色彩复原方法
CN114723608B (zh) * 2022-04-14 2023-04-07 西安电子科技大学 基于流体粒子网络的图像超分辨率重建方法
KR20230156585A (ko) * 2022-05-06 2023-11-14 에스케이텔레콤 주식회사 초해상도 신경망을 기반으로 한 화질개선 방법 및 장치
CN116016064B (zh) * 2023-01-12 2024-06-28 西安电子科技大学 基于u型卷积去噪自编码器的通信信号降噪方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910161A (zh) * 2017-01-24 2017-06-30 华南理工大学 一种基于深度卷积神经网络的单幅图像超分辨率重建方法
CN106991648A (zh) * 2017-04-10 2017-07-28 中国石油大学(华东) 一种基于小波变换和卷积神经网络的图像超分辨率重建方法
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107301372A (zh) * 2017-05-11 2017-10-27 中国科学院西安光学精密机械研究所 基于迁移学习的高光谱图像超分辨率方法
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9123138B2 (en) * 2013-06-18 2015-09-01 Adobe Systems Incorporated Adaptive patch-based image upscaling
EP3166070B1 (en) * 2015-11-09 2021-01-06 InterDigital CE Patent Holdings Method for upscaling noisy images, and apparatus for upscaling noisy images
US10482639B2 (en) * 2017-02-21 2019-11-19 Adobe Inc. Deep high-resolution style synthesis
US10271008B2 (en) * 2017-04-11 2019-04-23 Advanced Micro Devices, Inc. Enhanced resolution video and security via machine learning
CN107180410A (zh) * 2017-04-11 2017-09-19 中国农业大学 一种图像的风格化重建方法及装置
US10839577B2 (en) * 2017-09-08 2020-11-17 Apple Inc. Creating augmented reality self-portraits using machine learning
US10552944B2 (en) * 2017-10-13 2020-02-04 Adobe Inc. Image upscaling with controllable noise reduction using a neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910161A (zh) * 2017-01-24 2017-06-30 华南理工大学 一种基于深度卷积神经网络的单幅图像超分辨率重建方法
CN106991648A (zh) * 2017-04-10 2017-07-28 中国石油大学(华东) 一种基于小波变换和卷积神经网络的图像超分辨率重建方法
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107301372A (zh) * 2017-05-11 2017-10-27 中国科学院西安光学精密机械研究所 基于迁移学习的高光谱图像超分辨率方法
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11948273B2 (en) 2020-09-30 2024-04-02 Canon Kabushiki Kaisha Image processing method, storage medium, image processing apparatus, trained model producing method, learning method, learning apparatus, and image processing system

Also Published As

Publication number Publication date
JP7438108B2 (ja) 2024-02-26
US20190138838A1 (en) 2019-05-09
CN107767343B (zh) 2021-08-31
US10430683B2 (en) 2019-10-01
EP3709255A4 (en) 2021-07-28
CN107767343A (zh) 2018-03-06
JP2021502644A (ja) 2021-01-28
EP3709255A1 (en) 2020-09-16

Similar Documents

Publication Publication Date Title
WO2019091459A1 (zh) 图像处理方法、处理装置和处理设备
WO2019091181A1 (zh) 图像处理方法、处理装置和处理设备
US10706504B2 (en) Image processing methods and image processing devices
US11537873B2 (en) Processing method and system for convolutional neural network, and storage medium
US11551333B2 (en) Image reconstruction method and device
US11954822B2 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
US20210407041A1 (en) Image processing method and device, training method of neural network, and storage medium
CN107169535B (zh) 生物多光谱图像的深度学习分类方法及装置
US11961203B2 (en) Image processing device and operation method therefor
US11281938B2 (en) Image processing method, processing apparatus and processing device
CN111881920B (zh) 一种大分辨率图像的网络适配方法及神经网络训练装置
CN114830168A (zh) 图像重建方法、电子设备和计算机可读存储介质
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN117115685A (zh) 一种基于深度学习的经济作物信息识别方法及系统
US20220164934A1 (en) Image processing method and apparatus, device, video processing method and storage medium
CN112132253B (zh) 3d动作识别方法、装置、计算机可读存储介质及设备
CN110807746B (zh) 一种基于细节内嵌注入卷积神经网络高光谱图像锐化方法
US20240233074A9 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
CN115119052B (zh) 基于注意力机制与空间冗余的图像数据压缩方法及系统
Duan et al. FAColorGAN: a dual-branch generative adversarial network for near-infrared image colorization
CN115147737A (zh) 一种基于ConvNeXt迁移学习的城市树种分类方法
Yu-Dong et al. Image Quality Predictor with Highly Efficient Fully Convolutional Neural Network
CN116664587A (zh) 基于伪彩色增强的混合注意力UNet超声图像分割方法与装置
CN116863164A (zh) 一种视觉位置识别方法、电子设备、介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18876502

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020526028

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018876502

Country of ref document: EP

Effective date: 20200609