WO2020239026A1 - 图像处理方法及装置、神经网络的训练方法、存储介质 - Google Patents

图像处理方法及装置、神经网络的训练方法、存储介质 Download PDF

Info

Publication number
WO2020239026A1
WO2020239026A1 PCT/CN2020/092917 CN2020092917W WO2020239026A1 WO 2020239026 A1 WO2020239026 A1 WO 2020239026A1 CN 2020092917 W CN2020092917 W CN 2020092917W WO 2020239026 A1 WO2020239026 A1 WO 2020239026A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
branch
image
training
processing
Prior art date
Application number
PCT/CN2020/092917
Other languages
English (en)
French (fr)
Inventor
刘瀚文
那彦波
朱丹
张丽杰
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/281,291 priority Critical patent/US11908102B2/en
Publication of WO2020239026A1 publication Critical patent/WO2020239026A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the embodiments of the present disclosure relate to an image processing method and device, a neural network training method, and a storage medium.
  • Deep learning technology based on artificial neural networks has made great progress in fields such as object classification, text processing, recommendation engines, image search, facial recognition, age and speech recognition, human-machine dialogue, and emotional computing.
  • fields such as object classification, text processing, recommendation engines, image search, facial recognition, age and speech recognition, human-machine dialogue, and emotional computing.
  • deep learning technology has made breakthroughs in the field of human-like data perception. Deep learning technology can be used to describe image content, identify objects in complex environments, and Voice recognition in a noisy environment, etc.
  • deep learning technology can also solve the problem of image generation and fusion.
  • At least one embodiment of the present disclosure provides an image processing method, including: acquiring an input image; and using a generation network to process the input image to generate an output image; wherein the generation network includes a first sub-network and at least one The second sub-network, which uses the generation network to process the input image to generate the output image, includes: use the first sub-network to process the input image to obtain multiple first feature maps Use the at least one second sub-network to perform branch processing and weight sharing processing on the multiple first feature maps to obtain multiple second feature maps; and process the multiple second feature maps, To get the output image.
  • each of the second sub-networks includes a first branch network, a second branch network, and a third branch network.
  • the branch processing includes: dividing the input of each of the second sub-networks into a first branch input, a second branch input, and a third branch input; and using the first branch network to process the first branch input, To obtain the first branch output, use the second branch network to process the second branch input to obtain the second branch output, and use the third branch network to process the third branch input to obtain The third branch output; wherein, the at least one second sub-network includes a first second sub-network, the first second sub-network is connected to the first sub-network, and the plurality of first feature maps As input to the first second subnet.
  • each of the second sub-networks further includes a first backbone network
  • the weight sharing processing of each of the second sub-networks includes: The first branch output, the second branch output, and the third branch output are connected to obtain a first intermediate output; and the first intermediate output is processed using the first backbone network to obtain each The output of the second sub-network.
  • the processing of the first branch network includes standard convolution processing
  • the processing of the second branch network includes standard convolution processing
  • the processing of the third branch network The processing includes standard convolution processing
  • the processing of the first backbone network includes standard convolution processing and down-sampling processing.
  • the generation network further includes a third sub-network
  • processing the multiple second feature maps to obtain the output image includes: Multiple second feature maps are processed to obtain multiple third feature maps; the third sub-network is used to process the multiple third feature maps to obtain multiple fourth feature maps; and A plurality of fourth feature maps are synthesized to obtain an output image.
  • the third sub-network includes a second backbone network, a fourth branch network, a fifth branch network, and a sixth branch network, and the third sub-network pair
  • the processing the multiple third feature maps to obtain the multiple fourth feature maps includes: using the second backbone network to process the multiple third feature maps to obtain multiple fifth feature maps.
  • Feature map divide the plurality of fifth feature maps into a fourth branch input, a fifth branch input, and a sixth branch input; and use the fourth branch network to process the fourth branch input to obtain the The fourth feature map corresponding to the fourth branch network, the fifth branch network is used to process the fifth branch input to obtain the fourth feature map corresponding to the fifth branch network, and the sixth branch is used The network processes the sixth branch input to obtain a fourth feature map corresponding to the sixth branch network; wherein the multiple fourth feature maps include fourth feature maps corresponding to the fourth branch network, A fourth feature map corresponding to the fifth branch network and a fourth feature map corresponding to the sixth branch network.
  • the processing of the second backbone network includes upsampling processing
  • the processing of the fourth branch network includes standard convolution processing
  • the processing of the fifth branch network It includes standard convolution processing
  • the processing of the sixth branch network includes standard convolution processing.
  • the processing of the fourth branch network further includes up-sampling processing
  • the processing of the fifth branch network further includes up-sampling processing.
  • Processing also includes up-sampling processing.
  • the processing of the first sub-network includes standard convolution processing
  • the first sub-network is used to process the input image to obtain the multiple
  • the first feature map includes: performing standard convolution processing on the input image using the first sub-network to obtain the multiple first feature maps.
  • the input image has a first color channel, a second color channel, and a third color channel
  • the first sub-network includes a conversion module, a seventh branch network,
  • the eighth branch network, the ninth branch network, and the third backbone network use the first sub-network to process the input image to obtain the plurality of first feature maps, including: using the conversion module to convert the The data information of the first color channel, the second color channel, and the third color channel of the input image is converted into the data information of the first luminance channel, the first color difference channel, and the second color difference channel of the intermediate input image; using the seventh
  • the branch network processes the data information of the first luminance channel of the intermediate input image to obtain the seventh branch output, and uses the eighth branch network to process the data information of the first color difference channel of the intermediate input image
  • use the ninth branch network to process the data information of the second color difference channel of the intermediate input image to obtain the ninth branch output; output the seventh branch, the eighth branch
  • the processing of the seventh branch network includes standard convolution processing and down-sampling processing
  • the processing of the eighth branch network includes standard down-sampling processing
  • the processing of the nine-branch network includes standard down-sampling processing.
  • the processing of the fourth branch network includes standard convolution processing and upsampling processing
  • the processing of the fifth branch network includes standard convolution processing and standard upsampling.
  • the processing of the sixth branch network includes standard convolution processing and standard up-sampling processing.
  • the generation network further includes a dense sub-network, the dense sub-network includes N dense modules, and the plurality of second feature maps are processed to obtain
  • the multiple third feature maps include: using the dense sub-network to process the multiple second feature maps to obtain the multiple third feature maps; wherein, the multiple second feature maps As the input of the first dense module among the N dense modules, the plurality of second feature maps are connected with the output of i-1 dense modules before the i-th dense module among the N dense modules , As the input of the i-th dense module, the multiple second feature maps are connected with the output of each of the dense modules, as the multiple third feature maps, N and i are integers, N ⁇ 2. i ⁇ 2 and i ⁇ N.
  • the processing of each dense module includes dimensionality reduction processing and convolution processing.
  • the generation network further includes a synthesis module, which performs synthesis processing on the plurality of fourth feature maps to obtain the output image, including: using the synthesis The module performs synthesis processing on the plurality of fourth feature maps to obtain the output image.
  • the synthesis module includes a first transformation matrix, and the synthesis module is used to synthesize the plurality of fourth feature maps to obtain the output image, It includes: using the first conversion matrix to map the data information of the fourth feature map corresponding to the fourth branch network and the data information of the fourth feature map corresponding to the fifth branch network to the sixth branch network
  • the data information of the fourth feature map is converted into data information of the first color channel, data information of the second color channel, and data information of the third color channel of the output image to obtain the output image.
  • At least one embodiment of the present disclosure further provides a neural network training method, including: training a discriminant network based on the generation network to be trained; training the generation network to be trained based on the discriminant network; and, The above training process is performed alternately to obtain the generation network in the image processing method provided by any embodiment of the present disclosure; wherein, based on the discriminant network, training the generation network to be trained includes: using all The generation network to be trained processes the first training input image to generate a first training output image; based on the first training output image, the system loss value of the generation network to be trained is calculated by a system loss function; and Correcting the parameters of the generating network to be trained based on the system loss value.
  • the system loss function includes generating a network confrontation loss function, and the system loss value includes generating a network confrontation loss value; the generation network confrontation loss function is expressed as:
  • L G represents the generation network confrontation loss function
  • z1 represents the first training input image
  • P z1 (z1) represents the set of the first training input image
  • G(z1) represents the first training output Image
  • D(G(z1)) represents the output of the discriminant network for the first training output image
  • the system loss function further includes a content loss function
  • the system loss value further includes a content loss value
  • calculating the system loss value of the generation network to be trained through a system loss function includes: using an analysis network to extract the first content feature map of the first training input image and the first The second content feature map of the training output image is calculated, and the content loss value of the generation network is calculated by the content loss function according to the first content feature map and the second content feature map,
  • the analysis network includes at least one convolution module for extracting the first content feature map and the second content feature map;
  • the content loss function is expressed as:
  • L content represents the content loss function
  • C m represents the single-layer content loss function of the m-th convolution module in the at least one convolution module
  • w 1m represents the weight of C m ;
  • the single-layer content loss function is expressed as:
  • S 1 is a constant, Represents the value of the j-th position in the first content feature map of the first training input image extracted by the i-th convolution kernel in the m-th convolution module, Represents the value of the j-th position in the second content feature map of the first training output image extracted by the i-th convolution kernel in the m-th convolution module.
  • the system loss function further includes a color loss function, and the system loss value also includes a color loss value; the color loss function is expressed as:
  • L color represents the color loss function
  • G(z1) represents the first training output image
  • I1 represents the second training input image
  • gaussian() represents the Gaussian fuzzification operation
  • abs() represents the absolute value operation
  • the quality of the second training input image is higher than the quality of the first training input image.
  • the first training output image has a first color channel, a second color channel, and a third color channel;
  • the system loss function also includes a comparison loss function, and the system loss value also includes a comparison loss value; the comparison loss function is expressed as:
  • L L1 0.299*abs(F G(z1) -F I2 )+0.587*abs(S G(z1) -S I2 )+0.299*abs(T G(z1) -T I2 )
  • L L1 represents the comparison loss function
  • G(z1) represents the first training output image
  • I2 represents the third training input image
  • F G(z1) , S G(z1) and T G(z1) respectively Represents the data information of the first color channel, the second color channel, and the third color channel of the first training output image
  • F I2 , S I2 and T I2 represent the first color channel and the third color channel of the third training input image, respectively.
  • abs() means absolute value calculation
  • the third training input image has the same scene as the first training input image, and the quality of the third training input image is higher than the quality of the first training input image.
  • training the discrimination network based on the generation network to be trained includes: using the generation network to be trained to process a fourth training input image, To generate a second training output image; based on the second training output image and the fifth training input image, calculate a discriminant network confrontation loss value through a discriminant network confrontation loss function; and compare the discriminant network to the discrimination network based on the discriminant network confrontation loss value The parameters of is corrected; wherein the quality of the fifth training input image is higher than the quality of the fourth training input image.
  • the discriminant network confrontation loss function is expressed as:
  • L D represents the discriminant network confrontation loss function
  • x represents the fifth training input image
  • P data (x) represents the set of the fifth training input image
  • D(x) represents the discriminant network
  • the output of the fifth training input image Represents the expectation for the set of the fifth training input image
  • z2 represents the fourth training input image
  • P z2 (z2) represents the set of the fourth training input image
  • G(z2) represents the second training Output image
  • D(G(z2)) represents the output of the discriminant network for the second training output image, Indicates seeking expectations for the set of fourth training input images.
  • At least one embodiment of the present disclosure further provides an image processing device, including: a memory for non-transitory storage of computer-readable instructions; and a processor for running the computer-readable instructions, the computer-readable instructions being The processor executes the image processing method provided by any embodiment of the present disclosure or the neural network training method provided by any embodiment of the present disclosure while running.
  • At least one embodiment of the present disclosure further provides a storage medium that non-temporarily stores computer-readable instructions.
  • the image processing method provided by any embodiment of the present disclosure or the present disclosure can be executed.
  • the neural network training method provided by any embodiment.
  • Figure 1 is a schematic diagram of a convolutional neural network
  • Figure 2A is a schematic diagram of a convolutional neural network
  • Figure 2B is a schematic diagram of the working process of a convolutional neural network
  • Figure 3 is a schematic diagram of another convolutional neural network
  • FIG. 5 is an exemplary flowchart corresponding to step S200 shown in FIG. 4;
  • FIG. 6A is a schematic structural block diagram of a generation network corresponding to the image processing method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • 6B is a schematic structural block diagram of another generation network corresponding to the image processing method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of a dense sub-network provided by at least one embodiment of the present disclosure.
  • FIG. 8A is a schematic diagram of an upsampling layer provided by at least one embodiment of the present disclosure.
  • FIG. 8B is a schematic diagram of another upsampling layer provided by at least one embodiment of the present disclosure.
  • Fig. 9A is a schematic diagram of an input image
  • FIG. 9B is a schematic diagram of an output image obtained by processing the input image shown in FIG. 9A according to the generation network shown in FIG. 6A;
  • FIG. 9C is a schematic diagram of an output image obtained by processing the input image shown in FIG. 9A according to the generation network shown in FIG. 6B;
  • FIG. 10 is a flowchart of a neural network training method provided by at least one embodiment of the present disclosure.
  • FIG. 11A is a schematic structural block diagram of training a generating network to be trained corresponding to the training method shown in FIG. 10 according to at least one embodiment of the present disclosure
  • FIG. 11B is a schematic flowchart of a process of training a generation network to be trained according to at least one embodiment of the present disclosure
  • FIG. 12 is a schematic structural diagram of a discrimination network provided by at least one embodiment of the present disclosure.
  • FIG. 13 is a schematic structural diagram of an analysis network provided by at least one embodiment of the present disclosure.
  • FIG. 14A is a schematic structural block diagram of a training discriminant network corresponding to the training method shown in FIG. 10 provided by at least one embodiment of the present disclosure
  • 14B is a schematic flowchart of a process of training a discriminant network provided by at least one embodiment of the present disclosure
  • FIG. 15 is a schematic block diagram of an image processing apparatus provided by at least one embodiment of the present disclosure.
  • FIG. 16 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
  • Image enhancement is one of the research hotspots in the field of image processing. Due to the limitations of various physical factors in the image acquisition process (for example, the size of the image sensor of the mobile phone camera is too small and other software and hardware limitations) and the interference of environmental noise, the image quality will be greatly reduced.
  • the purpose of image enhancement is to improve the grayscale histogram of the image and the contrast of the image through image enhancement technology, thereby highlighting the detailed information of the image and improving the visual effect of the image.
  • deep neural networks for image enhancement is a technology emerging with the development of deep learning technology.
  • low-quality photos input images
  • the quality of the output images can be close to that of digital single-lens reflex cameras (Digital Single Lens Reflex Camera).
  • DSLR Digital Single Lens Reflex Camera
  • the quality of the photos taken For example, the peak signal to noise ratio (PSNR) index is commonly used to characterize image quality, where the higher the PSNR value, the closer the image is to the photos taken by a digital single-lens reflex camera.
  • PSNR peak signal to noise ratio
  • Andrey Ignatov et al. proposed a method of convolutional neural network to achieve image enhancement, please refer to the literature, Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool, DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks.arXiv:1704.02470v2[cs.CV], September 5, 2017. This document is hereby incorporated by reference in its entirety as a part of this application.
  • This method mainly uses convolutional layers, batch normalization layers and residual connections to construct a single-scale convolutional neural network.
  • the network can be used to input low-quality images (for example, low contrast, underexposure or exposure Excessive, the entire image is too dark or too bright, etc.) processed into a higher quality image.
  • low-quality images for example, low contrast, underexposure or exposure Excessive, the entire image is too dark or too bright, etc.
  • color loss, texture loss and content loss as the loss function in training can achieve better processing results.
  • CNN Convolutional Neural Network
  • FIG. 1 shows a schematic diagram of a convolutional neural network.
  • the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with convolution kernels.
  • FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited in the embodiment of the present disclosure.
  • the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103.
  • the input layer 101 has 4 inputs
  • the hidden layer 102 has 3 outputs
  • the output layer 103 has 2 outputs.
  • the convolutional neural network finally outputs 2 images.
  • the 4 inputs of the input layer 101 may be 4 images, or 4 feature images of 1 image.
  • the three outputs of the hidden layer 102 may be characteristic images of the image input through the input layer 101.
  • the convolutional layer has weights And bias Weights Represents the convolution kernel, bias Is a scalar superimposed on the output of the convolutional layer, where k is the label of the input layer 101, and i and j are the labels of the units of the input layer 101 and the hidden layer 102, respectively.
  • the first convolutional layer 201 includes a first set of convolution kernels (in Figure 1 ) And the first set of offsets (in Figure 1 ).
  • the second convolution layer 202 includes a second set of convolution kernels (in Figure 1 ) And the second set of offsets (in Figure 1 ).
  • each convolutional layer includes tens or hundreds of convolution kernels. If the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
  • the convolutional neural network further includes a first activation layer 203 and a second activation layer 204.
  • the first activation layer 203 is located behind the first convolutional layer 201
  • the second activation layer 204 is located behind the second convolutional layer 202.
  • the activation layer (for example, the first activation layer 203 and the second activation layer 204) includes activation functions, which are used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems .
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
  • the ReLU function is an unsaturated nonlinear function
  • the Sigmoid function and tanh function are saturated nonlinear functions.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer (for example, the first convolutional layer 201 can include the first activation layer 203, and the second convolutional layer 202 can be Including the second active layer 204).
  • the first convolution layer 201 For example, in the first convolution layer 201, first, several convolution kernels in the first group of convolution kernels are applied to each input And several offsets in the first set of offsets In order to obtain the output of the first convolutional layer 201; then, the output of the first convolutional layer 201 can be processed by the first activation layer 203 to obtain the output of the first activation layer 203.
  • the second convolution layer 202 first, apply several convolution kernels in the second set of convolution kernels to the output of the input first activation layer 203 And several offsets in the second set of offsets In order to obtain the output of the second convolutional layer 202; then, the output of the second convolutional layer 202 can be processed by the second activation layer 204 to obtain the output of the second activation layer 204.
  • the output of the first convolutional layer 201 can be a convolution kernel applied to its input Offset
  • the output of the second convolutional layer 202 can be a convolution kernel applied to the output of the first activation layer 203 Offset The result of the addition.
  • the convolutional neural network Before using the convolutional neural network for image processing, the convolutional neural network needs to be trained. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted through multiple sets of input/output example images and optimization algorithms to obtain an optimized convolutional neural network model.
  • FIG. 2A shows a schematic diagram of the structure of a convolutional neural network
  • FIG. 2B shows a schematic diagram of the working process of a convolutional neural network.
  • the main components of a convolutional neural network can include multiple convolutional layers, multiple downsampling layers, and fully connected layers.
  • each of these layers refers to the corresponding processing operation, that is, convolution processing, downsampling processing, and fully connected processing.
  • the described neural networks also refer to corresponding processing operations.
  • the example standardization layer or layer standardization layer to be described below is similar to this, and the description is not repeated here.
  • a complete convolutional neural network can be composed of these three layers.
  • FIG. 2A only shows three levels of a convolutional neural network, namely the first level, the second level, and the third level.
  • each level may include a convolution module and a downsampling layer.
  • each convolution module may include a convolution layer. Therefore, the processing procedure of each level may include: convolution processing and sub-sampling/down-sampling processing on the input image.
  • each convolution module may further include an instance normalization layer, so that the processing process at each level may also include standardization processing.
  • the instance standardization layer is used to perform instance standardization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving the effect of image enhancement.
  • the predetermined range may be [-1, 1].
  • the instance standardization layer performs instance standardization processing on each feature image according to its own mean and variance.
  • the instance standardization layer can also be used to perform instance standardization processing on a single image.
  • the instance standardization formula of the instance standardization layer can be expressed as follows:
  • x tijk is the value of the t-th feature image, the i-th feature image, the j-th row, and the k-th column in the feature image set output by the convolutional layer.
  • y tijk represents the result obtained after processing x tijk by the instance standardization layer.
  • ⁇ 1 is a small integer to avoid zero denominator.
  • the convolutional layer is the core layer of the convolutional neural network.
  • a neuron is only connected to some of the neurons in the adjacent layer.
  • the convolutional layer can apply several convolution kernels (also called filters) to the input image to extract multiple types of features of the input image.
  • Each convolution kernel can extract one type of feature.
  • the convolution kernel is generally initialized in the form of a random decimal matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights.
  • the result obtained after applying a convolution kernel to the input image is called a feature map, and the number of feature images is equal to the number of convolution kernels.
  • the feature image output by the convolution layer of one level can be input to the convolution layer of the next next level and processed again to obtain a new feature image.
  • the first-level convolutional layer may output a first feature image, which is input to the second-level convolutional layer and processed again to obtain a second feature image.
  • the convolutional layer can use different convolution kernels to convolve the data of a certain local receptive field of the input image, and the convolution result is input to the activation layer, which is calculated according to the corresponding activation function To get the characteristic information of the input image.
  • the down-sampling layer is arranged between adjacent convolutional layers, and the down-sampling layer is a form of down-sampling.
  • the down-sampling layer can be used to reduce the scale of the input image, simplify the calculation complexity, and reduce over-fitting to a certain extent; on the other hand, the down-sampling layer can also perform feature compression to extract the input image Main features.
  • the down-sampling layer can reduce the size of feature images, but does not change the number of feature images.
  • a 2 ⁇ 2 output image can be obtained, which means that 36 pixels on the input image are merged into the output image. 1 pixel.
  • the last downsampling layer or convolutional layer can be connected to one or more fully connected layers, which are used to connect all the extracted features.
  • the output of the fully connected layer is a one-dimensional matrix, which is a vector.
  • Figure 3 shows a schematic diagram of another convolutional neural network.
  • the output of the last convolutional layer ie, the t-th convolutional layer
  • the flattening layer can convert feature images (2D images) into vectors (1D).
  • the flattening operation can be performed as follows:
  • v is a vector containing k elements
  • f is a matrix with i rows and j columns.
  • the output of the planarization layer ie, the 1D vector
  • FCN fully connected layer
  • the fully connected layer can have the same structure as the convolutional neural network, but the difference is that the fully connected layer uses different scalar values to replace the convolution kernel.
  • the output of the last convolutional layer can also be input to the Averaging Layer (AVG).
  • AVG Averaging Layer
  • the averaging layer is used to average the output, that is, the average value of the feature image is used to represent the output image. Therefore, a 2D feature image is converted into a scalar.
  • the convolutional neural network includes a leveling layer, it may not include a flattening layer.
  • the homogenization layer or the fully connected layer can be connected to the classifier, the classifier can classify according to the extracted features, and the output of the classifier can be used as the final output of the convolutional neural network, that is, the category identifier that characterizes the image category (label).
  • the classifier may be a Support Vector Machine (SVM) classifier, a softmax classifier, and a nearest neighbor rule (KNN) classifier.
  • SVM Support Vector Machine
  • KNN nearest neighbor rule
  • the convolutional neural network includes a softmax classifier.
  • the softmax classifier is a generator of logic functions that can compress a K-dimensional vector z containing any real number into a K-dimensional vector ⁇ ( z).
  • the formula of the softmax classifier is as follows:
  • Z j represents the j-th element in the K-dimensional vector z
  • ⁇ (z) represents the predicted probability of each category label (label)
  • ⁇ (z) is a real number
  • its range is (0,1)
  • K-dimensional The sum of the vector ⁇ (z) is 1.
  • each category identifier in the K-dimensional vector z is assigned a certain prediction probability, and the category identifier with the largest prediction probability is selected as the identifier or category of the input image.
  • At least one embodiment of the present disclosure provides an image processing method.
  • the image processing method includes: obtaining an input image; and using a generation network to process the input image to generate an output image; wherein the generation network includes a first sub-network and at least one second sub-network, and the generation network
  • the network processing the input image to generate the output image includes: using the first sub-network to process the input image to obtain a plurality of first feature maps; using the at least one second sub-network
  • the network performs branch processing and weight sharing processing on the multiple first feature maps to obtain multiple second feature maps; and processes the multiple second feature maps to obtain output images.
  • Some embodiments of the present disclosure also provide image processing devices, neural network training methods, and storage media corresponding to the above-mentioned image processing methods.
  • the image processing method provided by at least one embodiment of the present disclosure combines branch processing and weight sharing processing to perform image enhancement processing, which can reduce the number of parameters and facilitate the calculation of gradients during backpropagation, thus, while outputting high-quality images It can also increase the processing speed and convergence speed.
  • FIG. 4 is a flowchart of an image processing method provided by at least one embodiment of the present disclosure.
  • the image processing method includes steps S100 to S200.
  • Step S100 Obtain an input image.
  • the input image may include photos captured by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a web camera, etc., which may include images of people, animations, etc.
  • Plant images or landscape images, etc. are not limited in the embodiments of the present disclosure.
  • the input image is a low-quality image, and the quality of the input image is lower than, for example, the quality of a photo taken with a digital single-lens reflex camera.
  • the input image may be an RGB image including 3 channels, and the embodiments of the present disclosure include but are not limited to this.
  • Step S200 Use the generation network to process the input image to generate an output image.
  • the generation network may perform image enhancement processing on the input image, so that the output image is a high-quality image, for example, the quality of the output image is close to, for example, a photo taken by a digital single-lens reflex camera.
  • FIG. 5 is an exemplary flowchart corresponding to step S200 shown in FIG. 4, and FIG. 6A is a schematic diagram of a generation network corresponding to the image processing method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • Figure 6B is a schematic structural block diagram of another generation network corresponding to the image processing method shown in Figure 4 provided by at least one embodiment of the present disclosure.
  • step S200 shown in FIG. 5 will be described in detail with reference to the generation network shown in FIG. 6A.
  • step S200 includes step S210, step S220, and step S225.
  • Step S210 Use the first sub-network to process the input image to obtain multiple first feature maps.
  • the generating network may include the first sub-network N1.
  • the first sub-network N1 may include a convolution module CN, that is, the processing of the first sub-network N1 includes standard convolution processing, so that step S210 may include using the first sub-network N1 to perform processing on the input image IN.
  • Standard convolution processing to obtain multiple first feature maps F1.
  • the convolution module CN may include a convolution layer and an instance normalization layer. Therefore, the standard convolution processing may include convolution processing and instance normalization processing. The following is similar to this and will not be Repeat it.
  • Step S220 Use at least one second sub-network to perform branch processing and weight sharing processing on multiple first feature maps to obtain multiple second feature maps.
  • the generating network may include at least one second sub-network N2.
  • the generating network may include two second sub-networks N2, namely, the first second sub-network N2 and the second second sub-network N2 (it should be noted that, In Figure 6A, the second sub-network N2 close to the first sub-network N1 is the first second sub-network), the first second sub-network N2 is connected to the first sub-network N1, and the first second sub-network N2 Connect to the second second subnet N2. Therefore, two second sub-networks N2 can be used to process multiple first feature maps F1. For example, as shown in FIG.
  • multiple first feature maps F1 can be used as the input of the first second sub-network N2, and the output of the first second sub-network N2 can be used as the input of the second second sub-network N2
  • the output of the second second sub-network N2 is a plurality of second feature maps F2.
  • connection of two sub-networks may mean that the output of the first one of the two sub-networks is used as the other one of the two sub-networks in the direction of signal transmission.
  • Network input For example, “the first second sub-network N2 is connected to the first sub-network N1" may indicate that the output of the first sub-network N1 is used as the input of the first second sub-network N2.
  • each second sub-network N2 may include a first branch network N21, a second branch network N22, and a third branch network N23, so that each second sub-network N2
  • the branch processing may include: dividing the input of each second sub-network (as shown by dc in each second sub-network N2 in FIG.
  • the number of feature maps included in the input of each branch network corresponding to each other may be the same, for example, the number of feature maps included in the first branch input B1, and the second branch input B2 The number of feature maps included is the same as the number of feature maps included in the third branch input B3. It should be noted that the embodiments of the present disclosure do not limit this. For example, in other embodiments of the present disclosure, the number of feature maps included in the input of each branch network corresponding to each other may be different or not exactly the same. For example, the number of feature maps included in the second branch input B2 The number is the same as the number of feature maps included in the third branch input B3, but is different from the number of feature maps included in the first branch input B1.
  • branch networks that correspond to each other refer to branch networks at the same level.
  • the first branch network N21, the second branch network N22, and the third branch network N23 are a set of branch networks corresponding to each other.
  • the introduced fourth branch network N31, fifth branch network N32, and sixth branch network N33 are a set of corresponding branch networks, and the seventh branch network N11, eighth branch network N12, and ninth branch network will be introduced below N13 is also a group of corresponding branch networks.
  • the requirements for the number of feature maps included in the input of the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33, and the requirements of the seventh branch network N11, the eighth branch network N12, and the ninth branch network N13 The requirements for the number of feature maps included in the input can be referred to the requirements for the number of feature maps included in the input of the first branch network N21, the second branch network N22, and the third branch network N23, which will not be repeated hereafter.
  • each second sub-network N2 can include 3n feature maps, where n is a positive integer, so that the 1st to nth feature maps can be divided into the first branch input B1, and the n+1 ⁇ 2nth feature maps can be divided into B1.
  • the embodiment of the present disclosure does not limit the specific division manner.
  • the first branch network N21, the second branch network N22, and the third branch network N23 may each include a convolution module CN, so that the first branch input B1, the second branch input B2, and the The three-branch input B3 is subjected to standard convolution processing to obtain the corresponding first branch output O1, second branch output O2, and third branch output O3.
  • the number of standard convolution processes of the first branch network N21, the second branch network N22, and the third branch network N23 may be the same; of course, the first branch network N21, the second branch network N22, and the third branch network N22
  • the parameters of the standard convolution processing of the branch network N23 may be different from each other. It should be noted that the embodiments of the present disclosure do not limit this.
  • each second sub-network N2 may also include the first backbone network N20, so that the weight sharing process of each second sub-network N2 may include: The output O1, the second branch output O2, and the third branch output O3 are connected to obtain the first intermediate output M1 (as shown by c in each second sub-network in FIG. 6A); and, using the first backbone network N20 The first intermediate output M1 is processed to obtain the output of each second sub-network.
  • the feature maps included in the matrix are all H rows and W columns.
  • the number of feature maps included in the first branch output O1 is C1.
  • the number of feature maps included in the second branch output O2 is C2, and the number of feature maps included in the third branch output O3 is C3.
  • the models of the first branch output O1, the second branch output O2, and the third branch output O3 are ( C1, H, W), (C2, H, W) and (C3, H, W). Therefore, the first branch output O1, the second branch output O2, and the third branch output O3 are connected, and the model of the first intermediate output M1 obtained is (C1+C2+C3, H, W).
  • the number of feature maps included in the first intermediate output M1 is C1+C2+C3, and the present disclosure does not limit the arrangement order of the feature maps in the model of the first intermediate output M1. It should be noted that the present disclosure includes but is not limited to this.
  • the first backbone network N20 may include a convolution module CN and a down-sampling layer DS, so that standard convolution processing and down-sampling processing can be performed on the first intermediate output M1.
  • the embodiment of the present disclosure does not limit the sequence of the convolution module CN and the down-sampling layer DS in the first backbone network N20 (that is, the sequence of the standard convolution processing and the down-sampling processing).
  • the down-sampling process is used to reduce the size of the feature map, thereby reducing the data amount of the feature map.
  • down-sampling can be performed through the down-sampling layer, but is not limited to this.
  • the down-sampling layer can use max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, and demultiplexing output (demuxout, Split the input image into multiple smaller images) and other down-sampling methods to achieve down-sampling processing.
  • the down-sampling processing methods and parameters in the first backbone network N20 of different second sub-networks N2 may be the same or different.
  • the embodiment of the present disclosure does not limit this.
  • the number of second sub-networks N2 in FIG. 6A is exemplary.
  • the embodiment of the present disclosure does not specifically limit the number of second sub-networks N2.
  • the number of second sub-networks N2 can also be For 1, 3, etc.
  • the at least one second sub-network may include a first second sub-network, the first second sub-network is connected to the first sub-network N1, and a plurality of first feature maps F1 serve as the first second sub-network
  • the at least one second sub-network may include other second sub-networks except the first second sub-network, and each of the other second sub-networks can be connected to it The output of the previous second sub-network is used as the input of the second sub-network, and the output of the last second sub-network is multiple second feature maps F2.
  • Step S225 Process multiple second feature maps to obtain an output image.
  • step S225 may include step S230 to step S250.
  • Step S230 Process multiple second feature maps to obtain multiple third feature maps.
  • the generation network may also include a dense sub-network DenseNet.
  • the dense sub-network DenseNet may be used to process multiple second feature maps F2 to obtain multiple third feature maps F3.
  • FIG. 7 is a schematic structural diagram of a dense sub-network provided by at least one embodiment of the present disclosure.
  • multiple second feature maps F2 may be used as the first dense module among the N dense modules (for example, the first dense module and at least one second dense module).
  • the input of the last second sub-network N2 in the sub-network N2), the multiple second feature maps F2 are also connected to the output of the i-1 dense module before the i-th dense module in the N dense modules DenseBlock ,
  • the multiple second feature maps are also connected with the output of each dense module, as multiple third feature maps F3.
  • i is an integer, i ⁇ 2 and i ⁇ N. It should be noted that the present disclosure includes but is not limited to this.
  • each dense module DenseBlock may include a bottleneck (Bottleneck) layer B and a convolutional layer Cv, so that the processing of each dense module DenseBlock includes dimensionality reduction processing and convolution processing.
  • the bottleneck layer B can use a 1 ⁇ 1 convolution kernel to reduce the dimensionality of the data, reduce the number of feature maps, thereby reducing the number of parameters in the subsequent convolution processing, reducing the amount of calculation, and increasing the processing speed.
  • the dense sub-network DenseNet has the advantages of greatly reducing parameters, reducing the amount of calculation, being able to effectively solve the problem of gradient disappearance, supporting feature reuse and enhancing feature propagation, and having very good anti-overfitting performance.
  • Step S240 Use the third sub-network to process multiple third feature maps to obtain multiple fourth feature maps.
  • the generating network may also include a third sub-network N3.
  • the third sub-network N3 may include the second backbone network N30, the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33, so that the processing of the third sub-network N3 may include : Use the second backbone network N30 to process multiple third feature maps F3 to obtain multiple fifth feature maps F5; divide multiple fifth feature maps F5 into fourth branch input B4 and fifth branch input B5 And the sixth branch input B6; and, use the fourth branch network N31 to process the fourth branch input B4 to obtain the fourth feature map F4 corresponding to the fourth branch network N31, and use the fifth branch network N32 to input the fifth branch B5 performs processing to obtain the fourth characteristic map F4 corresponding to the fifth branch network N32, and uses the sixth branch network N33 to process the sixth branch input B6 to obtain the fourth characteristic map F4 corresponding to the sixth branch network N33
  • the multiple fourth feature maps F4 obtained in step S240 include a fourth feature map corresponding to the fourth branch network N31, a fourth feature map corresponding to the fifth branch network N32, and a fourth feature map corresponding to the sixth branch network N33.
  • the second backbone network N30 may include an up-sampling layer US, so that the input multiple third feature maps F3 can be up-sampled to obtain multiple fifth feature maps F5.
  • the up-sampling process is used to increase the size of the feature map, thereby increasing the data volume of the feature map.
  • the up-sampling process can be performed through an up-sampling layer, but is not limited to this.
  • the up-sampling layer can adopt up-sampling methods such as strided transposed convolution and interpolation algorithms to implement up-sampling processing.
  • the interpolation algorithm may include, for example, interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation).
  • the interpolation algorithm can be used not only for up-sampling processing, but also for down-sampling processing.
  • the interpolation algorithm when used for upsampling processing, the original pixel value and the interpolation value can be retained, thereby increasing the size of the feature map; for example, when the interpolation algorithm is used for downsampling processing, only the interpolation value can be retained (removing the original pixel Value), thereby reducing the size of the feature map.
  • FIG. 8A is a schematic diagram of an upsampling layer provided by at least one embodiment of the present disclosure
  • FIG. 8B is a schematic diagram of another upsampling layer provided by at least one embodiment of the present disclosure.
  • the up-sampling layer uses pixel interpolation to implement up-sampling.
  • the up-sampling layer can also be called a composite layer.
  • the composite layer adopts an upsampling factor of 2 ⁇ 2, so that 4 input feature images (ie, INPUT 4n, INPUT 4n+1, INPUT 4n+2, INPUT 4n+3 in Figure 8A) can be combined to obtain 1
  • the output feature image in a fixed pixel order ie, OUTPUT n in FIG. 8A).
  • the upsampling layer obtains a first number of input feature images, and interleaves the pixel values of these input feature images to produce the same first number of input feature images.
  • Output characteristic image Compared with the input feature images, the number of output feature images has not changed, but the size of each output feature image is increased by a corresponding multiple. Therefore, the composite layer adds more data information through different permutations and combinations, and these combinations can give all possible upsampling combinations. Finally, you can select from up-sampling combinations through the activation layer.
  • the up-sampling layer adopts the pixel value interleaving rearrangement method to achieve up-sampling.
  • the up-sampling layer may also be called a composite layer.
  • the composite layer also uses an upsampling factor of 2 ⁇ 2, that is, taking every 4 input feature images (ie, INPUT 4n, INPUT 4n+1, INPUT 4n+2, INPUT 4n+3 in Figure 8B) as a group, and Their pixel values are interleaved to generate 4 output characteristic images (ie, OUTPUT 4n, OUTPUT 4n+1, OUTPUT 4n+2, OUTPUT 4n+3 in FIG. 8B).
  • the number of input feature images is the same as the number of output feature images obtained after the composite layer processing, and the size of each output feature image is increased by 4 times of the input feature image, that is, there are 4 times the number of pixels of the input feature image.
  • the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33 may all include the convolution module CN, so that B4 and the fifth branch can be input to the fourth branch respectively.
  • the branch input B5 and the sixth branch input B6 are subjected to standard convolution processing.
  • the number of standard convolution processes of the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33 may be the same; of course, the fourth branch network N31, the fifth branch network N32, and the sixth branch network N32
  • the parameters of the standard convolution processing of the branch network N33 may be different from each other. It should be noted that the embodiments of the present disclosure do not limit this.
  • the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33 may all include the upsampling layer US, so that the fourth branch network N31 and the fifth branch network N32 Both the processing of the sixth branch network N33 and the sixth branch network N33 may also include up-sampling processing.
  • the number of upsampling processes of the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33 may be the same; of course, the fourth branch network N31, the fifth branch network N32, and the sixth branch network
  • the parameters of the upsampling process of the network N33 may be different from each other. It should be noted that the embodiments of the present disclosure do not limit this.
  • the method of upsampling in the second backbone network N30 and the method of upsampling in the fourth branch network N31, the fifth branch network N32, and the sixth branch network N33 can be the same or different.
  • the disclosed embodiment does not limit this.
  • the number of fourth feature maps F4 corresponding to the fourth branch network N31 is 1, the number of fourth feature maps F4 corresponding to the fifth branch network N32 is 1, and the number of fourth feature maps F4 corresponding to the sixth branch network N33 is 1.
  • the number of feature maps F4 is 1, that is, the multiple feature maps F4 include 3 feature maps.
  • Step S250 Perform synthesis processing on multiple fourth feature maps to obtain an output image.
  • the generation network may also include a synthesis module Merg.
  • the synthesis module Merg may be used to process multiple fourth feature maps F4 to obtain the output image OUT.
  • the synthesis module Merg may include a first conversion matrix, and the first conversion module is used to convert the plurality of fourth feature maps F4 into the output image OUT.
  • using the synthesis module Merg to process multiple fourth feature maps F4 to obtain the output image OUT may include: using the first transformation matrix to convert the fourth branch network N31 corresponding to the fourth The data information of the feature map F4, the data information of the fourth feature map F4 corresponding to the fifth branch network N32, and the data information of the fourth feature map F4 corresponding to the sixth branch network N33 are converted into the data of the first color channel of the output image OUT Information, data information of the second color channel, and data information of the third color channel to obtain the output image OUT.
  • the first color channel, the second color channel, and the third color channel may be red (R), green (G), and blue (B) channels, respectively, so that the output image OUT is an image in RGB format .
  • R red
  • G green
  • B blue
  • the embodiments of the present disclosure include but are not limited to this.
  • the first conversion matrix may be used to convert an image in YUV format into an image in RGB format.
  • the conversion formula of the first conversion matrix may be expressed as follows:
  • Y, U, and V respectively represent the luminance information of the YUV format image (that is, the data information of the first luminance channel), the first chrominance information (that is, the data information of the first color difference channel), and the second chrominance information (that is, the The data information of the two-color difference channel), R, G, and B respectively represent the red information (ie the data information of the first color channel), the green information (ie the data information of the second color channel) and the blue of the converted RGB format image Information (that is, the data information of the third color channel).
  • the generation network shown in FIG. 6A may be trained first. For example, during the training process, the parameters of the first conversion matrix are fixed.
  • the data information of the fourth feature map F4 output by the fourth branch network N31, the data information of the fourth feature map F4 output by the fifth branch network N32, and the sixth branch network corresponds to the data information of the first luminance channel, the data information of the first color difference channel, and the data information of the second color difference channel, so that the RGB format can be obtained after conversion by the first conversion matrix.
  • the output image OUT corresponds to the data information of the first luminance channel, the data information of the first color difference channel, and the data information of the second color difference channel, so that the RGB format can be obtained after conversion by the first conversion matrix.
  • the output image OUT retains the content of the input image IN, but the output image OUT is a high-quality image.
  • the quality of the output image OUT may be close to that of photos taken by, for example, a digital single-lens reflex camera.
  • the number of the fourth feature map F4 output by the fourth branch network N31, the fourth feature map F4 output by the fifth branch network N32, and the fourth feature map F4 output by the sixth branch network N33 are all 1. That is, the multiple feature maps F4 include three feature maps (respectively corresponding to the first luminance channel, the first color difference channel, and the second color difference channel), and the first conversion matrix can convert the three feature maps into RGB format output images.
  • the processing of the synthesis module Merg may further include: converting the gray value of the pixel of the output image OUT to the range of [0, 255], for example.
  • the YUV format For the YUV format, Y represents brightness, U and V represent chroma, U and V are the two components of color. In the YUV color space, the first brightness channel (ie Y channel) and the first color difference channel (ie U channel) ), the second color difference channel (that is, the V channel) is separate.
  • the YUV format may include YUV444, YUV420, and YUV422 formats.
  • the main difference between YUV444, YUV420, and YUV422 formats is the sampling and storage methods of the U channel and V channel data.
  • the YUV444 format means that in each row of pixels, two kinds of chromaticity information (ie, the first chromaticity information U and the second chromaticity information V) are complete, that is, the two kinds of chromaticity information are stored based on complete sampling.
  • the data stream for storing or processing the 4 pixels is:
  • the mapped pixels are expressed as:
  • the mapped pixels are the original pixels.
  • the YUV420 format means that in each row of pixels, there is only one type of chrominance information (first chrominance information U or second chrominance information V), and the first chrominance information U or the second chrominance information V is divided by 1/2 The frequency samples are stored. In the image processing process, adjacent lines process different chrominance information.
  • the data stream for storing or processing the 8 pixels is:
  • the mapped pixels are expressed as:
  • the adjacent 4 pixels in each row only occupy 6 bytes when storing or processing.
  • the sampling format of YUV444 (4 pixels require 12 bytes)
  • the YUV420 format reduces the processing and The amount of pixel data stored.
  • the mapped pixels are slightly different from the original pixels, these differences will not cause significant changes in the perception of the human eye.
  • multiple feature maps F4 may have an image format of YUV444. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • the generation network shown in FIG. 6B is different from the generation network shown in FIG. 6A mainly in the first sub-network N1 and the third sub-network N3. It should be noted that other structures of the generating network shown in FIG. 6B are basically the same as those of the generating network shown in FIG. 6A, and the repetitions are not repeated here.
  • the input image has a first color channel, a second color channel, and a third color channel.
  • the first color channel, the second color channel, and the third color channel may be red (R), green (G), and blue (B) channels, respectively, and the embodiments of the present disclosure include but are not limited thereto.
  • the first sub-network N1 may include a conversion module Tran, a seventh branch network N11, an eighth branch network N12, a ninth branch network N13, and a third backbone network N10, Therefore, step S210 may include the following steps S211 to S214.
  • Step S211 Use the conversion module Tran to convert the data information of the first color channel, the second color channel and the third color channel of the input image IN into the first luminance channel, the first color difference channel and the second color difference channel of the intermediate input image MIN Data information.
  • the conversion module Tran may include a second conversion matrix for converting the input image IN into an intermediate input image MIN.
  • the second conversion matrix may be used to convert an image in RGB format into an image in YUV format.
  • the conversion formula of the second conversion matrix may be expressed as follows:
  • R, G and B respectively represent the red information (that is, the data information of the first color channel), the green information (that is, the data information of the second color channel) and the blue information (that is, the data of the third color channel) of the RGB format image.
  • Information) Y, U, and V respectively represent the brightness information (that is, the data information of the first luminance channel), the first chrominance information (that is, the data information of the first color difference channel) and the second chrominance of the converted YUV format image Information (that is, the data information of the second color difference channel).
  • the input image IN has an RGB format
  • the intermediate input image MIN has, for example, a YUV420 format, thereby reducing the size of the U channel and the V channel, thereby reducing the number of convolution kernels in the generation network. It should be noted that this embodiment includes but is not limited to this.
  • Step S212 Use the seventh branch network to process the data information of the first luminance channel of the intermediate input image to obtain the seventh branch output, and use the eighth branch network to process the data information of the first color difference channel of the intermediate input image.
  • the ninth branch network is used to process the data information of the second color difference channel of the intermediate input image to obtain the ninth branch output.
  • the data information of the first luminance channel, the first color difference channel, and the second color difference channel of the intermediate input image MIN are input as the seventh branch input B7 and the eighth branch input B8 and
  • the ninth branch input B9 is processed by the seventh branch network N11, the eighth branch network N12, and the ninth branch network N13 to obtain the seventh branch output O7, the eighth branch output O8, and the ninth branch output O9.
  • the seventh branch network N11 may include a convolution module CN and a down-sampling layer DS, so that standard convolution processing and down-sampling processing can be performed on the seventh branch input B7; the eighth branch network N12 and the first
  • the nine-branch network N13 can all include a standard down-sampling layer SDS, so that the eighth branch input B8 and the ninth branch input B9 can be subjected to standard down-sampling processing respectively.
  • the standard down-sampling layer can use interpolation algorithms such as interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation) to perform standard down-sampling processing.
  • interpolation algorithms such as interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation) to perform standard down-sampling processing.
  • interpolation algorithm for standard down-sampling processing, only the interpolated value can be retained (the original pixel value is removed), thereby reducing the size of the feature map.
  • the method of standard downsampling processing in the eighth branch network N12 and the ninth branch network N13 may be the same, but the parameters may be different. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • the eighth branch network N12 is equivalent to omitting the convolution module that handles the highest resolution of the U channel
  • the ninth branch network N13 is equivalent to omitting the volume that handles the highest resolution of the V channel.
  • Step S213 Connect the seventh branch output, the eighth branch output and the ninth branch output to obtain a second intermediate output.
  • the seventh branch output O7, the eighth branch output O8, and the ninth branch output O9 can be connected with reference to the connection mode in the aforementioned second sub-network to obtain the second The intermediate output M2, the specific details will not be repeated here.
  • Step S214 Use the third backbone network to process the second intermediate output to obtain multiple first feature maps.
  • the third backbone network N10 may be used to process the second intermediate output M2 to obtain multiple first feature maps F1.
  • the third backbone network N10 may include a convolution module CN, so that standard convolution processing can be performed on the input second intermediate output M2 to obtain multiple first feature maps F1.
  • At least one second sub-network N2 can be used to perform step S220, that is, branch processing and weight sharing processing are performed on multiple first feature maps F1 to obtain multiple second features
  • step S220 branch processing and weight sharing processing are performed on multiple first feature maps F1 to obtain multiple second features
  • FIG. F2 for example, for specific details, reference may be made to the corresponding description of performing step S220 based on the generating network shown in FIG. 6A, which will not be repeated here.
  • the number of the second sub-network N2 being 1 is exemplary and should not be regarded as a limitation of the present disclosure.
  • the dense sub-network DenseNet can be used to perform step S230, that is, to process multiple second feature maps F2 to obtain multiple third feature maps F3.
  • step S230 is performed based on the generation network shown in FIG. 6A, which is not repeated here.
  • the third sub-network N3 can be used to perform step S240, that is, the third sub-network N3 is used to process multiple third feature maps F3 to obtain multiple fourth feature maps F4.
  • the third sub-network N3 may also include the second backbone network N30, the fourth branch network N31, the fifth branch network N32, and the sixth branch network N30.
  • the branch network N33 may also include: using the second backbone network N30 to process multiple third feature maps F3 to obtain multiple fifth feature maps F5;
  • the five feature map F5 is divided into the fourth branch input B4, the fifth branch input B5, and the sixth branch input B6; and the fourth branch network N31 is used to process the fourth branch input B4 to obtain the fourth branch network N31 corresponding
  • the fourth characteristic map F4, the fifth branch network N32 is used to process the fifth branch input B5 to obtain the fourth characteristic map F4 corresponding to the fifth branch network N32, and the sixth branch network N33 is used to process the sixth branch input B6 , To obtain the fourth characteristic map F4 corresponding to the sixth branch network N33.
  • the second backbone network N30 may also include an upsampling layer US, so that multiple input third feature maps F3 can be upsampled Process to obtain a plurality of fifth characteristic maps F5.
  • the fourth branch network N31 may also include a convolution module and an up-sampling layer to perform standard convolution processing and up-sampling processing.
  • both the fifth branch network N32 and the sixth branch network N33 may include a convolution module CN and a standard up-sampling layer SUS to Used for standard convolution processing and standard up-sampling processing.
  • the standard upsampling layer can use interpolation algorithms such as interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation) to perform standard upsampling processing.
  • interpolation algorithms such as interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation) to perform standard upsampling processing.
  • interpolation algorithm for standard upsampling processing, the original pixel value and the interpolation value can be retained, thereby increasing the size of the feature map.
  • the fifth branch network N32 is equivalent to omitting the U-channel highest resolution convolution module
  • the sixth branch network N33 is equivalent to omitting the V-channel highest resolution volume.
  • Product modules which can increase processing speed.
  • This is similar to the aforementioned seventh branch network N11, eighth branch network N12, and ninth branch network N13.
  • the standard upsampling layer SUS in N32 in the fifth branch network generally corresponds to the standard downsampling layer SDS in the eighth branch network N12
  • the standard upsampling layer SUS in N33 in the sixth branch network generally corresponds to The standard down-sampling layer SDS in the ninth branch network N13 appears correspondingly.
  • the embodiments of the present disclosure include but are not limited to this.
  • the synthesis module Merg can be used to perform step S250, that is, the synthesis module Merg is used to process multiple fourth feature maps F4 to obtain the output image OUT.
  • the specific details can refer to the aforementioned
  • the corresponding description of step S250 is performed based on the generating network shown in FIG. 6A, which is not repeated here.
  • the generation network shown in FIG. 6A and the generation network shown in FIG. 6B are only exemplary and not restrictive. It should also be noted that before training the generation network, the generation network may not have the function of image enhancement processing at all, or it may also have the function of image enhancement processing, but the effect of image enhancement processing is not good; the generation network to be trained The generated network after training has the function of image enhancement processing and can generate high-quality images.
  • Fig. 9A is a schematic diagram of an input image
  • Fig. 9B is a schematic diagram of an output image obtained by processing the input image shown in Fig. 9A according to the generation network shown in Fig. 6A
  • Fig. 9C is a diagram of an output image obtained according to the generation network shown in Fig. 6B
  • the output images shown in FIGS. 9B and 9C retain the content of the input image, but the contrast of the image is improved, and the problem of excessive dark input image is improved.
  • the quality of the output image can be close to that of photos taken by, for example, a digital single-lens reflex camera, that is, the output image is a high-quality image.
  • the image processing method provided by the embodiment of the present disclosure achieves the effect of image enhancement processing.
  • the image processing method provided by the embodiments of the present disclosure can combine branch processing and weight sharing processing to perform image enhancement processing, which can reduce the number of parameters and facilitate the calculation of gradients during backpropagation, thus, when outputting high-quality images At the same time, the processing speed and convergence speed can be improved.
  • FIG. 10 is a flowchart of a neural network training method provided by at least one embodiment of the present disclosure.
  • the training method includes:
  • Step S300 Training the discriminant network based on the generation network to be trained
  • Step S400 Training the generative network to be trained based on the discrimination network.
  • the above training process is performed alternately to obtain the generation network in the image processing method provided by any of the above embodiments of the present disclosure.
  • the structure of the generation network to be trained may be the same as the generation network shown in FIG. 6A or the generation network shown in FIG. 6B, and the embodiments of the present disclosure include but are not limited to this.
  • the generation network to be trained can execute the image processing method provided by any of the above embodiments of the present disclosure after being trained by the training method, that is, the generation network obtained by using the training method can execute the image provided by any of the above embodiments of the present disclosure Approach.
  • FIG. 11A is a schematic structural block diagram for training a generating network to be trained corresponding to the training method shown in FIG. 10 provided by at least one embodiment of the present disclosure
  • FIG. 11B is a training target provided by at least one embodiment of the present disclosure.
  • step S300 includes step S310 to step S330.
  • Step S310 Use the generation network to be trained to process the first training input image to generate the first training output image.
  • the first training input image may also include the camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a webcam.
  • Photos which may include images of persons, images of animals and plants, or landscape images, etc., which are not limited in the embodiments of the present disclosure.
  • the first training input image is a low-quality image, for example, the quality of the first training input image is lower than, for example, the quality of a photo taken by a digital single-lens reflex camera.
  • the first training input image may be an image in RGB format, and embodiments of the present disclosure include but are not limited to this.
  • the generation network G to be trained may have the structure of the generation network shown in FIG. 6A or the generation network shown in FIG. 6B.
  • the initial parameter of the generation network G to be trained may be a random number, for example, the random number conforms to a Gaussian distribution. It should be noted that the embodiments of the present disclosure do not limit this.
  • step S310 can refer to the relevant description of step S200, that is, the first training input image corresponds to the input image, the first training output image corresponds to the output image, and the first training output image is generated according to the first training input image.
  • the process can refer to the foregoing process of generating an output image based on the input image, which will not be repeated here.
  • Step S320 Based on the first training output image, calculate the system loss value of the generating network to be trained through the system loss function.
  • the system loss function may include generating a network countermeasure loss function, and accordingly, the system loss value may include generating a network countermeasure loss value.
  • the discriminant network D in the training process of the generation network G to be trained, the discriminant network D can be used to process the first training output image, and according to the output of the discriminant network D, the generation network is calculated by the generation network against the loss function Confrontation loss value.
  • FIG. 12 is a schematic structural diagram of a discrimination network provided by at least one embodiment of the present disclosure.
  • the discriminant network D includes multiple convolution modules CM, multiple down-sampling layers DS, and fully connected layers FCN.
  • the structures and functions of the convolution module CM, the down-sampling layer DS, and the fully connected layer FCN in the discrimination network D can be referred to the foregoing descriptions related to the convolution module, down-sampling layer, and fully connected layer, respectively.
  • the embodiments of the present disclosure There is no restriction on this.
  • each convolution module CM may include a convolution layer and an instance normalization layer; for example, at least part of the convolution module CM may also omit the instance normalization layer.
  • the discrimination network D further includes an activation layer, which is connected to the fully connected layer FCN.
  • the activation function of the activation layer may adopt a Sigmoid function, so that the output of the activation layer (that is, the output of the discriminant network D) is a value within the range of [0, 1].
  • the discrimination network D can determine whether the quality of the first training output image is close to a high-quality image (for example, a photo taken by a digital single-lens reflex camera), taking the first training output image as the input of the discrimination network D as an example, the discrimination network D processes the first training output image to obtain the output of the discrimination network D.
  • the value output by the discrimination network D indicates how close the quality of the first training output image is to that of photos taken by, for example, a digital single-lens reflex camera.
  • the larger the output value of the discriminant network D, for example, is close to 1 it means that the discriminant network D determines that the quality of the first training output image is closer to the quality of the photo taken by the digital single-lens reflex camera, that is, the first training output image
  • the discrimination network shown in FIG. 12 is schematic.
  • the discriminant network shown in FIG. 12 may include more or fewer convolution modules or downsampling layers.
  • the discrimination network shown in FIG. 12 may also include other modules or layer structures, for example, a flattening module is provided before the fully connected layer.
  • some modules or layer structures in the discriminant network shown in Figure 12 can be replaced with other modules or layer structures, for example, the fully connected layer is replaced with a convolutional layer that performs averaging (AVG) (refer to Figure 3 and the aforementioned related description), for example, the activation layer is replaced with a two-class softmax module.
  • AVG averaging
  • the embodiment of the present disclosure does not limit the structure of the discrimination network, which includes but is not limited to the discrimination network structure shown in FIG. 12.
  • the generated network confrontation loss function can be expressed as:
  • L G represents the generated network confrontation loss function
  • z1 represents the first training input image
  • P z1 (z1) represents the set of first training input images (for example, including a batch of multiple first training input images)
  • G (z1) represents the first training output image
  • D(G(z1)) represents the output of the discrimination network D for the first training output image, that is, the output obtained by the discrimination network D processing the first training output image
  • the training goal of the generative network G to be trained is to minimize the system loss value. Therefore, in the training process of the generative network G to be trained, minimizing the system loss value includes reducing the generation network counter loss value.
  • the label of the first training output image is set to 1, that is, the discriminant network D needs to distinguish the quality of the first training output image from the photos taken by, for example, a digital single-lens reflex camera. The quality is consistent.
  • the parameters of the generation network G to be trained are continuously modified, so that the first training output image generated by the generation network G to be trained after the parameter correction corresponds to the discrimination
  • the output of network D is constantly approaching 1, thereby continuously reducing the value of the generated network confrontation loss.
  • the system loss function may also include a content loss function, and accordingly, the system loss value may include a content loss value.
  • the analysis network A can be used to process the first training output image, and the content loss value can be calculated by the content loss function according to the output of the analysis network A.
  • FIG. 13 is a schematic structural diagram of an analysis network provided by at least one embodiment of the present disclosure.
  • the analysis network A includes a plurality of convolution modules CM connected in sequence and a plurality of down-sampling layers DS interposed between adjacent convolution modules CM.
  • each convolution module CM includes a convolution layer, and each convolution layer includes a plurality of convolution kernels.
  • the convolution kernels can be used to extract the content feature and style feature of the input image of the analysis network A.
  • the input of the analysis network A shown in FIG. 13 may include a first training input image and a first training output image.
  • each convolution module CM may include a convolution layer and an instance normalization layer; for example, at least part of the convolution module CM may also omit the instance normalization layer.
  • the analysis network A may adopt a deep neural network capable of classifying images, as shown in FIG. 13, and the input is processed by several convolution modules CM and down-sampling layer DS to extract features.
  • the output of each convolution module CM is its input feature image.
  • the down-sampling layer DS can reduce the size of the feature image and pass it to the next-level convolution module.
  • Multiple convolution modules CM can output multiple feature images, which can represent features of different levels of input (for example, texture, edge, object, etc.).
  • the feature image is input to the flattening layer, and the flattening layer converts the feature image into a vector and then passes it to the fully connected layer and the classifier.
  • the classifier layer can include a softmax classifier.
  • the softmax classifier can output the probability that the input belongs to each category identifier, and the identifier with the highest probability will be the final output of the analysis network A.
  • the analysis network A realizes image classification.
  • the analysis network A can use a trained convolutional neural network model. Therefore, during the training process of the generation network G to be trained, it is not necessary to modify the parameters of the analysis network A (for example, the convolution kernel, etc.).
  • the analysis network A can use neural network models such as AlexNet, GoogleNet, VGG, Deep Residual Learning, etc. to extract input content features and style features.
  • the VGG network is a type of deep convolutional neural network, which was developed by the Visual Geometry Group of Oxford University and has been widely used in the field of visual recognition.
  • a VGG network can include 19 layers, and some of them can be standardized.
  • the embodiments of the present disclosure do not limit the specific details of analyzing the structure of the network, extracting style features and content features (for example, the number and level of first convolution modules used to extract style features and content features, etc.). It should also be noted that, in some examples, in the training process of the generation network G to be trained, only the part of the aforementioned analysis network A used to extract the content features of its input is used.
  • the analysis network A is used to receive the first training input image and the first training output image, and respectively generate and output the first training input image.
  • the content feature represents the distribution of objects in the image in the entire image, for example, the content feature includes the content information of the image.
  • the analysis network A can be used to extract the first content feature map of the first training input image and the second content feature map of the first training output image, and according to the first content feature
  • the graph and the second content feature graph are used to calculate the content loss value of the generating network G to be trained through the content loss function.
  • the single-layer content loss function can be expressed as:
  • S 1 is a constant, Represents the value of the j-th position in the first content feature map of the first training input image extracted by the i-th convolution kernel in the m-th convolution module in the analysis network A, Represents the value of the j-th position in the second content feature map of the first training output image extracted by the i-th convolution kernel in the m-th convolution module in the analysis network A.
  • the content features of the input image can be extracted through at least one convolution module CM, Then the content loss function can be expressed as:
  • L content represents the content loss function
  • C m represents the single-layer content loss function of the m-th convolution module in at least one convolution module used to extract content features
  • w 1m represents the weight of C m .
  • minimizing the system loss value includes reducing the content loss value.
  • the generation network G to be trained for image enhancement it is necessary to keep the output and input of the generation network G to be trained with the same content characteristics, that is, the first training output image retains the content of the first training input image .
  • the parameters of the generation network G to be trained are continuously revised, so that the content characteristics of the first training output image generated by the generation network G to be trained after the parameter correction Keep approaching the content feature of the first training input image, thereby continuously reducing the content loss value.
  • the system loss function may also include a color loss function, and accordingly, the system loss value may include a color loss value.
  • a color loss function can be established according to the first training output image and the second training input image to calculate the color loss value.
  • the color loss function can be expressed as:
  • L color represents the color loss function
  • G(z1) represents the first training output image
  • I1 represents the second training input image
  • gaussian() represents the Gaussian fuzzification operation
  • abs() represents the absolute value operation.
  • the second training input image may be a high-quality image, that is, the quality of the second training input image is higher than that of the first training input image.
  • the second training input image may be, for example, a photo image taken by a digital single-lens reflex camera.
  • the second training input image may include an image of a person, an image of an animal or a plant, or a landscape image, which is not limited in the embodiment of the present disclosure.
  • the quality of the first training output image is close to, for example, the quality of photos taken by a digital single-lens reflex camera, which can be at least partially embodied in the following:
  • the local color distribution and brightness distribution are close to the same.
  • minimizing the system loss value includes reducing the color loss value.
  • the generation network G to be trained is used for image enhancement processing, it is necessary to make the color distribution and brightness distribution of each part of the first training output image close to the same as the photos taken by a digital single-lens reflex camera.
  • the parameters of the generation network G to be trained are continuously modified, so that the first training output image generated by the generation network G to be trained after the parameter correction is in each part
  • the color distribution and brightness distribution are close to those taken by, for example, a digital single-lens reflex camera, thereby continuously reducing the color loss value.
  • the first training output image has a first color channel, a second color channel, and a third color channel.
  • the system loss function may also include a comparison loss function, and accordingly, the system loss value may include a comparison loss value.
  • a comparison loss function can be established according to the first training output image and the third training input image to calculate the comparison loss value.
  • the contrast loss function can be expressed as:
  • L L1 0.299*abs(F G(z1) -F I2 )+0.587*abs(S G(z1) -S I2 )+0.299*abs(T G(z1) -T I2 )
  • L L1 represents the comparison loss function
  • G(z1) represents the first training output image
  • I2 represents the third training input image
  • F G(z1) , S G(z1) and T G(z1) respectively Represents the data information of the first color channel, the second color channel, and the third color channel of the first training output image
  • F I2 , S I2 and T I2 represent the first color channel and the third color channel of the third training input image, respectively.
  • abs() means absolute value calculation.
  • the third training input image may have the same scene as the first training input image, that is, the content is the same, and the quality of the third training input image is higher than that of the first training input image.
  • the third training input image may be A photo image taken by a digital single-lens reflex camera. Since the third training input image can be equivalent to the target output image of the generation network G to be trained, the contrast loss function is added to the system loss function, which can improve the convergence speed and processing speed.
  • minimizing the system loss value includes reducing the contrast loss value.
  • the first training output image needs to be close to the third training input image.
  • the parameters of the generation network G to be trained are continuously revised, so that the first training output image generated by the generation network G to be trained after the parameter correction is close to the first training output image. Three train the input image to continuously reduce the contrast loss value.
  • the system loss function of the generation network G to be trained can be expressed as:
  • L total represents the system loss function
  • ⁇ , ⁇ , ⁇ , and ⁇ are the weights of the system loss function to generate the network confrontation loss function, content loss function, color loss function, and contrast loss function, respectively.
  • the weight ⁇ of the contrast loss function may be less than the weight ⁇ of the content loss function; for example, in some examples, the weight ratio of the contrast loss function does not exceed 20%.
  • step S320 the system loss value can be calculated by the system loss function represented by the above formula, and then the subsequent step S330 is executed to correct the parameters of the generating network G to be trained, thereby implementing step S300.
  • Step S330 Correct the parameters of the generating network to be trained based on the system loss value.
  • the training process of the generator network G to be trained may also include an optimization function (not shown in FIG. 11A), and the optimization function may calculate the error value of the parameters of the generator network G according to the system loss value calculated by the system loss function, And according to the error value, the parameters of the generating network G to be trained are corrected.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc. to calculate the error value of the parameters of the generated network G.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the parameters of the generator network G to be trained include: modifying the parameters of the generator network shown in FIG. 6A except for the parameters of the synthesis module Merg The parameters of the synthesis module Merg remain unchanged.
  • the parameters of the generative network G to be trained include: the division conversion module Tran and the synthesis module in the generative network shown in FIG. 6B The parameters other than the Merg parameters are modified, that is, the parameters of the conversion module Tran and the synthesis module Merg remain unchanged.
  • step S300 may further include: judging whether the training of the generative network G to be trained meets a predetermined condition, and if the predetermined condition is not met, repeating the generative network to be trained The training process of G; if the predetermined conditions are met, the training process of the generation network G to be trained at this stage is stopped, and the generated network G trained at this stage is obtained.
  • the generation network G trained in this stage can be used as the generation network G to be trained in the next stage.
  • the foregoing predetermined condition is that the system loss values corresponding to two consecutive (or more) first training input images no longer decrease significantly.
  • the foregoing predetermined condition is that the number of training times or training periods of the generating network G reaches a predetermined number. It should be noted that the embodiments of the present disclosure do not limit this.
  • the discriminant network D and the analysis network A need to be jointly trained. It should be noted that during the training process of generating the network G, the parameters of the discriminating network D remain unchanged. It should be noted that when the analysis network A adopts a trained convolutional neural network model, the parameters of the analysis network A can also remain unchanged.
  • the generation network to be trained the discriminant network, the analysis network, and the various layers or modules included in these neural networks (such as convolution module, up-sampling layer, down-sampling layer, etc.)
  • Each corresponds to a program/method that executes a corresponding processing process, for example, implemented by corresponding software, firmware, hardware, etc.; and the above example is only a schematic illustration of the training process of the generating network to be trained.
  • the training phase a large number of sample images need to be used to train the neural network; at the same time, in the training process of each sample image, multiple iterations can be included to perform the parameters of the generated network to be trained. Fix.
  • the training phase also includes fine-tune the parameters of the generation network to be trained to obtain more optimized parameters.
  • FIG. 14A is a schematic structural block diagram of training a discriminant network corresponding to the training method shown in FIG. 10 provided by at least one embodiment of the present disclosure
  • FIG. 14B is a process of training a discriminant network provided by at least one embodiment of the present disclosure Schematic flowchart.
  • step S400 includes step S410 to step S430, as follows:
  • Step S410 Use the generation network to be trained to process the fourth training input image to generate a second training output image
  • Step S420 Based on the second training output image and the fifth training input image, calculate the discriminant network adversarial loss value through the discriminating network adversarial loss function;
  • Step S430 Correct the parameters of the discriminating network according to the discriminating network counter loss value.
  • training the discriminant network based on the generation network to be trained may also include: judging whether the training of the discriminating network D meets a predetermined condition, and if the predetermined condition is not met, repeating the above-mentioned training process of the discriminating network D; If the predetermined conditions are met, the training process of the discriminant network D at this stage is stopped, and the discriminant network D trained at this stage is obtained.
  • the above-mentioned predetermined condition is that the discriminant network confrontation loss value corresponding to two consecutive (or more) fifth training input images and second training output images is no longer significantly reduced.
  • the above-mentioned predetermined condition is that it is determined that the number of training times or training periods of the network D reaches a predetermined number. It should be noted that the embodiments of the present disclosure do not limit this.
  • the training phase only schematically illustrates the training process of the discriminant network.
  • Those skilled in the art should know that in the training phase, a large number of sample images need to be used to train the neural network; at the same time, in the training process of each sample image, multiple iterations can be included to correct the parameters of the discriminant network.
  • the training phase also includes fine-tune the parameters of the discriminant network to obtain more optimized parameters.
  • the initial parameter of the discrimination network D may be a random number, for example, the random number conforms to a Gaussian distribution, which is not limited in the embodiment of the present disclosure.
  • the training process of the discriminant network D can also include an optimization function (not shown in FIG. 14A), and the optimization function can calculate the error value of the parameters of the discriminant network D according to the discriminant network countermeasure loss value calculated by the discriminant network counter loss function. And according to the error value to correct the parameters of the discriminant network D.
  • the optimization function can use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc., to calculate the error value of the parameters of the discriminating network D.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the fourth training input image may be the same as the first training input image, for example, the set of fourth training input images and the set of first training input images are the same image set.
  • the fourth training input image may also include photos taken by the camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera or a web camera, etc., which may include images of people, animals and plants Images or landscape images, etc., are not limited in the embodiments of the present disclosure.
  • the fourth training input image is a low-quality image, for example, the quality of the fourth training input image is lower than, for example, the quality of a photo taken by a digital single-lens reflex camera.
  • the fourth training input image may be an image in RGB format, and embodiments of the present disclosure include but are not limited to this.
  • the fifth training input image is a high-quality image, that is, the quality of the fifth training input image is higher than that of the fourth training input image.
  • the fifth training input image may be a photo image taken by a digital single-lens reflex camera.
  • the fifth training input image may include an image of a person, an image of animals and plants, or a landscape image, which is not limited in the embodiment of the present disclosure.
  • the fifth training input image may be the same as the second training input image.
  • the fifth training input image set and the second training input image set are the same image set; of course, the fifth training input image may also be the same as the first training input image.
  • the two training input images are different, which is not limited in the embodiment of the present disclosure.
  • the discrimination network D may be the discrimination network shown in FIG. 12, but it is not limited to this.
  • the discriminative network adversarial loss function can be expressed as:
  • L D represents the discriminative network confrontation loss function
  • x represents the fifth training input image
  • P data (x) represents the set of the fifth training input image (for example, includes a batch of multiple fifth training input images)
  • D (x) represents the output of the discrimination network D for the fifth training input image x, that is, the output obtained by the discrimination network D processing the fifth training input image x, Represents the expectation for the set of fifth training input images
  • z2 represents the fourth training input image
  • P z2 (z2) represents the set of fourth training input images (for example, including a batch of multiple fourth training input images)
  • G(z2) represents the second training output image
  • D(G(z2)) represents the output of the discrimination network D for the second training output image, that is, the output obtained by the discrimination network D processing the second training output image, Represents expectation for the set of training input images. Therefore, the batch gradient descent algorithm can be used to optimize the parameters of the discriminant network D.
  • discriminant network countermeasure loss function expressed by the above formula is exemplary, and the present disclosure includes but is not limited to this.
  • the training goal of discriminant network D is to minimize the value of discriminant network confrontation loss.
  • the label of the fifth training input image is set to 1, that is, the discriminant network D needs to identify the fifth training input image as a photo image taken by a digital single-lens reflex camera, which is high Quality image;
  • the label of the second training output image is set to 0, that is, the discriminant network D needs to identify that the second training output image is not a photo image taken by, for example, a digital single-lens reflex camera, that is, a low-quality image.
  • the parameters of the discriminant network D are continuously modified, so that the discriminant network D after the parameter correction can accurately distinguish the quality of the fifth training input image and the second training output image, that is , So that the output of the discriminant network D corresponding to the fifth training input image is constantly approaching 1, and the output of the discriminant network D corresponding to the second training output image is constantly approaching 0, thereby continuously reducing the generation network counter loss value .
  • the training of the generating network G to be trained and the training of the discriminant network D are performed alternately and iteratively.
  • the first stage of training is generally performed on the discriminant network D to improve the discrimination ability of the discriminant network D (that is, the quality of the input to the discriminant network D).
  • the discriminant network D trained in the first stage then, based on the discriminant network D trained in the first stage, perform the first stage training on the generation network G (that is, the generation network G to be trained) to improve the image enhancement processing capability of the generation network G ( That is, the output of the generating network G is a high-quality image), and the generating network G after the first stage training is obtained.
  • the second-stage training Similar to the first-stage training, in the second-stage training, based on the generation network G trained in the first stage, perform the second-stage training on the discriminant network D trained in the first stage to improve the discriminating ability of the discriminant network D, and get The discriminant network D trained in the second stage; then, the generation network G trained in the first stage is trained in the second stage based on the discriminant network D trained in the second stage to improve the image enhancement processing capability of the generated network G, and get The generative network G trained in the second stage, and so on, and then the discriminant network D and the generative network G are trained in the third stage, the fourth stage of training, ... until the quality of the output of the generated network G can be close to, for example The quality of the photos taken by the digital SLR camera, that is, the training output image is a high-quality image.
  • the antibodies of generating network G and discriminating network D are now generating the output of network G (the high-resolution image generated by generating network G) in their own separate
  • the label is 1 in the training process of the generation network G
  • the label is 0 in the training process of the discriminant network D
  • the second part of the discriminant network against the loss function that is, with the generation network
  • the part related to the high-resolution image generated by G is opposite to the generated network counter loss function in the system loss function.
  • the image output by the generating network G obtained after training is a high-quality image (that is, close to the quality of photos taken by, for example, a digital single-lens reflex camera), and the judgment network D is for the fifth training input image
  • the output of the second training output image generated by the generating network G is both 0.5, that is, the generating network G and the discriminating network D reach the Nash equilibrium through the confrontation game.
  • the neural network training method provided by at least one embodiment of the present disclosure combines generative adversarial network technology.
  • the generative network trained by the training method can combine branch processing and weight sharing processing to perform image enhancement processing, which can reduce The number of parameters can also facilitate the calculation of the gradient during back propagation, so that the processing speed and convergence speed can be improved while outputting high-quality images.
  • FIG. 15 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • the image processing apparatus 500 includes a memory 510 and a processor 520.
  • the memory 510 is used for non-transitory storage of computer readable instructions
  • the processor 520 is used for running the computer readable instructions.
  • the image processing method provided by any embodiment of the present disclosure is executed. .
  • the memory 510 and the processor 520 may directly or indirectly communicate with each other.
  • components such as the memory 510 and the processor 520 may communicate through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, etc.
  • the wired network can, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication
  • the wireless network can, for example, use 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 520 may control other components in the image processing apparatus to perform desired functions.
  • the processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
  • the GPU can also be built into the central processing unit (CPU).
  • the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions.
  • the computer-readable storage medium may also store various application programs and various data, such as the first to fifth training input images, and various data used and/or generated by the application programs.
  • some computer instructions stored in the memory 510 can execute one or more steps in the image processing method provided by any embodiment of the present disclosure when executed by the processor 520, and/or can execute the neural network provided by any embodiment of the present disclosure.
  • One or more steps in the network training method can be executed.
  • the image processing device provided by the embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the image processing device may also include other conventional components or structures, for example, to achieve image processing. For the necessary functions of the device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.
  • FIG. 16 is a schematic diagram of a storage medium provided by an embodiment of the present disclosure.
  • the storage medium 600 non-transitory stores computer-readable instructions 601.
  • the non-transitory computer-readable instructions 601 are executed by a computer (including a processor), any of the embodiments of the present disclosure can be executed. Instructions for the image processing method.
  • one or more computer instructions may be stored on the storage medium 600.
  • Some computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the foregoing image processing method.
  • the other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the aforementioned neural network training method.
  • the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), optical disk read only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD -ROM optical disk read only memory
  • flash memory or any combination of the above storage media, can also be other suitable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法及装置、神经网络的训练方法、存储介质。该图像处理方法包括:获取输入图像;以及使用生成网络对输入图像进行处理,以生成输出图像;生成网络包括第一子网络和至少一个第二子网络,使用生成网络对输入图像进行处理,以生成输出图像,包括:使用第一子网络对输入图像进行处理,以得到多个第一特征图;使用至少一个第二子网络对多个第一特征图进行分支处理和权值共享处理,以得到多个第二特征图;以及对多个第二特征图进行处理,以得到输出图像。

Description

图像处理方法及装置、神经网络的训练方法、存储介质
本申请要求于2019年5月30日递交的中国专利申请第201910463969.5号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种图像处理方法及装置、神经网络的训练方法、存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如物体分类、文本处理、推荐引擎、图像搜索、面部识别、年龄和语音识别、人机对话以及情感计算等领域取得了巨大进展。随着人工神经网络结构的加深和算法的提升,深度学习技术在类人类数据感知领域取得了突破性的进展,深度学习技术可以用于描述图像内容、识别图像中的复杂环境下的物体以及在嘈杂环境中进行语音识别等。同时,深度学习技术还可以解决图像生成和融合的问题。
发明内容
本公开至少一个实施例提供一种图像处理方法,包括:获取输入图像;以及使用生成网络对所述输入图像进行处理,以生成输出图像;其中,所述生成网络包括第一子网络和至少一个第二子网络,使用所述生成网络对所述输入图像进行处理,以生成所述输出图像,包括:使用所述第一子网络对所述输入图像进行处理,以得到多个第一特征图;使用所述至少一个第二子网络对所述多个第一特征图进行分支处理和权值共享处理,以得到多个第二特征图;以及对所述多个第二特征图进行处理,以得到输出图像。
例如,在本公开一些实施例提供的图像处理方法中,每个所述第二子网络包括第一分支网络、第二分支网络、第三分支网络,每个所述第二子网络的所述分支处理包括:将每个所述第二子网络的输入划分为第一分支输入、第二分支输入和第三分支输入;以及使用所述第一分支网络对所述第一分支输入进行处理,以得到第一分支输出,使用所述第二分支网络对所述第二分支输入进行处理,以得到第二分支输出,使用所述第三分支网络对所述第三分支输入进行处理,以得到第三分支输出;其中,所述至少一个第二子网络包括第一个第二子网络,所述第一个第二子网络与所述第一子网络连接,所述多个第一特征图作为所述第一个第二子网络的输入。
例如,在本公开一些实施例提供的图像处理方法中,每个所述第二子网络还包括第一主干网络,每个所述第二子网络的所述权值共享处理包括:将所述第一分支输出、所述第二分支输出和所述第三分支输出进行连接,以得到第一中间输出;以及使用所述第一主干网络对所述第一中间输出进行处理,以得到每个所述第二子网络的输出。
例如,在本公开一些实施例提供的图像处理方法中,所述第一分支网络的处理包括标 准卷积处理,所述第二分支网络的处理包括标准卷积处理,所述第三分支网络的处理包括标准卷积处理,所述第一主干网络的处理包括标准卷积处理和下采样处理。
例如,在本公开一些实施例提供的图像处理方法中,所述生成网络还包括第三子网络,对所述多个第二特征图进行处理,以得到所述输出图像,包括:对所述多个第二特征图进行处理,以得到多个第三特征图;使用所述第三子网络对所述多个第三特征图进行处理,以得到多个第四特征图;以及对所述多个第四特征图进行合成处理,以得到输出图像。
例如,在本公开一些实施例提供的图像处理方法中,所述第三子网络包括第二主干网络、第四分支网络、第五分支网络和第六分支网络,使用所述第三子网络对所述多个第三特征图进行处理,以得到所述多个第四特征图,包括:使用所述第二主干网络对所述多个第三个特征图进行处理,以得到多个第五特征图;将所述多个第五特征图划分为第四分支输入、第五分支输入和第六分支输入;以及使用所述第四分支网络对所述第四分支输入进行处理,以得到所述第四分支网络对应的第四特征图,使用所述第五分支网络对所述第五分支输入进行处理,以得到所述第五分支网络对应的第四特征图,使用所述第六分支网络对所述第六分支输入进行处理,以得到所述第六分支网络对应的第四特征图;其中,所述多个第四特征图包括所述第四分支网络对应的第四特征图、所述第五分支网络对应的第四特征图和所述第六分支网络对应的第四特征图。
例如,在本公开一些实施例提供的图像处理方法中,所述第二主干网络的处理包括上采样处理,所述第四分支网络的处理包括标准卷积处理,所述第五分支网络的处理包括标准卷积处理,所述第六分支网络的处理包括标准卷积处理。
例如,在本公开一些实施例提供的图像处理方法中,所述第四分支网络的处理还包括上采样处理,所述第五分支网络的处理还包括上采样处理,所述第六分支网络的处理还包括上采样处理。
例如,在本公开一些实施例提供的图像处理方法中,所述第一子网络的处理包括标准卷积处理,使用所述第一子网络对所述输入图像进行处理,以得到所述多个第一特征图,包括:使用所述第一子网络对所述输入图像进行标准卷积处理,以得到所述多个第一特征图。
例如,在本公开一些实施例提供的图像处理方法中,所述输入图像具有第一颜色通道、第二颜色通道和第三颜色通道,所述第一子网络包括转换模块、第七分支网络、第八分支网络、第九分支网络和第三主干网络,使用所述第一子网络对所述输入图像进行处理,以得到所述多个第一特征图,包括:使用所述转换模块将所述输入图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息转换为中间输入图像的第一亮度信道、第一色差信道和第二色差信道的数据信息;使用所述第七分支网络对所述中间输入图像的第一亮度信道的数据信息进行处理,以得到第七分支输出,使用所述第八分支网络对所述中间输入图像的第一色差信道的数据信息进行处理,以得到第八分支输出,使用所述第九分支网络对所述中间输入图像的第二色差信道的数据信息进行处理,以得到第九分支输出;将所述第 七分支输出、所述第八分支输出和所述第九分支输出进行连接,以得到第二中间输出;以及使用所述第三主干网络对所述第二中间输出进行处理,以得到所述多个第一特征图。
例如,在本公开一些实施例提供的图像处理方法中,所述第七分支网络的处理包括标准卷积处理和下采样处理,所述第八分支网络的处理包括标准下采样处理,所述第九分支网络的处理包括标准下采样处理。
例如,在本公开一些实施例提供的图像处理方法中,所述第四分支网络的处理包括标准卷积处理和上采样处理,所述第五分支网络的处理包括标准卷积处理和标准上采样处理,所述第六分支网络的处理包括标准卷积处理和标准上采样处理。
例如,在本公开一些实施例提供的图像处理方法中,所述生成网络还包括密集子网络,所述密集子网络包括N个密集模块,对所述多个第二特征图进行处理,以得到所述多个第三特征图,包括:使用所述密集子网络对所述多个第二特征图进行处理,以得到所述多个第三特征图;其中,所述多个第二特征图作为所述N个密集模块中的第1个密集模块的输入,所述多个第二特征图与所述N个密集模块中的第i个密集模块之前的i-1个密集模块的输出连接,作为所述第i个密集模块的输入,所述多个第二特征图和每个所述密集模块的输出进行连接,作为所述多个第三特征图,N、i为整数,N≥2,i≥2且i≤N。
例如,在本公开一些实施例提供的图像处理方法中,每个密集模块的处理包括降维处理和卷积处理。
例如,在本公开一些实施例提供的图像处理方法中,所述生成网络还包括合成模块,对所述多个第四特征图进行合成处理,以得到所述输出图像,包括:使用所述合成模块对所述多个第四特征图进行合成处理,以得到所述输出图像。
例如,在本公开一些实施例提供的图像处理方法中,所述合成模块包括第一转换矩阵,使用所述合成模块对所述多个第四特征图进行合成处理,以得到所述输出图像,包括:利用所述第一转换矩阵,将所述第四分支网络对应的第四特征图的数据信息、所述第五分支网络对应的第四特征图的数据信息和所述第六分支网络对应的第四特征图的数据信息转换为所述输出图像的第一颜色通道的数据信息、第二颜色通道的数据信息和第三颜色通道的数据信息,以得到所述输出图像。
本公开至少一个实施例还提供一种神经网络的训练方法,包括:基于待训练的生成网络,对判别网络进行训练;基于所述判别网络,对所述待训练的生成网络进行训练;以及,交替地执行上述训练过程,以得到本公开任一实施例提供的图像处理方法中的所述生成网络;其中,基于所述判别网络,对所述待训练的生成网络进行训练,包括:使用所述待训练的生成网络对第一训练输入图像进行处理,以生成第一训练输出图像;基于所述第一训练输出图像,通过系统损失函数计算所述待训练的生成网络的系统损失值;以及基于所述系统损失值对所述待训练的生成网络的参数进行修正。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数包括生成网络对抗损失函数,所述系统损失值包括生成网络对抗损失值;所述生成网络对抗损失函数表示为:
Figure PCTCN2020092917-appb-000001
其中,L G表示所述生成网络对抗损失函数,z1表示所述第一训练输入图像,P z1(z1)表示所述第一训练输入图像的集合,G(z1)表示所述第一训练输出图像,D(G(z1))表示所述判别网络针对所述第一训练输出图像的输出,
Figure PCTCN2020092917-appb-000002
表示针对所述第一训练输入图像的集合求期望以得到所述生成网络对抗损失值。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数还包括内容损失函数,所述系统损失值还包括内容损失值;
基于所述第一训练输出图像,通过系统损失函数计算所述待训练的生成网络的系统损失值,包括:使用分析网络提取所述第一训练输入图像的第一内容特征图和所述第一训练输出图像的第二内容特征图,根据所述第一内容特征图和所述第二内容特征图,通过所述内容损失函数计算所述生成网络的所述内容损失值,
其中,所述分析网络包括用于提取所述第一内容特征图和所述第二内容特征图的至少一个卷积模块;
所述内容损失函数表示为:
Figure PCTCN2020092917-appb-000003
其中,L content表示所述内容损失函数,C m表示所述至少一个卷积模块中的第m个卷积模块的单层内容损失函数,w 1m表示C m的权重;
所述单层内容损失函数表示为:
Figure PCTCN2020092917-appb-000004
其中,S 1为常数,
Figure PCTCN2020092917-appb-000005
表示在所述第m个卷积模块中第i个卷积核提取的所述第一训练输入图像的第一内容特征图中第j个位置的值,
Figure PCTCN2020092917-appb-000006
表示在所述第m个卷积模块中第i个卷积核提取的所述第一训练输出图像的第二内容特征图中第j个位置的值。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数还包括颜色损失函数,所述系统损失值还包括颜色损失值;所述颜色损失函数表示为:
L color=abs(gaussian(G(z1))-gaussian(I1))
其中,L color表示所述颜色损失函数,G(z1)表示所述第一训练输出图像,I1表示第二训练输入图像,gaussian()表示高斯模糊化运算,abs()表示求绝对值运算;
所述第二训练输入图像的质量比所述第一训练输入图像的质量高。
例如,在本公开一些实施例提供的训练方法中,所述第一训练输出图像具有第一颜色通道、第二颜色通道和第三颜色通道;
所述系统损失函数还包括对比损失函数,所述系统损失值还包括对比损失值;所述对比损失函数表示为:
L L1=0.299*abs(F G(z1)-F I2)+0.587*abs(S G(z1)-S I2)+0.299*abs(T G(z1)-T I2)
其中,L L1表示所述对比损失函数,G(z1)表示所述第一训练输出图像,I2表示第三训 练输入图像,F G(z1)、S G(z1)和T G(z1)分别表示所述第一训练输出图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,F I2、S I2和T I2分别表示所述第三训练输入图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,abs()表示求绝对值运算;
所述第三训练输入图像具有与所述第一训练输入图像相同的场景,且所述第三训练输入图像的质量比所述第一训练输入图像的质量高。
例如,在本公开一些实施例提供的训练方法中,基于所述待训练的生成网络,对所述判别网络进行训练,包括:利用所述待训练的生成网络对第四训练输入图像进行处理,以生成第二训练输出图像;基于所述第二训练输出图像和第五训练输入图像,通过判别网络对抗损失函数计算判别网络对抗损失值;以及根据所述判别网络对抗损失值对所述判别网络的参数进行修正;其中,所述第五训练输入图像的质量比所述第四训练输入图像的质量高。
例如,在本公开一些实施例提供的训练方法中,所述判别网络对抗损失函数表示为:
Figure PCTCN2020092917-appb-000007
其中,L D表示所述判别网络对抗损失函数,x表示所述第五训练输入图像,P data(x)表示所述第五训练输入图像的集合,D(x)表示所述判别网络针对所述第五训练输入图像的输出,
Figure PCTCN2020092917-appb-000008
表示针对所述第五训练输入图像的集合求期望,z2表示所述第四训练输入图像,P z2(z2)表示所述第四训练输入图像的集合,G(z2)表示所述第二训练输出图像,D(G(z2))表示所述判别网络针对所述第二训练输出图像的输出,
Figure PCTCN2020092917-appb-000009
表示针对所述第四训练输入图像的集合求期望。
本公开至少一个实施例还提供一种图像处理装置,包括:存储器,用于非暂时性存储计算机可读指令;以及处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行本公开任一实施例提供的图像处理方法或本公开任一实施例提供的神经网路的训练方法。
本公开至少一个实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行本公开任一实施例提供的图像处理方法或本公开任一实施例提供的神经网路的训练方法。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为一种卷积神经网络的示意图;
图2A为一种卷积神经网络的结构示意图;
图2B为一种卷积神经网络的工作过程示意图;
图3为另一种卷积神经网络的结构示意图;
图4为本公开至少一实施例提供的一种图像处理方法的流程图;
图5为一种对应于图4中所示的步骤S200的示例性流程图;
图6A为本公开至少一实施例提供的一种对应于图4中所示的图像处理方法的生成网络的示意性架构框图;
图6B为本公开至少一实施例提供的另一种对应于图4中所示的图像处理方法的生成网络的示意性架构框图;
图7为本公开至少一实施例提供的一种密集子网络的结构示意图;
图8A为本公开至少一实施例提供的一种上采样层的示意图;
图8B为本公开至少一实施例提供的另一种上采样层的示意图;
图9A为一种输入图像的示意图;
图9B为根据图6A所示的生成网络对图9A所示的输入图像进行处理得到的输出图像的示意图;
图9C为根据图6B所示的生成网络对图9A所示的输入图像进行处理得到的输出图像的示意图;
图10为本公开至少一实施例提供的一种神经网络的训练方法的流程图;
图11A为本公开至少一实施例提供的一种对应于图10中所示的训练方法训练待训练的生成网络的示意性架构框图;
图11B为本公开至少一实施例提供的一种训练待训练的生成网络的过程的示意性流程图;
图12为本公开至少一实施例提供的一种判别网络的结构示意图;
图13为本公开至少一实施例提供的一种分析网络的结构示意图;
图14A为本公开至少一实施例提供的一种对应于图10中所示的训练方法训练判别网络的示意性架构框图;
图14B为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图;
图15为本公开至少一实施例提供的一种图像处理装置的示意性框图;以及
图16为本公开至少一实施例提供的一种存储介质的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者 物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本公开实施例的以下说明清楚且简明,本公开省略了已知功能和已知部件的详细说明。当本公开实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同或类似的参考标号表示。
图像增强是图像处理领域的研究热点之一。由于在图像采集过程中存在各种物理因素的限制(例如,手机相机的图像传感器尺寸太小以及其他软件、硬件的限制等)以及环境噪声的干扰,会导致图像质量大大降低。图像增强的目的是通过图像增强技术,改善图像的灰度直方图,提高图像的对比度,从而凸显图像细节信息,改善图像的视觉效果。
利用深度神经网络进行图像增强是随着深度学习技术的发展而新兴起来的技术。例如,基于卷积神经网络,可以对手机拍摄的低质量的照片(输入图像)进行处理以获得高质量的输出图像,该输出图像的质量可以接近于数码单镜反光相机(Digital Single Lens Reflex Camera,常简称为DSLR,也简称为数码单反相机)拍摄的照片的质量。例如,常用峰值信噪比(Peak Signal to Noise Ratio,PSNR)指标来表征图像质量,其中PSNR值越高表示图像越接近于数码单镜反光相机拍摄的照片。
例如,Andrey Ignatov等人提出了一种卷积神经网络实现图像增强的方法,请参见文献,Andrey Ignatov,Nikolay Kobyshev,Kenneth Vanhoey,Radu Timofte,Luc Van Gool,DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks.arXiv:1704.02470v2[cs.CV],2017年9月5日。在此将该文献全文引用结合于此,以作为本申请的一部分。该方法主要是利用卷积层、批量标准化层及残差连接构建了一种单一尺度的卷积神经网络,利用该网络可以将输入的低质量图像(例如,对比度较低,图像曝光不足或曝光过度,整幅图像过暗或过亮等)处理成一张较高质量图像。利用颜色损失、纹理损失及内容损失作为训练中的损失函数,能够取得较好的处理效果。
最初,卷积神经网络(Convolutional Neural Network,CNN)主要用于识别二维形状,其对图像的平移、比例缩放、倾斜或其他形式的变形具有高度不变性。CNN主要通过局部感知野和权值共享来简化神经网络模型的复杂性、减少权重的数量。随着深度学习技术的发展,CNN的应用范围已经不仅仅限于图像识别领域,其也可以应用在人脸识别、文字识别、动物分类、图像处理等领域。
图1示出了一种卷积神经网络的示意图。例如,该卷积神经网络可以用于图像处理,其使用图像作为输入和输出,并通过卷积核替代标量的权重。图1中仅示出了具有3层结构的卷积神经网络,本公开的实施例对此不作限制。如图1所示,卷积神经网络包括输入层101、隐藏层102和输出层103。输入层101具有4个输入,隐藏层102具有3个输出,输出层103具有2个输出,最终该卷积神经网络最终输出2幅图像。
例如,输入层101的4个输入可以为4幅图像,或者1幅图像的四种特征图像。隐藏层102的3个输出可以为经过输入层101输入的图像的特征图像。
例如,如图1所示,卷积层具有权重
Figure PCTCN2020092917-appb-000010
和偏置
Figure PCTCN2020092917-appb-000011
权重
Figure PCTCN2020092917-appb-000012
表示卷积核,偏置
Figure PCTCN2020092917-appb-000013
是叠加到卷积层的输出的标量,其中,k是表示输入层101的标签,i和j分别是输入层101的单元和隐藏层102的单元的标签。例如,第一卷积层201包括第一组卷积核(图1中的
Figure PCTCN2020092917-appb-000014
)和第一组偏置(图1中的
Figure PCTCN2020092917-appb-000015
)。第二卷积层202包括第二组卷积核(图1中的
Figure PCTCN2020092917-appb-000016
)和第二组偏置(图1中的
Figure PCTCN2020092917-appb-000017
)。通常,每个卷积层包括数十个或数百个卷积核,若卷积神经网络为深度卷积神经网络,则其可以包括至少五层卷积层。
例如,如图1所示,该卷积神经网络还包括第一激活层203和第二激活层204。第一激活层203位于第一卷积层201之后,第二激活层204位于第二卷积层202之后。激活层(例如,第一激活层203和第二激活层204)包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。ReLU函数为非饱和非线性函数,Sigmoid函数和tanh函数为饱和非线性函数。例如,激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层(例如,第一卷积层201可以包括第一激活层203,第二卷积层202可以包括第二激活层204)中。
例如,在第一卷积层201中,首先,对每个输入应用第一组卷积核中的若干卷积核
Figure PCTCN2020092917-appb-000018
和第一组偏置中的若干偏置
Figure PCTCN2020092917-appb-000019
以得到第一卷积层201的输出;然后,第一卷积层201的输出可以通过第一激活层203进行处理,以得到第一激活层203的输出。在第二卷积层202中,首先,对输入的第一激活层203的输出应用第二组卷积核中的若干卷积核
Figure PCTCN2020092917-appb-000020
和第二组偏置中的若干偏置
Figure PCTCN2020092917-appb-000021
以得到第二卷积层202的输出;然后,第二卷积层202的输出可以通过第二激活层204进行处理,以得到第二激活层204的输出。例如,第一卷积层201的输出可以为对其输入应用卷积核
Figure PCTCN2020092917-appb-000022
后再与偏置
Figure PCTCN2020092917-appb-000023
相加的结果,第二卷积层202的输出可以为对第一激活层203的输出应用卷积核
Figure PCTCN2020092917-appb-000024
后再与偏置
Figure PCTCN2020092917-appb-000025
相加的结果。
在利用卷积神经网络进行图像处理前,需要对卷积神经网络进行训练。经过训练之后,卷积神经网络的卷积核和偏置在图像处理期间保持不变。在训练过程中,各卷积核和偏置通过多组输入/输出示例图像以及优化算法进行调整,以获取优化后的卷积神经网络模型。
图2A示出了一种卷积神经网络的结构示意图,图2B示出了一种卷积神经网络的工作过程示意图。例如,如图2A和2B所示,输入图像通过输入层输入到卷积神经网络后,依次经过若干个处理过程(如图2A中的每个层级)后输出类别标识。卷积神经网络的主要组成部分可以包括多个卷积层、多个下采样层和全连接层等。在本公开中,应该理解的是,多个卷积层、多个下采样层和全连接层等这些层每个都指代对应的处理操作,即卷积处理、下采样处理、全连接处理等,所描述的神经网络也都指代对应的处理操作,以下将要描述的实例标准化层或层标准化层等也与此类似,这里不再重复说明。例如,一个完整的卷积神经网络可以由这三种层叠加组成。例如,图2A仅示出了一种卷积神经网络的三个层级, 即第一层级、第二层级和第三层级。例如,每个层级可以包括一个卷积模块和一个下采样层。例如,每个卷积模块可以包括卷积层。由此,每个层级的处理过程可以包括:对输入图像进行卷积(convolution)处理以及下采样(sub-sampling/down-sampling)处理。例如,根据实际需要,每个卷积模块还可以包括实例标准化(instance normalization)层,从而每个层级的处理过程还可以包括标准化处理。
例如,实例标准化层用于对卷积层输出的特征图像进行实例标准化处理,以使特征图像的像素的灰度值在预定范围内变化,从而简化图像生成过程,改善图像增强的效果。例如,预定范围可以为[-1,1]。实例标准化层根据每个特征图像自身的均值和方差,对该特征图像进行实例标准化处理。例如,实例标准化层还可用于对单幅图像进行实例标准化处理。
例如,假设小批梯度下降法(mini-batch gradient decent)的尺寸为T,某一卷积层输出的特征图像的数量为C,且每个特征图像均为H行W列的矩阵,则特征图像的模型表示为(T,C,W,H)。从而,实例标准化层的实例标准化公式可以表示如下:
Figure PCTCN2020092917-appb-000026
其中,x tijk为该卷积层输出的特征图像集合中的第t个特征块(patch)、第i个特征图像、第j行、第k列的值。y tijk表示经过实例标准化层处理x tijk后得到的结果。ε 1为一个很小的整数,以避免分母为0。
卷积层是卷积神经网络的核心层。在卷积神经网络的卷积层中,一个神经元只与部分相邻层的神经元连接。卷积层可以对输入图像应用若干个卷积核(也称为滤波器),以提取输入图像的多种类型的特征。每个卷积核可以提取一种类型的特征。卷积核一般以随机小数矩阵的形式初始化,在卷积神经网络的训练过程中卷积核将通过学习以得到合理的权值。对输入图像应用一个卷积核之后得到的结果被称为特征图像(feature map),特征图像的数目与卷积核的数目相等。一个层级的卷积层输出的特征图像可以被输入到相邻的下一个层级的卷积层并再次处理以得到新的特征图像。例如,如图2A所示,第一层级的卷积层可以输出第一特征图像,该第一特征图像被输入到第二层级的卷积层再次处理以得到第二特征图像。
例如,如图2B所示,卷积层可以使用不同的卷积核对输入图像的某一个局部感受域的数据进行卷积,卷积结果被输入激活层,该激活层根据相应的激活函数进行计算以得到输入图像的特征信息。
例如,如图2A和2B所示,下采样层设置在相邻的卷积层之间,下采样层是下采样的一种形式。一方面,下采样层可以用于缩减输入图像的规模,简化计算的复杂度,在一定程度上减小过拟合的现象;另一方面,下采样层也可以进行特征压缩,提取输入图像的主要特征。下采样层能够减少特征图像的尺寸,但不改变特征图像的数量。例如,一个尺寸为12×12的输入图像,通过6×6的卷积核对其进行采样,那么可以得到2×2的输出图像, 这意味着输入图像上的36个像素合并为输出图像中的1个像素。最后一个下采样层或卷积层可以连接到一个或多个全连接层,全连接层用于连接提取的所有特征。全连接层的输出为一个一维矩阵,也就是向量。
图3示出了另一种卷积神经网络的结构示意图。例如,参见图3所示的示例,最后一个卷积层(即第t个卷积层)的输出被输入到平坦化层以进行平坦化操作(Flatten)。平坦化层可以将特征图像(2D图像)转换为向量(1D)。该平坦化操作可以按照如下的方式进行:
v k=f k/j,k%j
其中,v是包含k个元素的向量,f是具有i行j列的矩阵。
然后,平坦化层的输出(即1D向量)被输入到一个全连接层(FCN)。全连接层可以具有与卷积神经网络相同的结构,但不同之处在于,全连接层使用不同的标量值以替代卷积核。
例如,最后一个卷积层的输出也可以被输入到均化层(AVG)。均化层用于对输出进行平均操作,即利用特征图像的均值表示输出图像,因此,一个2D的特征图像转换成为一个标量。例如,如果卷积神经网络包括均化层,则其可以不包括平坦化层。
例如,根据实际需要,均化层或全连接层可以连接到分类器,分类器可以根据提取的特征进行分类,分类器的输出可以作为卷积神经网络的最终输出,即表征图像类别的类别标识(label)。
例如,分类器可以为支持向量机(Support Vector Machine,SVM)分类器、softmax分类器以及最邻近规则(KNN)分类器等。如图3所示,在一个示例中,卷积神经网络包括softmax分类器,softmax分类器是一种逻辑函数的生成器,可以把一个包含任意实数的K维向量z压缩成K维向量σ(z)。softmax分类器的公式如下:
Figure PCTCN2020092917-appb-000027
其中,Z j表示K维向量z中第j个元素,σ(z)表示每个类别标识(label)的预测概率,σ(z)为实数,且其范围为(0,1),K维向量σ(z)的和为1。根据以上公式,K维向量z中的每个类别标识均被赋予一定的预测概率,而具有最大预测概率的类别标识被选择作为输入图像的标识或类别。
本公开至少一实施例提供一种图像处理方法。该图像处理方法包括:获取输入图像;以及使用生成网络对所述输入图像进行处理,以生成输出图像;其中,所述生成网络包括第一子网络和至少一个第二子网络,使用所述生成网络对所述输入图像进行处理,以生成所述输出图像,包括:使用所述第一子网络对所述输入图像进行处理,以得到多个第一特征图;使用所述至少一个第二子网络对所述多个第一特征图进行分支处理和权值共享处理,以得到多个第二特征图;以及对所述多个第二特征图进行处理,以得到输出图像。
本公开的一些实施例还提供对应于上述图像处理方法的图像处理装置、神经网络的训 练方法及存储介质。
本公开至少一实施例提供的图像处理方法结合分支处理和权值共享处理以进行图像增强处理,既可以减少参数数量,又可以便于反向传播时计算梯度,从而,在输出高质量图像的同时还可以提高处理速度和收敛速度。
下面结合附图对本公开的一些实施例及其示例进行详细说明。
图4为本公开至少一实施例提供的一种图像处理方法的流程图。例如,如图4所示,该图像处理方法包括步骤S100至步骤S200。
步骤S100:获取输入图像。
例如,在步骤S100中,输入图像可以包括通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等拍摄采集的照片,其可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。例如,输入图像为低质量图像,输入图像的质量低于例如采用数码单镜反光相机拍摄的照片的质量。例如,输入图像可以为包括3个通道的RGB图像,本公开的实施例包括但不限于此。
步骤S200:使用生成网络对输入图像进行处理,以生成输出图像。
例如,在步骤S200中,生成网络可以对输入图像进行图像增强处理,从而使输出图像为高质量图像,例如,输出图像的质量接近于例如数码单镜反光相机拍摄的照片。
图5为一种对应于图4中所示的步骤S200的示例性流程图,图6A为本公开至少一实施例提供的一种对应于图4中所示的图像处理方法的生成网络的示意性架构框图,图6B为本公开至少一实施例提供的另一种对应于图4中所示的图像处理方法的生成网络的示意性架构框图。
以下,先结合图6A所示的生成网络对图5所示的步骤S200进行详细说明。
例如,如图5所示,使用生成网络对输入图像进行处理,以生成输出图像,即步骤S200,包括步骤S210、步骤S220和步骤S225。
步骤S210:使用第一子网络对输入图像进行处理,以得到多个第一特征图。
例如,在一些示例中,如图6A所示,生成网络可以包括第一子网络N1。例如,如图6A所示,第一子网络N1可以包括卷积模块CN,即第一子网络N1的处理包括标准卷积处理,从而步骤S210可以包括使用第一子网络N1对输入图像IN进行标准卷积处理,以得到多个第一特征图F1。需要说明的是,在本公开的实施例中,卷积模块CN可以包括卷积层和实例标准化层,从而,标准卷积处理可以包括卷积处理和实例标准化处理,下文与此类似,不再重复赘述。
步骤S220:使用至少一个第二子网络对多个第一特征图进行分支处理和权值共享处理,以得到多个第二特征图。
例如,在一些示例中,如图6A所示,生成网络可以包括至少一个第二子网络N2。例如,如图6A所示,在一些实施例中,生成网络可以包括两个第二子网络N2,即第一个第二子网络N2和第二个第二子网络N2(需要说明的是,在图6A中靠近第一子网络N1的第 二子网络N2为第一个第二子网络),第一个第二子网络N2与第一子网络N1连接,第一个第二子网络N2和第二个第二子网络N2连接。从而,可以使用两个第二子网络N2对多个第一特征图F1进行处理。例如,如图6A所示,可以将多个第一特征图F1作为第一个第二子网络N2的输入,第一个第二子网络N2的输出作为第二个第二子网络N2的输入第二个第二子网络N2的输出为多个第二特征图F2。
需要说明的是,在本公开中,两个子网络“连接”可以表示在信号传输的方向上将两个子网络中的靠前的一个子网络的输出作为两个子网络中的靠后的另一个子网络的输入。例如,“第一个第二子网络N2与第一子网络N1连接”可以表示将第一子网络N1的输出作为第一个第二子网络N2的输入。
例如,在一些示例中,如图6A所示,每个第二子网络N2可以包括第一分支网络N21、第二分支网络N22、第三分支网络N23,从而,每个第二子网络N2的分支处理可以包括:将每个第二子网络的输入(如图6A中每个第二子网络N2中的dc所示)划分为第一分支输入B1、第二分支输入B2和第三分支输入B3;以及,使用第一分支网络N21对第一分支输入B1进行处理,以得到第一分支输出O1,使用所第二分支网络N22对第二分支输入B2进行处理,以得到第二分支输出O2,使用第三分支网络N23对第三分支输入B3进行处理,以得到第三分支输出O3。
例如,在本公开的一些实施例中,互相对应的每个分支网络的输入所包括的特征图的数量可以相同,例如,第一分支输入B1所包括的特征图的数量、第二分支输入B2所包括的特征图的数量和第三分支输入B3所包括的特征图的数量均相同。需要说明的是,本公开的实施例对此不作限制。例如,在本公开的另一些实施例中,互相对应的每个分支网络的输入所包括的特征图的数量可以互不相同或者不完全相同,例如,第二分支输入B2所包括的特征图的数量和第三分支输入B3所包括的特征图的数量相同,但不同于第一分支输入B1所包括的特征图的数量。应当理解的是,互相对应的分支网络是指处于同一层级的分支网络,例如,第一分支网络N21、第二分支网络N22和第三分支网络N23是一组互相对应的分支网络,下文中将要介绍的第四分支网络N31、第五分支网络N32和第六分支网络N33是一组互相对应的分支网络,以及下文中将要介绍的第七分支网络N11、第八分支网络N12和第九分支网络N13也是一组互相对应的分支网络。因此,第四分支网络N31、第五分支网络N32和第六分支网络N33的输入所包括的特征图的数量的要求,以及第七分支网络N11、第八分支网络N12和第九分支网络N13的输入所包括的特征图的数量的要求,均可以参考第一分支网络N21、第二分支网络N22和第三分支网络N23的输入所包括的特征图的数量的要求,下文中不再重复赘述。
例如,每个第二子网络N2的输入可以包括3n个特征图,其中n为正整数,从而,可以将第1~n个特征图划分为第一分支输入B1,将第n+1~2n个特征图划分为第二分支输入B2,将第2n+1~3n个特征图划分为第三分支输入B3;或者,也可以将第1、4、7、…、3n-2个特征图划分为第一分支输入B1,将第2、5、8、…、3n-1个特征图划分为第二分支输入 B2,将第3、6、9、…、3n个特征图划分为第三分支输入B3。需要说明的是,本公开的实施例对具体的划分方式不作限制。
例如,如图6A所示,第一分支网络N21、第二分支网络N22和第三分支网络N23均可以包括卷积模块CN,从而可以分别对第一分支输入B1、第二分支输入B2和第三分支输入B3进行标准卷积处理,以得到对应的第一分支输出O1、第二分支输出O2和第三分支输出O3。例如,在一些示例中,第一分支网络N21、第二分支网络N22和第三分支网络N23的标准卷积处理的次数可以相同;当然,第一分支网络N21、第二分支网络N22和第三分支网络N23的标准卷积处理的参数可以互不相同。需要说明的是,本公开的实施例对此均不作限制。
例如,在一些示例中,如图6A所示,每个第二子网络N2还可以包括第一主干网络N20,从而,每个第二子网络N2的权值共享处理可以包括:将第一分支输出O1、第二分支输出O2和第三分支输出O3进行连接,以得到第一中间输出M1(如图6A中每个第二子网络中的c所示);以及,使用第一主干网络N20对第一中间输出M1进行处理,以得到每个第二子网络的输出。
例如,以第一分支输出O1、第二分支输出O2和第三分支输出O3包括的特征图均为H行W列的矩阵为例,第一分支输出O1包括的特征图的数量为C1,第二分支输出O2包括的特征图的数量为C2,第三分支输出O3包括的特征图的数量为C3,则第一分支输出O1、第二分支输出O2和第三分支输出O3的模型分别为(C1,H,W)、(C2,H,W)和(C3,H,W)。从而,将第一分支输出O1、第二分支输出O2和第三分支输出O3进行连接,得到的第一中间输出M1的模型为(C1+C2+C3,H,W)。第一中间输出M1包括的特征图的数量为C1+C2+C3,本公开对第一中间输出M1的模型中各个特征图的排列顺序不作限制。需要说明的是,本公开包括但不限于此。
例如,如图6A所示,第一主干网络N20可以包括卷积模块CN和下采样层DS,从而可以对第一中间输出M1进行标准卷积处理和下采样处理。需要说明的是,本公开的实施例对第一主干网络N20中的卷积模块CN和下采样层DS的先后顺序(即标准卷积处理和下采样处理的先后顺序)不作限制。
下采样处理用于减小特征图的尺寸,从而减少特征图的数据量,例如可以通过下采样层进行下采样处理,但不限于此。例如,下采样层可以采用最大值合并(max pooling)、平均值合并(average pooling)、跨度卷积(strided convolution)、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等下采样方法实现下采样处理。
需要说明的是,当至少一个第二子网络N2的数量大于等于2时,不同的第二子网络N2的第一主干网络N20中的下采样处理的方法和参数等可以相同,也可以不同,本公开的实施例对此不作限制。
需要说明的是,图6A中的第二子网络N2的数量为2是示例性的,本公开的实施例对第二子网络N2的数量不作具体限制,例如第二子网络N2的数量还可以为1、3等。例如,所述至少一个第二子网络可以包括第一个第二子网络,第一个第二子网络与第一子网络N1连接,多个第一特征图F1作为第一个第二子网络的输入;又例如,所述至少一个第二子网络可以包括除第一个第二子网络之外的其他第二子网络,其他第二子网络中的每一个第二子网络均以与其连接的前一个第二子网络的输出作为该第二子网络的输入,最后一个第二子网络的输出即为多个第二特征图F2。
步骤S225:对多个第二特征图进行处理,以得到输出图像。
例如,具体地,如图5所示,步骤S225可以包括步骤S230至步骤S250。
步骤S230:对多个第二特征图进行处理,以得到多个第三特征图。
例如,在一些示例中,如图6A所示,生成网络还可以包括密集子网络DenseNet。例如,如图6A所示,在步骤S250中,可以使用密集子网络DenseNet对多个第二特征图F2进行处理,以得到多个第三特征图F3。
图7为本公开至少一实施例提供的一种密集子网络的结构示意图。例如,如图7所示,该密集子网络DenseNet包括多个密集模块DenseBlock,例如密集模块DenseBlock的数量为N,其中N≥2。需要说明的是,图7示出的密集子网络DenseNet中密集模块DenseBlock的数量为N=4是示例性的,不应视作对本公开的限制。
例如,在一些示例中,结合图6A和图7所示,多个第二特征图F2可以作为N个密集模块DenseBlock中的第1个密集模块(例如,第1个密集模块与至少一个第二子网络N2中的最后一个第二子网络N2连接)的输入,该多个第二特征图F2还与N个密集模块DenseBlock中的第i个密集模块之前的i-1个密集模块的输出连接,作为第i个密集模块的输入,该多个第二特征图还和每个密集模块的输出进行连接,作为多个第三特征图F3。i为整数,i≥2且i≤N。需要说明的是,本公开包括但不限于此,例如,在另一些示例中,还可以仅将每个密集模块的输出进行连接,作为多个第三特征图F3。例如,此处的连接操作可以参考前述第二子网络中的连接操作,在此不再赘述。例如,在一些示例中,如图7所示,每个密集模块DenseBlock可以包括瓶颈(Bottleneck)层B和卷积层Cv,从而,每个密集模块DenseBlock的处理包括降维处理和卷积处理。例如,瓶颈层B可以采用1×1卷积核对数据进行降维,减少特征图的数量,从而减少后续卷积处理中的参数数量,降低计算量,从而提高处理速度。
例如,密集子网络DenseNet具有大幅度减少参数、降低计算量、能够有效地解决梯度消失问题、支持特征重用和强化特征传播以及具有非常好的抗过拟合性能等优点。
步骤S240:使用第三子网络对多个第三特征图进行处理,以得到多个第四特征图。
例如,在一些示例中,如图6A所示,生成网络还可以包括第三子网络N3。例如,如图6A所示,第三子网络N3可以包括第二主干网络N30、第四分支网络N31、第五分支网络N32和第六分支网络N33,从而,第三子网络N3的处理可以包括:使用第二主干网络 N30对多个第三个特征图F3进行处理,以得到多个第五特征图F5;将多个第五特征图F5划分为第四分支输入B4、第五分支输入B5和第六分支输入B6;以及,使用第四分支网络N31对第四分支输入B4进行处理,以得到第四分支网络N31对应的第四特征图F4,使用第五分支网络N32对第五分支输入B5进行处理,以得到第五分支网络N32对应的第四特征图F4,使用第六分支网络N33对第六分支输入B6进行处理,以得到第六分支网络N33对应的第四特征图F4。
例如,步骤S240得到的多个第四特征图F4包括第四分支网络N31对应的第四特征图、第五分支网络N32对应的第四特征图和第六分支网络N33对应的第四特征图。
例如,在一些示例中,如图6A所示,第二主干网络N30可以包括上采样层US,从而可以对输入的多个第三特征图F3进行上采样处理,以得到多个第五特征图F5。
上采样处理用于增大特征图的尺寸,从而增加特征图的数据量,例如可以通过上采样层进行上采样处理,但不限于此。例如,上采样层可以采用跨度转置卷积(strided transposed convolution)、插值算法等上采样方法实现上采样处理。插值算法例如可以包括内插值、双线性插值、两次立方插值(Bicubic Interprolation)等算法。
需要说明的是,插值算法不仅可以用于进行上采样处理,也可以用于下采样处理。例如,在利用插值算法进行上采样处理时,可以保留原始像素值和插入值,从而增大特征图的尺寸;例如,在利用插值算法进行下采样处理时,可以仅保留插入值(去除原始像素值),从而减小特征图的尺寸。
图8A为本公开至少一实施例提供的一种上采样层的示意图,图8B为本公开至少一实施例提供的另一种上采样层的示意图。
例如,在一些示例中,如图8A所示,上采样层采用像素插值法实现上采样。此时,该上采样层还可以称为复合层。复合层采用2×2的上采样因子,从而可以将4个输入特征图像(即,图8A中的INPUT 4n,INPUT 4n+1,INPUT 4n+2,INPUT 4n+3)结合以得到1个具有固定像素顺序的输出特征图像(即,图8A中的OUTPUT n)。
例如,在一些示例中,对于二维的特征图像,上采样层获取输入的第一数量的输入特征图像,将这些输入特征图像的像素值交织(interleave)重排以产生相同的第一数量的输出特征图像。相比于输入特征图像,输出特征图像的数量没有改变,但是每个输出特征图像的大小增加相应倍数。由此,该复合层通过不同的排列组合增加更多的数据信息,这些组合可给出所有可能的上采样组合。最后,可通过激活层从上采样组合进行选择。
例如,在图8B所示的示例中,上采样层采用像素值交织重排方法实现上采样。此时,该上采样层也可以称为复合层。复合层同样采用2×2的上采样因子,即以每4个输入特征图像(即,图8B中的INPUT 4n,INPUT 4n+1,INPUT 4n+2,INPUT 4n+3)为一组,将它们的像素值交织生成4个输出特征图像(即,图8B中的OUTPUT 4n,OUTPUT 4n+1,OUTPUT 4n+2,OUTPUT 4n+3)。输入特征图像的数量和经过复合层处理后得到的输出特 征图像的数量相同,而各输出特征图像的大小增加为输入特征图像的4倍,即具有输入特征图像的4倍的像素数量。
例如,在一些示例中,如图6A所示,第四分支网络N31、第五分支网络N32和第六分支网络N33均可以包括卷积模块CN,从而可以分别对第四分支输入B4、第五分支输入B5和第六分支输入B6进行标准卷积处理。例如,在一些示例中,第四分支网络N31、第五分支网络N32和第六分支网络N33的标准卷积处理的次数可以相同;当然,第四分支网络N31、第五分支网络N32和第六分支网络N33的标准卷积处理的参数可以互不相同。需要说明的是,本公开的实施例对此均不作限制。
例如,在一些示例中,如图6A所示,第四分支网络N31、第五分支网络N32和第六分支网络N33均可以包括上采样层US,从而第四分支网络N31、第五分支网络N32和第六分支网络N33的处理还均可以包括上采样处理。例如,在一些示例中,第四分支网络N31、第五分支网络N32和第六分支网络N33的上采样处理的次数可以相同;当然,第四分支网络N31、第五分支网络N32和第六分支网络N33的上采样处理的参数可以互不相同。需要说明的是,本公开的实施例对此均不作限制。
需要说明的是,第二主干网络N30中的上采样处理的方法与第四分支网络N31、第五分支网络N32和第六分支网络N33中的上采样处理的方法可以相同,也可以不同,本公开的实施例对此不作限制。
例如,在一些示例中,第四分支网络N31对应的第四特征图F4的数量为1,第五分支网络N32对应的第四特征图F4的数量为1,第六分支网络N33对应的第四特征图F4的数量为1,即多个特征图F4包括3幅特征图。
步骤S250:对多个第四特征图进行合成处理,以得到输出图像。
例如,在一些示例中,如图6A所示,生成网络还可以包括合成模块Merg。例如,如图6A所示,在步骤S250中,可以使用合成模块Merg对多个第四特征图F4进行处理,以得到输出图像OUT。
例如,在一些示例中,合成模块Merg可以包括第一转换矩阵,该第一转换模块用于将多个第四特征图F4转换为输出图像OUT。例如,在一些示例中,具体地,使用合成模块Merg对多个第四特征图F4进行处理,以得到输出图像OUT,可以包括:利用第一转换矩阵,将第四分支网络N31对应的第四特征图F4的数据信息、第五分支网络N32对应的第四特征图F4的数据信息和第六分支网络N33对应的第四特征图F4的数据信息转换为输出图像OUT的第一颜色通道的数据信息、第二颜色通道的数据信息和第三颜色通道的数据信息,以得到输出图像OUT。
例如,在一些示例中,第一颜色通道、第二颜色通道和第三颜色通道可以分别为红色(R)、绿色(G)、蓝色(B)通道,从而输出图像OUT为RGB格式的图像。需要说明的是,本公开的实施例包括但不限于此。
例如,在一些示例中,第一转换矩阵可以用于将YUV格式的图像转换为RGB格式的 图像,例如,第一转换矩阵的转换公式可以表示如下:
Figure PCTCN2020092917-appb-000028
其中,Y、U、V分别表示YUV格式图像的亮度信息(即第一亮度信道的数据信息)、第一色度信息(即第一色差信道的数据信息)和第二色度信息(即第二色差信道的数据信息),R、G和B分别表示转换得到的RGB格式图像的红色信息(即第一颜色通道的数据信息)、绿色信息(即第二颜色通道的数据信息)和蓝色信息(即第三颜色通道的数据信息)。
需要说明的是,在使用例如图6A所示的生成网络执行本公开的实施例提供的图像处理方法之前,可以先对该生成网络进行训练。例如,在训练过程中,第一转换矩阵的参数固定不变。例如,图6A所示的生成网络经过训练后,其第四分支网络N31输出的第四特征图F4的数据信息、第五分支网络N32输出的第四特征图F4的数据信息和第六分支网络N33输出的第四特征图F4的数据信息分别对应第一亮度信道的数据信息、第一色差信道的数据信息和第二色差信道的数据信息,从而通过第一转换矩阵进行转换后可以得到RGB格式的输出图像OUT。
例如,输出图像OUT保留了输入图像IN的内容,但是,输出图像OUT为高质量图像,例如,输出图像OUT的质量可以接近于例如数码单镜反光相机拍摄的照片的质量。
例如,在一些示例中,第四分支网络N31输出的第四特征图F4、第五分支网络N32输出的第四特征图F4和第六分支网络N33输出的第四特征图F4的数量均为1,即多个特征图F4包括3幅特征图(分别对应第一亮度信道、第一色差信道、第二色差信道),第一转换矩阵可以将该3幅特征图转换为RGB格式的输出图像。
例如,由于第四分支网络N31、第五分支网络N32和第六分支网络N33均包括标准卷积处理(标准卷积处理包括卷积处理和实例标准化处理),多个第四特征图F4的像素的灰度值的范围可以为例如[-1,1],因此,合成模块Merg的处理还可以包括:将输出图像OUT的像素的灰度值转换到例如[0,255]范围内。
对于YUV格式,Y代表亮度,U、V代表色度,U和V是构成彩色的两个分量,在YUV颜色空间中,第一亮度信道(即Y通道)和第一色差信道(即U通道)、第二色差信道(即V通道)是分离的。例如,YUV格式可以包括YUV444、YUV420以及YUV422等格式。YUV444、YUV420以及YUV422等格式的主要区别在于U通道和V通道的数据的采样方式和存储方式。
例如,YUV444格式表示每一行像素中,两种色度信息(即第一色度信息U和第二色度信息V)都是完整的,即两种色度信息均基于完全抽样进行存储。
假设,若一幅图像中的4个像素点分别表示为:
[Y0U0V0][Y1U1V1][Y2U2V2][Y3U3V3]
在图像处理过程中,存放或处理该4个像素点的数据流为:
Y0U0V0Y1U1V1Y2U2V2 Y3U3V3
映射出的像素点分别表示为:
[Y0U0V0][Y1U1V1][Y2U2V2][Y3U3V3]
即映射出的像素点为原始的像素点。
例如,YUV420格式表示每一行像素中,只有一种色度信息(第一色度信息U或第二色度信息V),且第一色度信息U或第二色度信息V以1/2的频率抽样存储。在图像处理过程中,相邻的行处理不同的色度信息。
假设一幅图像中的两行8个像素点分别表示为:
[Y0U0V0][Y1U1V1][Y2U2V2][Y3U3V3]
[Y4U4V4][Y5U5V5][Y6U6V6][Y7U7V7]
在图像处理过程中,存放或处理该8个像素点的数据流为:
Y0U0 Y1 Y2U2 Y3
Y4V4 Y5 Y6V6 Y7
在第一行像素中,只有第一色度信息U;在第二行像素中,只有第二色度信息V。
映射出的像素点表示为:
[Y0 U0 V4][Y1 U0 V4][Y2 U2 V6][Y3 U2 V6]
[Y4 U0 V4][Y5 U0 V4][Y6U2 V7][Y7 U2 V6]
综上,每一行中相邻的4个像素点在存放或处理时仅占用6个字节,相比YUV444(4个像素点需要12个字节)的采样格式,YUV420格式减小了处理和存储的像素点的数据量。尽管映射出的像素点与原始像素点略有不同,但这些不同在人眼的感觉中不会引起明显的变化。
例如,在一些示例中,图6A所示的生成网络中,多个特征图F4可以具有YUV444的图像格式。需要说明的是,本公开的实施例包括但不限于此。
图6B所示的生成网络与图6A所示的生成网络的不同之处主要在于第一子网络N1和第三子网络N3。需要说明的是,图6B所示的生成网络的其他构造与图6A所示的生成网络基本相同,在此重复之处不再赘述。
以下结合图6B所示的生成网络与图6A所示的生成网络的不同之处,对这些不同之处对应的图5所示的流程中的步骤进行详细说明。
例如,在一些示例中,输入图像具有第一颜色通道、第二颜色通道和第三颜色通道。例如第一颜色通道、第二颜色通道和第三颜色通道可以分别为红色(R)、绿色(G)、蓝色(B)通道,本公开的实施例包括但不限于此。例如,相应地,在图6B所示的生成网络中,第一子网络N1可以包括转换模块Tran、第七分支网络N11、第八分支网络N12、第九分支网络N13和第三主干网络N10,从而,步骤S210可以包括以下步骤S211至步骤S214。
步骤S211:使用转换模块Tran将输入图像IN的第一颜色通道、第二颜色通道和第三 颜色通道的数据信息转换为中间输入图像MIN的第一亮度信道、第一色差信道和第二色差信道的数据信息。
例如,在一些示例中,转换模块Tran可以包括第二转换矩阵,该第二转换矩阵用于将输入图像IN转换为中间输入图像MIN。例如,在一些示例中,第二转换矩阵可以用于将RGB格式的图像转换为YUV格式的图像,例如,第二转换矩阵的转换公式可以表示如下:
Figure PCTCN2020092917-appb-000029
其中,R、G和B分别表示RGB格式图像的红色信息(即第一颜色通道的数据信息)、绿色信息(即第二颜色通道的数据信息)和蓝色信息(即第三颜色通道的数据信息),Y、U、V分别表示转换得到的YUV格式图像的亮度信息(即第一亮度信道的数据信息)、第一色度信息(即第一色差信道的数据信息)和第二色度信息(即第二色差信道的数据信息)。
例如,在一些示例中,输入图像IN具有RGB格式,中间输入图像MIN具有例如YUV420格式,从而减小U通道和V通道的尺寸,进而减小生成网络中卷积核的数量。需要说明的是,本实施例包括但不限于此。
步骤S212:使用第七分支网络对中间输入图像的第一亮度信道的数据信息进行处理,以得到第七分支输出,使用第八分支网络对中间输入图像的第一色差信道的数据信息进行处理,以得到第八分支输出,使用第九分支网络对中间输入图像的第二色差信道的数据信息进行处理,以得到第九分支输出。
例如,在一些示例中,如图6B所示,将中间输入图像MIN的第一亮度信道、第一色差信道和第二色差信道的数据信息分别作为第七分支输入B7、第八分支输入B8和第九分支输入B9,并分别经过第七分支网络N11、第八分支网络N12和第九分支网路N13处理,以对应得到第七分支输出O7、第八分支输出O8和第九分支输出O9。
例如,如图6B所示,第七分支网络N11可以包括卷积模块CN和下采样层DS,从而可以对第七分支输入B7进行标准卷积处理和下采样处理;第八分支网络N12和第九分支网络N13均可以包括标准下采样层SDS,从而可以分别对第八分支输入B8和第九分支输入B9进行标准下采样处理。
例如,标准下采样层可以采用内插值、双线性插值、两次立方插值(Bicubic Interprolation)等插值算法进行标准下采样处理。例如,在利用插值算法进行标准下采样处理时,可以仅保留插入值(去除原始像素值),从而减小特征图的尺寸。
例如,在一些示例中,第八分支网络N12和第九分支网络N13中的标准下采样处理的方法可以相同,而其参数可以不同。需要说明的是,本公开的实施例包括但不限于此。
例如,与第七分支网络N11相比,第八分支网络N12中相当于省略了处理U通道最高分辨率的卷积模块,第九分支网络N13中相当于省略了处理V通道最高分辨率的卷积模块,从而可以提高处理速度。
步骤S213:将第七分支输出、第八分支输出和第九分支输出进行连接,以得到第二中间输出。
例如,在一些示例中,如图6B所示,可以参考前述第二子网络中的连接方式,将第七分支输出O7、第八分支输出O8和第九分支输出O9进行连接,以得到第二中间输出M2,具体细节在此不再赘述。
步骤S214:使用第三主干网络对第二中间输出进行处理,以得到多个第一特征图。
例如,在一些示例中,如图6B所示,可以使用第三主干网络N10对第二中间输出M2进行处理,以得到多个第一特征图F1。例如,如图6B所示,第三主干网络N10可以包括卷积模块CN,从而可以对输入的第二中间输出M2进行标准卷积处理,以得到多个第一特征图F1。
例如,在图6B所示的生成网络中,可以使用至少一个第二子网络N2执行步骤S220,即对多个第一特征图F1进行分支处理和权值共享处理,以得到多个第二特征图F2,例如,具体细节可以参考前述基于图6A所示的生成网络执行步骤S220的相应描述,在此不再赘述。需要说明的是,图6B所示的生成网络中,第二子网络N2的数量为1是示例性的,不应视作对本公开的限制。
例如,在图6B所示的生成网络中,可以使用密集子网络DenseNet执行步骤S230,即对多个第二特征图F2进行处理,以得到多个第三特征图F3,例如,具体细节可以参考前述基于图6A所示的生成网络执行步骤S230的相应描述,在此不再赘述。
例如,在图6B所示的生成网络中,可以使用第三子网络N3执行步骤S240,即使用第三子网络N3对多个第三特征图F3进行处理,以得到多个第四特征图F4。例如,与图6A所示的生成网络类似,在图6B所示的生成网络中,第三子网络N3也可以包括第二主干网络N30、第四分支网络N31、第五分支网络N32和第六分支网络N33,从而,该第三子网络N3的处理也可以包括:使用第二主干网络N30对多个第三个特征图F3进行处理,以得到多个第五特征图F5;将多个第五特征图F5划分为第四分支输入B4、第五分支输入B5和第六分支输入B6;以及,使用第四分支网络N31对第四分支输入B4进行处理,以得到第四分支网络N31对应的第四特征图F4,使用第五分支网络N32对第五分支输入B5进行处理,以得到第五分支网络N32对应的第四特征图F4,使用第六分支网络N33对第六分支输入B6进行处理,以得到第六分支网络N33对应的第四特征图F4。
例如,与图6A所示的生成网络类似,在图6B所示的生成网络中,第二主干网络N30也可以包括上采样层US,从而可以对输入的多个第三特征图F3进行上采样处理,以得到多个第五特征图F5。
例如,与图6A所示的生成网络类似,在图6B所示的生成网络中,第四分支网络N31也可以包括卷积模块和上采样层,以用于进行标准卷积处理和上采样处理。例如,与图6B所示的生成网络不同的是,在图6B所示的生成网络中,第五分支网络N32和第六分支网络N33均可以包括卷积模块CN和标准上采样层SUS,以用于进行标准卷积处理和标准上采样 处理。
例如,标准上采样层可以采用内插值、双线性插值、两次立方插值(Bicubic Interprolation)等插值算法进行标准上采样处理。例如,在利用插值算法进行标准上采样处理时,可以保留原始像素值和插入值,从而增大特征图的尺寸。
例如,与第四分支网络N31相比,第五分支网络N32中相当于省略了处理U通道最高分辨率的卷积模块,第六分支网络N33中相当于省略了处理V通道最高分辨率的卷积模块,从而可以提高处理速度。这与前述的第七分支网络N11、第八分支网络N12、第九分支网络N13的情况类似。需要说明的是,第五分支网络中N32中的标准上采样层SUS一般与第八分支网络N12中的标准下采样层SDS对应出现,第六分支网络中N33中的标准上采样层SUS一般与第九分支网络N13中的标准下采样层SDS对应出现。需要说明的是,本公开的实施例包括但不限于此。
例如,在图6B所示的生成网络中,可以使用合成模块Merg执行步骤S250,即使用合成模块Merg对多个第四特征图F4进行处理,以得到输出图像OUT,例如,具体细节可以参考前述基于图6A所示的生成网络执行步骤S250的相应描述,在此不再赘述。
需要说明的是,在本公开的实施例中,图6A所示的生成网络和图6B所示的生成网络都只是示例性的,而非限制性的。还需要说明的是,在对生成网络进行训练之前,生成网络可能完全不具有图像增强处理的功能,或者也可能具有图像增强处理的功能,但是图像增强处理的效果不好;对待训练的生成网络训练后得到的生成网络具有图像增强处理的功能,且能够生成高质量图像。
图9A为一种输入图像的示意图,图9B为根据图6A所示的生成网络对图9A所示的输入图像进行处理得到的输出图像的示意图,图9C为根据图6B所示的生成网络对图9A所示的输入图像进行处理得到的输出图像的示意图。例如,与图9A所示的输入图像相比,图9B和图9C所示的输出图像均保留了输入图像的内容,但是提高了图像的对比度,改善了输入图像过暗的问题,从而,与输入图像相比,输出图像的质量可以接近于例如数码单镜反光相机拍摄的照片的质量,即输出图像为高质量图像。由此,本公开的实施例提供的图像处理方法实现了图像增强处理的效果。
本公开的实施例提供的图像处理方法,可以结合分支处理和权值共享处理以进行图像增强处理,既可以减少参数数量,又可以便于反向传播时计算梯度,从而,在输出高质量图像的同时还可以提高处理速度和收敛速度。
本公开至少一实施例还提供一种神经网络的训练方法。图10为本公开至少一实施例提供的一种神经网络的训练方法的流程图。例如,如图10所示,该训练方法包括:
步骤S300:基于待训练的生成网络,对判别网络进行训练;
步骤S400:基于判别网络,对待训练的生成网络进行训练;以及,
交替地执行上述训练过程,以得到本公开上述任一实施例提供的图像处理方法中的生成网络。
例如,在一些示例中,该待训练的生成网络的构造可以与图6A所示的生成网络或图6B所示的生成网络相同,本公开的实施例包括但不限于此。例如,待训练的生成网络经过该训练方法的训练后可以执行本公开上述任一实施例提供的图像处理方法,即利用该训练方法得到的生成网络可以执行本公开上述任一实施例提供的图像处理方法。
图11A为本公开至少一实施例提供的一种对应于图10中所示的训练方法训练待训练的生成网络的示意性架构框图,图11B为本公开至少一实施例提供的一种训练待训练的生成网络的过程的示意性流程图。
例如,结合图11A和图11B所示,基于判别网络,对待训练的生成网络进行训练,即步骤S300,包括步骤S310至步骤S330。
步骤S310:使用待训练的生成网络对第一训练输入图像进行处理,以生成第一训练输出图像。
例如,与前述步骤S100中的输入图像类似,第一训练输入图像也可以包括通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等拍摄采集的照片,其可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。例如,第一训练输入图像为低质量图像,例如,第一训练输入图像的质量低于例如数码单镜反光相机拍摄的照片的质量。例如,在一些示例中,第一训练输入图像可以为RGB格式的图像,本公开的实施例包括但不限于此。
例如,待训练的生成网络G可以具有图6A所示的生成网络或图6B所示的生成网络的构架。例如,待训练的生成网络G的初始参数可以为随机数,例如随机数符合高斯分布。需要说明的是,本公开的实施例对此不作限制。
例如,步骤S310的具体过程可以参考前述步骤S200的相关描述,即第一训练输入图像对应于输入图像,第一训练输出图像对应于输出图像,根据第一训练输入图像生成第一训练输出图像的过程可以参考前述根据输入图像生成输出图像的过程,在此不再赘述。
步骤S320:基于第一训练输出图像,通过系统损失函数计算待训练的生成网络的系统损失值。
例如,在一些示例中,如图11A所示,系统损失函数可以包括生成网络对抗损失函数,相应地,系统损失值可以包括生成网络对抗损失值。例如,如图11A所示,在待训练的生成网络G的训练过程中,可以使用判别网络D对第一训练输出图像进行处理,根据判别网络D的输出,通过生成网络对抗损失函数计算生成网络对抗损失值。
图12为本公开至少一实施例提供的一种判别网络的结构示意图。例如,如图12所示,该判别网络D包括多个卷积模块CM、多个下采样层DS和全连接层FCN。例如,判别网络D中的卷积模块CM、下采样层DS和全连接层FCN的结构和作用可以分别参考前述与卷积模块、下采样层、全连接层相关的描述,本公开的实施例对此不作限制。
例如,如图12所示,在该判别网络D中,多个卷积模块CM依次连接,在一些相邻的卷积模块CM之间具有下采样层DS,例如,如图12所示,判别网络D包括依次连接的六 个卷积模块CM,在第二个卷积模块和第三卷积模块之间具有一个下采样层,在第四个卷积模块和第五卷积模块之间具有一个下采样层。全连接层FCN与最后一个卷积模块CM连接。例如,每个卷积模块CM可以包括卷积层和实例标准化层;例如,至少部分卷积模块CM还可以省略实例标准化层。
例如,如图12所示,该判别网络D还包括激活层,该激活层连接到全连接层FCN。例如,如图12所示,该激活层的激活函数可以采用Sigmoid函数,从而,该激活层的输出(即判别网络D的输出)为一个在[0,1]的取值范围内的数值。例如,判别网络D可以判断第一训练输出图像的质量是否接近于高质量图像(例如,数码单镜反光相机拍摄的照片),以第一训练输出图像作为判别网络D的输入为例,判别网络D对第一训练输出图像进行处理,以得到判别网络D输出,判别网络D输出的数值表示第一训练输出图像的质量与例如数码单镜反光相机拍摄的照片的质量的接近程度。例如,该判别网络D输出的数值越大,例如趋近于1,表示判别网络D认定第一训练输出图像的质量越接近于数码单镜反光相机拍摄的照片的质量,即第一训练输出图像的质量越高;例如,该判别网络D输出的数值越小,例如趋近于0,则表示判别网络D认定第一训练输出图像的质量越不接近于数码单镜反光相机拍摄的照片的质量,即第一训练输出图像的质量越低。
需要说明的是,图12所示的判别网络是示意性的。例如,在一些示例中,图12所示的判别网络可以包括更多或更少的卷积模块或下采样层。例如,在一些示例中,图12所示的判别网络还可以包括其他模块或层结构,例如在全连接层之前还具有一个平坦化模块。例如,在一些示例中,图12所示的判别网络中的部分模块或层结构可以替换为其他模块或层结构,例如将全连接层替换为进行平均操作(AVG)的卷积层(参考图3及前述相关描述),又例如将激活层替换为二分类的softmax模块。进一步地,本公开的实施例对判别网络的结构不作限制,即包括但不限于图12所示的判别网络结构。
例如,在一些示例中,生成网络对抗损失函数可以表示为:
Figure PCTCN2020092917-appb-000030
其中,L G表示生成网络对抗损失函数,z1表示第一训练输入图像,P z1(z1)表示第一训练输入图像的集合(例如,包括一个批次的多幅第一训练输入图像),G(z1)表示第一训练输出图像,D(G(z1))表示判别网络D针对第一训练输出图像的输出,即判别网络D对第一训练输出图进行处理得到的输出,
Figure PCTCN2020092917-appb-000031
表示针对第一训练输入图像的集合求平均以得到生成网络对抗损失值。由此,可以相应采用批量梯度下降算法对待训练的生成网络G进行参数优化。
需要说明的是,上述公式表示的生成网络对抗损失函数是示例性的,本公开的实施例包括但不限于此。
待训练的生成网络G的训练目标是最小化系统损失值,因此,在待训练的生成网络G的训练过程中,最小化系统损失值包括减小生成网络对抗损失值。例如,在待训练的生成网络G的训练过程中,第一训练输出图像的标签设置为1,即需要使判别网络D鉴别认定 第一训练输出图像的质量与例如数码单镜反光相机拍摄的照片的质量一致。例如,在待训练的生成网络G的训练过程中,待训练的生成网络G的参数被不断地修正,以使经过参数修正后的待训练的生成网络G生成的第一训练输出图像对应的判别网络D的输出不断趋近于1,从而不断地减小生成网络对抗损失值。
例如,在一些示例中,如图11A所示,系统损失函数还可以包括内容损失函数,相应地,系统损失值可以包括内容损失值。例如,如图11A所示,在待训练的生成网络G的训练过程中,可以使用分析网络A对第一训练输出图像进行处理,根据分析网络A的输出,通过内容损失函数计算内容损失值。
图13为本公开至少一实施例提供的一种分析网络的结构示意图。例如,如图13所示,该分析网络A包括依次连接的多个卷积模块CM和间插于相邻卷积模块CM的多个下采样层DS。例如,每个卷积模块CM包括卷积层,每个卷积层包括多个卷积核,该卷积核可以用于提取分析网络A的输入图像的内容特征和风格特征。例如,参考图11A,图13所示的分析网络A的输入可以包括第一训练输入图像和第一训练输出图像。例如,每个卷积模块CM可以包括卷积层和实例标准化层;例如,至少部分卷积模块CM还可以省略实例标准化层。
例如,分析网络A可以采用能够对图像进行分类的深度神经网络如图13所示,输入经过若干个卷积模块CM和下采样层DS处理,以提取特征。每个卷积模块CM的输出都是其输入的特征图像。下采样层DS可以降低特征图像的尺寸并传递给下一层级的卷积模块。多个卷积模块CM可以输出多个特征图像,该多个特征图像可以表征输入的不同级别的特征(例如,纹理、边缘、物体等)。经过若干个卷积模块CM和下采样层DS处理之后,特征图像被输入至平坦化层,平坦化层将特征图像转换成向量然后传递给全连接层以及分类器。分类器层可以包括softmax分类器,softmax分类器可以输出输入属于每一个类别标识的概率,其中概率最大的标识将作为分析网络A最终的输出。由此,分析网络A实现图像分类。
例如,分析网络A可以采用已经训练好的卷积神经网络模型。从而,在待训练的生成网络G的训练过程中,不需对分析网络A的参数(例如,卷积核等)进行修正。例如,分析网络A可以采用AlexNet、GoogleNet、VGG、Deep Residual Learning等神经网络模型实现提取输入的内容特征和风格特征。VGG网络为深度卷积神经网络的一种,其是由牛津大学视觉几何组(Visual Geometry Group)开发,已经在视觉识别领域得到广泛应用。例如,VGG网络可以包括19层,并且可以对其中的一些层进行标准化处理。
需要说明的是,在一些示例中,在待训练的生成网络G的训练过程中,仅需要用到上述分析网络A中用于提取其输入的特征的部分,例如,如图13中虚线框所示的多个卷积模块CM和多个下采样层DS。例如,在本公开的实施例提供的分析网络A中,如图13所示,至少两个卷积模块CM用于提取风格特征,至少一个卷积模块CM用于提取内容特征。需要说明的是,图13所示的分析网络是示意性的。本公开的实施例对分析网络的结构、提取风格特征和内容特征的具体细节(例如,用于提取风格特征和内容特征的第一卷积模块的 数量和层级等)等均不作限制。还需要说明的是,在一些示例中,在待训练的生成网络G的训练过程中,仅需要用到上述分析网络A中用于提取其输入的内容特征的部分。
例如,如图11A所示,在待训练的生成网络G的训练过程中,分析网络A用于接收第一训练输入图像和第一训练输出图像,且分别产生并输出第一训练输入图像的第一内容特征图以及第一训练输出图像的第二内容特征图。例如,内容特征表示图像中物体在整幅图像的分布,例如,内容特征包括图像的内容信息。
例如,在待训练的生成网络G的训练过程中,可以使用分析网络A提取第一训练输入图像的第一内容特征图和第一训练输出图像的第二内容特征图,并根据第一内容特征图和第二内容特征图,通过内容损失函数计算待训练的生成网络G的内容损失值。例如,在一些示例中,对于如图13所示的分析网络A,单层内容损失函数可以表示为:
Figure PCTCN2020092917-appb-000032
其中,S 1为常数,
Figure PCTCN2020092917-appb-000033
表示在分析网络A中第m个卷积模块中第i个卷积核提取的第一训练输入图像的第一内容特征图中第j个位置的值,
Figure PCTCN2020092917-appb-000034
表示在分析网络A中第m个卷积模块中第i个卷积核提取的第一训练输出图像的第二内容特征图中第j个位置的值。
例如,在如图13所示的分析网络A中,可以通过至少一个卷积模块CM提取输入图像(例如,此处的输入图像包括第一训练输入图像和第一训练输出图像)的内容特征,则内容损失函数可以表示为:
Figure PCTCN2020092917-appb-000035
其中,L content表示内容损失函数,C m表示用于提取内容特征的至少一个卷积模块中的第m个卷积模块的单层内容损失函数,w 1m表示C m的权重。
例如,在待训练的生成网络G的训练过程中,最小化系统损失值包括减小内容损失值。例如,在使用待训练的生成网络G进行图像增强处理时,需要使保持待训练的生成网络G的输出和输入具有相同的内容特征,即第一训练输出图像保留了第一训练输入图像的内容。例如,在待训练的生成网络G的训练过程中,待训练的生成网络G的参数被不断地修正,以使经过参数修正后的待训练的生成网络G生成的第一训练输出图像的内容特征不断趋近于第一训练输入图像的内容特征,从而不断地减小内容损失值。
例如,在一些示例中,如图11A所示,系统损失函数还可以包括颜色损失函数,相应地,系统损失值可以包括颜色损失值。例如,如图11A所示,在待训练的生成网络G的训练过程中,可以根据第一训练输出图像和第二训练输入图像建立颜色损失函数以计算颜色损失值。
例如,在一些示例中,颜色损失函数可以表示为:
L color=abs(gaussian(G(z1))-gaussian(I1))
其中,L color表示所述颜色损失函数,G(z1)表示所述第一训练输出图像,I1表示第二训练输入图像,gaussian()表示高斯模糊化运算,abs()表示求绝对值运算。
例如,第二训练输入图像可以为高质量图像,即第二训练输入图像的质量比第一训练输入图像的质量高,例如,第二训练输入图像可以为例如数码单镜反光相机拍摄的照片图像。例如,第二训练输入图像可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。
例如,在一些示例中,第一训练输出图像的质量接近于例如数码单镜反光相机拍摄的照片的质量,至少可以部分体现为:第一训练输出图像与数码单镜反光相机拍摄的照片在各个局部的颜色分布和亮度分布等接近一致。
例如,在待训练的生成网络G的训练过程中,最小化系统损失值包括减小颜色损失值。例如,在使用待训练的生成网络G进行图像增强处理时,需要使第一训练输出图像在各个局部的颜色分布和亮度分布等与例如数码单镜反光相机拍摄的照片接近一致。例如,在待训练的生成网络G的训练过程中,待训练的生成网络G的参数被不断地修正,以使经过参数修正后的待训练的生成网络G生成的第一训练输出图像在各个局部的颜色分布和亮度分布等接近于例如数码单镜反光相机拍摄的照片,从而不断地减小颜色损失值。
例如,在一些示例中,第一训练输出图像具有第一颜色通道、第二颜色通道和第三颜色通道,例如,可以参考前述图像处理方法中的输出图像的相关描述,在此不再赘述。例如,如图11A所示,系统损失函数还可以包括对比损失函数,相应地,系统损失值可以包括对比损失值。例如,如图11A所示,在待训练的生成网络G的训练过程中,可以根据第一训练输出图像和第三训练输入图像建立对比损失函数以计算对比损失值。
例如,在一些示例中,对比损失函数可以表示为:
L L1=0.299*abs(F G(z1)-F I2)+0.587*abs(S G(z1)-S I2)+0.299*abs(T G(z1)-T I2)
其中,L L1表示所述对比损失函数,G(z1)表示所述第一训练输出图像,I2表示第三训练输入图像,F G(z1)、S G(z1)和T G(z1)分别表示所述第一训练输出图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,F I2、S I2和T I2分别表示所述第三训练输入图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,abs()表示求绝对值运算。
例如,第三训练输入图像可以具有与第一训练输入图像相同的场景,即内容相同,且第三训练输入图像的质量比第一训练输入图像的质量高,例如第三训练输入图像可以为例如数码单镜反光相机拍摄的照片图像。由于第三训练输入图像可以相当于待训练的生成网络G的目标输出图像,从而,在系统损失函数中增加对比损失函数,可以提高收敛速度和处理速度。
例如,在待训练的生成网络G的训练过程中,最小化系统损失值包括减小对比损失值。例如,在使用待训练的生成网络G进行图像增强处理时,需要使第一训练输出图像接近于第三训练输入图像。例如,在待训练的生成网络G的训练过程中,待训练的生成网络G的参数被不断地修正,以使经过参数修正后的待训练的生成网络G生成的第一训练输出图像接近于第三训练输入图像,从而不断地减小对比损失值。
例如,在本公开的实施例中,待训练的生成网络G的系统损失函数可以表示为:
L total=αL G+βL content+χL color+δL L1
其中,L total表示系统损失函数,α、β、χ和δ分别为系统损失函数中生成网络对抗损失函数、内容损失函数、颜色损失函数和对比损失函数的权重。
例如,在一些示例中,为了防止过拟合,对比损失函数的权重δ可以小于内容损失函数的权重β;例如,在一些示例中,对比损失函数的权重占比不超过20%。
例如,在步骤S320中,可以通过上述公式表示的系统损失函数计算系统损失值,再执行后续步骤S330,对待训练的生成网络G的参数进行修正,由此可以实现步骤S300。
步骤S330:基于系统损失值对待训练的生成网络的参数进行修正。
例如,在待训练的生成网络G的训练过程中还可以包括优化函数(图11A中未示出),优化函数可以根据系统损失函数计算得到的系统损失值计算生成网络G的参数的误差值,并根据该误差值对待训练的生成网络G的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算生成网络G的参数的误差值。
例如,以待训练的生成网络G为图6A所示的生成网络为例,对待训练的生成网络G的参数进行修正,包括:对图6A所示的生成网络中的除合成模块Merg的参数以外的参数进行修正,即该合成模块Merg的参数保持不变。例如,以待训练的生成网络G为图6B所示的生成网络为例,对待训练的生成网络G的参数进行修正,包括:对图6B所示的生成网络中的除转换模块Tran和合成模块Merg的参数以外的参数进行修正,即该转换模块Tran和合成模块Merg的参数均保持不变。
例如,基于判别网络,对待训练的生成网络进行训练,即步骤S300还可以包括:判断待训练的生成网络G的训练是否满足预定条件,若不满足预定条件,则重复执行上述待训练的生成网络G的训练过程;若满足预定条件,则停止本阶段的待训练的生成网络G的训练过程,得到本阶段训练好的生成网络G。需要说明的是,本阶段训练好的生成网络G可以作为下一阶段的待训练的生成网络G。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)第一训练输入图像对应的系统损失值不再显著减小。例如,在另一个示例中,上述预定条件为生成网络G的训练次数或训练周期达到预定数目。需要说明的是,本公开的实施例对此不作限制。
例如,如图11A所示,在待训练的生成网络G的训练过程中,需要联合判别网络D和分析网络A进行训练。需要说明的是,在生成网络G的训练过程中,判别网络D的参数保持不变。需要说明的是,当分析网络A采用已经训练好的卷积神经网络模型时,分析网络A的参数也可以保持不变。
需要说明的是,在本公开中,例如,待训练的生成网络、判别网络、分析网络以及这些神经网络包括的各种层或模块(例如卷积模块、上采样层、下采样层等)等每个分别对应执行相应处理过程的程序/方法,例如通过相应的软件、固件、硬件等方式实现;并且,上述示例仅是示意性说明待训练的生成网络的训练过程。本领域技术人员应当知道,在训 练阶段,需要利用大量样本图像对神经网络进行训练;同时,在每一幅样本图像训练过程中,都可以包括多次反复迭代以对待训练的生成网络的参数进行修正。又例如,训练阶段还包括对待训练的生成网络的参数进行微调(fine-tune),以获取更优化的参数。
图14A为本公开至少一实施例提供的一种对应于图10中所示的训练方法训练判别网络的示意性架构框图,图14B为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图。
例如,结合图14A和图14B所示,基于待训练的生成网络,对判别网络进行训练,即步骤S400,包括步骤S410至步骤S430,如下所示:
步骤S410:利用待训练的生成网络对第四训练输入图像进行处理,以生成第二训练输出图像;
步骤S420:基于第二训练输出图像和第五训练输入图像,通过判别网络对抗损失函数计算判别网络对抗损失值;
步骤S430:根据判别网络对抗损失值对判别网络的参数进行修正。
例如,基于待训练的生成网络,对判别网络进行训练,即步骤S400还可以包括:判断判别网络D的训练是否满足预定条件,若不满足预定条件,则重复执行上述判别网络D的训练过程;若满足预定条件,则停止本阶段的判别网络D的训练过程,得到本阶段训练好的判别网络D。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)第五训练输入图像和第二训练输出图像对应的判别网络对抗损失值不再显著减小。例如,在另一个示例中,上述预定条件为判别网络D的训练次数或训练周期达到预定数目。需要说明的是,本公开的实施例对此不作限制。
例如,如图14A所示,在判别网络D的训练过程中,需要联合待训练的生成网络G进行训练。需要说明的是,在判别网络D的训练过程中,待训练的生成网络G的参数保持不变。
需要说明的是,上述示例仅是示意性说明判别网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本图像对神经网络进行训练;同时,在每一幅样本图像训练过程中,都可以包括多次反复迭代以对判别网络的参数进行修正。又例如,训练阶段还包括对判别网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,判别网络D的初始参数可以为随机数,例如随机数符合高斯分布,本公开的实施例对此不作限制。
例如,判别网络D的训练过程中还可以包括优化函数(图14A中未示出),优化函数可以根据判别网络对抗损失函数计算得到的判别网络对抗损失值计算判别网络D的参数的误差值,并根据该误差值对判别网络D的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算判别网络D的参数的误差值。
例如,第四训练输入图像可以与第一训练输入图像相同,例如,第四训练输入图像的集合与第一训练输入图像的集合是同一个图像集合。例如,第四训练输入图像也可以包括通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等拍摄采集的照片,其可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。例如,第四训练输入图像为低质量图像,例如第四训练输入图像的质量低于例如数码单镜反光相机拍摄的照片的质量。例如,在一些示例中,第四训练输入图像可以为RGB格式的图像,本公开的实施例包括但不限于此。
例如,第五训练输入图像为高质量图像,即第五训练输入图像的质量比第四训练输入图像的质量高,例如,第五训练输入图像可以为数码单镜反光相机拍摄的照片图像。例如,第五训练输入图像可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。例如,第五训练输入图像可以与第二训练输入图像相同,例如,第五训练输入图像的集合与第二训练输入图像的集合是同一个图像集合;当然,第五训练输入图像也可以与第二训练输入图像不同,本公开的实施例对此不作限制。
例如,判别网络D可以为如图12所示的判别网络,但不限于此。
例如,在一些示例中,判别网络对抗损失函数可以表示为:
Figure PCTCN2020092917-appb-000036
其中,L D表示判别网络对抗损失函数,x表示第五训练输入图像,P data(x)表示第五训练输入图像的集合(例如,包括一个批次的多幅第五训练输入图像),D(x)表示判别网络D针对第五训练输入图像x的输出,即判别网络D对第五训练输入图像x进行处理得到的输出,
Figure PCTCN2020092917-appb-000037
表示针对第五训练输入图像的集合求期望,z2表示第四训练输入图像,P z2(z2)表示第四训练输入图像的集合(例如,包括一个批次的多幅第四训练输入图像),G(z2)表示第二训练输出图像,D(G(z2))表示判别网络D针对第二训练输出图像的输出,即判别网络D对第二训练输出图像进行处理得到的输出,
Figure PCTCN2020092917-appb-000038
表示针对第为训练输入图像的集合求期望。由此,可以相应采用批量梯度下降算法对判别网络D进行参数优化。
需要说明的是,上述公式表示的判别网络对抗损失函数是示例性的,本公开包括但不限于此。
判别网络D的训练目标是最小化判别网络对抗损失值。例如,在判别网络D的训练过程中,第五训练输入图像的标签设置为1,即需要使判别网络D鉴别认定第五训练输入图像为例如数码单镜反光相机拍摄的照片图像,即为高质量图像;同时,第二训练输出图像的标签设置为0,即需要使判别网络D鉴别认定第二训练输出图像不是例如数码单镜反光相机拍摄的照片图像,即为低质量图像。
例如,在判别网络D的训练过程中,判别网络D的参数被不断地修正,以使经过参数修正后的判别网络D能够准确鉴别第五训练输入图像和第二训练输出图像的质量,也就是,使第五训练输入图像对应的判别网络D的输出不断趋近于1,以及使第二训练输出图像对应的判别网络D的输出不断趋近于0,从而不断地减小生成网络对抗损失值。
例如,在本公开的实施例中,待训练的生成网络G的训练和判别网络D的训练是交替迭代进行的。例如,对于未经训练的生成网络G和判别网络D,一般先对判别网络D进行第一阶段训练,提高判别网络D的鉴别能力(即,鉴别判别网络D的输入的质量高低),得到经过第一阶段训练的判别网络D;然后,基于经过第一阶段训练的判别网络D对生成网络G(即待训练的生成网络G)进行第一阶段训练,提高生成网络G的图像增强处理能力(即,使生成网络G的输出为高质量图像),得到经过第一阶段训练的生成网络G。与第一阶段训练类似,在第二阶段训练中,基于经过第一阶段训练的生成网络G,对经过第一阶段训练的判别网络D进行第二阶段训练,提高判别网络D的鉴别能力,得到经过第二阶段训练的判别网络D;然后,基于经过第二阶段训练的判别网络D对经过第一阶段训练的生成网络G进行第二阶段训练,提高生成网络G的图像增强处理能力,得到经过第二阶段训练的生成网络G,依次类推,接下来对判别网络D和生成网络G进行第三阶段训练、第四阶段训练、……,直到得到的生成网络G的输出的质量可以接近于例如的数码单镜反光相机拍摄的照片的质量,即训练输出图像为高质量图像。
需要说明的是,在生成网络G和判别网络D的交替训练过程中,生成网络G和判别网络D的对抗体现在生成网络G的输出(生成网络G生成的高分辨率图像)在各自单独的训练过程中具有不同的标签(在生成网络G的训练过程中标签为1,在判别网络D的训练过程中标签为0),也体现在判别网络对抗损失函数的第二部分(即与生成网络G生成的高分辨率图像有关的部分)与系统损失函数中的生成网络对抗损失函数相反。还需要说明的是,理想情况下,经过训练得到的生成网络G输出的图像为高质量图像(即接近于例如数码单镜反光相机拍摄的照片的质量),判别网络D针对第五训练输入图像和该生成网络G生成的第二训练输出图像的输出均为0.5,即生成网络G和判别网络D经过对抗博弈达到纳什均衡。
本公开的至少一实施例提供的神经网络的训练方法,结合了生成式对抗网络技术,经过该训练方法训练得到的生成网络可以结合分支处理和权值共享处理以进行图像增强处理,既可以减少参数数量,又可以便于反向传播时计算梯度,从而,在输出高质量图像的同时还可以提高处理速度和收敛速度。
本公开至少一实施例还提供一种图像处理装置。图15为本公开一实施例提供的一种图像处理装置的示意性框图。
例如,如图15所示,该图像处理装置500包括存储器510和处理器520。例如,存储器510用于非暂时性存储计算机可读指令,处理器520用于运行该计算机可读指令,该计算机可读指令被处理器520运行时执行本公开任一实施例提供的图像处理方法。
例如,存储器510和处理器520之间可以直接或间接地互相通信。例如,存储器510和处理器520等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如 可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器520可以控制图像处理装置中的其它组件以执行期望的功能。处理器520可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器510可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
例如,在存储器510上可以存储一个或多个计算机指令,处理器520可以运行所述计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如第一至第五训练输入图像、以及应用程序使用和/或产生的各种数据等。
例如,存储器510存储的一些计算机指令被处理器520执行时可以执行本公开任一实施例提供的图像处理方法中的一个或多个步骤,和/或可以执行本公开任一实施例提供的神经网络的训练方法中的一个或多个步骤。
例如,关于图像处理方法的处理过程的详细说明可以参考上述图像处理方法的实施例中的相关描述,关于神经网络的训练方法的处理过程的详细说明可以参考上述神经网络的训练方法的实施例中的相关描述,重复之处不再赘述。
需要说明的是,本公开的实施例提供的图像处理装置是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以包括其他常规部件或结构,例如,为实现图像处理装置的必要功能,本领域技术人员可以根据具体应用场景设置其他的常规部件或结构,本公开的实施例对此不作限制。
本公开的至少一实施例提供的图像处理装置的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种存储介质。图16为本公开一实施例提供的一种存储介质的示意图。例如,如图16所示,该存储介质600非暂时性地存储计算机可读指令601,当非暂时性计算机可读指令601由计算机(包括处理器)执行时可以执行本公开任一实施例提供的图像处理方法的指令。
例如,在存储介质600上可以存储一个或多个计算机指令。存储介质600上存储的一些计算机指令可以是例如用于实现上述图像处理方法中的一个或多个步骤的指令。存储介 质上存储的另一些计算机指令可以是例如用于实现上述神经网络的训练方法中的一个或多个步骤的指令。
例如,存储介质可以包括平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
对于本公开,有以下几点需要说明:
(1)本公开实施例附图中,只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开同一实施例及不同实施例中的特征可以相互组合。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (25)

  1. 一种图像处理方法,包括:
    获取输入图像;以及
    使用生成网络对所述输入图像进行处理,以生成输出图像;其中,
    所述生成网络包括第一子网络和至少一个第二子网络,
    使用所述生成网络对所述输入图像进行处理,以生成所述输出图像,包括:
    使用所述第一子网络对所述输入图像进行处理,以得到多个第一特征图;
    使用所述至少一个第二子网络对所述多个第一特征图进行分支处理和权值共享处理,以得到多个第二特征图;以及
    对所述多个第二特征图进行处理,以得到输出图像。
  2. 根据权利要求1所述的图像处理方法,其中,每个所述第二子网络包括第一分支网络、第二分支网络、第三分支网络,每个所述第二子网络的所述分支处理包括:
    将每个所述第二子网络的输入划分为第一分支输入、第二分支输入和第三分支输入;以及
    使用所述第一分支网络对所述第一分支输入进行处理,以得到第一分支输出,使用所述第二分支网络对所述第二分支输入进行处理,以得到第二分支输出,使用所述第三分支网络对所述第三分支输入进行处理,以得到第三分支输出;
    其中,所述至少一个第二子网络包括第一个第二子网络,所述第一个第二子网络与所述第一子网络连接,所述多个第一特征图作为所述第一个第二子网络的输入。
  3. 根据权利要求2所述的图像处理方法,其中,每个所述第二子网络还包括第一主干网络,每个所述第二子网络的所述权值共享处理包括:
    将所述第一分支输出、所述第二分支输出和所述第三分支输出进行连接,以得到第一中间输出;以及
    使用所述第一主干网络对所述第一中间输出进行处理,以得到每个所述第二子网络的输出。
  4. 根据权利要求3所述的图像处理方法,其中,所述第一分支网络的处理包括标准卷积处理,所述第二分支网络的处理包括标准卷积处理,所述第三分支网络的处理包括标准卷积处理,所述第一主干网络的处理包括标准卷积处理和下采样处理。
  5. 根据权利要求1-4任一项所述的图像处理方法,其中,所述生成网络还包括第三子网络,
    对所述多个第二特征图进行处理,以得到所述输出图像,包括:
    对所述多个第二特征图进行处理,以得到多个第三特征图;
    使用所述第三子网络对所述多个第三特征图进行处理,以得到多个第四特征图;以及
    对所述多个第四特征图进行合成处理,以得到输出图像。
  6. 根据权利要求5所述的图像处理方法,其中,所述第三子网络包括第二主干网络、第四分支网络、第五分支网络和第六分支网络,
    使用所述第三子网络对所述多个第三特征图进行处理,以得到所述多个第四特征图,包括:
    使用所述第二主干网络对所述多个第三个特征图进行处理,以得到多个第五特征图;
    将所述多个第五特征图划分为第四分支输入、第五分支输入和第六分支输入;以及
    使用所述第四分支网络对所述第四分支输入进行处理,以得到所述第四分支网络对应的第四特征图,使用所述第五分支网络对所述第五分支输入进行处理,以得到所述第五分支网络对应的第四特征图,使用所述第六分支网络对所述第六分支输入进行处理,以得到所述第六分支网络对应的第四特征图;
    其中,所述多个第四特征图包括所述第四分支网络对应的第四特征图、所述第五分支网络对应的第四特征图和所述第六分支网络对应的第四特征图。
  7. 根据权利要求6所述的图像处理方法,其中,所述第二主干网络的处理包括上采样处理,所述第四分支网络的处理包括标准卷积处理,所述第五分支网络的处理包括标准卷积处理,所述第六分支网络的处理包括标准卷积处理。
  8. 根据权利要求7所述的图像处理方法,其中,所述第四分支网络的处理还包括上采样处理,所述第五分支网络的处理还包括上采样处理,所述第六分支网络的处理还包括上采样处理。
  9. 根据权利要求1-8任一项所述的图像处理方法,其中,所述第一子网络的处理包括标准卷积处理,
    使用所述第一子网络对所述输入图像进行处理,以得到所述多个第一特征图,包括:
    使用所述第一子网络对所述输入图像进行标准卷积处理,以得到所述多个第一特征图。
  10. 根据权利要求6所述的图像处理方法,其中,所述输入图像具有第一颜色通道、第二颜色通道和第三颜色通道,
    所述第一子网络包括转换模块、第七分支网络、第八分支网络、第九分支网络和第三主干网络,
    使用所述第一子网络对所述输入图像进行处理,以得到所述多个第一特征图,包括:
    使用所述转换模块将所述输入图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息转换为中间输入图像的第一亮度信道、第一色差信道和第二色差信道的数据信息;
    使用所述第七分支网络对所述中间输入图像的第一亮度信道的数据信息进行处理,以得到第七分支输出,使用所述第八分支网络对所述中间输入图像的第一色差信道的数据信息进行处理,以得到第八分支输出,使用所述第九分支网络对所述中间输入图像的第二色差信道的数据信息进行处理,以得到第九分支输出;
    将所述第七分支输出、所述第八分支输出和所述第九分支输出进行连接,以得到第二中间输出;以及
    使用所述第三主干网络对所述第二中间输出进行处理,以得到所述多个第一特征图。
  11. 根据权利要求10所述的图像处理方法,其中,所述第七分支网络的处理包括标准卷积处理和下采样处理,所述第八分支网络的处理包括标准下采样处理,所述第九分支网络的处理包括标准下采样处理。
  12. 根据权利要求11所述的图像处理方法,其中,所述第四分支网络的处理包括标准卷积处理和上采样处理,所述第五分支网络的处理包括标准卷积处理和标准上采样处理,所述第六分支网络的处理包括标准卷积处理和标准上采样处理。
  13. 根据权利要求5-8任一项所述的图像处理方法,其中,所述生成网络还包括密集子网络,所述密集子网络包括N个密集模块,
    对所述多个第二特征图进行处理,以得到所述多个第三特征图,包括:
    使用所述密集子网络对所述多个第二特征图进行处理,以得到所述多个第三特征图;
    其中,所述多个第二特征图作为所述N个密集模块中的第1个密集模块的输入,
    所述多个第二特征图与所述N个密集模块中的第i个密集模块之前的i-1个密集模块的输出连接,作为所述第i个密集模块的输入,
    所述多个第二特征图和每个所述密集模块的输出进行连接,作为所述多个第三特征图,N、i为整数,N≥2,i≥2且i≤N。
  14. 根据权利要求13所述的图像处理方法,其中,每个密集模块的处理包括降维处理和卷积处理。
  15. 根据权利要求6-8任一项所述的图像处理方法,其中,所述生成网络还包括合成模块,
    对所述多个第四特征图进行合成处理,以得到所述输出图像,包括:
    使用所述合成模块对所述多个第四特征图进行合成处理,以得到所述输出图像。
  16. 根据权利要求15所述的图像处理方法,其中,所述合成模块包括第一转换矩阵,
    使用所述合成模块对所述多个第四特征图进行合成处理,以得到所述输出图像,包括:
    利用所述第一转换矩阵,将所述第四分支网络对应的第四特征图的数据信息、所述第五分支网络对应的第四特征图的数据信息和所述第六分支网络对应的第四特征图的数据信息转换为所述输出图像的第一颜色通道的数据信息、第二颜色通道的数据信息和第三颜色通道的数据信息,以得到所述输出图像。
  17. 一种神经网络的训练方法,包括:
    基于待训练的生成网络,对判别网络进行训练;
    基于所述判别网络,对所述待训练的生成网络进行训练;以及,
    交替地执行上述训练过程,以得到根据权利要求1-16任一项所述的图像处理方法中的所述生成网络;其中,
    基于所述判别网络,对所述待训练的生成网络进行训练,包括:
    使用所述待训练的生成网络对第一训练输入图像进行处理,以生成第一训练输出图像;
    基于所述第一训练输出图像,通过系统损失函数计算所述待训练的生成网络的系统损失值;以及
    基于所述系统损失值对所述待训练的生成网络的参数进行修正。
  18. 根据权利要求17所述的训练方法,其中,所述系统损失函数包括生成网络对抗损失函数,所述系统损失值包括生成网络对抗损失值;所述生成网络对抗损失函数表示为:
    Figure PCTCN2020092917-appb-100001
    其中,L G表示所述生成网络对抗损失函数,z1表示所述第一训练输入图像,P z1(z1)表示所述第一训练输入图像的集合,G(z1)表示所述第一训练输出图像,D(G(z1))表示所述判别网络针对所述第一训练输出图像的输出,
    Figure PCTCN2020092917-appb-100002
    表示针对所述第一训练输入图像的集合求期望以得到所述生成网络对抗损失值。
  19. 根据权利要求18所述的训练方法,其中,所述系统损失函数还包括内容损失函数,所述系统损失值还包括内容损失值;
    基于所述第一训练输出图像,通过系统损失函数计算所述待训练的生成网络的系统损失值,包括:使用分析网络提取所述第一训练输入图像的第一内容特征图和所述第一训练输出图像的第二内容特征图,根据所述第一内容特征图和所述第二内容特征图,通过所述内容损失函数计算所述生成网络的所述内容损失值,
    其中,所述分析网络包括用于提取所述第一内容特征图和所述第二内容特征图的至少一个卷积模块;
    所述内容损失函数表示为:
    Figure PCTCN2020092917-appb-100003
    其中,L content表示所述内容损失函数,C m表示所述至少一个卷积模块中的第m个卷积模块的单层内容损失函数,w 1m表示C m的权重;
    所述单层内容损失函数表示为:
    Figure PCTCN2020092917-appb-100004
    其中,S 1为常数,
    Figure PCTCN2020092917-appb-100005
    表示在所述第m个卷积模块中第i个卷积核提取的所述第一训练输入图像的第一内容特征图中第j个位置的值,
    Figure PCTCN2020092917-appb-100006
    表示在所述第m个卷积模块中第i个卷积核提取的所述第一训练输出图像的第二内容特征图中第j个位置的值。
  20. 根据权利要求19所述的训练方法,其中,所述系统损失函数还包括颜色损失函数,所述系统损失值还包括颜色损失值;所述颜色损失函数表示为:
    L color=abs(gaussian(G(z1))-gaussian(I1))
    其中,L color表示所述颜色损失函数,G(z1)表示所述第一训练输出图像,I1表示第二训练输入图像,gaussian()表示高斯模糊化运算,abs()表示求绝对值运算;
    所述第二训练输入图像的质量比所述第一训练输入图像的质量高。
  21. 根据权利要求20所述的训练方法,其中,所述第一训练输出图像具有第一颜色通道、第二颜色通道和第三颜色通道;
    所述系统损失函数还包括对比损失函数,所述系统损失值还包括对比损失值;所述对比损失函数表示为:
    L L1=0.299*abs(F G(z1)-F I2)+0.587*abs(S G(z1)-S I2)+0.299*abs(T G(z1)-T I2)
    其中,L L1表示所述对比损失函数,G(z1)表示所述第一训练输出图像,I2表示第三训练输入图像,F G(z1)、S G(z1)和T G(z1)分别表示所述第一训练输出图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,F I2、S I2和T I2分别表示所述第三训练输入图像的第一颜色通道、第二颜色通道和第三颜色通道的数据信息,abs()表示求绝对值运算;
    所述第三训练输入图像具有与所述第一训练输入图像相同的场景,且所述第三训练输入图像的质量比所述第一训练输入图像的质量高。
  22. 根据权利要求17-21任一项所述的训练方法,其中,基于所述待训练的生成网络,对所述判别网络进行训练,包括:
    利用所述待训练的生成网络对第四训练输入图像进行处理,以生成第二训练输出图像;
    基于所述第二训练输出图像和第五训练输入图像,通过判别网络对抗损失函数计算判别网络对抗损失值;以及
    根据所述判别网络对抗损失值对所述判别网络的参数进行修正;
    其中,所述第五训练输入图像的质量比所述第四训练输入图像的质量高。
  23. 根据权利要求22所述的训练方法,其中,所述判别网络对抗损失函数表示为:
    Figure PCTCN2020092917-appb-100007
    其中,L D表示所述判别网络对抗损失函数,x表示所述第五训练输入图像,P data(x)表示所述第五训练输入图像的集合,D(x)表示所述判别网络针对所述第五训练输入图像的输出,
    Figure PCTCN2020092917-appb-100008
    表示针对所述第五训练输入图像的集合求期望,z2表示所述第四训练输入图像,P z2(z2)表示所述第四训练输入图像的集合,G(z2)表示所述第二训练输出图像,D(G(z2))表示所述判别网络针对所述第二训练输出图像的输出,
    Figure PCTCN2020092917-appb-100009
    表示针对所述第四训练输入图像的集合求期望。
  24. 一种图像处理装置,包括:
    存储器,用于非暂时性存储计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行根据权利要求1-16任一项所述的图像处理方法或根据权利要求17-23任一项所述的神经网络的训练方法。
  25. 一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行根据权利要求1-16任一项所述的图像处理方法或根据权利要求17-23任一项所述的神经网络的训练方法。
PCT/CN2020/092917 2019-05-30 2020-05-28 图像处理方法及装置、神经网络的训练方法、存储介质 WO2020239026A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/281,291 US11908102B2 (en) 2019-05-30 2020-05-28 Image processing method and device, training method of neural network, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910463969.5 2019-05-30
CN201910463969.5A CN110188776A (zh) 2019-05-30 2019-05-30 图像处理方法及装置、神经网络的训练方法、存储介质

Publications (1)

Publication Number Publication Date
WO2020239026A1 true WO2020239026A1 (zh) 2020-12-03

Family

ID=67718996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092917 WO2020239026A1 (zh) 2019-05-30 2020-05-28 图像处理方法及装置、神经网络的训练方法、存储介质

Country Status (3)

Country Link
US (1) US11908102B2 (zh)
CN (1) CN110188776A (zh)
WO (1) WO2020239026A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463812A (zh) * 2022-01-18 2022-05-10 赣南师范大学 基于双通道多分支融合特征蒸馏的低分辨率人脸识别方法

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018091486A1 (en) 2016-11-16 2018-05-24 Ventana Medical Systems, Inc. Convolutional neural networks for locating objects of interest in images of biological samples
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110598786B (zh) 2019-09-09 2022-01-07 京东方科技集团股份有限公司 神经网络的训练方法、语义分类方法、语义分类装置
CN110705460B (zh) * 2019-09-29 2023-06-20 北京百度网讯科技有限公司 图像类别识别方法及装置
US11455531B2 (en) * 2019-10-15 2022-09-27 Siemens Aktiengesellschaft Trustworthy predictions using deep neural networks based on adversarial calibration
CN110717851B (zh) * 2019-10-18 2023-10-27 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110633700B (zh) * 2019-10-21 2022-03-25 深圳市商汤科技有限公司 视频处理方法及装置、电子设备和存储介质
CN111091503B (zh) * 2019-11-09 2023-05-02 复旦大学 基于深度学习的图像去失焦模糊方法
US11887298B2 (en) * 2020-01-07 2024-01-30 Rensselaer Polytechnic Institute Fluorescence lifetime imaging using deep learning
CN113095470B (zh) * 2020-01-08 2024-04-23 字节跳动有限公司 神经网络的训练方法、图像处理方法及装置、存储介质
CN111275128B (zh) * 2020-02-13 2023-08-25 平安科技(深圳)有限公司 图像识别模型训练方法及系统和图像识别方法
CN111507910B (zh) * 2020-03-18 2023-06-06 南方电网科学研究院有限责任公司 一种单图像去反光的方法、装置及存储介质
CN111652262A (zh) * 2020-03-19 2020-09-11 深圳市彬讯科技有限公司 图像物体识别方法、装置、计算机设备及存储介质
JP7446903B2 (ja) * 2020-04-23 2024-03-11 株式会社日立製作所 画像処理装置、画像処理方法及び画像処理システム
CN111709890B (zh) 2020-06-12 2023-11-24 北京小米松果电子有限公司 一种图像增强模型的训练方法、装置及存储介质
CN112541876B (zh) * 2020-12-15 2023-08-04 北京百度网讯科技有限公司 卫星图像处理方法、网络训练方法、相关装置及电子设备
EP4281928A4 (en) * 2021-01-19 2024-10-02 Alibaba Group Holding Ltd NEURAL NETWORK BASED IN-LOOP FILTERING FOR VIDEO CODING
CN113066019A (zh) * 2021-02-27 2021-07-02 华为技术有限公司 一种图像增强方法及相关装置
CN113238375B (zh) * 2021-04-20 2022-04-08 北京理工大学 一种基于深度学习的自由曲面成像系统初始结构生成方法
CN113591771B (zh) * 2021-08-10 2024-03-08 武汉中电智慧科技有限公司 一种多场景配电室物体检测模型的训练方法和设备
CN113762221B (zh) * 2021-11-05 2022-03-25 通号通信信息集团有限公司 人体检测方法及装置
CN114612580A (zh) * 2022-03-15 2022-06-10 中国人民解放军国防科技大学 一种面向低质量相机的高清成像方法
CN115147314B (zh) * 2022-09-02 2022-11-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备以及存储介质
CN116233626B (zh) * 2023-05-05 2023-09-15 荣耀终端有限公司 图像处理方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150063688A1 (en) * 2013-09-05 2015-03-05 Anurag Bhardwaj System and method for scene text recognition
CN109033107A (zh) * 2017-06-09 2018-12-18 腾讯科技(深圳)有限公司 图像检索方法和装置、计算机设备和存储介质
CN109255317A (zh) * 2018-08-31 2019-01-22 西北工业大学 一种基于双网络的航拍图像差异检测方法
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528589B (zh) * 2015-12-31 2019-01-01 上海科技大学 基于多列卷积神经网络的单张图像人群计数算法
EP3433795A4 (en) * 2016-03-24 2019-11-13 Ramot at Tel-Aviv University Ltd. METHOD AND SYSTEM FOR CONVERTING A TEXT IMAGE
CN108229497B (zh) * 2017-07-28 2021-01-05 北京市商汤科技开发有限公司 图像处理方法、装置、存储介质、计算机程序和电子设备
CN109426858B (zh) * 2017-08-29 2021-04-06 京东方科技集团股份有限公司 神经网络、训练方法、图像处理方法及图像处理装置
CN107527044B (zh) * 2017-09-18 2021-04-30 北京邮电大学 一种基于搜索的多张车牌清晰化方法及装置
CN107767343B (zh) * 2017-11-09 2021-08-31 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN109241880B (zh) * 2018-08-22 2021-02-05 北京旷视科技有限公司 图像处理方法、图像处理装置、计算机可读存储介质
CN109299733A (zh) * 2018-09-12 2019-02-01 江南大学 利用紧凑型深度卷积神经网络进行图像识别的方法
CN109191382B (zh) * 2018-10-18 2023-12-05 京东方科技集团股份有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN109559287A (zh) * 2018-11-20 2019-04-02 北京工业大学 一种基于DenseNet生成对抗网络的语义图像修复方法
CN109816764B (zh) * 2019-02-02 2021-06-25 深圳市商汤科技有限公司 图像生成方法及装置、电子设备和存储介质
CN109816612A (zh) 2019-02-18 2019-05-28 京东方科技集团股份有限公司 图像增强方法和装置、计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150063688A1 (en) * 2013-09-05 2015-03-05 Anurag Bhardwaj System and method for scene text recognition
CN109033107A (zh) * 2017-06-09 2018-12-18 腾讯科技(深圳)有限公司 图像检索方法和装置、计算机设备和存储介质
CN109255317A (zh) * 2018-08-31 2019-01-22 西北工业大学 一种基于双网络的航拍图像差异检测方法
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463812A (zh) * 2022-01-18 2022-05-10 赣南师范大学 基于双通道多分支融合特征蒸馏的低分辨率人脸识别方法
CN114463812B (zh) * 2022-01-18 2024-03-26 赣南师范大学 基于双通道多分支融合特征蒸馏的低分辨率人脸识别方法

Also Published As

Publication number Publication date
CN110188776A (zh) 2019-08-30
US20210407041A1 (en) 2021-12-30
US11908102B2 (en) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2020239026A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
US11461639B2 (en) Image processing method, image processing device, and training method of neural network
WO2020200030A1 (zh) 神经网络的训练方法、图像处理方法、图像处理装置和存储介质
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
US11537873B2 (en) Processing method and system for convolutional neural network, and storage medium
US10970830B2 (en) Image style conversion method, apparatus and device
WO2020215236A1 (zh) 图像语义分割方法和系统
CN111402143B (zh) 图像处理方法、装置、设备及计算机可读存储介质
CN113095470A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
WO2022067653A1 (zh) 图像处理方法及装置、设备、视频处理方法及存储介质
CN115565043A (zh) 结合多表征特征以及目标预测法进行目标检测的方法
CN113096023A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN114830168A (zh) 图像重建方法、电子设备和计算机可读存储介质
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN113076966A (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN116912268A (zh) 一种皮肤病变图像分割方法、装置、设备及存储介质
WO2023029559A1 (zh) 一种数据处理方法以及装置
WO2022183325A1 (zh) 视频块处理方法及装置、神经网络的训练方法和存储介质
CN115797709B (zh) 一种图像分类方法、装置、设备和计算机可读存储介质
Raj et al. CNN Model for Handwritten Digit Recognition with Improved Accuracy and Performance Using MNIST Dataset
CN117876394A (zh) 图像处理方法、电子设备及存储介质
CN116797678A (zh) 基于多头类卷积自注意力的图像特征编码方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20813202

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20813202

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20813202

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.07.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20813202

Country of ref document: EP

Kind code of ref document: A1