CN111724309B

CN111724309B - Image processing method and device, training method of neural network and storage medium

Info

Publication number: CN111724309B
Application number: CN201910209662.2A
Authority: CN
Inventors: 刘瀚文; 张丽杰; 朱丹; 那彦波
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2023-07-14
Anticipated expiration: 2039-03-19
Also published as: CN111724309A; WO2020187029A1

Abstract

An image processing method, an image processing apparatus, a training method of a neural network, and a storage medium. The image processing method comprises the following steps: receiving a first feature image; performing at least one multi-scale cyclic sampling process on the first characteristic image; the multi-scale cyclic sampling process comprises a first hierarchical sampling process and a second hierarchical sampling process, the first hierarchical sampling process comprises a first downsampling process, a first upsampling process and a first residual link addition process which are sequentially executed, the second hierarchical sampling process is nested between the first downsampling process and the first upsampling process, and the second hierarchical sampling process comprises a second downsampling process, a second upsampling process and a second residual link addition process which are sequentially executed. The image processing method can carry out image enhancement processing on the input image with low quality, and can greatly improve the quality of the output image by repeatedly sampling on a plurality of scales to obtain higher image fidelity.

Description

Image processing method and device, training method of neural network and storage medium

Technical Field

Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, a training method of a neural network, and a storage medium.

Background

Currently, artificial neural network-based deep learning techniques have made tremendous progress in fields such as image classification, image capturing and searching, face recognition, age, and speech recognition. The advantage of deep learning is that very different technical problems can be solved with a relatively similar system with a generic architecture. Convolutional neural networks (Convolutional Neural Network, CNN) are artificial neural networks that have been developed and have attracted considerable attention in recent years, CNN being a special way of image recognition, belonging to very efficient networks with forward feedback. The application range of CNN is not limited to the field of image recognition, but can be applied to the application directions of face recognition, character recognition, image processing and the like.

Disclosure of Invention

At least one embodiment of the present disclosure provides an image processing method including: receiving a first feature image; and performing at least one multi-scale cyclic sampling process on the first feature image;

the multi-scale cyclic sampling processing comprises nested first-level sampling processing and second-level sampling processing, the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual link adding processing, wherein the first downsampling processing carries out downsampling processing based on input of the first-level sampling processing to obtain first downsampling output, the first upsampling processing carries out upsampling processing based on the first downsampling output to obtain first upsampling output, and the first residual link adding processing carries out first residual link adding on input of the first-level sampling processing and the first upsampling output, and then takes a result of the first residual link adding as output of the first-level sampling processing; the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input to the second-level sampling process, and provides an output of the second-level sampling process as an input to the first upsampling process, such that the first upsampling process performs an upsampling process based on the first downsampled output; the second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link addition process, wherein the second downsampling process performs downsampling process based on an input of the second-level sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, and the second residual link addition process performs second residual link addition on the input of the second-level sampling process and the second upsampled output, and then takes a result of the second residual link addition as an output of the second-level sampling process.

For example, in the image processing method provided in an embodiment of the present disclosure, the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process; the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

For example, in the image processing method provided in an embodiment of the present disclosure, the multi-scale cyclic sampling process further includes a third-level sampling process, the third-level sampling process being nested between the second downsampling process and the second upsampling process, receiving the second downsampled output as an input of the third-level sampling process, and providing an output of the third-level sampling process as an input of the second upsampling process, such that the second upsampling process performs the upsampling process based on the second downsampled output; the third-level sampling process includes a third downsampling process, a third upsampling process, and a third residual link addition process, wherein the third downsampling process performs downsampling process based on an input of the third-level sampling process to obtain a third downsampled output, the third upsampling process performs upsampling process based on the third downsampled output to obtain a third upsampled output, and the third residual link addition process performs third residual link addition on the input of the third-level sampling process and the third upsampled output, and then uses a result of the third residual link addition as an output of the third-level sampling process.

For example, in the image processing method provided in an embodiment of the present disclosure, the multi-scale cyclic sampling process includes the second-level sampling process sequentially performed a plurality of times, the first time the second-level sampling process receives the first downsampled output as an input of the first time the second-level sampling process, each time the second-level sampling process except the first time the second-level sampling process receives an output of the previous time the second-level sampling process as an input of the second-level sampling process at this time, and the last time the output of the second-level sampling process is as an input of the first upsampling process.

For example, in the image processing method provided in an embodiment of the present disclosure, the at least one multi-scale cyclic sampling process includes the multi-scale cyclic sampling processes performed sequentially a plurality of times, each time an input of the multi-scale cyclic sampling process is taken as an input of the first-level sampling process in the multi-scale cyclic sampling process, each time an output of the first-level sampling process in the multi-scale cyclic sampling process is taken as an output of the multi-scale cyclic sampling process; the first time the multi-scale cyclic sampling process receives the first feature image as an input of the first time the multi-scale cyclic sampling process, each time the multi-scale cyclic sampling process except the first time the multi-scale cyclic sampling process receives an output of a previous time the multi-scale cyclic sampling process as an input of the multi-scale cyclic sampling process, and the last time the output of the multi-scale cyclic sampling process is used as an output of the at least one multi-scale cyclic sampling process.

For example, in an image processing method provided in an embodiment of the present disclosure, the multi-scale cyclic sampling process further includes: after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process, an instance normalization process or a layer normalization process is performed on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively.

For example, the image processing method provided in an embodiment of the present disclosure further includes: performing the multi-scale cyclic sampling process using a first convolutional neural network; wherein the first convolutional neural network comprises: a first meta-network for performing the first-level sampling process; and the second binary network is used for executing the second-level sampling processing.

For example, in an image processing method provided in an embodiment of the present disclosure, the first meta network includes: a first subnetwork for performing the first downsampling process; a second sub-network for performing the first upsampling process; the second element network includes: a third sub-network for performing the second downsampling process; and a fourth sub-network for performing the second upsampling process.

For example, in the image processing method provided in an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, and a dense network.

For example, in the image processing method provided in an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an instance normalization layer for performing an instance normalization process or a layer normalization layer for performing a layer normalization process.

For example, the image processing method provided in an embodiment of the present disclosure further includes: acquiring an input image; converting an input image into the first feature image using an analysis network; and converting the output of the at least one multi-scale cyclic sampling process into an output image using a synthesis network.

At least one embodiment of the present disclosure further provides a training method of a neural network, wherein the neural network includes: the system comprises an analysis network, a first sub-neural network and a synthesis network, wherein the analysis network processes an input image to obtain a first characteristic image, the first sub-neural network processes the first characteristic image at least once in a multi-scale cyclic sampling mode to obtain a second characteristic image, and the synthesis network processes the second characteristic image to obtain an output image;

The training method comprises the following steps: acquiring a training input image; processing the training input image using the analysis network to provide a first training feature image; performing the at least one multi-scale cyclic sampling process on the first training feature image by using the first sub-neural network to obtain a second training feature image; processing the second training feature image using the synthesis network to obtain a training output image; calculating a loss value of the neural network through a loss function based on the training output image; correcting parameters of the neural network according to the loss value;

the multi-scale cyclic sampling processing comprises nested first-level sampling processing and second-level sampling processing, the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual link adding processing which are sequentially executed, wherein the first downsampling processing carries out downsampling processing based on input of the first-level sampling processing to obtain first downsampling output, the first upsampling processing carries out upsampling processing based on the first downsampling output to obtain first upsampling output, and the first residual link adding processing carries out first residual link adding on input of the first-level sampling processing and the first upsampling output, and then a result of the first residual link adding is used as output of the first-level sampling processing; the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input to the second-level sampling process, and provides an output of the second-level sampling process as an input to the first upsampling process, such that the first upsampling process performs an upsampling process based on the first downsampled output; the second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link addition process that are sequentially performed, wherein the second downsampling process performs downsampling process based on an input of the second-level sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, and the second residual link addition process performs the second residual link addition on the input of the second-level sampling process and the second upsampled output, and then uses a result of the second residual link addition as an output of the second-level sampling process.

For example, in the training method provided in an embodiment of the present disclosure, the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process; the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

For example, in the training method provided in an embodiment of the present disclosure, the first sub-neural network includes: a first meta-network for performing the first-level sampling process; and the second binary network is used for executing the second-level sampling processing.

For example, in the training method provided in an embodiment of the present disclosure, the first meta-network includes: a first subnetwork for performing the first downsampling process; a second sub-network for performing the first upsampling process; the second element network includes: a third sub-network for performing the second downsampling process; and a fourth sub-network for performing the second upsampling process.

For example, in the training method provided in an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, and a dense network.

For example, in the training method provided in an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an instance normalization layer for performing an instance normalization process on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively, and a layer normalization layer for performing a layer normalization process on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively.

At least one embodiment of the present disclosure also provides an image processing apparatus including: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer readable instructions, which when executed by the processor, perform the image processing method provided by any of the embodiments of the present disclosure.

At least one embodiment of the present disclosure also provides a storage medium that non-transitory stores computer readable instructions that, when executed by a computer, can perform the instructions of the image processing method provided by any of the embodiments of the present disclosure.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.

FIG. 1 is a schematic diagram of a convolutional neural network;

FIG. 2A is a schematic diagram of a convolutional neural network;

FIG. 2B is a schematic diagram of the operation of a convolutional neural network;

FIG. 3 is a flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 4A is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in FIG. 3 according to one embodiment of the present disclosure;

FIG. 4B is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in FIG. 3 according to another embodiment of the present disclosure;

FIG. 4C is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in FIG. 3 according to yet another embodiment of the present disclosure;

FIG. 4D is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in FIG. 3 according to yet another embodiment of the present disclosure;

FIG. 5 is a flowchart of an image processing method according to another embodiment of the present disclosure;

FIG. 6A is a schematic diagram of an input image;

FIG. 6B is a schematic diagram of an output image obtained by processing the input image shown in FIG. 6A according to an embodiment of the present disclosure;

fig. 7A is a schematic structural diagram of a neural network according to an embodiment of the disclosure;

FIG. 7B is a flowchart of a neural network training method according to an embodiment of the present disclosure;

FIG. 7C is a schematic block diagram of an architecture for training the neural network shown in FIG. 7A, corresponding to the training method shown in FIG. 7B, in accordance with an embodiment of the present disclosure;

fig. 8 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

fig. 9 is a schematic diagram of a storage medium according to an embodiment of the disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

The present disclosure is illustrated by the following several specific examples. Detailed descriptions of known functions and known components may be omitted for the sake of clarity and conciseness in the following description of the embodiments of the present disclosure. When any element of an embodiment of the present disclosure appears in more than one drawing, the element is identified by the same or similar reference numeral in each drawing.

Image enhancement is one of the research hotspots in the field of image processing. The image quality is greatly reduced due to limitations of various physical factors in the image acquisition process (e.g., the size of the image sensor of the mobile phone camera is too small, and other software, hardware limitations, etc.), as well as interference from environmental noise. The purpose of image enhancement is to improve the gray level histogram of the image and the contrast of the image by an image enhancement technology, so that the detail information of the image is highlighted and the visual effect of the image is improved.

Image enhancement using deep neural networks is a technology that is emerging with the development of deep learning technology. For example, based on convolutional neural networks, low quality photographs (input images) taken from a cell phone may be processed to obtain high quality output images, which may be of quality close to that taken by a digital single mirror reflex camera (Digital Single Lens Reflex Camera, often simply referred to as DSLR, also simply referred to as digital single reflex camera). For example, a peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR) index is commonly used to characterize image quality, where a higher PSNR value indicates that the image is closer to a photograph taken by a real digital single mirror reflex camera.

For example, andrey Ignatov et al, describe a method for achieving image enhancement by convolutional neural networks, see, for example, andrey Ignatov, nikolay Kobyshev, kenneth Vanhoey, radu Timofte, luc Van Gool, DSLR-Quality Photos on Mobile Devices with Deep Convolutional networks.arXiv:1704.0247 v2[ cs.CV ],2017, 9/5. This document is incorporated by reference herein in its entirety as part of the present application. The method mainly utilizes a convolution layer, a batch standardization layer and residual connection to construct a single-scale convolution neural network, and can process an input low-quality image (such as low contrast, underexposure or overexposure of the image, over-darkness or over-brightness of the whole image and the like) into a higher-quality image by utilizing the network. The color loss, the texture loss and the content loss are used as loss functions in training, so that a good processing effect can be obtained.

At least one embodiment of the present disclosure provides an image processing method, an image processing apparatus, a training method of a neural network, and a storage medium. The image processing method provides a multi-scale cyclic sampling method based on a convolutional neural network, and the multi-scale cyclic sampling method is used for repeatedly sampling on a plurality of scales to obtain higher image fidelity, so that the quality of an output image can be greatly improved, and the method is suitable for off-line application such as batch processing with higher requirement on the image quality.

Initially, convolutional neural networks (Convolutional Neural Network, CNN) were used primarily to identify two-dimensional shapes that were highly invariant to translation, scaling, tilting, or other forms of deformation of the image. CNN simplifies the complexity of the neural network model and reduces the number of weights mainly by local perception field and weight sharing. With the development of deep learning technology, the application range of CNN is not limited to the field of image recognition, but can also be applied to the fields of face recognition, word recognition, animal classification, image processing and the like.

Fig. 1 shows a schematic diagram of a convolutional neural network. For example, the convolutional neural network may be used for image processing, which uses images as inputs and outputs, and replaces scalar weights by convolutional kernels. Only convolutional neural networks having a 3-layer structure are shown in fig. 1, to which embodiments of the present disclosure are not limited. As shown in fig. 1, the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103. The input layer 101 has 4 inputs, the hidden layer 102 has 3 outputs, the output layer 103 has 2 outputs, and finally the convolutional neural network outputs 2 images.

For example, the 4 inputs of the input layer 101 may be 4 images, or four feature images of 1 image. The 3 outputs of the hidden layer 102 may be characteristic images of the image input through the input layer 101.

For example, as shown in FIG. 1, the convolutional layers have weights

And bias->

Weight->

Representing convolution kernel, bias +.>

Is a scalar that is superimposed on the output of the convolutional layer, where k is a label representing the input layer 101 and i and j are labels of the unit of the input layer 101 and the unit of the concealment layer 102, respectively. For example, the first convolution layer 201 comprises a first set of convolution kernels (+.>

) And a first set of biases (+.in FIG. 1)>

). The second convolution layer 202 comprises a second set of convolution kernels (+_in FIG. 1>

) And a second set of biases (+.in FIG. 1)>

). Typically, each convolutional layer comprises tens or hundreds of convolutional kernels, which may comprise at least five convolutional layers if the convolutional neural network is a deep convolutional neural network.

For example, as shown in fig. 1, the convolutional neural network further includes a first active layer 203 and a second active layer 204. The first active layer 203 is located after the first convolutional layer 201 and the second active layer 204 is located after the second convolutional layer 202. The activation layers (e.g., the first activation layer 203 and the second activation layer 204) include an activation function that is used to introduce non-linearities into the convolutional neural network so that the convolutional neural network can better address the more complex problem. The activation function may include a linear correction unit (ReLU) function, an S-type function (Sigmoid function), a hyperbolic tangent function (tanh function), or the like. The ReLU function is a non-saturated nonlinear function, and the Sigmoid function and the tanh function are saturated nonlinear functions. For example, the active layer may be a layer of the convolutional neural network alone, or the active layer may be included in a convolutional layer (e.g., the first convolutional layer 201 may include the first active layer 203 and the second convolutional layer 202 may include the second active layer 204).

For example, in the first convolution layer 201, first, several convolution kernels in the first set of convolution kernels are applied to each input

And several offsets in the first set of offsets +.>

To obtain an output of the first convolution layer 201; the output of the first convolutional layer 201 may then be processed through the first active layer 203 to obtain the output of the first active layer 203. In the second convolution layer 202, first, several convolution kernels of the second set of convolution kernels are applied to the output of the input first activation layer 203>

And several offsets in the second set of offsets +.>

To obtain the output of the second convolution layer 202; the output of the second convolutional layer 202 may then be processed through the second active layer 204 to obtain the output of the second active layer 204. For example, the output of the first convolution layer 201 may be to apply a convolution kernel to its input>

Back and bias->

As a result of the addition, the output of the second convolution layer 202 may be the application of a convolution kernel to the output of the first activation layer 203>

Back and bias->

The result of the addition.

The convolutional neural network needs to be trained before image processing is performed using the convolutional neural network. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted through a plurality of groups of input/output example images and an optimization algorithm to obtain an optimized convolution neural network model.

Fig. 2A shows a schematic structural diagram of a convolutional neural network, and fig. 2B shows a schematic working process of the convolutional neural network. For example, as shown in fig. 2A and 2B, after the input image is input to the convolutional neural network through the input layer, the category identification is output after a plurality of processing procedures (such as each level in fig. 2A) are sequentially performed. The main components of the convolutional neural network may include a plurality of convolutional layers, a plurality of downsampling layers, a fully-connected layer, and the like. In the present disclosure, it should be understood that each of the plurality of convolution layers, the plurality of downsampling layers, and the full connection layer refers to a corresponding processing operation, that is, a convolution process, a downsampling process, a full connection process, etc., and the described neural network also refers to a corresponding processing operation, and an example normalization layer or a layer normalization layer, etc., which will be described below are similar, and the description thereof will not be repeated here. For example, a complete convolutional neural network may consist of a superposition of these three layers. For example, fig. 2A shows only three levels of a convolutional neural network, namely a first level, a second level, and a third level. For example, each hierarchy may include a convolution module and a downsampling layer. For example, each convolution module may include a convolution layer. Thus, the process of each hierarchy may include: the input image is convolved (convolved) and downsampled (sub-sampled/down-sampled). For example, each convolution module may also include an instance normalization (instance normalization) layer or a layer normalization (layer normalization) layer, depending on the actual needs, such that each level of processing may also include an instance normalization process or a layer normalization process.

For example, the example normalization layer is used for performing example normalization processing on the feature image output by the convolution layer, so that the gray value of the pixel of the feature image is changed within a preset range, thereby simplifying the image generation process and improving the image enhancement effect. For example, the predetermined range may be [ -1,1]. And the instance normalization layer performs instance normalization processing on each characteristic image according to the mean value and the variance of the characteristic image. For example, the instance normalization layer may also be used to conduct instance normalization processing on individual images.

For example, assuming that the size of the small-scale gradient descent method (mini-batch gradient decent) is T, the number of feature images output by a certain convolution layer is C, and each feature image is a matrix of H rows and W columns, the model of the feature image is expressed as (T, C, H, W). Thus, the instance normalization formula for the instance normalization layer can be expressed as follows:

wherein x is _tijk The values of the t-th feature block (patch), the i-th feature image, the j-th row and the k-th column in the feature image set output for the convolution layer. y is _tijk Representing the normalized layer processing x by instance _tijk The results obtained. e, e ₁ Is a small integer to avoid a denominator of 0.

For example, the layer normalization layer is similar to the example normalization layer, and is also used for performing layer normalization processing on the feature image output by the convolution layer, so that the gray value of the pixel of the feature image is changed within a predetermined range, thereby simplifying the image generation process and improving the image enhancement effect. For example, the predetermined range may be [ -1,1]. Unlike the example normalization layer, the layer normalization layer performs layer normalization processing on each column of each feature image according to the mean and variance of each column of the feature image, thereby implementing layer normalization processing on the feature image. For example, a layer normalization layer may also be used to perform a layer normalization process on a single image.

For example, still taking the small-batch gradient descent method (mini-batch gradient decent) described above as an example, the model of the feature image is denoted as (T, C, H, W). Thus, the layer normalization formula for the layer normalization layer can be expressed as follows:

wherein x is _tijk The values of the t-th feature block (patch), the i-th feature image, the j-th row and the k-th column in the feature image set output for the convolution layer. y' _tijk Representing a layer normalized layer treatment x _tijk The results obtained. e, e ₂ Is a small integer to avoid a denominator of 0.

The convolutional layer is the core layer of the convolutional neural network. In the convolutional layer of a convolutional neural network, one neuron is connected with only a part of neurons of an adjacent layer. The convolution layer may apply several convolution kernels (also called filters) to the input image to extract various types of features of the input image. Each convolution kernel may extract a type of feature. The convolution kernel is typically initialized in the form of a random decimal matrix, and will learn to obtain reasonable weights during the training process of the convolutional neural network. The result obtained after applying one convolution kernel to the input image is called feature image (feature map), and the number of feature images is equal to the number of convolution kernels. Each feature image is composed of a plurality of neurons in rectangular arrangement, and the neurons of the same feature image share weights, wherein the shared weights are convolution kernels. The feature image output by the convolution layer of one level may be input to the adjacent convolution layer of the next level and processed again to obtain a new feature image. For example, as shown in fig. 2A, a first level of convolution layers may output a first level feature image that is input to a second level of convolution layers for further processing to obtain a second level feature image.

For example, as shown in fig. 2B, the convolution layer may use different convolution checks to convolve data of a local receptive field of the input image, and the convolution result is input to the activation layer, where the activation layer performs calculation according to a corresponding activation function to obtain feature information of the input image.

For example, as shown in fig. 2A and 2B, a downsampling layer is provided between adjacent convolution layers, the downsampling layer being one form of downsampling. On one hand, the downsampling layer can be used for reducing the scale of an input image, simplifying the complexity of calculation and reducing the phenomenon of overfitting to a certain extent; on the other hand, the downsampling layer can also perform feature compression to extract main features of the input image. The downsampling layer is capable of reducing the size of the feature images without changing the number of feature images. For example, an input image of size 12×12, which is sampled by a convolution kernel of 6×6, can result in a 2×2 output image, which means that 36 pixels on the input image are combined into 1 pixel in the output image. The last downsampling layer or convolution layer may be connected to one or more fully connected layers that connect all of the extracted features. The output of the fully connected layer is a one-dimensional matrix, i.e. a vector.

Some embodiments of the present disclosure and examples thereof are described in detail below with reference to the attached drawings.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present disclosure. For example, as shown in fig. 3, the image processing method includes:

step S110: receiving a first feature image;

step S120: at least one multi-scale cyclic sampling process is performed on the first feature image.

For example, in step S110, the first feature image may include a feature image obtained by processing the input image through one of a convolution layer, a residual network, a dense network, and the like (for example, refer to fig. 2B). For example, the residual network holds its inputs in a proportion to its outputs by way of, for example, residual connection addition. For example, dense networks include a bottleneck layer (bottleneck layer) and a convolution layer, e.g., in some examples, the bottleneck layer is used to dimension down data to reduce the number of parameters in subsequent convolution operations, e.g., the bottleneck layer has a convolution kernel of 1x1, e.g., the convolution layer has a convolution kernel of 3x 3; the present disclosure includes, but is not limited to, this. For example, the input image is subjected to convolution, downsampling, or the like to obtain a first feature image. Note that, in this embodiment, the method of acquiring the first feature image is not limited. For example, the first feature image may include a plurality of feature images, but is not limited thereto.

For example, the first feature image received in step S110 is input to the multi-scale cyclic sampling process in step S120. For example, the multiscale cyclical sampling process may take a variety of forms including, but not limited to, the three forms shown in fig. 4A-4C, which will be described below.

Fig. 4A is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in fig. 3 according to an embodiment of the present disclosure. As shown in fig. 4A, the multi-scale cyclic sampling process includes nested first-level sampling processes and second-level sampling processes.

For example, as shown in fig. 4A, the input of the multi-scale cyclic sampling process is taken as the input of the first-level sampling process, and the output of the first-level sampling process is taken as the output of the multi-scale cyclic sampling process. The output of the multi-scale cyclic sampling process, for example, is referred to as a second feature image, e.g., the size (the number of rows and columns of the pixel array) of the second feature image may be the same as the size of the first feature image.

For example, as shown in fig. 4A, the first-level sampling process includes a first downsampling process, a first upsampling process, and a first residual link addition process, which are sequentially performed. The first downsampling process downsamples the input of the first-level sampling process to obtain a first downsampled output, e.g., the first downsampling process may downsample the input of the first-level sampling process directly to obtain the first downsampled output. The first upsampling process performs upsampling processing based on the first downsampled output to obtain a first upsampled output, e.g., after the first downsampled output has undergone a second level of sampling processing, the upsampling processing is performed to obtain the first upsampled output, i.e., the first upsampling process may indirectly upsample the first downsampled output. The first residual link addition process performs a first residual link addition on the input of the first hierarchical sampling process and the first up-sampled output, and then takes the result of the first residual link addition as the output of the first hierarchical sampling process. For example, the size of the output of the first upsampling process (i.e., the first upsampling output) is the same as the size of the input of the first hierarchical sampling process (i.e., the input of the first downsampling process), such that after the first residual link addition, the size of the output of the first hierarchical sampling process is the same as the size of the input of the first hierarchical sampling process.

For example, as shown in fig. 4A, the second-level sampling process is nested between the first downsampling process and the first upsampling process of the first-level sampling process, the first downsampled output is received as an input to the second-level sampling process, and the output of the second-level sampling process is provided as an input to the first upsampling process, such that the first upsampling process performs the upsampling process based on the first downsampled output.

For example, as shown in fig. 4A, the second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link addition process, which are sequentially performed. The second downsampling process downsamples the input of the second-level sampling process to obtain a second downsampled output, e.g., the second downsampling process may downsample the input of the second-level sampling process directly to obtain the second downsampled output. The second upsampling process upsamples the second upsampled output based on the second downsampled output, e.g., the second upsampling process may directly upsample the second downsampled output to obtain the second upsampled output. The second residual link addition process performs a second residual link addition on the input of the second hierarchical sampling process and the second up-sampled output, and then takes the result of the second residual link addition as the output of the second hierarchical sampling process. For example, the size of the output of the second upsampling process (i.e., the second upsampling output) is the same as the size of the input of the second hierarchical sampling process (i.e., the input of the second downsampling process), such that after the second residual link addition, the size of the output of the second hierarchical sampling process is the same as the size of the input of the second hierarchical sampling process.

Note that in some embodiments of the present disclosure (not limited to the present embodiment), the flow of sampling processing (e.g., first-level sampling processing, second-level sampling processing, third-level sampling processing to be described in the embodiment shown in fig. 4B, and the like) of each level is similar, and includes downsampling processing, upsampling processing, and residual link addition processing. In addition, taking the feature image as an example, the residual link addition process may include correspondingly adding values of each row and each column of the matrix of the two feature images, but is not limited thereto.

In this disclosure, "nested" refers to one object including another object similar to or identical to the object, including but not limited to a flow or network structure, etc.

It is noted that in some embodiments of the present disclosure, the size of the output of the upsampling process (e.g., the output of the upsampling process is a feature image) in each level is the same as the size of the input of the downsampling process (e.g., the input of the downsampling process is a feature image), so that after the residual link addition, the size of the output of the sampling process (e.g., the output of the sampling process of each level may be a feature image) and the size of the input of the sampling process of each level (e.g., the input of the sampling process of each level may be a feature image) are the same.

It is noted that in some embodiments of the present disclosure, the multiscale cyclic sampling process may be implemented by a convolutional neural network. For example, in some embodiments of the present disclosure, a multi-scale cyclic sampling process may be performed using a first convolutional neural network. For example, in some examples, the first convolutional neural network may include nested first and second meta networks, the first meta network to perform the first hierarchical sampling process and the second meta network to perform the second hierarchical sampling process.

For example, in some examples, the first subnetwork may include a first subnetwork for performing the first downsampling process and a second subnetwork for performing the first upsampling process. The second subnetwork is nested between the first subnetwork and the third subnetwork of the first subnetwork. For example, in some examples, the second sub-network may include a third sub-network to perform the second downsampling process and a fourth sub-network to perform the second upsampling process. For example, the first and second element networks are both similar in form to the residual network described previously.

For example, in some examples, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, a dense network, and the like. Specifically, the first sub-network and the third sub-network may include a convolution layer (downsampling layer) having a downsampling function, or may include one of a residual network, a dense network, and the like having a downsampling function; the second sub-network and the fourth sub-network may include a convolution layer (up-sampling layer) having an up-sampling function, or may include one of a residual network, a dense network, etc. having an up-sampling function. It should be noted that the first sub-network and the third sub-network may have the same structure or may have different structures; the second sub-network and the fourth sub-network may have the same structure or may have different structures; embodiments of the present disclosure are not limited in this regard.

Downsampling is used to reduce the size of the feature image, thereby reducing the amount of data of the feature image, for example, downsampling may be performed by a downsampling layer, but is not limited thereto. For example, the downsampling layer may implement the downsampling process using a maximum value combining (max pooling), average combining (average pooling), span convolution (strided convolution), downsampling (resolution, e.g., selecting fixed pixels), demultiplexing out (splitting an input image into multiple smaller images), and the like.

The upsampling is used to increase the size of the feature image, thereby increasing the data amount of the feature image, and the upsampling process may be performed by an upsampling layer, for example, but is not limited thereto. For example, the upsampling layer may implement the upsampling process using a span transpose convolution (strided transposed convolution), an interpolation algorithm, or the like. The interpolation algorithm may include, for example, interpolation, bilinear interpolation, bicubic interpolation (Bicubic Interprolation), etc.

It should be noted that, in some embodiments of the present disclosure, the downsampling factor of the downsampling process of the same hierarchy corresponds to the upsampling factor of the upsampling process, that is: when the downsampling factor of the downsampling process is 1/y, then the upsampling factor of the upsampling process is y, where y is a positive integer, and y is typically greater than 2. Thus, it can be ensured that the output of the upsampling process and the input of the downsampling process at the same level are the same in size.

It should be noted that, in some embodiments of the present disclosure (not limited to the present embodiment), parameters of the downsampling process of different levels (i.e., parameters of a network corresponding to the downsampling process) may be the same or different; the parameters of the up-sampling processes of different levels (i.e., the parameters of the network to which the up-sampling processes correspond) may be the same or different; the parameters of the residual connection addition of different levels may be the same or different. The present disclosure is not limited in this regard.

For example, in some embodiments of the present disclosure (not limited to the present embodiment), in order to improve the global features of brightness, contrast, etc. of the feature image, the multi-scale cyclic sampling process may further include: after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process, an instance normalization process or a layer normalization process is performed on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively. It should be noted that the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output may use the same normalization method (e.g., an example normalization process or a layer normalization process), or may use different normalization methods, which is not limited in this disclosure.

Accordingly, the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network further include an instance normalization layer or a layer normalization layer, respectively, for performing instance normalization processing, and the layer normalization layer is for performing layer normalization processing. For example, the instance normalization layer may perform instance normalization processing according to the foregoing instance normalization formula, and the layer normalization layer may perform layer normalization processing according to the foregoing layer normalization formula, which is not limiting of the present disclosure. It should be noted that the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network may include the same standardization layer (an example standardization layer or a layer standardization layer), or may include different standardization layers, which is not limited in this disclosure.

Fig. 4B is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in fig. 3 according to another embodiment of the present disclosure. As shown in fig. 4B, the multi-scale cyclic sampling process further includes a third-level sampling process on the basis of the multi-scale cyclic sampling process shown in fig. 4A. It should be noted that, the other flow of the multi-scale cyclic sampling process shown in fig. 5 is substantially the same as the flow of the multi-scale cyclic sampling process shown in fig. 4A, and the repetition is not repeated here.

For example, as shown in fig. 4B, a third-level sampling process is nested between a second downsampling process and a second upsampling process of a second-level sampling process, a second downsampled output is received as an input to the third-level sampling process, and an output of the third-level sampling process is provided as an input to the second upsampling process, such that the second upsampling process performs the upsampling process based on the second downsampled output. In this case, similarly to the first upsampling process, the first downsampling output is indirectly upsampled, and the second upsampling process also indirectly upsamples the second downsampling output.

The third-level sampling process includes a third downsampling process, a third upsampling process, and a third residual link addition process that are sequentially performed. The third downsampling process downsamples the input of the third level sampling process to obtain a third downsampled output, e.g., the third downsampling process may downsample the input of the third level sampling process directly to obtain the third downsampled output. The third upsampling process upsamples the third upsampled output based on the third downsampled output to obtain the third upsampled output, e.g., the third upsampling process may directly upsample the third downsampled output to obtain the third upsampled output. The third residual link addition process performs a third residual link addition on the input of the third level sampling process and the third up-sampling output, and then takes the result of the third residual link addition as the output of the third level sampling process. For example, the size of the output of the third upsampling process (i.e., the third upsampling output) is the same as the size of the input of the third level sampling process (i.e., the input of the third downsampling process), so that after the third residual link addition, the size of the output of the third level sampling process is the same as the size of the input of the third level sampling process.

It should be noted that, for more details and implementation (i.e., network structure) of the third-level sampling process, reference may be made to the description of the first-level sampling process and the second-level sampling process in the embodiment shown in fig. 4A, which is not repeated in this disclosure.

It should be noted that, based on the present embodiment, it should be understood by those skilled in the art that the multi-scale cyclic sampling process may further include sampling processes of more layers, for example, a fourth-level sampling process nested in a third-level sampling process, a fifth-level sampling process nested in a fourth-level sampling process, and the like, in a similar manner to the second-level sampling process and the third-level sampling process described above, which is not limited in this disclosure.

Fig. 4C is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in fig. 3 according to still another embodiment of the present disclosure. As shown in fig. 4C, the multi-scale cyclic sampling process includes a second-level sampling process that is sequentially performed a plurality of times on the basis of the multi-scale cyclic sampling process shown in fig. 4A. It should be noted that, the other flow of the multi-scale cyclic sampling process shown in fig. 5 is substantially the same as the flow of the multi-scale cyclic sampling process shown in fig. 4A, and the repetition is not repeated here. It is also noted that the inclusion of two second-level sampling processes in fig. 4C is exemplary, and in embodiments of the present disclosure, the multi-scale cyclic sampling process may include two or more sequentially performed second-level sampling processes. It should be noted that, in the embodiment of the present disclosure, the number of times of the second-level sampling process may be selected according to actual needs, which is not limited by the present disclosure. For example, in some examples, the inventors of the present application found that performing image enhancement processing with an image processing method having two second-level sampling processes is better than with an image processing method having one or three second-level sampling processes, but this should not be considered as limiting the present disclosure.

For example, the first second-level sampling process receives a first downsampled output as an input to the first second-level sampling process, each second-level sampling process except the first second-level sampling process receives an output of a previous second-level sampling process as an input to the present second-level sampling process, and an output of a last second-level sampling process as an input to the first upsampling process.

It should be noted that, for more details and implementation of each second-level sampling process, reference may be made to the description of the second-level sampling process in the embodiment shown in fig. 4A, which is not repeated in this disclosure.

It should be noted that, in some embodiments of the present disclosure (not limited to the present embodiment), parameters of the downsampling process of the same level in different orders may be the same or different; the parameters of the up-sampling processing of the same level in different orders can be the same or different; the parameters of the residual connection addition of the same hierarchy in different orders may be the same or different. The present disclosure is not limited in this regard.

It should be noted that, based on the present embodiment, it should be understood by those skilled in the art that, in the multi-scale cyclic sampling process, the first-level sampling process may nest a plurality of sequentially executed second-level sampling processes; further, at least part of the second-level sampling processes may nest one or more third-level sampling processes that are sequentially performed, and the number of the third-level sampling processes that are nested by the at least part of the second-level sampling processes may be the same or different; further, the third level sampling process may nest the fourth level sampling process, and the specific nesting manner may be the same as the manner in which the second level sampling process nests the third level sampling process; and so on.

It should be noted that fig. 4A to 4C illustrate a case where the image processing method provided by the embodiment of the present disclosure includes a one-time multi-scale cyclic sampling process. In the image processing method provided by the embodiment shown in fig. 4A-4C, the at least one multi-scale cyclic sampling process includes one multi-scale cyclic sampling process. The multi-scale cyclic sampling process receives the first feature image as an input to the multi-scale cyclic sampling process, the input to the multi-scale cyclic sampling process as an input to a first level sampling process in the multi-scale cyclic sampling process, the output of the first level sampling process in the multi-scale cyclic sampling process as an output of the multi-scale cyclic sampling process, and the output of the multi-scale cyclic sampling process as an output of the at least one multi-scale cyclic sampling process. The present disclosure includes, but is not limited to, this.

Fig. 4D is a schematic flow diagram of a multi-scale cyclic sampling process corresponding to the image processing method shown in fig. 3 according to another embodiment of the present disclosure. As shown in fig. 4D, in the image processing method provided in the present embodiment, the at least one multi-scale cyclic sampling process includes a multi-scale cyclic sampling process that is sequentially performed a plurality of times, for example, the at least one multi-scale cyclic sampling process may include a multi-scale cyclic sampling process that is sequentially performed two or three times, but is not limited thereto. It should be noted that, in the embodiment of the present disclosure, the number of times of the multi-scale cyclic sampling process may be selected according to actual needs, which is not limited by the present disclosure. For example, in some examples, the inventors of the present application found that performing image enhancement processing with an image processing method having two multi-scale cyclic sampling processes is better than with an image processing method having one or three multi-scale cyclic sampling processes, but this should not be considered as limiting the present disclosure.

For example, the input of each multi-scale cyclic sampling process is taken as the input of the first-level sampling process in the multi-scale cyclic sampling process, and the output of the first-level sampling process in each multi-scale cyclic sampling process is taken as the output of the multi-scale cyclic sampling process.

For example, as shown in fig. 4D, the first multi-scale cyclic sampling process receives the first feature image as an input to the first multi-scale cyclic sampling process, each multi-scale cyclic sampling process other than the first multi-scale cyclic sampling process receives an output of a previous multi-scale cyclic sampling process as an input to the present multi-scale cyclic sampling process, and an output of a last multi-scale cyclic sampling process as an output of at least one multi-scale cyclic sampling process.

It should be noted that, for more details and implementation of each multi-scale cyclic sampling process, reference may be made to the description of the multi-scale cyclic sampling process in the embodiment shown in fig. 4A-4C, which is not repeated in this disclosure. It should also be noted that the implementation manner (i.e., network structure) and parameters of the multi-scale cyclic sampling process in different orders may be the same or different, which is not limited by the present disclosure.

Fig. 5 is a flowchart of an image processing method according to another embodiment of the present disclosure. As shown in fig. 5, the image processing method includes steps S210 to S250. It should be noted that, steps S230 to S240 of the image processing method shown in fig. 5 correspond to steps S110 to S120 of the image processing method shown in fig. 3, that is, the image processing method shown in fig. 5 includes the image processing method shown in fig. 3, and thus steps S230 to S240 of the image processing method shown in fig. 5 may refer to the foregoing description of steps S110 to S120 of the image processing method shown in fig. 3, and of course, may refer to the methods and the like of the embodiments shown in fig. 4A to 4D. Hereinafter, steps S210 to S250 of the image processing method shown in fig. 5 will be described in detail.

Step S210: an input image is acquired.

For example, in step S210, the input image may include a photograph taken by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, or a web camera, etc., which may include a person image, an animal and plant image, or a landscape image, etc., which is not limited in the present disclosure. For example, the quality of the input image is lower than that of a photograph taken by a real digital single mirror reflex camera, i.e., the input image is a low quality image. For example, in some examples, the input image may include a 3-channel RGB image; in other examples, the input image may include a 3-channel YUV image. Hereinafter, description will be made taking an example in which an input image includes an RGB image, but embodiments of the present disclosure are not limited thereto.

Step S220: the input image is converted into a first feature image using an analysis network.

For example, in step S220, the analysis network may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, in some examples, the analysis network may convert the 3-channel RGB image (i.e., the input image) into a plurality of first feature images, e.g., 64 first feature images, including but not limited to.

It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the analysis network, as long as it can convert the input image into the convolution feature dimension (i.e., into the first feature image).

Step S230: receiving a first feature image;

step S240: at least one multi-scale cyclic sampling process is performed on the first feature image.

It should be noted that, the steps S230 to S240 may refer to the descriptions of the steps S110 to S120, which are not repeated in the disclosure.

Step S250: the output of the at least one multi-scale cyclic sampling process is converted to an output image using a synthesis network.

For example, in step S250, the synthetic network may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, the output of the at least one multi-scale cyclic sampling process may be referred to as a second feature image. For example, the number of the second feature images may be plural, but is not limited thereto. For example, in some examples, the synthesis network may convert the plurality of second feature images into an output image, e.g., the output image may include a 3-channel RGB image, including but not limited to.

Fig. 6A is a schematic diagram of an input image, and fig. 6B is a schematic diagram of an output image obtained by processing the input image shown in fig. 6A according to an image processing method (for example, the image processing method shown in fig. 5) according to an embodiment of the disclosure.

For example, as shown in fig. 6A and 6B, the output image retains the content of the input image, but improves the contrast of the image and improves the problem of excessive darkness of the input image, so that the quality of the output image can be close to that of a photograph taken by a real digital single mirror reflex camera, i.e., the output image is a high quality image, compared to the input image.

It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the synthetic network, as long as it can convert the convolution feature dimension (i.e., the second feature image) into the output image.

The image processing method provided by the embodiment of the disclosure can perform image enhancement processing on low-quality input images, can greatly improve the quality of output images by repeatedly sampling on a plurality of scales to obtain higher image fidelity, and is suitable for off-line applications such as batch processing with higher requirements on image quality. Specifically, the PSNR of the image output by the image enhancement method proposed in the Andrey Ignatov et al document is 20.08, whereas the PSNR of the output image obtained by the image processing method provided based on the embodiment of fig. 4C of the present disclosure may reach 23.35, i.e., the image obtained by the image processing method provided by the embodiment of the present disclosure may be closer to a photograph taken by a real digital single-mirror reflex camera.

At least one embodiment of the present disclosure also provides a training method for a neural network. Fig. 7A is a schematic structural diagram of a neural network according to an embodiment of the present disclosure, fig. 7B is a flowchart of a training method of the neural network according to an embodiment of the present disclosure, and fig. 7C is a schematic block diagram of a training architecture of the neural network according to an embodiment of the present disclosure, corresponding to the training method shown in fig. 7B.

For example, as shown in fig. 7A, the neural network 300 includes an analysis network 310, a first sub-neural network 320, and a synthesis network 330. For example, the analysis network 310 processes the input image to obtain a first feature image, the first sub-neural network 320 performs at least one multi-scale cyclic sampling process on the first feature image to obtain a second feature image, and the synthesis network 330 processes the second feature image to obtain an output image.

For example, the structure of the analysis network 310 may refer to the description of the analysis network in the aforementioned step S220, which is not limited by the present disclosure; the structure of the first sub-neural network 320 may refer to the description of the implementation of the multi-scale cyclic sampling process in the aforementioned step S120 (i.e., step S240), for example, the first sub-neural network may include, but is not limited to, the aforementioned first convolutional neural network, which is not limited in this disclosure; for example, the synthesis network 330 may refer to the description of the synthesis network in the foregoing step S250, which is not limited by the present disclosure.

For example, the input image and the output image may also refer to the descriptions about the input image and the output image in the image processing method provided in the foregoing embodiment, which are not repeated in this disclosure.

For example, as shown in connection with fig. 7B and 7C, the training method of the neural network includes steps S410 to S460.

Step S410: a training input image is acquired.

For example, similar to the input image in the aforementioned step S210, the training input image may also include a photograph taken by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, or a web camera, etc., which may include a person image, an animal and plant image, or a landscape image, etc., which is not limited in the present disclosure. For example, the quality of the training input image is lower than the quality of a photograph taken by a real digital single mirror reflex camera, i.e. the training input image is a low quality image. For example, in some examples, the training input image may include a 3-channel RGB image.

Step S420: the training input image is processed using an analysis network to provide a first training feature image.

For example, similar to the analysis network in the aforementioned step S220, the analysis network 310 may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, in some examples, the analysis network may convert the 3-channel RGB image (i.e., training input image) into a plurality of first training feature images, e.g., 64 first training feature images, including but not limited to.

Step S430: and performing at least one multi-scale cyclic sampling processing on the first training feature image by using the first sub-neural network to obtain a second training feature image.

For example, in step S430, the multi-scale cyclic sampling process may be implemented as the multi-scale cyclic sampling process in any of the embodiments shown in fig. 4A-4D, but is not limited thereto. Hereinafter, the multi-scale cyclic sampling process in step S430 is described as an example of the multi-scale cyclic sampling process shown in fig. 4A.

For example, as shown in fig. 4A, the multi-scale cyclic sampling process nests a first-level sampling process and a second-level sampling process.

For example, as shown in fig. 4A, the input of the multi-scale cyclic sampling process (i.e., the first training feature image) serves as the input of the first-level sampling process, and the output of the first-level sampling process serves as the output of the multi-scale cyclic sampling process (i.e., the second training feature image). For example, the second training feature image may be the same size as the first training feature image.

For example, the first sub-neural network 320 may be implemented as the aforementioned first convolutional neural network, accordingly. For example, the first sub-neural network 320 may include nested first and second sub-networks, the first sub-network for performing the first-level sampling process and the second sub-network for performing the second-level sampling process.

For example, the first subnetwork may include a first subnetwork for performing the first downsampling process and a second subnetwork for performing the first upsampling process. The second subnetwork is nested between the first subnetwork and the third subnetwork of the first subnetwork. For example, the second sub-network may include a third sub-network for performing the second downsampling process and a fourth sub-network for performing the second upsampling process.

For example, each of the first, second, third, and fourth subnetworks includes one of a convolutional layer, a residual network, a dense network, and the like. In particular, the first and third sub-networks may include one of a convolutional layer (downsampling layer) having a downsampling function, a residual network, a dense network, and the like; the second sub-network and the fourth sub-network may include one of a convolutional layer (upsampling layer) having an upsampling function, a residual network, a dense network, and the like. It should be noted that the first sub-network and the third sub-network may have the same structure or may have different structures; the second sub-network and the fourth sub-network may have the same structure or may have different structures; the present disclosure is not limited in this regard.

For example, in an embodiment of the present disclosure, to improve the brightness, contrast, etc. global features of the feature image, the multi-scale cyclic sampling process may further include: after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process, an instance normalization process or a layer normalization process is performed on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively. It should be noted that the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output may use the same normalization method (e.g., an example normalization process or a layer normalization process), or may use different normalization methods, which is not limited in this disclosure.

It should be noted that, for more implementation and more details of the multi-scale cyclic sampling process in step S430, reference may be made to the description of the multi-scale cyclic sampling process in the foregoing step S120 (i.e. step S240) and the embodiment shown in fig. 4A-4D, which will not be repeated in this disclosure. It should be further noted that, when the multi-scale cyclic sampling process in step S430 is implemented in other forms, the first sub-neural network 320 should be changed accordingly to implement the multi-scale cyclic sampling process in other forms, which will not be described in detail in this disclosure.

For example, in step S430, the number of the second training feature images may be plural, but is not limited thereto.

Step S440: the second training feature image is processed using the synthesis network to obtain a training output image.

For example, similar to the synthesis network in the aforementioned step S250, the synthesis network 330 may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, in some examples, the synthesis network may convert the plurality of second training feature images into a training output image, e.g., the training output image may include a 3-channel RGB image, including but not limited to.

Step S450: based on the training output image, a loss value of the neural network is calculated by a loss function.

For example, the parameters of the neural network 300 include parameters of the analysis network 310, parameters of the first sub-neural network 320, and parameters of the synthesis network 330. For example, the initial parameter of the neural network 300 may be a random number, e.g., the random number conforms to a gaussian distribution, to which embodiments of the present disclosure are not limited.

For example, the loss function of this embodiment may be referred to the loss function in the literature provided by Andrey Ignatov et al. For example, similar to the loss function in this document, the loss function may include a color loss function, a texture loss function, and a content loss function; accordingly, a specific procedure for calculating the loss value of the parameter of the neural network 300 by the loss function may also be referred to the description in this document. It should be noted that the embodiments of the present disclosure are not limited to the specific form of the loss function, that is, include, but are not limited to, the form of the loss function in the above documents.

Step S460: and correcting the parameters of the neural network according to the loss value.

For example, an optimization function (not shown in fig. 7C) may be further included in the training process of the neural network 300, and the optimization function may calculate an error value of the parameter of the neural network 300 according to the loss value calculated by the loss function, and correct the parameter of the neural network 300 according to the error value. For example, the optimization function may calculate the error value of the parameters of the neural network 300 using a random gradient descent (stochastic gradient descent, SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, or the like.

For example, the training method of the neural network may further include: judging whether the training of the neural network meets the preset condition, if not, repeating the training process (i.e. step S410 to step S460); and if the preset condition is met, stopping the training process to obtain the trained neural network. For example, in one example, the predetermined condition is that loss values corresponding to two (or more) consecutive training output images are no longer significantly reduced. For example, in another example, the predetermined condition is that the training number or training period of the neural network reaches a predetermined number. The present disclosure is not limited in this regard.

For example, the training output image output by the trained neural network 300 retains the content of the training input image, but the quality of the training output image may be close to the quality of the photograph taken by a real digital single mirror reflex camera, i.e., the training output image is a high quality image.

It should be noted that the above embodiments are only illustrative of the training process of the neural network. Those skilled in the art will appreciate that in the training phase, a large number of sample images are required to train the neural network; meanwhile, in the training process of each sample image, a plurality of repeated iterations can be included to correct the parameters of the neural network. As another example, the training phase may also include fine-tuning parameters of the neural network to obtain more optimal parameters.

The neural network training method provided by the embodiment of the disclosure can train the neural network adopted in the image processing method of the embodiment of the disclosure, the neural network trained by the training method can carry out image enhancement processing on low-quality input images, and the quality of output images can be greatly improved by repeatedly sampling on a plurality of scales to obtain higher image fidelity, so that the neural network training method is suitable for off-line application such as batch processing with higher image quality requirements.

At least one embodiment of the present disclosure also provides an image processing apparatus. Fig. 8 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure. For example, as shown in fig. 8, the image processing apparatus 500 includes a memory 510 and a processor 520. For example, the memory 510 is used to non-transitory store computer readable instructions that the processor 520 is configured to execute, when executed by the processor 520, the image processing method provided by embodiments of the present disclosure.

For example, the memory 510 and the processor 520 may communicate with each other directly or indirectly. For example, the components of memory 510 and processor 520 may communicate over a network connection. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The network may include a local area network, the internet, a telecommunications network, an internet of things (Internet of Things) based on the internet and/or telecommunications network, any combination of the above, and/or the like. The wired network may use twisted pair, coaxial cable or optical fiber transmission, and the wireless network may use 3G/4G/5G mobile communication network, bluetooth, zigbee or WiFi, for example. The present disclosure is not limited herein with respect to the type and functionality of the network.

For example, the processor 520 may control other components in the image processing apparatus to perform desired functions. Processor 520 may be a Central Processing Unit (CPU), tensor Processor (TPU), or graphics processor GPU, among other devices having data processing and/or program execution capabilities. The Central Processing Unit (CPU) can be an X86 or ARM architecture, etc. The GPU may be integrated directly onto the motherboard alone or built into the north bridge chip of the motherboard. The GPU may also be built-in on a Central Processing Unit (CPU).

For example, memory 510 may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like.

For example, one or more computer instructions may be stored on memory 510 that may be executed by processor 520 to perform various functions. Various applications and various data, such as training input images, and various data used and/or generated by the applications, may also be stored in the computer readable storage medium.

For example, some of the computer instructions stored by memory 510, when executed by processor 520, may perform one or more steps in accordance with the image processing methods described above. As another example, further computer instructions stored by memory 510 may, when executed by processor 520, perform one or more steps in a method of training a neural network, in accordance with the description above.

For example, the detailed description of the processing procedure of the image processing method may refer to the related description in the embodiment of the image processing method, and the detailed description of the processing procedure of the training method of the neural network may refer to the related description in the embodiment of the training method of the neural network, and the repetition is omitted.

It should be noted that, the image processing apparatus provided in the foregoing embodiments of the present disclosure is exemplary and not limited, and the image processing apparatus may further include other conventional components or structures according to practical application requirements, for example, to implement the necessary functions of the image processing apparatus, those skilled in the art may set other conventional components or structures according to specific application scenarios, and the embodiments of the present disclosure are not limited thereto.

The technical effects of the image processing apparatus provided in the foregoing embodiments of the present disclosure may refer to corresponding descriptions of the image processing method and the training method of the neural network in the foregoing embodiments, which are not repeated herein.

At least one embodiment of the present disclosure also provides a storage medium. Fig. 9 is a schematic diagram of a storage medium according to an embodiment of the disclosure. For example, as shown in fig. 9, the storage medium 600 non-transitory stores computer readable instructions 601, which when the non-transitory computer readable instructions 601 are executed by a computer (including a processor) can execute instructions of an image processing method provided by any of the embodiments of the present disclosure.

For example, one or more computer instructions may be stored on storage medium 600. Some of the computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the image processing method described above. Further computer instructions stored on the storage medium may be instructions for implementing one or more steps in the training method of the neural network described above, for example.

For example, the storage medium may include a storage component of a tablet computer, a hard disk of a personal computer, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), compact disc read only memory (CD-ROM), flash memory, or any combination of the foregoing storage media, as well as other suitable storage media.

Technical effects of the storage medium provided by the embodiments of the present disclosure may refer to corresponding descriptions of the image processing method and the training method of the neural network in the above embodiments, which are not described herein again.

For the purposes of this disclosure, the following points are to be described:

(1) In the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are referred to, and other structures may refer to the general design.

(2) Features of the same and different embodiments of the disclosure may be combined with each other without conflict.

The foregoing is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it should be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

receiving a first feature image; and

performing at least one multi-scale cyclic sampling process on the first characteristic image;

wherein the multi-scale cyclic sampling process comprises a nested first-level sampling process and a second-level sampling process,

The first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual link adding processing, wherein the first downsampling processing is performed on the basis of input of the first-level sampling processing to obtain first downsampling output, the first upsampling processing is performed on the basis of the first downsampling output to obtain first upsampling output, the first residual link adding processing is performed on the basis of the input of the first-level sampling processing and the first upsampling output to obtain first residual link adding, and then a result of the first residual link adding is used as output of the first-level sampling processing;

the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input to the second-level sampling process, and provides an output of the second-level sampling process as an input to the first upsampling process, such that the first upsampling process performs an upsampling process based on the first downsampled output;

the second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link addition process, wherein the second downsampling process performs downsampling process based on an input of the second-level sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, and the second residual link addition process performs second residual link addition on the input of the second-level sampling process and the second upsampled output, and then takes a result of the second residual link addition as an output of the second-level sampling process.

2. The image processing method according to claim 1, wherein a size of an output of the first upsampling process is the same as a size of an input of the first downsampling process;

the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

3. The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling process further comprises a third-level sampling process,

the third level sampling process is nested between the second downsampling process and the second upsampling process, receives the second downsampled output as an input to the third level sampling process, and provides an output of the third level sampling process as an input to the second upsampling process, such that the second upsampling process performs an upsampling process based on the second downsampled output;

the third-level sampling process includes a third downsampling process, a third upsampling process, and a third residual link addition process, wherein the third downsampling process performs downsampling process based on an input of the third-level sampling process to obtain a third downsampled output, the third upsampling process performs upsampling process based on the third downsampled output to obtain a third upsampled output, and the third residual link addition process performs third residual link addition on the input of the third-level sampling process and the third upsampled output, and then uses a result of the third residual link addition as an output of the third-level sampling process.

4. The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling process includes the second-level sampling process being sequentially performed a plurality of times,

the first time the second-level sampling process receives the first downsampled output as an input to the first time the second-level sampling process,

each of the second-level sampling processes except the first of the second-level sampling processes receives the output of the previous second-level sampling process as an input to the current second-level sampling process,

the output of the last second-level sampling process is taken as the input of the first up-sampling process.

5. The image processing method according to claim 1 or 2, wherein the at least one multi-scale cyclic sampling process includes the multi-scale cyclic sampling process performed sequentially a plurality of times,

each time the input of the multi-scale cyclic sampling process is used as the input of the first-level sampling process in the multi-scale cyclic sampling process, and each time the output of the first-level sampling process in the multi-scale cyclic sampling process is used as the output of the multi-scale cyclic sampling process;

The first multi-scale cyclic sampling process receives the first feature image as an input to the first multi-scale cyclic sampling process,

each of the multi-scale cyclic sampling processes except the first of the multi-scale cyclic sampling processes receives as input to the current multi-scale cyclic sampling process the output of the previous multi-scale cyclic sampling process,

the output of the last multi-scale cyclic sampling process is taken as the output of the at least one multi-scale cyclic sampling process.

6. The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling process further includes:

after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process, an instance normalization process or a layer normalization process is performed on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively.

7. The image processing method according to claim 1 or 2, further comprising: performing the multi-scale cyclic sampling process using a first convolutional neural network;

wherein the first convolutional neural network comprises:

A first meta-network for performing the first-level sampling process;

and the second binary network is used for executing the second-level sampling processing.

8. The image processing method according to claim 7, wherein,

the first meta-network comprises:

a first subnetwork for performing the first downsampling process;

a second sub-network for performing the first upsampling process;

the second element network includes:

a third sub-network for performing the second downsampling process;

and a fourth sub-network for performing the second upsampling process.

9. The image processing method of claim 8, wherein each of the first, second, third, and fourth sub-networks comprises one of a convolutional layer, a residual network, and a dense network.

10. The image processing method of claim 9, wherein each of the first, second, third and fourth sub-networks includes an instance normalization layer or layer normalization layer,

the instance normalization layer is used for executing instance normalization processing, and the layer normalization layer is used for executing layer normalization processing.

11. The image processing method according to claim 1 or 2, further comprising:

acquiring an input image;

converting an input image into the first feature image using an analysis network; and:

the output of the at least one multi-scale cyclic sampling process is converted to an output image using a synthesis network.

12. A training method of a neural network, wherein,

the neural network includes: an analysis network, a first sub-neural network and a synthesis network,

the analysis network processes an input image to obtain a first characteristic image, the first sub-neural network performs at least one multi-scale cyclic sampling processing on the first characteristic image to obtain a second characteristic image, and the synthesis network processes the second characteristic image to obtain an output image;

the training method comprises the following steps:

acquiring a training input image;

processing the training input image using the analysis network to provide a first training feature image;

performing the at least one multi-scale cyclic sampling process on the first training feature image by using the first sub-neural network to obtain a second training feature image;

processing the second training feature image using the synthesis network to obtain a training output image;

Calculating a loss value of the neural network through a loss function based on the training output image; and

correcting parameters of the neural network according to the loss value;

the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual link adding processing which are sequentially executed, wherein the first downsampling processing is performed on the basis of input of the first-level sampling processing to obtain first downsampling output, the first upsampling processing is performed on the basis of the first downsampling output to obtain first upsampling output, the first residual link adding processing is performed on the input of the first-level sampling processing and the first upsampling output to obtain first residual link adding, and then a result of the first residual link adding is used as output of the first-level sampling processing;

The second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link addition process that are sequentially performed, where the second downsampling process performs downsampling process based on an input of the second-level sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, and the second residual link addition process performs second residual link addition on the input of the second-level sampling process and the second upsampled output, and then uses a result of the second residual link addition as an output of the second-level sampling process.

13. The training method of claim 12, wherein the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process;

14. An image processing apparatus comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer readable instructions, which when executed by the processor perform the image processing method according to any of claims 1-11.

15. A storage medium non-transitory storing computer readable instructions which, when executed by a computer, can perform the instructions of the image processing method according to any one of claims 1-11.