CN111724309A

CN111724309A - Image processing method and device, neural network training method and storage medium

Info

Publication number: CN111724309A
Application number: CN201910209662.2A
Authority: CN
Inventors: 刘瀚文; 张丽杰; 朱丹; 那彦波
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-09-29
Anticipated expiration: 2039-03-19
Also published as: CN111724309B; WO2020187029A1

Abstract

An image processing method, an image processing apparatus, a neural network training method, and a storage medium. The image processing method comprises the following steps: receiving a first characteristic image; and carrying out multi-scale cyclic sampling processing on the first characteristic image at least once; the multi-scale cyclic sampling processing comprises nested first-level sampling processing and second-level sampling processing, the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual linking and adding processing which are sequentially executed, the second-level sampling processing is nested between the first downsampling processing and the first upsampling processing, and the second-level sampling processing comprises second downsampling processing, second upsampling processing and second residual linking and adding processing which are sequentially executed. The image processing method can perform image enhancement processing on low-quality input images, and can greatly improve the quality of output images by repeatedly sampling on multiple scales to obtain higher image fidelity.

Description

Image processing method and device, neural network training method and storage medium

Technical Field

Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, a training method of a neural network, and a storage medium.

Background

Currently, deep learning techniques based on artificial neural networks have made tremendous progress in areas such as image classification, image capture and search, face recognition, age, and speech recognition. The advantage of deep learning is that a generic structure can be used to solve very different technical problems with relatively similar systems. Convolutional Neural Network (CNN) is an artificial neural network that has been developed in recent years and attracted much attention, and CNN is a special image recognition method, and belongs to a very effective network with forward feedback. At present, the application range of CNN is not limited to the field of image recognition, but can also be applied to the application directions of face recognition, character recognition, image processing, and the like.

Disclosure of Invention

At least one embodiment of the present disclosure provides an image processing method, including: receiving a first characteristic image; and carrying out multi-scale cyclic sampling processing on the first characteristic image at least once;

the multi-scale cyclic sampling processing comprises nested first-level sampling processing and second-level sampling processing, wherein the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual linking and adding processing, the first downsampling processing is carried out based on input of the first-level sampling processing to obtain first downsampled output, the first upsampling processing is carried out based on the first downsampled output to obtain first upsampled output, the first residual linking and adding processing is carried out on the input of the first-level sampling processing and the first upsampled output to obtain first residual linking and adding, and then the result of the first residual linking and adding is used as the output of the first-level sampling processing; the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input of the second-level sampling process, and provides an output of the second-level sampling process as an input of the first upsampling process, so that the first upsampling process performs upsampling processing based on the first downsampled output; the second hierarchical sampling process includes a second downsampling process, a second upsampling process, and a second residual linking and adding process, wherein the second downsampling process performs downsampling process based on an input of the second hierarchical sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, the second residual linking and adding process performs second residual linking and adding on the input of the second hierarchical sampling process and the second upsampled output, and then uses a result of the second residual linking and adding as an output of the second hierarchical sampling process.

For example, in an image processing method provided by an embodiment of the present disclosure, a size of an output of the first upsampling process is the same as a size of an input of the first downsampling process; the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

For example, in the image processing method provided by an embodiment of the present disclosure, the multi-scale cyclic sampling process further includes a third-level sampling process, which is nested between the second downsampling process and the second upsampling process, receives the second downsampled output as an input of the third-level sampling process, and provides an output of the third-level sampling process as an input of the second upsampling process, so that the second upsampling process performs upsampling processing based on the second downsampled output; the third level sampling processing comprises third down-sampling processing, third up-sampling processing and third residual linking and adding processing, wherein the third down-sampling processing is performed with down-sampling processing based on the input of the third level sampling processing to obtain a third down-sampling output, the third up-sampling processing is performed with up-sampling processing based on the third down-sampling output to obtain a third up-sampling output, the third residual linking and adding processing is performed with third residual linking and adding on the input of the third level sampling processing and the third up-sampling output, and then the result of the third residual linking and adding is used as the output of the third level sampling processing.

For example, in the image processing method provided by an embodiment of the present disclosure, the multi-scale cyclic sampling process includes the second-level sampling process performed sequentially a plurality of times, the first-time second-level sampling process receives the first down-sampled output as an input of the first-time second-level sampling process, each time the second-level sampling process except the first-time second-level sampling process receives an output of the previous second-level sampling process as an input of the current second-level sampling process, and an output of the last second-level sampling process is an input of the first up-sampling process.

For example, in the image processing method provided by an embodiment of the present disclosure, the at least one multi-scale cyclic sampling process includes multiple times of the multi-scale cyclic sampling processes that are sequentially executed, an input of each time of the multi-scale cyclic sampling process is used as an input of the first level sampling process in the current multi-scale cyclic sampling process, and an output of the first level sampling process in each time of the multi-scale cyclic sampling process is used as an output of the current multi-scale cyclic sampling process; and the first multi-scale cyclic sampling processing receives the first characteristic image as the input of the first multi-scale cyclic sampling processing, each time of the multi-scale cyclic sampling processing except the first multi-scale cyclic sampling processing receives the output of the previous multi-scale cyclic sampling processing as the input of the multi-scale cyclic sampling processing, and the output of the last multi-scale cyclic sampling processing is used as the output of the at least one multi-scale cyclic sampling processing.

For example, in an image processing method provided in an embodiment of the present disclosure, the multi-scale cyclic sampling process further includes: performing instance normalization or layer normalization on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively, after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process.

For example, an image processing method provided by an embodiment of the present disclosure further includes: performing the multi-scale cyclic sampling process using a first convolutional neural network; wherein the first convolutional neural network comprises: a first metanetwork for performing the first hierarchical sampling process; a second element network for performing the second level sampling process.

For example, in an image processing method provided in an embodiment of the present disclosure, the first meta-network includes: a first sub-network for performing the first down-sampling process; a second sub-network for performing the first upsampling process; the second network element comprises: a third sub-network for performing the second downsampling process; a fourth sub-network for performing the second upsampling process.

For example, in an image processing method provided by an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, and a dense network.

For example, in an image processing method provided by an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an instance normalization layer for performing an instance normalization process or a layer normalization layer for performing a layer normalization process.

For example, an image processing method provided by an embodiment of the present disclosure further includes: acquiring an input image; converting an input image into the first feature image using an analysis network; and converting the output of the at least one multi-scale cyclic sampling process into an output image using a synthesis network.

At least one embodiment of the present disclosure further provides a training method of a neural network, wherein the neural network includes: the image processing method comprises the steps that an analysis network, a first sub-neural network and a synthesis network are used, the analysis network processes an input image to obtain a first characteristic image, the first sub-neural network performs at least one multi-scale cyclic sampling processing on the first characteristic image to obtain a second characteristic image, and the synthesis network processes the second characteristic image to obtain an output image;

the training method comprises the following steps: acquiring a training input image; processing the training input image using the analysis network to provide a first training feature image; performing the at least one multi-scale cyclic sampling processing on the first training feature image by using the first sub-neural network to obtain a second training feature image; processing the second training feature image using the synthesis network to obtain a training output image; calculating a loss value of the neural network through a loss function based on the training output image; correcting parameters of the neural network according to the loss value;

the multi-scale cyclic sampling processing comprises nested first-level sampling processing and second-level sampling processing, wherein the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual linking and adding processing which are sequentially executed, the first downsampling processing is carried out based on input of the first-level sampling processing to obtain first downsampled output, the first upsampling processing is carried out based on the first downsampled output to obtain first upsampled output, the first residual linking and adding processing is carried out to carry out first residual linking and adding on the input of the first-level sampling processing and the first upsampled output, and then a result of the first residual linking and adding is used as output of the first-level sampling processing; the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input of the second-level sampling process, and provides an output of the second-level sampling process as an input of the first upsampling process, so that the first upsampling process performs upsampling processing based on the first downsampled output; the second-level sampling processing includes second down-sampling processing, second up-sampling processing, and second residual linking addition processing that are sequentially executed, wherein the second down-sampling processing performs down-sampling processing based on an input of the second-level sampling processing to obtain a second down-sampling output, the second up-sampling processing performs up-sampling processing based on the second down-sampling output to obtain a second up-sampling output, the second residual linking addition processing performs the second residual linking addition with the input of the second-level sampling processing and the second up-sampling output, and then a result of the second residual linking addition is taken as an output of the second-level sampling processing.

For example, in a training method provided by an embodiment of the present disclosure, the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process; the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

For example, in a training method provided in an embodiment of the present disclosure, the first sub-neural network includes: a first metanetwork for performing the first hierarchical sampling process; a second element network for performing the second level sampling process.

For example, in a training method provided in an embodiment of the present disclosure, the first meta-network includes: a first sub-network for performing the first down-sampling process; a second sub-network for performing the first upsampling process; the second network element comprises: a third sub-network for performing the second downsampling process; a fourth sub-network for performing the second upsampling process.

For example, in a training method provided by an embodiment of the present disclosure, each of the first, second, third, and fourth sub-networks includes one of a convolutional layer, a residual network, and a dense network.

For example, in the training method provided by an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an example normalization layer or a layer normalization layer, the example normalization layer is configured to perform example normalization on the first down-sampled output, the first up-sampled output, the second down-sampled output, and the second up-sampled output, respectively, and the layer normalization layer is configured to perform layer normalization on the first down-sampled output, the first up-sampled output, the second down-sampled output, and the second up-sampled output, respectively.

At least one embodiment of the present disclosure also provides an image processing apparatus including: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer readable instructions, wherein the computer readable instructions, when executed by the processor, perform the image processing method provided by any embodiment of the disclosure.

At least one embodiment of the present disclosure also provides a storage medium that non-transitory stores computer readable instructions that, when executed by a computer, can execute instructions of the image processing method provided by any one of the embodiments of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

FIG. 1 is a schematic diagram of a convolutional neural network;

FIG. 2A is a schematic diagram of a convolutional neural network;

FIG. 2B is a schematic diagram of the operation of a convolutional neural network;

fig. 3 is a flowchart of an image processing method according to an embodiment of the disclosure;

fig. 4A is a schematic flow chart diagram corresponding to the multi-scale circular sampling process in the image processing method shown in fig. 3 according to an embodiment of the present disclosure;

fig. 4B is a schematic flow chart diagram corresponding to the multi-scale circular sampling process in the image processing method shown in fig. 3 according to another embodiment of the disclosure;

FIG. 4C is a schematic flow chart diagram of a multi-scale circular sampling process corresponding to the image processing method shown in FIG. 3 according to still another embodiment of the disclosure;

FIG. 4D is a schematic flow chart diagram of a multi-scale circular sampling process corresponding to the image processing method shown in FIG. 3 according to another embodiment of the disclosure;

fig. 5 is a flowchart of an image processing method according to another embodiment of the disclosure;

FIG. 6A is a diagram of an input image;

fig. 6B is a schematic diagram of an output image obtained by processing the input image shown in fig. 6A according to an image processing method provided in an embodiment of the present disclosure;

fig. 7A is a schematic structural diagram of a neural network according to an embodiment of the present disclosure;

fig. 7B is a flowchart of a training method of a neural network according to an embodiment of the present disclosure;

FIG. 7C is a block diagram illustrating an exemplary architecture for training the neural network shown in FIG. 7A corresponding to the training method shown in FIG. 7B according to an embodiment of the present disclosure;

fig. 8 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

fig. 9 is a schematic diagram of a storage medium according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

The present disclosure is illustrated by the following specific examples. Detailed descriptions of known functions and known components may be omitted in order to keep the following description of the embodiments of the present disclosure clear and concise. When any component of an embodiment of the present disclosure appears in more than one drawing, that component is represented by the same or similar reference numeral in each drawing.

Image enhancement is one of the research hotspots in the field of image processing. Due to the limitations of various physical factors (for example, the image sensor size of the mobile phone camera is too small and the limitations of other software and hardware) and the interference of environmental noise in the image acquisition process, the image quality is greatly reduced. The purpose of image enhancement is to improve the gray level histogram of an image and improve the contrast of the image through an image enhancement technology, so that the detail information of the image is highlighted and the visual effect of the image is improved.

The image enhancement by using the deep neural network is a technology emerging along with the development of a deep learning technology. For example, based on a convolutional neural network, a low-quality photograph (input image) taken by a mobile phone may be processed to obtain a high-quality output image, which may be close to the quality of a photograph taken by a Digital Single lens reflex Camera (often abbreviated as DSLR, also abbreviated as Digital Single lens reflex Camera). For example, Peak Signal to Noise Ratio (PSNR) index is commonly used to represent image quality, wherein a higher PSNR value indicates that an image is closer to a photo taken by a real digital single-lens reflex camera.

For example, Andrey Ignatov et al have proposed a method for convolutional neural networks to achieve image enhancement, see the literature, Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc VanGool, DSLR-Quality Photos on Mobile Devices with Deep Convolitional networks.arXiv. 1704.02470v2[ cs.CV ], 2017, 9.5.d. This document is hereby incorporated by reference in its entirety as part of the present application. The method mainly utilizes the convolution layer, the batch normalization layer and the residual connection to construct a single-scale convolution neural network, and the network can be utilized to process the input low-quality image (such as low contrast, underexposed or overexposed image, too dark or too bright whole image and the like) into a high-quality image. By using color loss, texture loss and content loss as loss functions in training, a better processing effect can be obtained.

At least one embodiment of the present disclosure provides an image processing method, an image processing apparatus, a training method of a neural network, and a storage medium. The image processing method provides a multi-scale cyclic sampling method based on a convolutional neural network, and the method can greatly improve the quality of an output image by repeatedly sampling on multiple scales to obtain higher image fidelity, and is suitable for offline application such as batch processing with higher requirements on image quality.

Originally, Convolutional Neural Networks (CNNs) were primarily used to identify two-dimensional shapes that were highly invariant to translation, scaling, tilting, or other forms of deformation of images. CNN simplifies the complexity of neural network models and reduces the number of weights mainly by local perceptual field and weight sharing. With the development of deep learning technology, the application range of CNN has not only been limited to the field of image recognition, but also can be applied to the fields of face recognition, character recognition, animal classification, image processing, and the like.

Fig. 1 shows a schematic diagram of a convolutional neural network. For example, the convolutional neural network may be used for image processing, which uses images as input and output and replaces scalar weights by convolutional kernels. Only a convolutional neural network having a 3-layer structure is illustrated in fig. 1, and embodiments of the present disclosure are not limited thereto. As shown in fig. 1, the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103. The input layer 101 has 4 inputs, the hidden layer 102 has 3 outputs, the output layer 103 has 2 outputs, and finally the convolutional neural network finally outputs 2 images.

For example, the 4 inputs to the input layer 101 may be 4 images, or four feature images of 1 image. The 3 outputs of the hidden layer 102 may be feature images of the image input via the input layer 101.

For example, as shown in FIG. 1, the convolutional layers have weights

And bias

Weight of

Representing convolution kernels, offsets

Is a scalar superimposed on the output of the convolutional layer, where k is a label representing the input layer 101 and i and j are labels of the elements of the input layer 101 and the elements of the hidden layer 102, respectively. For example, the first convolution layer 201 includes a first set of convolution kernels (of FIG. 1)

) And a first set of offsets (of FIG. 1

). The second convolutional layer 202 includes a second set of convolutional kernels (of FIG. 1)

) And a second set of offsets (of FIG. 1

). Typically, each convolutional layer comprises tens or hundreds of convolutional kernels, which may comprise at least five convolutional layers if the convolutional neural network is a deep convolutional neural network.

For example, as shown in fig. 1, the convolutional neural network further includes a first activation layer 203 and a second activation layer 204. A first active layer 203 is located behind the first convolutional layer 201, and a second active layer 204 is located behind the second convolutional layer 202. The activation layers (e.g., the first activation layer 203 and the second activation layer 204) include activation functions that are used to introduce non-linear factors into the convolutional neural network so that the convolutional neural network can better solve more complex problems. The activation function may include a linear modification unit (ReLU) function, a Sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function), etc. The ReLU function is a non-saturated non-linear function, and the Sigmoid function and the tanh function are saturated non-linear functions. For example, the activation layer may be solely a layer of the convolutional neural network, or the activation layer may be included in a convolutional layer (e.g., the first convolutional layer 201 may include the first activation layer 203, and the second convolutional layer 202 may include the second activation layer 204).

For example, in the first convolution layer 201, first, a number of convolution kernels of the first set of convolution kernels are applied to each input

And a number of biases of the first set of biases

To obtain the output of the first convolution layer 201; the output of first buildup layer 201 can then be processed through first active layer 203 to obtain the output of first active layer 203. In the second convolutional layer 202, first, several convolutional kernels of the second set of convolutional kernels are applied to the output of the first active layer 203 which is input

And a number of biases of the second set of biases

To obtain the output of the second convolutional layer 202; the output of second convolutional layer 202 may then be processed by second active layer 204 to obtain the output of second active layer 204. For example, the output of the first convolution layer 201 may be the application of a convolution kernel to its input

Then is offset with

As a result of the addition, the output of the second convolutional layer 202 may apply a convolutional kernel to the output of the first active layer 203

Then is offset with

The result of the addition.

Before image processing is performed by using the convolutional neural network, the convolutional neural network needs to be trained. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted through a plurality of groups of input/output example images and an optimization algorithm to obtain an optimized convolution neural network model.

Fig. 2A shows a schematic structural diagram of a convolutional neural network, and fig. 2B shows a schematic operational process diagram of a convolutional neural network. For example, as shown in fig. 2A and 2B, after the input image is input to the convolutional neural network through the input layer, the class identifier is output after several processing procedures (e.g., each level in fig. 2A) are performed in sequence. The main components of the convolutional neural network may include a plurality of convolutional layers, a plurality of downsampling layers, a full-connection layer, and the like. In the present disclosure, it should be understood that a plurality of convolutional layers, a plurality of downsampling layers, and a fully-connected layer, etc. each refer to a corresponding processing operation, i.e., convolution processing, downsampling processing, fully-connected processing, etc., and the neural network described also refers to a corresponding processing operation, and similarly, an example normalization layer or a layer normalization layer, etc. to be described below are also described, and a description thereof is not repeated here. For example, a complete convolutional neural network may be composed of a stack of these three layers. For example, fig. 2A shows only three levels of a convolutional neural network, namely a first level, a second level, and a third level. For example, each tier may include a convolution module and a downsampling layer. For example, each convolution module may include a convolution layer. Thus, the processing procedure of each hierarchy may include: the input image is convolved (convolution) and downsampled (sub-sampling/down-sampling). For example, each convolution module may further include an instance normalization layer or a layer normalization layer according to actual needs, so that the processing procedure of each layer may further include an instance normalization process or a layer normalization process.

For example, the example normalization layer is used for performing example normalization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image is changed within a predetermined range, thereby simplifying the image generation process and improving the image enhancement effect. For example, the predetermined range may be [ -1, 1 ]. And the example standardization layer performs example standardization processing on each characteristic image according to the mean value and the variance of the characteristic image. For example, the instance normalization layer can also be used to perform an instance normalization process on a single image.

For example, assuming that the size of the mini-batch gradient descent method (mini-batch gradient) is T, the number of feature images output by a certain convolution layer is C, and each feature image is a matrix of H rows and W columns, the model of the feature image is represented as (T, C, H, W). Thus, an example normalization formula for an example normalization layer can be expressed as follows:

wherein x is_tijkThe values of the t-th feature block (patch), the i-th feature image, the j-th row, and the k-th column in the feature image set output for the convolutional layer. y is_tijkRepresents processing x through an example normalization layer_tijkThe result obtained is then. e.g. of the type₁Is a small integer to avoid a denominator of 0.

For example, the layer normalization layer is also used to perform layer normalization processing on the feature image output by the convolutional layer, similar to the example normalization layer, so that the gray-scale values of the pixels of the feature image are changed within a predetermined range, thereby simplifying the image generation process and improving the image enhancement effect. For example, the predetermined range may be [ -1, 1 ]. Unlike the example normalization layer, the layer normalization layer performs layer normalization on each column of each feature image according to the mean and variance of each column of the feature image, thereby implementing layer normalization on the feature image. For example, the layer normalization layer may also be used to perform a layer normalization process on a single image.

For example, still taking the above-mentioned mini-batch gradient decreasing method (mini-batch gradient determination) as an example, the model of the feature image is represented as (T, C, H, W). Thus, the layer normalization formula for the layer normalization layer can be expressed as follows:

wherein x is_tijkThe values of the t-th feature block (patch), the i-th feature image, the j-th row, and the k-th column in the feature image set output for the convolutional layer. y'_tijkRepresenting the processing x through the layer normalization layer_tijkThe result obtained is then. e.g. of the type₂Is a small integer to avoid a denominator of 0.

Convolutional layers are the core layers of convolutional neural networks. In the convolutional layer of the convolutional neural network, one neuron is connected with only part of the neurons of the adjacent layer. The convolutional layer may apply several convolutional kernels (also called filters) to the input image to extract various types of features of the input image. Each convolution kernel may extract one type of feature. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel can be learned to obtain a reasonable weight in the training process of the convolutional neural network. The result obtained after applying a convolution kernel to the input image is called a feature image (feature map), and the number of feature images is equal to the number of convolution kernels. Each characteristic image is composed of a plurality of neurons arranged in a rectangular shape, and the neurons of the same characteristic image share a weight value, wherein the shared weight value is a convolution kernel. The feature images output by a convolutional layer of one level may be input to an adjacent convolutional layer of the next level and processed again to obtain new feature images. For example, as shown in fig. 2A, a first-level convolutional layer may output a first-level feature image, which is input to a second-level convolutional layer to be processed again to obtain a second-level feature image.

For example, as shown in fig. 2B, the convolutional layer may use different convolutional cores to convolve the data of a certain local perceptual domain of the input image, and the convolution result is input to the active layer, which performs calculation according to the corresponding activation function to obtain the feature information of the input image.

For example, as shown in fig. 2A and 2B, a downsampled layer is disposed between adjacent convolutional layers, which is one form of downsampling. On one hand, the down-sampling layer can be used for reducing the scale of an input image, simplifying the complexity of calculation and reducing the phenomenon of overfitting to a certain extent; on the other hand, the downsampling layer may perform feature compression to extract main features of the input image. The downsampling layer can reduce the size of the feature images without changing the number of feature images. For example, an input image of size 12 × 12, which is sampled by a convolution kernel of 6 × 6, then a 2 × 2 output image can be obtained, which means that 36 pixels on the input image are combined to 1 pixel in the output image. The last downsampled or convolutional layer may be connected to one or more fully-connected layers that are used to connect all the extracted features. The output of the fully connected layer is a one-dimensional matrix, i.e., a vector.

Some embodiments of the present disclosure and examples thereof are described in detail below with reference to the accompanying drawings.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present disclosure. For example, as shown in fig. 3, the image processing method includes:

step S110: receiving a first characteristic image;

step S120: and carrying out multi-scale cyclic sampling processing on the first characteristic image at least once.

For example, in step S110, the first feature image may include a feature image obtained by processing the input image by one of a convolutional layer, a residual error network, a dense network, and the like (for example, refer to fig. 2B). For example, a residual network holds its input in proportion to its output by means of, for example, a residual concatenation addition. For example, a dense network includes a bottleneck layer (bottle layer) and a convolutional layer, e.g., in some examples, the bottleneck layer is used to reduce the dimensions of the data to reduce the number of parameters in subsequent convolution operations, e.g., the convolution kernel of the bottleneck layer is a 1x1 convolution kernel, e.g., the convolution kernel of the convolutional layer is a 3x3 convolution kernel; the present disclosure includes but is not limited thereto. For example, the input image is subjected to convolution, down-sampling, or the like to obtain a first feature image. It should be noted that the embodiment does not limit the manner of acquiring the first characteristic image. For example, the first feature image may include a plurality of feature images, but is not limited thereto.

For example, the first feature image received in step S110 is input to the multi-scale cyclic sampling process in step S120. For example, the multi-scale cyclic sampling process may take a variety of forms, including but not limited to the three forms shown in FIGS. 4A-4C, which will be described below.

Fig. 4A is a schematic flow chart diagram corresponding to the multi-scale circular sampling process in the image processing method shown in fig. 3 according to an embodiment of the present disclosure. As shown in fig. 4A, the multi-scale circular sampling process includes a nested first-level sampling process and a second-level sampling process.

For example, as shown in fig. 4A, the input of the multi-scale cyclic sampling process is input to the first hierarchical sampling process, and the output of the first hierarchical sampling process is output to the multi-scale cyclic sampling process. For example, the output of the multi-scale cyclic sampling process is referred to as the second feature image, e.g., the size of the second feature image (the number of rows and columns of the pixel array) may be the same as the size of the first feature image.

For example, as shown in fig. 4A, the first hierarchical sampling process includes a first down-sampling process, a first up-sampling process, and a first residual link addition process, which are sequentially performed. The first downsampling process downsamples the first downsampled output based on the input of the first hierarchical sampling process, e.g., the first downsampling process may directly downsample the input of the first hierarchical sampling process to obtain the first downsampled output. The first upsampling process performs upsampling on the first downsampled output to obtain a first upsampled output, for example, after the first downsampled output is subjected to the second-level sampling process, the upsampling process is performed to obtain the first upsampled output, that is, the first upsampling process may indirectly perform upsampling on the first downsampled output. The first residual link addition process performs first residual link addition on an input of the first hierarchical sampling process and a first up-sampled output, and then takes a result of the first residual link addition as an output of the first hierarchical sampling process. For example, the size of the output of the first upsampling process (i.e., the first upsampling output) is the same as the size of the input of the first hierarchical sampling process (i.e., the input of the first downsampling process), so that after the first residual chaining addition, the size of the output of the first hierarchical sampling process is the same as the size of the input of the first hierarchical sampling process.

For example, as shown in fig. 4A, the second-level sampling process is nested between a first downsampling process and a first upsampling process of the first-level sampling process, receives a first downsampled output as an input of the second-level sampling process, and provides an output of the second-level sampling process as an input of the first upsampling process, such that the first upsampling process performs upsampling based on the first downsampled output.

For example, as shown in fig. 4A, the second-level sampling process includes a second downsampling process, a second upsampling process, and a second residual link-addition process, which are sequentially performed. The second downsampling process performs downsampling processing based on the input of the second-level sampling process to obtain a second downsampled output, for example, the second downsampling process may directly downsample the input of the second-level sampling process to obtain the second downsampled output. The second upsampling process upsamples the second downsampled output based on the second upsampled output to obtain a second upsampled output, e.g., the second upsampling process may directly upsample the second downsampled output to obtain the second upsampled output. The second residual link addition process performs second residual link addition of an input of the second-level sampling process and a second up-sampled output, and then takes a result of the second residual link addition as an output of the second-level sampling process. For example, the size of the output of the second upsampling process (i.e., the second upsampling output) is the same as the size of the input of the second level sampling process (i.e., the input of the second downsampling process), so that the size of the output of the second level sampling process is the same as the size of the input of the second level sampling process after the second residual linking addition.

It should be noted that, in some embodiments (not limited to the present embodiment) of the present disclosure, the flow of sampling processing of each hierarchy (for example, the first hierarchy sampling processing, the second hierarchy sampling processing, and the third hierarchy sampling processing to be described in the embodiment shown in fig. 4B, etc.) is similar, and includes downsampling processing, upsampling processing, and residual linking and adding processing. In addition, taking the feature image as an example, the residual linking and adding process may include adding values of each row and each column of the matrix of the two feature images, but is not limited thereto.

In the present disclosure, "nested" means that one object includes another object similar to or the same as the object, and the object includes but is not limited to a flow or a network structure, etc.

It should be noted that, in some embodiments of the present disclosure, the size of the output of the upsampling process (e.g., the output of the upsampling process is a feature image) in the sampling process of each hierarchy is the same as the size of the input of the downsampling process (e.g., the input of the downsampling process is a feature image), so that after the residual linking addition, the size of the output of the sampling process of each hierarchy (e.g., the output of the sampling process of each hierarchy may be a feature image) is the same as the size of the input of the sampling process of each hierarchy (e.g., the input of the sampling process of each hierarchy may be a feature image).

It is noted that in some embodiments of the present disclosure, the multi-scale circular sampling process may be implemented by a convolutional neural network. For example, in some embodiments of the present disclosure, a first convolutional neural network may be used for multi-scale round robin sampling processing. For example, in some examples, a first convolutional neural network may include nested first and second component networks, the first component network to perform a first level of sampling processing and the second component network to perform a second level of sampling processing.

For example, in some examples, the first subnetwork may include a first subnetwork and a second subnetwork, the first subnetwork for performing the first downsampling process, the second subnetwork for performing the first upsampling process. The second subnetwork is nested between the first subnetwork and the third subnetwork of the first subnetwork. For example, in some examples, the second subnet may include a third subnet to perform the second downsampling process and a fourth subnet to perform the second upsampling process. For example, the first and second component networks are both similar in form to the aforementioned residual networks.

For example, in some examples, each of the first, second, third, and fourth subnetworks comprises one of a convolutional layer, a residual network, a dense network, and the like. Specifically, the first sub-network and the third sub-network may include a convolution layer (down-sampling layer) having a down-sampling function, and may also include one of a residual network, a dense network, and the like having a down-sampling function; the second sub-network and the fourth sub-network may comprise convolutional layers (upsampling layers) with upsampling functions, and may also comprise one of residual networks, dense networks, etc. with upsampling functions. It should be noted that the first sub-network and the third sub-network may have the same structure or different structures; the second sub-network and the fourth sub-network may have the same structure or may have different structures; embodiments of the present disclosure are not limited in this regard.

The down-sampling is used to reduce the size of the feature image and thus reduce the data amount of the feature image, and the down-sampling process may be performed by a down-sampling layer, for example, but is not limited thereto. For example, the downsampling layer may implement downsampling processing by using a downsampling method such as maximum value combination (max), average value combination (average), span convolution (stride convolution), downsampling (e.g., selecting fixed pixels), and demux output (demux, splitting an input image into a plurality of smaller images).

The upsampling is used to increase the size of the feature image, thereby increasing the data size of the feature image, and may be performed by an upsampling layer, for example, but not limited thereto. For example, the upsampling layer may implement upsampling processing by using an upsampling method such as span transposed convolution (stride transformed convolution), interpolation algorithm, and the like. The interpolation algorithm may include, for example, an interpolation value, bilinear interpolation, Bicubic interpolation (Bicubic interpolation), and the like.

It should be noted that, in some embodiments of the present disclosure, the down-sampling factor of the down-sampling process of the same level corresponds to the up-sampling factor of the up-sampling process, that is: when the down-sampling factor of the down-sampling process is 1/y, then the up-sampling factor of the up-sampling process is y, where y is a positive integer, and y is typically greater than 2. Thus, it is possible to ensure that the output of the up-sampling process and the input of the down-sampling process of the same level are the same in size.

It should be noted that, in some embodiments of the present disclosure (not limited to the present embodiment), parameters of downsampling processes at different levels (i.e., parameters of networks corresponding to the downsampling processes) may be the same or different; parameters of upsampling processing of different levels (i.e. parameters of networks corresponding to the upsampling processing) may be the same or different; the parameters for the residual concatenation addition at different levels may be the same or different. The present disclosure is not so limited.

For example, in some embodiments of the present disclosure (not limited to this embodiment), in order to improve global features such as brightness and contrast of the feature image, the multi-scale cyclic sampling process may further include: after the first down-sampling process, the first up-sampling process, the second down-sampling process, and the second up-sampling process, an example normalization process or a layer normalization process is performed on the first down-sampled output, the first up-sampled output, the second down-sampled output, and the second up-sampled output, respectively. It should be noted that the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output may use the same normalization processing method (example normalization processing or layer normalization processing), or may use different normalization processing methods, which is not limited in this disclosure.

Accordingly, the first, second, third and fourth sub-networks further comprise an instance normalization layer for performing an instance normalization process or a layer normalization layer for performing a layer normalization process, respectively. For example, the example normalization layer may perform example normalization according to the example normalization formula, and the layer normalization layer may perform layer normalization according to the layer normalization formula, which is not limited by the present disclosure. It should be noted that the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network may include the same normalization layer (example normalization layer or layer normalization layer), or may include different normalization layers, which is not limited by the disclosure.

Fig. 4B is a schematic flow chart diagram corresponding to the multi-scale circular sampling process in the image processing method shown in fig. 3 according to another embodiment of the present disclosure. As shown in fig. 4B, the multi-scale cyclic sampling process further includes a third hierarchical sampling process on the basis of the multi-scale cyclic sampling process shown in fig. 4A. It should be noted that other flows of the multi-scale cyclic sampling process shown in fig. 5 are substantially the same as the flows of the multi-scale cyclic sampling process shown in fig. 4A, and repeated parts are not described herein again.

For example, as shown in fig. 4B, the third-level sampling process is nested between the second downsampling process and the second upsampling process of the second-level sampling process, receives the second downsampled output as an input to the third-level sampling process, and provides an output of the third-level sampling process as an input to the second upsampling process, such that the second upsampling process performs an upsampling process based on the second downsampled output. In this case, similarly to the first upsampling process indirectly upsampling the first downsampled output, the second upsampling process indirectly upsamples the second downsampled output.

The third level sampling process includes a third down-sampling process, a third up-sampling process, and a third residual link addition process, which are sequentially performed. The third downsampling process performs a downsampling process based on the input of the third-level sampling process to obtain a third downsampled output, for example, the third downsampling process may directly perform downsampling on the input of the third-level sampling process to obtain the third downsampled output. The third upsampling process upsamples the third downsampled output based on the third upsampling process to obtain a third upsampled output, e.g., the third upsampling process may directly upsample the third downsampled output to obtain the third upsampled output. The third residual link addition process performs third residual link addition on the input of the third level sampling process and the third up-sampled output, and then takes the result of the third residual link addition as the output of the third level sampling process. For example, the size of the output of the third upsampling process (i.e., the third upsampling output) is the same as the size of the input of the third level sampling process (i.e., the input of the third downsampling process), so that after the third residual chaining addition, the size of the output of the third level sampling process is the same as the size of the input of the third level sampling process.

It should be noted that more details and implementation manners (i.e., network structures) of the third-level sampling process may refer to the descriptions about the first-level sampling process and the second-level sampling process in the embodiment shown in fig. 4A, and details of this disclosure are not repeated herein.

It should be noted that, based on the present embodiment, a person skilled in the art should understand that the multi-scale cyclic sampling process may further include more levels of sampling processes, for example, a fourth level sampling process nested in a third level sampling process, a fifth level sampling process nested in a fourth level sampling process, and the like, and the nesting manner of the sampling processes is similar to that of the second level sampling process and the third level sampling process described above, which is not limited by the present disclosure.

Fig. 4C is a schematic flow chart diagram of a multi-scale cyclic sampling process in the image processing method shown in fig. 3 according to still another embodiment of the present disclosure. As shown in fig. 4C, the multi-scale loop sampling process includes a second-level sampling process performed a plurality of times in sequence, on the basis of the multi-scale loop sampling process shown in fig. 4A. It should be noted that other flows of the multi-scale cyclic sampling process shown in fig. 5 are substantially the same as the flows of the multi-scale cyclic sampling process shown in fig. 4A, and repeated parts are not described herein again. It should be further noted that, the two-time second-level sampling process included in fig. 4C is exemplary, and in the embodiment of the present disclosure, the multi-scale cyclic sampling process may include two or more second-level sampling processes performed sequentially. It should be noted that, in the embodiment of the present disclosure, the number of times of the second-level sampling processing may be selected according to actual needs, and the present disclosure does not limit this. For example, in some examples, the inventors of the present application found that image enhancement processing performed using an image processing method having two second-level sampling processes is more effective than image processing method having one or three second-level sampling processes, but this should not be construed as a limitation of the present disclosure.

For example, the first-time second-level sampling process receives the first down-sampled output as an input of the first-time second-level sampling process, each second-level sampling process other than the first-time second-level sampling process receives an output of a previous second-level sampling process as an input of the present second-level sampling process, and an output of a last second-level sampling process as an input of the first up-sampling process.

It should be noted that, for more details and implementation of the second-level sampling processing each time, reference may be made to the description about the second-level sampling processing in the embodiment shown in fig. 4A, and details of this disclosure are not repeated here.

It should be noted that, in some embodiments of the present disclosure (not limited to the present embodiment), the parameters of the downsampling processes of the same level in different orders may be the same or different; parameters of the upsampling processing of the same level in different orders can be the same or different; the parameters for the residual concatenation addition of the same level in different orders may be the same or different. The present disclosure is not so limited.

It should be noted that, based on the present embodiment, a person skilled in the art should understand that, in the multi-scale cyclic sampling process, a first-level sampling process may nest a plurality of second-level sampling processes executed in sequence; further, at least part of the second-level sampling processes may be nested with one or more third-level sampling processes that are sequentially executed, and the number of the third-level sampling processes nested by the at least part of the second-level sampling processes may be the same or different; further, the third-level sampling process may be nested with the fourth-level sampling process, and the specific nesting manner may be the same as the nesting manner of the second-level sampling process with the third-level sampling process; and so on thereafter.

It should be noted that fig. 4A to 4C illustrate a case where the image processing method provided by the embodiment of the present disclosure includes one multi-scale cyclic sampling process. In the image processing method provided by the embodiment shown in fig. 4A-4C, the at least one multi-scale cyclic sampling process includes one multi-scale cyclic sampling process. The multi-scale cyclic sampling process receives the first characteristic image as an input of the multi-scale cyclic sampling process, the input of the multi-scale cyclic sampling process is used as an input of a first-level sampling process in the multi-scale cyclic sampling process, an output of the first-level sampling process in the multi-scale cyclic sampling process is used as an output of the multi-scale cyclic sampling process, and the output of the multi-scale cyclic sampling process is used as an output of at least one multi-scale cyclic sampling process. The present disclosure includes but is not limited thereto.

Fig. 4D is a schematic flow chart diagram corresponding to the multi-scale circular sampling process in the image processing method shown in fig. 3 according to another embodiment of the present disclosure. As shown in fig. 4D, in the image processing method provided in this embodiment, the at least one multi-scale cyclic sampling process includes a plurality of multi-scale cyclic sampling processes that are sequentially executed, for example, the at least one multi-scale cyclic sampling process may include two or three multi-scale cyclic sampling processes that are sequentially executed, but is not limited thereto. It should be noted that, in the embodiment of the present disclosure, the number of times of the multi-scale cyclic sampling process may be selected according to actual needs, and the present disclosure does not limit this. For example, in some examples, the inventors of the present application found that image enhancement processing performed using an image processing method with two times of multi-scale cyclic sampling processing was more effective than using an image processing method with one or three times of multi-scale cyclic sampling processing, but this should not be taken as a limitation of the present disclosure.

For example, the input of each multi-scale cyclic sampling process is used as the input of the first-level sampling process in the current multi-scale cyclic sampling process, and the output of the first-level sampling process in each multi-scale cyclic sampling process is used as the output of the current multi-scale cyclic sampling process.

For example, as shown in fig. 4D, the first multi-scale cyclic sampling process receives the first feature image as an input of the first multi-scale cyclic sampling process, each multi-scale cyclic sampling process except the first multi-scale cyclic sampling process receives an output of the previous multi-scale cyclic sampling process as an input of the present multi-scale cyclic sampling process, and an output of the last multi-scale cyclic sampling process is an output of at least one multi-scale cyclic sampling process.

It should be noted that, for more details and implementation of each multi-scale cyclic sampling process, reference may be made to the description of the multi-scale cyclic sampling process in the embodiment shown in fig. 4A to 4C, and details of this disclosure are not repeated here. It should be further noted that the implementation manner (i.e., network structure) and parameters of the multiscale cyclic sampling processing in different orders may be the same or different, and the disclosure does not limit this.

Fig. 5 is a flowchart of an image processing method according to another embodiment of the disclosure. As shown in fig. 5, the image processing method includes steps S210 to S250. It should be noted that steps S230 to S240 of the image processing method shown in fig. 5 correspond to steps S110 to S120 of the image processing method shown in fig. 3, that is, the image processing method shown in fig. 5 includes the image processing method shown in fig. 3, so steps S230 to S240 of the image processing method shown in fig. 5 may refer to the description of steps S110 to S120 of the image processing method shown in fig. 3, and may of course refer to the methods of the embodiments shown in fig. 4A to 4D, and the like. Hereinafter, steps S210 to S250 of the image processing method shown in fig. 5 will be described in detail.

Step S210: an input image is acquired.

For example, in step S210, the input image may include a captured photo taken by a camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, a webcam, or the like, which may include a person image, an animal image, a landscape image, or the like, which is not limited by the present disclosure. For example, the quality of the input image is lower than that of a photograph taken by a real digital single-mirror reflex camera, i.e., the input image is a low-quality image. For example, in some examples, the input image may include 3 channels of RGB images; in other examples, the input image may include 3 channels of YUV images. Hereinafter, the input image is described by taking an example in which the input image includes an RGB image, but embodiments of the present disclosure are not limited thereto.

Step S220: an input image is converted into a first feature image using an analysis network.

For example, in step S220, the analysis network may be a convolutional neural network including one of convolutional layers, residual networks, dense networks, and the like. For example, in some examples, the analysis network may convert 3 channels of RGB images (i.e., input images) into a plurality of first feature images, such as 64 first feature images, including but not limited to this disclosure.

It should be noted that the structure and parameters of the analysis network are not limited by the embodiments of the present disclosure, as long as the input image can be converted into the convolution feature dimension (i.e., into the first feature image).

Step S230: receiving a first characteristic image;

step S240: and carrying out multi-scale cyclic sampling processing on the first characteristic image at least once.

It should be noted that, reference may be made to the foregoing description about step S110 to step S120 in step S230 to step S240, and this disclosure is not repeated herein.

Step S250: converting an output of the at least one multi-scale cyclic sampling process into an output image using a synthesis network.

For example, in step S250, the composite network may be a convolutional neural network including one of convolutional layers, residual networks, dense networks, and the like. For example, the output of the at least one multi-scale cyclic sampling process may be referred to as a second feature image. For example, the number of the second feature images may be plural, but is not limited thereto. For example, in some examples, the synthesis network may convert the plurality of second feature images into an output image, e.g., the output image may include 3 channels of RGB images, including but not limited to this disclosure.

Fig. 6A is a schematic diagram of an input image, and fig. 6B is a schematic diagram of an output image obtained by processing the input image shown in fig. 6A according to an image processing method (e.g., the image processing method shown in fig. 5) provided by an embodiment of the disclosure.

For example, as shown in fig. 6A and 6B, the output image retains the content of the input image, but the contrast of the image is improved, and the problem of the input image being too dark is improved, so that the quality of the output image can be close to that of a photograph taken by a real digital single-lens reflex camera, that is, the output image is a high-quality image, compared to the input image.

It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the synthetic network, as long as it can convert the convolved feature dimension (i.e., the second feature image) into an output image.

The image processing method provided by the embodiment of the disclosure can perform image enhancement processing on a low-quality input image, can greatly improve the quality of an output image by repeatedly sampling on multiple scales to obtain higher image fidelity, and is suitable for offline application such as batch processing with higher requirements on image quality. Specifically, the PSNR of an image output by the image enhancement method proposed in the Andrey Ignatov et al document is 20.08, whereas the PSNR of an output image obtained by the image processing method provided in the embodiment shown in fig. 4C of the present disclosure may reach 23.35, that is, an image obtained by the image processing method provided in the embodiment of the present disclosure may be closer to a photograph taken by a real digital single-lens reflex camera.

At least one embodiment of the present disclosure further provides a training method of a neural network. Fig. 7A is a schematic structural diagram of a neural network according to an embodiment of the present disclosure, fig. 7B is a flowchart of a training method of the neural network according to an embodiment of the present disclosure, and fig. 7C is a schematic structural block diagram of the neural network shown in fig. 7A trained according to the training method shown in fig. 7B according to an embodiment of the present disclosure.

For example, as shown in fig. 7A, the neural network 300 includes an analysis network 310, a first sub-neural network 320, and a synthesis network 330. For example, the analysis network 310 processes the input image to obtain a first feature image, the first sub-neural network 320 performs at least one multi-scale cyclic sampling process on the first feature image to obtain a second feature image, and the synthesis network 330 processes the second feature image to obtain an output image.

For example, the structure of the analysis network 310 may refer to the description of the analysis network in the foregoing step S220, which is not limited by the present disclosure; the structure of the first sub-neural network 320 may refer to the description of the implementation manner of the multi-scale cyclic sampling process in the foregoing step S120 (i.e., step S240), for example, the first sub-neural network may include, but is not limited to, the foregoing first convolutional neural network, which is not limited by the present disclosure; for example, the synthetic network 330 may refer to the description of the synthetic network in the aforementioned step S250, and the disclosure does not limit this.

For example, the description about the input image and the output image in the image processing method provided in the foregoing embodiment may also be referred to for the input image and the output image, and details of this disclosure are not repeated here.

For example, as shown in fig. 7B and 7C, the training method of the neural network includes steps S410 to S460.

Step S410: a training input image is acquired.

For example, similar to the input image in the foregoing step S210, the training input image may also include a photo captured by a camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, a web camera, or the like, which may include a human image, an animal image, a plant image, a landscape image, or the like, which is not limited by the present disclosure. For example, the quality of the training input image is lower than the quality of a photograph taken by a real digital single-lens reflex camera, i.e. the training input image is a low quality image. For example, in some examples, the training input image may include 3 channels of RGB images.

Step S420: the training input image is processed using an analysis network to provide a first training feature image.

For example, similar to the analysis network in the aforementioned step S220, the analysis network 310 may be a convolutional neural network including one of convolutional layers, residual networks, dense networks, and the like. For example, in some examples, the analysis network may convert 3 channels of RGB images (i.e., training input images) into a plurality of first training feature images, such as 64 first training feature images, including but not limited to this disclosure.

Step S430: and carrying out multi-scale cyclic sampling processing on the first training feature image at least once by using the first sub-neural network to obtain a second training feature image.

For example, in step S430, the multi-scale cyclic sampling process may be implemented as the multi-scale cyclic sampling process in any of the embodiments shown in fig. 4A-4D, but is not limited thereto. The following description will be given taking as an example the implementation of the multi-scale cyclic sampling process in step S430 as the multi-scale cyclic sampling process shown in fig. 4A.

For example, as shown in fig. 4A, the multi-scale cyclic sampling process nests a first level sampling process and a second level sampling process.

For example, as shown in fig. 4A, the input of the multi-scale cyclic sampling process (i.e., the first training feature image) is used as the input of the first-level sampling process, and the output of the first-level sampling process is used as the output of the multi-scale cyclic sampling process (i.e., the second training feature image). For example, the size of the second training feature image may be the same as the size of the first training feature image.

For example, the first sub-neural network 320 may be implemented as the aforementioned first convolutional neural network, accordingly. For example, the first sub-neural network 320 may include a first and second nested metanetworks, the first metanetwork to perform a first level of sampling processing and the second metanetwork to perform a second level of sampling processing.

For example, the first subnetwork may comprise a first subnetwork for performing the first downsampling process and a second subnetwork for performing the first upsampling process. The second subnetwork is nested between the first subnetwork and the third subnetwork of the first subnetwork. For example, the second subnet may include a third subnet to perform the second downsampling process and a fourth subnet to perform the second upsampling process.

For example, each of the first, second, third, and fourth subnetworks comprises one of a convolutional layer, a residual network, a dense network, and the like. Specifically, the first sub-network and the third sub-network may include one of a convolution layer (down-sampling layer) having a down-sampling function, a residual network, a dense network, and the like; the second sub-network and the fourth sub-network may comprise one of a convolutional layer (upsampling layer) having an upsampling function, a residual network, a dense network, and the like. It should be noted that the first sub-network and the third sub-network may have the same structure or different structures; the second sub-network and the fourth sub-network may have the same structure or may have different structures; the present disclosure is not so limited.

For example, in an embodiment of the present disclosure, in order to improve global features such as brightness and contrast of a feature image, the multi-scale cyclic sampling process may further include: after the first down-sampling process, the first up-sampling process, the second down-sampling process, and the second up-sampling process, an example normalization process or a layer normalization process is performed on the first down-sampled output, the first up-sampled output, the second down-sampled output, and the second up-sampled output, respectively. It should be noted that the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output may use the same normalization processing method (example normalization processing or layer normalization processing), or may use different normalization processing methods, which is not limited in this disclosure.

It should be noted that, for more implementation and more details of the multi-scale cyclic sampling processing in step S430, reference may be made to the description of the multi-scale cyclic sampling processing in step S120 (i.e., step S240) and the embodiments shown in fig. 4A to 4D, and details of this disclosure are not repeated here. It should be further noted that, when the multi-scale cyclic sampling processing in step S430 is implemented in other forms, the first sub-neural network 320 should be correspondingly changed to implement the multi-scale cyclic sampling processing in other forms, which is not described in detail herein.

For example, in step S430, the number of the second training feature images may be multiple, but is not limited thereto.

Step S440: and processing the second training feature image by using the synthesis network to obtain a training output image.

For example, the synthetic network 330 may be a convolutional neural network including one of convolutional layers, residual networks, dense networks, and the like, similar to the synthetic network in the aforementioned step S250. For example, in some examples, the synthesis network may convert the plurality of second training feature images into training output images, which may include, for example, 3-channel RGB images, including but not limited to this disclosure.

Step S450: based on the training output image, a loss value of the neural network is calculated by a loss function.

For example, the parameters of the neural network 300 include parameters of the analysis network 310, parameters of the first sub-neural network 320, and parameters of the synthesis network 330. For example, the initial parameter of the neural network 300 may be a random number, e.g., the random number conforms to a gaussian distribution, which is not limited by the embodiments of the present disclosure.

For example, the loss function of this embodiment can refer to the loss function in the literature provided by Andrey Ignatov et al. For example, similar to the loss functions in this document, the loss functions may include a color loss function, a texture loss function, and a content loss function; accordingly, the specific process of calculating the loss value of the parameter of the neural network 300 by the loss function can also refer to the description in this document. It should be noted that the embodiments of the present disclosure do not limit the specific form of the loss function, that is, the loss function in the above documents is not limited.

Step S460: and correcting parameters of the neural network according to the loss value.

For example, an optimization function (not shown in fig. 7C) may be further included in the training process of the neural network 300, and the optimization function may calculate an error value of the parameter of the neural network 300 according to the loss value calculated by the loss function, and modify the parameter of the neural network 300 according to the error value. For example, the optimization function may calculate error values of parameters of the neural network 300 using a Stochastic Gradient Descent (SGD) algorithm, a Batch Gradient Descent (BGD) algorithm, or the like.

For example, the training method of the neural network may further include: judging whether the training of the neural network meets a preset condition, if not, repeatedly executing the training process (namely, the step S410 to the step S460); and if the preset conditions are met, stopping the training process to obtain the trained neural network. For example, in one example, the predetermined condition is that the corresponding loss values for two (or more) consecutive training output images are no longer significantly reduced. For example, in another example, the predetermined condition is that the number of times of training or the training period of the neural network reaches a predetermined number. The present disclosure is not so limited.

For example, the training output image output by the trained neural network 300 retains the content of the training input image, but the quality of the training output image may be close to the quality of a photo taken by a real digital single-lens reflex camera, i.e., the training output image is a high-quality image.

It should be noted that the above embodiments are only schematic illustrations of the training process of the neural network. Those skilled in the art will appreciate that in the training phase, a large number of sample images are required to train the neural network; meanwhile, in the training process of each sample image, a plurality of repeated iterations can be included to correct the parameters of the neural network. As another example, the training phase may also include fine-tuning (fine-tune) of parameters of the neural network to obtain more optimal parameters.

The neural network training method provided by the embodiment of the disclosure can train the neural network adopted in the image processing method of the embodiment of the disclosure, the neural network trained by the training method can perform image enhancement processing on low-quality input images, and the quality of the output images can be greatly improved by repeatedly sampling on multiple scales to obtain higher image fidelity, so that the neural network training method is suitable for offline application such as batch processing with higher requirements on image quality.

At least one embodiment of the present disclosure also provides an image processing apparatus. Fig. 8 is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure. For example, as shown in fig. 8, the image processing apparatus 500 includes a memory 510 and a processor 520. For example, the memory 510 is used for non-transitory storage of computer readable instructions, and the processor 520 is used for executing the computer readable instructions, and the computer readable instructions are executed by the processor 520 to execute the image processing method provided by the embodiment of the disclosure.

For example, the memory 510 and the processor 520 may be in direct or indirect communication with each other. For example, components such as memory 510 and processor 520 may communicate over a network connection. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The network may include a local area network, the Internet, a telecommunications network, an Internet of Things (Internet of Things) based on the Internet and/or a telecommunications network, and/or any combination thereof, and/or the like. The wired network may communicate by using twisted pair, coaxial cable, or optical fiber transmission, for example, and the wireless network may communicate by using 3G/4G/5G mobile communication network, bluetooth, Zigbee, or WiFi, for example. The present disclosure is not limited herein as to the type and function of the network.

For example, the processor 520 may control other components in the image processing apparatus to perform desired functions. The processor 520 may be a device having data processing capability and/or program execution capability, such as a Central Processing Unit (CPU), Tensor Processor (TPU), or Graphics Processor (GPU). The Central Processing Unit (CPU) may be an X86 or ARM architecture, etc. The GPU may be separately integrated directly onto the motherboard, or built into the north bridge chip of the motherboard. The GPU may also be built into the Central Processing Unit (CPU).

For example, memory 510 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.

For example, one or more computer instructions may be stored on memory 510 and executed by processor 520 to implement various functions. Various applications and various data, such as training input images, and various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

For example, some of the computer instructions stored by memory 510, when executed by processor 520, may perform one or more steps in accordance with the image processing methods described above. As another example, other computer instructions stored by memory 510, when executed by processor 520, may perform one or more steps in a training method according to a neural network described above.

For example, for a detailed description of the processing procedure of the image processing method, reference may be made to the related description in the embodiment of the image processing method, and for a detailed description of the processing procedure of the training method of the neural network, reference may be made to the related description in the embodiment of the training method of the neural network, and repeated details are not repeated.

It should be noted that the image processing apparatus provided in the above embodiments of the present disclosure is illustrative and not restrictive, and the image processing apparatus may further include other conventional components or structures according to practical application needs, for example, in order to implement the necessary functions of the image processing apparatus, a person skilled in the art may set other conventional components or structures according to a specific application scenario, and the embodiments of the present disclosure are not limited thereto.

For technical effects of the image processing apparatus provided by the above embodiments of the present disclosure, reference may be made to corresponding descriptions about an image processing method and a training method of a neural network in the above embodiments, and details are not repeated herein.

At least one embodiment of the present disclosure also provides a storage medium. Fig. 9 is a schematic diagram of a storage medium according to an embodiment of the disclosure. For example, as shown in fig. 9, the storage medium 600 non-temporarily stores computer readable instructions 601, and when the non-transitory computer readable instructions 601 are executed by a computer (including a processor), the instructions of the image processing method provided by any embodiment of the present disclosure may be executed.

For example, one or more computer instructions may be stored on the storage medium 600. Some of the computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the image processing method described above. Further computer instructions stored on the storage medium may be, for example, instructions for carrying out one or more steps of the above-described neural network training method.

For example, the storage medium may include a storage component of a tablet computer, a hard disk of a personal computer, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a compact disc read only memory (CD-ROM), a flash memory, or any combination of the above storage media, as well as other suitable storage media.

For technical effects of the storage medium provided by the embodiments of the present disclosure, reference may be made to corresponding descriptions about an image processing method and a training method of a neural network in the foregoing embodiments, and details are not repeated herein.

For the present disclosure, there are the following points to be explained:

(1) in the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are referred to, and other structures may refer to general designs.

(2) Features of the disclosure in the same embodiment and in different embodiments may be combined with each other without conflict.

The above is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method comprising:

receiving a first characteristic image; and

performing multi-scale cyclic sampling processing on the first characteristic image at least once;

wherein the multi-scale cyclic sampling process comprises a nested first level sampling process and a second level sampling process,

the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual error link addition processing, wherein the first downsampling processing is carried out based on input of the first-level sampling processing to obtain first downsampled output, the first upsampling processing is carried out based on the first downsampled output to obtain first upsampled output, the first residual error link addition processing is carried out for carrying out first residual error link addition on the input of the first-level sampling processing and the first upsampled output, and then a result of the first residual error link addition is used as output of the first-level sampling processing;

the second-level sampling process is nested between the first downsampling process and the first upsampling process, receives the first downsampled output as an input of the second-level sampling process, and provides an output of the second-level sampling process as an input of the first upsampling process, so that the first upsampling process performs upsampling processing based on the first downsampled output;

the second hierarchical sampling process includes a second downsampling process, a second upsampling process, and a second residual linking and adding process, wherein the second downsampling process performs downsampling process based on an input of the second hierarchical sampling process to obtain a second downsampled output, the second upsampling process performs upsampling process based on the second downsampled output to obtain a second upsampled output, the second residual linking and adding process performs second residual linking and adding on the input of the second hierarchical sampling process and the second upsampled output, and then uses a result of the second residual linking and adding as an output of the second hierarchical sampling process.

2. The image processing method according to claim 1, wherein a size of an output of the first upsampling process is the same as a size of an input of the first downsampling process;

the size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.

3. The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling process further comprises a third hierarchical sampling process,

the third level sampling process is nested between the second downsampling process and the second upsampling process, receives the second downsampled output as an input to the third level sampling process, and provides an output of the third level sampling process as an input to the second upsampling process, such that the second upsampling process performs upsampling based on the second downsampled output;

the third level sampling processing comprises third down-sampling processing, third up-sampling processing and third residual linking and adding processing, wherein the third down-sampling processing is performed with down-sampling processing based on the input of the third level sampling processing to obtain a third down-sampling output, the third up-sampling processing is performed with up-sampling processing based on the third down-sampling output to obtain a third up-sampling output, the third residual linking and adding processing is performed with third residual linking and adding on the input of the third level sampling processing and the third up-sampling output, and then the result of the third residual linking and adding is used as the output of the third level sampling processing.

4. The image processing method according to claim 1 or 2, wherein the multi-scale loop sampling process includes the second-level sampling process performed a plurality of times in sequence,

a first said second level sampling process receiving said first down-sampled output as an input to a first said second level sampling process,

each of the second-level sampling processes except for the first one receives an output of a previous one of the second-level sampling processes as an input of the present second-level sampling process,

the output of the last second-level sampling process is used as the input of the first upsampling process.

5. The image processing method according to claim 1 or 2, wherein the at least one multi-scale cyclic sampling process includes a plurality of the multi-scale cyclic sampling processes performed in sequence,

the input of each multi-scale cycle sampling processing is used as the input of the first-level sampling processing in the multi-scale cycle sampling processing, and the output of the first-level sampling processing in the multi-scale cycle sampling processing is used as the output of the multi-scale cycle sampling processing;

the first multi-scale cyclic sampling process receives the first feature image as input to the first multi-scale cyclic sampling process,

each time the multi-scale cycle sampling processing except the first multi-scale cycle sampling processing receives the output of the previous multi-scale cycle sampling processing as the input of the multi-scale cycle sampling processing,

and the output of the last multi-scale cycle sampling processing is used as the output of the at least one multi-scale cycle sampling processing.

6. The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling process further comprises:

performing instance normalization or layer normalization on the first downsampled output, the first upsampled output, the second downsampled output, and the second upsampled output, respectively, after the first downsampling process, the first upsampling process, the second downsampling process, and the second upsampling process.

7. The image processing method according to claim 1 or 2, further comprising: performing the multi-scale cyclic sampling process using a first convolutional neural network;

wherein the first convolutional neural network comprises:

a first metanetwork for performing the first hierarchical sampling process;

a second element network for performing the second level sampling process.

8. The image processing method according to claim 7,

the first metanetwork includes:

a first sub-network for performing the first down-sampling process;

a second sub-network for performing the first upsampling process;

the second network element comprises:

a third sub-network for performing the second downsampling process;

a fourth sub-network for performing the second upsampling process.

9. The image processing method of claim 8, wherein each of the first, second, third, and fourth sub-networks comprises one of a convolutional layer, a residual network, a dense network.

10. The image processing method of claim 9, wherein each of the first, second, third, and fourth sub-networks comprises an instance normalization layer or a layer normalization layer,

the instance normalization layer is to perform an instance normalization process, and the layer normalization layer is to perform a layer normalization process.

11. The image processing method according to claim 1 or 2, further comprising:

acquiring an input image;

converting an input image into the first feature image using an analysis network; and:

converting the output of the at least one multi-scale cyclic sampling process into an output image using a synthesis network.

12. A training method of a neural network, wherein,

the neural network includes: an analysis network, a first sub-neural network and a synthesis network,

the analysis network processes an input image to obtain a first characteristic image, the first sub-neural network performs at least one multi-scale cyclic sampling processing on the first characteristic image to obtain a second characteristic image, and the synthesis network processes the second characteristic image to obtain an output image;

the training method comprises the following steps:

acquiring a training input image;

processing the training input image using the analysis network to provide a first training feature image;

performing the at least one multi-scale cyclic sampling processing on the first training feature image by using the first sub-neural network to obtain a second training feature image;

processing the second training feature image using the synthesis network to obtain a training output image;

calculating a loss value of the neural network through a loss function based on the training output image; and

correcting parameters of the neural network according to the loss value;

the first-level sampling processing comprises first downsampling processing, first upsampling processing and first residual error link addition processing which are sequentially executed, wherein the first downsampling processing is carried out based on input of the first-level sampling processing to obtain first downsampled output, the first upsampling processing is carried out based on the first downsampled output to obtain first upsampled output, the first residual error link addition processing is carried out for carrying out first residual error link addition on the input of the first-level sampling processing and the first upsampled output, and then a result of the first residual error link addition is used as output of the first-level sampling processing;

the second-level sampling processing comprises second down-sampling processing, second up-sampling processing and second residual linking and adding processing which are sequentially executed, wherein the second down-sampling processing is based on the input of the second-level sampling processing to perform down-sampling processing to obtain second down-sampling output, the second up-sampling processing is based on the second down-sampling output to perform up-sampling processing to obtain second up-sampling output, the second residual linking and adding processing is used for performing second residual linking and adding on the input of the second-level sampling processing and the second up-sampling output, and then the result of the second residual linking and adding is used as the output of the second-level sampling processing.

13. The training method of claim 12, wherein the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process;

14. An image processing apparatus comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer readable instructions, which when executed by the processor perform the image processing method of any of claims 1-11.

15. A storage medium storing, non-temporarily, computer-readable instructions which, when executed by a computer, can carry out the instructions of the image processing method according to any one of claims 1 to 11.