WO2020187029A1

WO2020187029A1 - Image processing method and device, neural network training method, and storage medium

Info

Publication number: WO2020187029A1
Application number: PCT/CN2020/077763
Authority: WO
Inventors: 刘瀚文; 张丽杰; 朱丹; 那彦波
Original assignee: 京东方科技集团股份有限公司
Priority date: 2019-03-19
Filing date: 2020-03-04
Publication date: 2020-09-24
Also published as: CN111724309A; CN111724309B

Abstract

An image processing method, an image processing device, a neural network training method and a storage medium. The image processing method comprising: receiving a first feature image; and performing at least one instance of multi-scale loop sampling processing with regard to the first feature image; wherein, the multi-scale loop sampling processing comprising a nested first level sampling processing and second level sampling processing, the first level sampling processing comprising, successively executed, first downsampling processing, first upsampling processing and first residual connection addition processing; the second level sampling processing being nested between the first downsampling processing and the first upsampling processing, and the second level sampling processing comprising, successively executed, second downsampling processing, second upsampling processing and second residual connection addition processing.

Description

Image processing method and device, neural network training method, and storage medium

This application claims the priority of the Chinese patent application No. 201910209662.2 filed on March 19, 2019, and the content of the above-mentioned Chinese patent application is quoted here in full as a part of this application.

Technical field

The embodiments of the present disclosure relate to an image processing method, an image processing device, a training method of a neural network, and a storage medium.

Background technique

Currently, deep learning technology based on artificial neural networks has made great progress in fields such as image classification, image capture and search, facial recognition, age and speech recognition. The advantage of deep learning is that it can use a common structure to solve very different technical problems with relatively similar systems. Convolutional Neural Network (CNN) is an artificial neural network that has been developed in recent years and has attracted wide attention. CNN is a special method of image recognition and is a very effective network with forward feedback. Now, the application scope of CNN is not only limited to the field of image recognition, but can also be applied to the application direction of face recognition, text recognition, image processing and so on.

Summary of the invention

At least one embodiment of the present disclosure provides an image processing method, including: receiving a first characteristic image; and performing multi-scale cyclic sampling processing on the first characteristic image at least once;

Wherein, the multi-scale cyclic sampling processing includes nested first-level sampling processing and second-level sampling processing, and the first-level sampling processing includes first down-sampling processing, first up-sampling processing, and first residual linking Addition processing, wherein the first down-sampling processing performs down-sampling processing based on the input of the first-level sampling processing to obtain a first down-sampled output, and the first up-sampling processing performs up-sampling based on the first down-sampling output The first up-sampling output is obtained by processing, and the first residual link addition processing performs a first residual link addition on the input of the first-level sampling processing and the first up-sampling output, and then adds the first residual link The result of a residual link addition is used as the output of the first-level sampling process; the second-level sampling process is nested between the first down-sampling process and the first up-sampling process, and receives the first The down-sampling output is used as the input of the second-level sampling process, and the output of the second-level sampling process is provided as the input of the first up-sampling process, so that the first up-sampling process performs up-sampling processing based on the first down-sampling output The second-level sampling process includes a second down-sampling process, a second up-sampling process, and a second residual link addition process, wherein the second down-sampling process is performed based on the input of the second-level sampling process The down-sampling process obtains a second down-sampling output, the second up-sampling process performs up-sampling processing based on the second down-sampling output to obtain a second up-sampling output, and the second residual link addition process adds the first The input of the second-level sampling process and the second up-sampling output are subjected to a second residual link addition, and then the result of the second residual link addition is used as the output of the second-level sampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process; the size of the output of the second upsampling process is The size is the same as the input size of the second downsampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, the multi-scale cyclic sampling processing further includes a third-level sampling processing, and the third-level sampling processing is nested in the second down-sampling processing and the During the second up-sampling process, the second down-sampling output is received as the input of the third-level sampling process, and the output of the third-level sampling process is provided as the input of the second up-sampling process, so that the second up-sampling The processing performs up-sampling processing based on the second down-sampling output; the third-level sampling processing includes third down-sampling processing, third up-sampling processing, and third residual link addition processing, where the third down-sampling processing The sampling process performs down-sampling based on the input of the third-level sampling process to obtain a third down-sampled output, and the third up-sampling process performs up-sampling based on the third down-sampled output to obtain a third up-sampled output, so The third residual link addition processing performs a third residual link addition on the input of the third level sampling process and the third up-sampling output, and then uses the third residual link addition result as The output of the third-level sampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, the multi-scale cyclic sampling process includes the second-level sampling process that is sequentially executed multiple times, and the second-level sampling process receives the first-level sampling process for the first time. The one-shot output is used as the input of the first second-level sampling process, and each second-level sampling process except the first second-level sampling process receives the previous second-level sampling The processed output is used as the input of the second-level sampling process this time, and the output of the last second-level sampling process is used as the input of the first upsampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, the at least one multi-scale cyclic sampling process includes the multi-scale cyclic sampling process performed sequentially multiple times, and each time the input of the multi-scale cyclic sampling process is The input of the first-level sampling processing in the multi-scale cyclic sampling processing this time, and the output of the first-level sampling processing in the multi-scale cyclic sampling processing each time is used as the multi-scale cyclic sampling this time Processing output; the first multi-scale cyclic sampling process receives the first feature image as the input of the first multi-scale cyclic sampling process, except for the first multi-scale cyclic sampling process. The multi-scale cyclic sampling process receives the output of the previous multi-scale cyclic sampling process as the input of this multi-scale cyclic sampling process, and the output of the last multi-scale cyclic sampling process is used as the at least one multi-scale cyclic sampling process. The output of the scale cycle sampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, the multi-scale cyclic sampling processing further includes: performing the first down-sampling processing, the first up-sampling processing, the second down-sampling processing, and After the second up-sampling processing, perform instance standardization processing or layer standardization processing on the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output, respectively .

For example, the image processing method provided by some embodiments of the present disclosure further includes: using a first convolutional neural network to perform the multi-scale cyclic sampling processing; wherein, the first convolutional neural network includes: a first element network for Perform the first-level sampling processing; a second meta network for performing the second-level sampling processing.

For example, in the image processing method provided by some embodiments of the present disclosure, the first meta-network includes: a first sub-network for performing the first down-sampling process; a second sub-network for performing the first down-sampling process An up-sampling process; the second meta-network includes: a third sub-network for performing the second down-sampling process; a fourth sub-network for performing the second up-sampling process.

For example, in the image processing method provided by some embodiments of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes a convolutional layer, One of residual networks and dense networks.

For example, in the image processing method provided by some embodiments of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes instance standardization A layer or layer standardization layer, the instance standardization layer is used to perform instance standardization processing, and the layer standardization layer is used to perform layer standardization processing.

For example, the image processing method provided by some embodiments of the present disclosure further includes: acquiring an input image; using an analysis network to convert the input image into the first feature image; and using a synthesis network to process the output of the at least one multi-scale cyclic sampling process Convert to output image.

At least one embodiment of the present disclosure further provides a neural network training method, wherein the neural network includes: an analysis network, a first sub-neural network, and a synthesis network, and the analysis network processes the input image to obtain the first feature Image, the first sub-neural network performs multi-scale cyclic sampling processing on the first feature image at least once to obtain a second feature image, and the synthesis network processes the second feature image to obtain an output image;

The training method includes: obtaining a training input image; using the analysis network to process the training input image to provide a first training feature image; using the first sub-neural network to perform an analysis on the first training feature image The at least one multi-scale cyclic sampling process is used to obtain a second training feature image; the synthesis network is used to process the second training feature image to obtain a training output image; based on the training output image, the loss function is used to calculate the The loss value of the neural network; and correcting the parameters of the neural network according to the loss value;

Wherein, the multi-scale cyclic sampling processing includes nested first-level sampling processing and second-level sampling processing, and the first-level sampling processing includes first down-sampling processing, first up-sampling processing, and first residual linking Addition processing, wherein the first down-sampling processing performs down-sampling processing based on the input of the first-level sampling processing to obtain a first down-sampled output, and the first up-sampling processing performs up-sampling based on the first down-sampling output The first up-sampling output is obtained by processing, and the first residual link addition processing performs a first residual link addition on the input of the first-level sampling processing and the first up-sampling output, and then adds the first residual link The result of a residual link addition is used as the output of the first-level sampling process; the second-level sampling process is nested between the first down-sampling process and the first up-sampling process, and receives the first The down-sampling output is used as the input of the second-level sampling process, and the output of the second-level sampling process is provided as the input of the first up-sampling process, so that the first up-sampling process performs up-sampling processing based on the first down-sampling output The second-level sampling process includes a second down-sampling process, a second up-sampling process, and a second residual link addition process, wherein the second down-sampling process is performed based on the input of the second-level sampling process The down-sampling process obtains a second down-sampling output, the second up-sampling process performs up-sampling processing based on the second down-sampling output to obtain a second up-sampling output, and the second residual link addition process adds the first The input of the second-level sampling process and the second up-sampling output are subjected to the second residual link addition, and then the result of the second residual link addition is used as the output of the second-level sampling process.

For example, in the training method provided by some embodiments of the present disclosure, the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process; the size of the output of the second upsampling process The same size as the input of the second downsampling process.

For example, in the training method provided by some embodiments of the present disclosure, the multi-scale cyclic sampling process further includes a third-level sampling process, and the third-level sampling process is nested in the second down-sampling process and the second down-sampling process. Between the two up-sampling processes, the second down-sampling output is received as the input of the third-level sampling process, and the output of the third-level sampling process is provided as the input of the second up-sampling process, so that the second up-sampling process Up-sampling processing is performed based on the second down-sampling output; the third-level sampling processing includes third down-sampling processing, third up-sampling processing, and third residual link addition processing, where the third down-sampling Processing is performed based on the input of the third-level sampling process to perform down-sampling processing to obtain a third down-sampled output, and the third up-sampling process performs up-sampling based on the third down-sampled output to obtain a third up-sampled output, the The third residual link addition process performs a third residual link addition on the input of the third level sampling process and the third up-sampling output, and then uses the result of the third residual link addition as the The output of the third-level sampling process.

For example, in the training method provided by some embodiments of the present disclosure, the multi-scale cyclic sampling processing includes the second-level sampling processing that is sequentially executed multiple times, and the second-level sampling processing receives the first The down-sampling output is used as the input of the first second-level sampling process, and each second-level sampling process except the first second-level sampling process receives the previous second-level sampling process The output of is used as the input of the second-level sampling process this time, and the output of the last second-level sampling process is used as the input of the first upsampling process.

For example, in the training method provided by some embodiments of the present disclosure, the at least one multi-scale cyclic sampling processing includes the multi-scale cyclic sampling processing performed sequentially multiple times, and each time the input of the multi-scale cyclic sampling processing is used as the original The input of the first-level sampling process in the multi-scale cyclic sampling process, and the output of the first-level sampling process in the multi-scale cyclic sampling process is used as the multi-scale cyclic sampling process this time The output of the first multi-scale cyclic sampling process receives the first training feature image as the input of the first multi-scale cyclic sampling process, except for the first multi-scale cyclic sampling process every time The multi-scale cyclic sampling process receives the output of the previous multi-scale cyclic sampling process as the input of this multi-scale cyclic sampling process, and the output of the last multi-scale cyclic sampling process is used as the at least one multi-scale cyclic sampling process. The output of the scale cycle sampling process.

For example, in the training method provided by an embodiment of the present disclosure, the first sub-neural network includes: a first meta-network for performing the first-level sampling processing; a second meta-network for performing the first-level sampling process; Two-level sampling processing.

For example, in the training method provided by an embodiment of the present disclosure, the first meta-network includes: a first sub-network for performing the first downsampling process; a second sub-network for performing the first Up-sampling processing; the second meta-network includes: a third sub-network for performing the second down-sampling processing; a fourth sub-network for performing the second up-sampling processing.

For example, in the training method provided by an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes a convolutional layer, One of residual networks and dense networks.

For example, in the training method provided by an embodiment of the present disclosure, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an instance standardization layer Or a layer standardization layer, the instance standardization layer is used to perform instance standardization processing on the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output, respectively, The layer standardization layer is used to perform layer standardization processing on the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output, respectively.

At least one embodiment of the present disclosure further provides an image processing device, including: a memory for non-transitory storage of computer-readable instructions; and a processor for running the computer-readable instructions, the computer-readable instructions being The processor executes the image processing method provided by any embodiment of the present disclosure or the neural network training method provided by any embodiment of the present disclosure while running.

At least one embodiment of the present disclosure further provides a storage medium that non-temporarily stores computer-readable instructions, and when the computer-readable instructions are executed by a computer, the instructions or instructions of the image processing method provided in any embodiment of the present disclosure can be executed. Instructions of the neural network training method provided by any embodiment of the present disclosure.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings of the embodiments. Obviously, the drawings in the following description only relate to some embodiments of the present disclosure, rather than limit the present disclosure. .

Figure 1 is a schematic diagram of a convolutional neural network;

Figure 2A is a schematic diagram of a convolutional neural network;

Figure 2B is a schematic diagram of the working process of a convolutional neural network;

FIG. 3 is a flowchart of an image processing method provided by an embodiment of the disclosure;

4A is a schematic flow chart of a multi-scale cyclic sampling process corresponding to the image processing method shown in FIG. 3 according to an embodiment of the present disclosure;

FIG. 4B is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to another embodiment of the present disclosure;

FIG. 4C is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to still another embodiment of the present disclosure;

FIG. 4D is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to another embodiment of the present disclosure;

FIG. 5 is a flowchart of an image processing method provided by another embodiment of the present disclosure;

Fig. 6A is a schematic diagram of an input image;

6B is a schematic diagram of an output image obtained by processing the input image shown in FIG. 6A according to an image processing method provided by an embodiment of the present disclosure;

FIG. 7A is a schematic structural diagram of a neural network provided by an embodiment of the disclosure;

FIG. 7B is a flowchart of a neural network training method provided by an embodiment of the disclosure;

FIG. 7C is a schematic structural block diagram of training the neural network shown in FIG. 7A corresponding to the training method shown in FIG. 7B according to an embodiment of the present disclosure;

FIG. 8 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present disclosure; and

FIG. 9 is a schematic diagram of a storage medium provided by an embodiment of the disclosure.

detailed description

In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings of the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the described embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative labor are within the protection scope of the present disclosure.

Unless otherwise defined, the technical terms or scientific terms used in the present disclosure shall have the usual meanings understood by those with ordinary skills in the field to which this disclosure belongs. The "first", "second" and similar words used in the present disclosure do not indicate any order, quantity, or importance, but are only used to distinguish different components. "Include" or "include" and other similar words mean that the element or item appearing before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. Similar words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to indicate the relative position relationship. When the absolute position of the described object changes, the relative position relationship may also change accordingly.

The present disclosure will be described below through several specific embodiments. In order to keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and known components may be omitted. When any component of the embodiment of the present disclosure appears in more than one drawing, the component is represented by the same or similar reference numeral in each drawing.

Image enhancement is one of the research hotspots in the field of image processing. Due to the limitations of various physical factors in the image acquisition process (for example, the size of the image sensor of the mobile phone camera is too small and other software and hardware limitations) and the interference of environmental noise, the image quality will be greatly reduced. The purpose of image enhancement is to improve the grayscale histogram of the image and the contrast of the image through image enhancement technology, thereby highlighting the detailed information of the image and improving the visual effect of the image.

The use of deep neural networks for image enhancement is a technology emerging with the development of deep learning technology. For example, based on convolutional neural networks, low-quality photos (input images) taken by mobile phones can be processed to obtain high-quality output images. The quality of the output images can be close to that of digital single-lens reflex cameras (Digital Single Lens Reflex Camera). , Often referred to as DSLR, also referred to as digital SLR camera) the quality of the photos taken. For example, the Peak Signal to Noise Ratio (PSNR) index is commonly used to characterize image quality, where the higher the PSNR value, the closer the image is to the real photos taken by a digital single-lens reflex camera.

For example, Andrey Ignatov et al. proposed a method of convolutional neural network to achieve image enhancement, please refer to the literature, Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool, DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks.arXiv:1704.02470v2[cs.CV], September 5, 2017. This document is hereby incorporated by reference in its entirety as a part of this application. This method mainly uses convolutional layers, batch normalization layers and residual connections to construct a single-scale convolutional neural network. The network can be used to input low-quality images (for example, low contrast, underexposure or exposure Excessive, the entire image is too dark or too bright, etc.) processed into a higher quality image. Using color loss, texture loss and content loss as the loss function in training can achieve better processing results.

At least one embodiment of the present disclosure provides an image processing method, an image processing device, a neural network training method, and a storage medium. This image processing method proposes a multi-scale cyclic sampling method based on convolutional neural network. By repeatedly sampling at multiple scales to obtain higher image fidelity, the quality of the output image can be greatly improved, and it is suitable for image processing. Offline applications such as batch processing with high quality requirements.

Initially, Convolutional Neural Network (CNN) was mainly used to identify two-dimensional shapes, and it was highly invariant to image translation, scaling, tilt, or other forms of deformation. CNN mainly simplifies the complexity of the neural network model and reduces the number of weights through the sharing of local perception fields and weights. With the development of deep learning technology, the application scope of CNN is not only limited to the field of image recognition, it can also be applied in fields such as face recognition, text recognition, animal classification, and image processing.

Figure 1 shows a schematic diagram of a convolutional neural network. For example, the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with convolution kernels. FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited in the embodiment of the present disclosure. As shown in FIG. 1, the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103. The input layer 101 has 4 inputs, the hidden layer 102 has 3 outputs, and the output layer 103 has 2 outputs. Finally, the convolutional neural network finally outputs 2 images.

For example, the 4 inputs of the input layer 101 may be 4 images, or 4 feature images of 1 image. The three outputs of the hidden layer 102 may be characteristic images of the image input through the input layer 101.

For example, as shown in Figure 1, the convolutional layer has weights

And bias

Weights

Represents the convolution kernel, bias

Is a scalar superimposed on the output of the convolutional layer, where k is the label of the input layer 101, and i and j are the labels of the unit of the input layer 101 and the unit of the hidden layer 102, respectively. For example, the first convolutional layer 201 includes a first set of convolution kernels (in Figure 1

) And the first set of offsets (in Figure 1

). The second convolution layer 202 includes a second set of convolution kernels (in Figure 1

) And the second set of offsets (in Figure 1

). Generally, each convolutional layer includes tens or hundreds of convolution kernels. If the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.

For example, as shown in FIG. 1, the convolutional neural network further includes a first activation layer 203 and a second activation layer 204. The first activation layer 203 is located behind the first convolutional layer 201, and the second activation layer 204 is located behind the second convolutional layer 202. The activation layer (for example, the first activation layer 203 and the second activation layer 204) includes activation functions, which are used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems . The activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function). The ReLU function is an unsaturated nonlinear function, and the Sigmoid function and tanh function are saturated nonlinear functions. For example, the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer (for example, the first convolutional layer 201 can include the first activation layer 203, and the second convolutional layer 202 can be Including the second active layer 204).

For example, in the first convolution layer 201, first, several convolution kernels in the first group of convolution kernels are applied to each input

And several offsets in the first set of offsets

In order to obtain the output of the first convolutional layer 201; then, the output of the first convolutional layer 201 can be processed by the first activation layer 203 to obtain the output of the first activation layer 203. In the second convolutional layer 202, first, apply several convolution kernels in the second set of convolution kernels to the output of the input first activation layer 203

And several offsets in the second set of offsets

In order to obtain the output of the second convolutional layer 202; then, the output of the second convolutional layer 202 can be processed by the second activation layer 204 to obtain the output of the second activation layer 204. For example, the output of the first convolutional layer 201 can be a convolution kernel applied to its input

Offset

As a result of the addition, the output of the second convolutional layer 202 can be a convolution kernel applied to the output of the first activation layer 203

Offset

The result of the addition.

Before using the convolutional neural network for image processing, the convolutional neural network needs to be trained. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted through multiple sets of input/output example images and optimization algorithms to obtain an optimized convolutional neural network model.

FIG. 2A shows a schematic diagram of the structure of a convolutional neural network, and FIG. 2B shows a schematic diagram of the working process of a convolutional neural network. For example, as shown in FIGS. 2A and 2B, after the input image is input to the convolutional neural network through the input layer, it undergoes several processing procedures (each level in FIG. 2A) in turn, and then outputs the category identification. The main components of a convolutional neural network can include multiple convolutional layers, multiple downsampling layers, and fully connected layers. In this disclosure, it should be understood that each of these layers, such as multiple convolutional layers, multiple downsampling layers, and fully connected layers, refers to a corresponding processing operation, that is, convolution processing, downsampling processing, fully connected processing, etc. , The described neural network also refers to the corresponding processing operation, the example standardization layer or layer standardization layer to be described below is similar to this, and the description is not repeated here. For example, a complete convolutional neural network can be composed of these three layers. For example, FIG. 2A only shows three levels of a convolutional neural network, namely the first level, the second level, and the third level. For example, each level may include a convolution module and a downsampling layer. For example, each convolution module may include a convolution layer. Thus, the processing process of each level may include: convolution and down-sampling of the input image. For example, according to actual needs, each convolution module may also include an instance normalization layer or a layer normalization layer, so that the processing process of each level may also include instance normalization processing or layer normalization processing.

For example, the instance standardization layer is used to perform instance standardization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving the effect of image enhancement. For example, the predetermined range may be [-1, 1]. The instance standardization layer performs instance standardization processing on each feature image according to its own mean and variance. For example, the instance standardization layer can also be used to perform instance standardization processing on a single image.

For example, assuming that the size of the mini-batch gradient decent method is T, the number of feature images output by a certain convolutional layer is C, and each feature image is a matrix of H rows and W columns, then the feature The image model is represented as (T, C, H, W). Therefore, the instance standardization formula of the instance standardization layer can be expressed as follows:

Among them, x _tijk is the value of the t-th feature image, the i-th feature image, the j-th row, and the k-th column in the feature image set output by the convolutional layer. y _tijk represents the result obtained after processing x _tijk by the instance standardization layer. ε ₁ is a small integer to avoid zero denominator.

For example, the layer standardization layer is similar to the instance standardization layer, and is also used to perform layer standardization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving The effect of image enhancement. For example, the predetermined range may be [-1, 1]. Different from the example standardization layer, the layer standardization layer performs layer standardization processing on each column of the characteristic image according to the mean value and variance of each column of each characteristic image, thereby realizing the layer standardization processing of the characteristic image. For example, the layer standardization layer can also be used to perform layer standardization processing on a single image.

For example, still taking the mini-batch gradient decent method described above as an example, the model of the feature image is expressed as (T, C, H, W). Therefore, the layer standardization formula of the layer standardization layer can be expressed as follows:

Among them, x _tijk is the value of the t-th feature image, the i-th feature image, the j-th row, and the k-th column in the feature image set output by the convolutional layer. y′ _tijk represents the result obtained after processing x _tijk by the layer standardization layer. ε ₂ is a small integer to avoid zero denominator.

The convolutional layer is the core layer of the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron is only connected to some of the neurons in the adjacent layer. The convolutional layer can apply several convolution kernels (also called filters) to the input image to extract multiple types of features of the input image. Each convolution kernel can extract one type of feature. The convolution kernel is generally initialized in the form of a random decimal matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights. The result obtained after applying a convolution kernel to the input image is called a feature map, and the number of feature images is equal to the number of convolution kernels. Each feature image is composed of some rectangularly arranged neurons, and the neurons of the same feature image share weights, and the shared weights here are the convolution kernels. The feature image output by the convolution layer of one level can be input to the convolution layer of the next next level and processed again to obtain a new feature image. For example, as shown in FIG. 2A, the first-level convolutional layer may output a first-level feature image, and the first-level feature image is input to the second-level convolutional layer and processed again to obtain a second-level feature image.

For example, as shown in Figure 2B, the convolutional layer can use different convolution kernels to convolve the data of a certain local receptive field of the input image, and the convolution result is input to the activation layer, which is calculated according to the corresponding activation function To get the characteristic information of the input image.

For example, as shown in FIGS. 2A and 2B, the down-sampling layer is arranged between adjacent convolutional layers, and the down-sampling layer is a form of down-sampling. On the one hand, the down-sampling layer can be used to reduce the scale of the input image, simplify the calculation complexity, and reduce over-fitting to a certain extent; on the other hand, the down-sampling layer can also perform feature compression to extract the input image Main features. The down-sampling layer can reduce the size of feature images, but does not change the number of feature images. For example, if an input image with a size of 12×12 is sampled by a 6×6 convolution kernel, a 2×2 output image can be obtained, which means that 36 pixels on the input image are merged into the output image. 1 pixel. The last downsampling layer or convolutional layer can be connected to one or more fully connected layers, which are used to connect all the extracted features. The output of the fully connected layer is a one-dimensional matrix, which is a vector.

Hereinafter, some embodiments and examples of the present disclosure will be described in detail with reference to the accompanying drawings.

FIG. 3 is a flowchart of an image processing method provided by an embodiment of the disclosure. For example, as shown in Figure 3, the image processing method includes:

Step S110: receiving a first characteristic image;

Step S120: Perform at least one multi-scale cyclic sampling process on the first feature image.

For example, in step S110, the first feature image may include a feature image obtained after the input image is processed by one of a convolutional layer, a residual network, a dense network, etc. (for example, refer to FIG. 2B). For example, the residual network maintains its input in a certain proportion in its output by means of, for example, adding residual connections. For example, a dense network includes a bottleneck layer and a convolution layer. For example, in some examples, the bottleneck layer is used to reduce the dimensionality of the data to reduce the number of parameters in the subsequent convolution operation, such as the convolution kernel of the bottleneck layer. It is a 1×1 convolution kernel, for example, the convolution kernel of the convolution layer is a 3×3 convolution kernel; the present disclosure includes but is not limited to this. For example, the input image is processed by convolution, down-sampling, etc. to obtain the first feature image. It should be noted that this embodiment does not limit the acquisition method of the first characteristic image. For example, the first characteristic image may include a plurality of characteristic images, but is not limited thereto.

For example, the first feature image received in step S110 is used as the input of the multi-scale cyclic sampling process in step S120. For example, the multi-scale cyclic sampling process may have various forms, including but not limited to the three forms shown in FIGS. 4A-4C which will be described below.

FIG. 4A is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to an embodiment of the present disclosure. As shown in FIG. 4A, the multi-scale cyclic sampling processing includes nested first-level sampling processing and second-level sampling processing.

For example, as shown in FIG. 4A, the input of the multi-scale cyclic sampling processing is used as the input of the first-level sampling processing, and the output of the first-level sampling processing is used as the output of the multi-scale cyclic sampling processing. For example, the output of the multi-scale cyclic sampling process is called the second feature image. For example, the size of the second feature image (the number of rows and columns of the pixel array) may be the same as the size of the first feature image.

For example, as shown in FIG. 4A, the first-level sampling process includes a first down-sampling process, a first up-sampling process, and a first residual link addition process that are sequentially executed. The first down-sampling process is performed based on the input of the first-level sampling process to obtain the first down-sampled output. For example, the first down-sampling process can directly down-sample the input of the first-level sampling process to obtain the first down-sampling Output. The first up-sampling process is performed based on the first down-sampling output to perform up-sampling processing to obtain the first up-sampling output, for example, after the first down-sampling output is subjected to the second-level sampling process, the up-sampling process is performed to obtain the first up-sampling output That is, the first up-sampling process can indirectly perform up-sampling on the first down-sampling output. The first residual link addition process performs the first residual link addition on the input of the first level sampling process and the first upsampling output, and then uses the result of the first residual link addition as the output of the first level sampling process . For example, the size of the output of the first up-sampling process (ie, the first up-sampling output) is the same as the size of the input of the first-level sampling process (ie, the input of the first down-sampling process), which is added through the first residual link After that, the size of the output of the first-level sampling process is the same as the size of the input of the first-level sampling process.

For example, as shown in Figure 4A, the second-level sampling process is nested between the first down-sampling process and the first up-sampling process of the first-level sampling process, and the first down-sampling output is received as the input of the second-level sampling process. , Providing the output of the second-level sampling process as the input of the first up-sampling process, so that the first up-sampling process performs the up-sampling process based on the first down-sampling output.

For example, as shown in FIG. 4A, the second-level sampling process includes a second down-sampling process, a second up-sampling process, and a second residual link addition process that are sequentially executed. The second down-sampling process performs down-sampling based on the input of the second-level sampling process to obtain the second down-sampled output. For example, the second down-sampling process can directly down-sample the input of the second-level sampling process to obtain the second down-sampling Output. The second up-sampling process performs up-sampling based on the second down-sampled output to obtain the second up-sampled output. For example, the second up-sampling process may directly up-sample the second down-sampled output to obtain the second up-sampled output. The second residual link addition process performs a second residual link addition on the input of the second level sampling process and the second upsampling output, and then uses the result of the second residual link addition as the output of the second level sampling process . For example, the size of the output of the second up-sampling process (ie, the second up-sampling output) is the same as the size of the input of the second-level sampling process (ie, the input of the second down-sampling process), so that it is added through the second residual link After that, the size of the output of the second-level sampling process is the same as the size of the input of the second-level sampling process.

It should be noted that, in some embodiments of the present disclosure (not limited to this embodiment), the sampling processing of each level (for example, the first-level sampling processing, the second-level sampling processing, and the embodiment shown in FIG. 4B will be The procedures of the third-level sampling processing, etc. introduced are similar, including down-sampling processing, up-sampling processing and residual link addition processing. In addition, taking the feature image as an example, the residual link addition processing may include correspondingly adding the values of each row and each column of the matrix of the two feature images, but it is not limited to this.

In the present disclosure, “nested” means that an object includes another object that is similar or identical to the object, and the object includes but is not limited to a process or a network structure.

It should be noted that in some embodiments of the present disclosure, the size of the output of the upsampling process (for example, the output of the upsampling process is a feature image) in the sampling process of each level and the input of the downsampling process (for example, The input of the down-sampling process is the feature image) with the same size, so after the residual link addition, the output of the sampling process of each level (for example, the output of the sampling process of each level can be the feature image) size and each The input of the sampling process of the levels (for example, the input of the sampling process of each level may be a feature image) has the same size.

It should be noted that, in some embodiments of the present disclosure, multi-scale cyclic sampling processing can be implemented by a convolutional neural network. For example, in some embodiments of the present disclosure, the first convolutional neural network may be used to perform multi-scale cyclic sampling processing. For example, in some examples, the first convolutional neural network may include a nested first meta network and a second meta network, the first meta network is used to perform the first level of sampling processing, and the second meta network is used to perform the second Hierarchical sampling processing.

For example, in some examples, the first meta-network may include a first sub-network and a second sub-network, the first sub-network is used to perform the first down-sampling process, and the second sub-network is used to perform the first up-sampling process. The second meta network is nested between the first sub network and the third sub network of the first meta network. For example, in some examples, the second meta-network may include a third sub-network and a fourth sub-network, the third sub-network is used to perform the second down-sampling process, and the fourth sub-network is used to perform the second up-sampling process. For example, both the first meta network and the second meta network are similar to the aforementioned residual network.

For example, in some examples, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, a dense network, and the like. Specifically, the first sub-network and the third sub-network may include a convolutional layer (down-sampling layer) with down-sampling function, and may also include one of residual networks and dense networks with down-sampling function; the second sub-network The and fourth sub-network may include a convolutional layer (up-sampling layer) with an up-sampling function, and may also include one of a residual network with an up-sampling function, a dense network, and the like. It should be noted that the first sub-network and the third sub-network may have the same structure or different structures; the second sub-network and the fourth sub-network may have the same structure or different structures; The disclosed embodiment does not limit this.

Down-sampling is used to reduce the size of the feature image, thereby reducing the data amount of the feature image. For example, down-sampling can be performed through the down-sampling layer, but is not limited to this. For example, the down-sampling layer can use max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, and demultiplexing output (demuxout, Split the input image into multiple smaller images) and other down-sampling methods to achieve down-sampling processing.

Upsampling is used to increase the size of the feature image, thereby increasing the data volume of the feature image. For example, upsampling can be performed through an upsampling layer, but is not limited to this. For example, the up-sampling layer can adopt up-sampling methods such as strided transposed convolution and interpolation algorithms to implement up-sampling processing. The interpolation algorithm may include, for example, interpolation, bilinear interpolation, and bicubic interpolation (Bicubic Interprolation).

It should be noted that, in some embodiments of the present disclosure, the downsampling factor of the downsampling process at the same level corresponds to the upsampling factor of the upsampling process, that is, when the downsampling factor of the downsampling process is 1/y , The upsampling factor of the upsampling process is y, where y is a positive integer, and y is usually greater than 2. Thus, it can be ensured that the output of the up-sampling process and the input size of the down-sampling process at the same level are the same.

It should be noted that, in some embodiments of the present disclosure (not limited to this embodiment), the parameters of the downsampling process at different levels (that is, the parameters of the network corresponding to the downsampling process) may be the same or different; different levels The parameters of the upsampling process (that is, the parameters of the network corresponding to the upsampling process) can be the same or different; the added parameters of the residual connections at different levels can be the same or different. This disclosure does not limit this.

For example, in some embodiments of the present disclosure (not limited to this embodiment), in order to improve the global features such as brightness and contrast of the feature image, the multi-scale cyclic sampling processing may also include: first down-sampling processing, first up-sampling processing After the processing, the second down-sampling process, and the second up-sampling process, the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output are respectively subjected to instance standardization processing or layer standardization processing. It should be noted that the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output can use the same standardization processing method (instance standardization processing or layer standardization processing), or different Standardized processing method, this disclosure does not limit this.

Correspondingly, the first sub-network, the second sub-network, the third sub-network and the fourth sub-network also include an instance standardization layer or a layer standardization layer, respectively, the instance standardization layer is used to perform instance standardization processing, and the layer standardization layer is used to execute the layer. Standardized processing. For example, the instance standardization layer can perform instance standardization processing according to the aforementioned instance standardization formula, and the layer standardization layer can perform layer standardization processing according to the aforementioned layer standardization formula, which is not limited in the present disclosure. It should be noted that the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network may include the same standardization layer (example standardization layer or layer standardization layer), or can include different standardization layers, the present disclosure There is no restriction on this.

FIG. 4B is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to another embodiment of the present disclosure. As shown in FIG. 4B, based on the multi-scale cyclic sampling process shown in FIG. 4A, the multi-scale cyclic sampling process further includes a third-level sampling process. It should be noted that the other procedures of the multi-scale cyclic sampling processing shown in FIG. 5 are basically the same as the procedures of the multi-scale cyclic sampling processing shown in FIG. 4A, and the repetitions are not repeated here.

For example, as shown in Figure 4B, the third-level sampling process is nested between the second down-sampling process and the second up-sampling process of the second-level sampling process, and the second down-sampling output is received as the input of the third-level sampling process. , Providing the output of the third-level sampling process as the input of the second up-sampling process, so that the second up-sampling process performs the up-sampling process based on the second down-sampling output. It should be noted that, at this time, similar to the first up-sampling process indirectly performing the up-sampling process on the first down-sampled output, the second up-sampling process also indirectly up-sampling the second down-sampled output.

The third-level sampling process includes a third down-sampling process, a third up-sampling process, and a third residual link addition process that are sequentially executed. The third down-sampling process is performed based on the input of the third-level sampling process to obtain the third down-sampled output. For example, the third down-sampling process can directly down-sample the input of the third-level sampling process to obtain the third down-sampling Output. The third up-sampling process performs up-sampling based on the third down-sampled output to obtain the third up-sampled output. For example, the third up-sampling process may directly up-sample the third down-sampled output to obtain the third up-sampled output. The third residual link addition process performs the third residual link addition on the input of the third level sampling process and the third upsampling output, and then uses the third residual link addition result as the output of the third level sampling process . For example, the size of the output of the third up-sampling process (ie, the third up-sampling output) is the same as the size of the input of the third-level sampling process (ie, the input of the third down-sampling process), so that it is added through the third residual link After that, the size of the output of the third-level sampling process is the same as the size of the input of the third-level sampling process.

It should be noted that more details of the third-level sampling processing and implementation (ie network structure) can refer to the description of the first-level sampling processing and the second-level sampling processing in the embodiment shown in FIG. 4A. Repeat it again.

It should be noted that based on this embodiment, those skilled in the art should understand that multi-scale cyclic sampling processing may also include more levels of sampling processing, for example, it may also include a fourth level nested in the third level of sampling processing. The sampling processing, the fifth-level sampling processing nested in the fourth-level sampling processing, etc., the nesting method is similar to the second-level sampling processing and the third-level sampling processing described above. limit.

FIG. 4C is a schematic flowchart diagram corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to another embodiment of the present disclosure. As shown in FIG. 4C, based on the multi-scale cyclic sampling process shown in FIG. 4A, the multi-scale cyclic sampling process includes a second-level sampling process that is sequentially executed multiple times. It should be noted that the other procedures of the multi-scale cyclic sampling processing shown in FIG. 5 are basically the same as the procedures of the multi-scale cyclic sampling processing shown in FIG. 4A, and the repetitions are not repeated here. It should also be noted that the inclusion of two second-level sampling processing in FIG. 4C is exemplary. In an embodiment of the present disclosure, the multi-scale cyclic sampling processing may include two or more second-level sampling performed sequentially. deal with. It should be noted that in the embodiment of the present disclosure, the number of second-level sampling processing can be selected according to actual needs, and the present disclosure does not limit this. For example, in some examples, the inventor of the present application found that compared to using an image processing method with one or three second-level sampling processing, an image processing method with two second-level sampling processing is used for image enhancement processing. The effect is better, but this should not be seen as a limitation of the present disclosure.

For example, the first second-level sampling process receives the first down-sampling output as the input of the first second-level sampling process, and every second-level sampling process except the first second-level sampling process receives the previous one The output of the second-level sampling process is used as the input of this second-level sampling process, and the output of the last second-level sampling process is used as the input of the first upsampling process.

It should be noted that, for more details and implementation manners of each second-level sampling processing, reference may be made to the description of the second-level sampling processing in the embodiment shown in FIG. 4A, which will not be repeated in this disclosure.

It should be noted that in some embodiments of the present disclosure (not limited to this embodiment), the parameters of the downsampling process of the same level in different orders may be the same or different; the parameters of the upsampling process of the same level in different orders It can be the same or different; the added parameters of the residual connections of the same level in different orders can be the same or different. This disclosure does not limit this.

It should be noted that, based on this embodiment, those skilled in the art should understand that in the multi-scale cyclic sampling process, the first-level sampling process can nest multiple second-level sampling processes that are executed in sequence; further, at least partially The second-level sampling process may nest one or more third-level sampling processes that are executed sequentially, and the number of third-level sampling processes nested in at least part of the second-level sampling process may be the same or different; further, The third-level sampling processing can nest the fourth-level sampling processing, and the specific nesting manner may be the same as the second-level sampling processing nesting the third-level sampling processing; and so on.

It should be noted that FIGS. 4A-4C show a situation where the image processing method provided by an embodiment of the present disclosure includes a multi-scale cyclic sampling process. In the image processing method provided by the embodiments shown in FIGS. 4A-4C, at least one multi-scale cyclic sampling processing includes one multi-scale cyclic sampling processing. The multi-scale cyclic sampling processing receives the first feature image as the input of the multi-scale cyclic sampling processing, and the input of the multi-scale cyclic sampling processing is used as the input of the first-level sampling processing in the multi-scale cyclic sampling processing. The output of the first-level sampling processing is used as the output of the multi-scale cyclic sampling processing, and the output of the multi-scale cyclic sampling processing is used as the output of the multi-scale cyclic sampling processing at least once. The present disclosure includes but is not limited to this.

FIG. 4D is a schematic flow chart corresponding to the multi-scale cyclic sampling processing in the image processing method shown in FIG. 3 according to another embodiment of the present disclosure. As shown in FIG. 4D, in the image processing method provided in this embodiment, at least one multi-scale cyclic sampling process includes multiple times of sequential execution of multi-scale cyclic sampling processing. For example, at least one multi-scale cyclic sampling process may include two or three times. Multi-scale cyclic sampling processing executed sequentially, but not limited to this. It should be noted that in the embodiments of the present disclosure, the number of times of multi-scale cyclic sampling processing can be selected according to actual needs, and the present disclosure does not limit this. For example, in some examples, the inventor of the present application found that compared to the image processing method with one or three multi-scale cyclic sampling processing, the image processing method with two multi-scale cyclic sampling processing is used for image enhancement processing. The effect is better, but this should not be seen as a limitation of the present disclosure.

For example, the input of each multi-scale cyclic sampling process is used as the input of the first-level sampling process in this multi-scale cyclic sampling process, and the output of the first-level sampling process in each multi-scale cyclic sampling process is used as the current multi-scale The output of cyclic sampling processing.

For example, as shown in Figure 4D, the first multi-scale cyclic sampling process receives the first feature image as the input of the first multi-scale cyclic sampling process, and each multi-scale cycle except the first multi-scale cyclic sampling process The sampling process receives the output of the previous multi-scale cyclic sampling process as the input of this multi-scale cyclic sampling process, and the output of the last multi-scale cyclic sampling process is used as the output of at least one multi-scale cyclic sampling process.

It should be noted that, for more details and implementation of the multi-scale cyclic sampling processing each time, reference may be made to the description of the multi-scale cyclic sampling processing in the embodiment shown in FIGS. 4A-4C, which will not be repeated in this disclosure. It should also be noted that the implementation manner (ie, network structure) and parameters of the multi-scale cyclic sampling processing in different orders may be the same or different, and the present disclosure does not limit this.

Fig. 5 is a flowchart of an image processing method provided by another embodiment of the present disclosure. As shown in FIG. 5, the image processing method includes step S210 to step S250. It should be noted that steps S230 to S240 of the image processing method shown in FIG. 5 correspond to the same as steps S110 to S120 of the image processing method shown in FIG. 3, that is, the image processing method shown in FIG. Therefore, steps S230 to S240 of the image processing method shown in FIG. 5 can refer to the foregoing description of steps S110 to S120 of the image processing method shown in FIG. 3, and of course, you can also refer to FIG. 4A to The method of the embodiment shown in 4D, etc. Hereinafter, steps S210 to S250 of the image processing method shown in FIG. 5 will be described in detail.

Step S210: Obtain an input image.

For example, in step S210, the input image may include photos captured by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a web camera, etc., which may include images of people, animations, etc. Plant images or landscape images, etc., are not limited in this disclosure. For example, the quality of the input image is lower than the quality of photos taken by a real digital single-lens reflex camera, that is, the input image is a low-quality image. For example, in some examples, the input image may include a 3-channel RGB image; in other examples, the input image may include a 3-channel YUV image. Hereinafter, the input image includes an RGB image as an example, but the embodiment of the present disclosure is not limited to this.

Step S220: Use the analysis network to convert the input image into a first feature image.

For example, in step S220, the analysis network may be a convolutional neural network including one of a convolutional layer, a residual network, and a dense network. For example, in some examples, the analysis network can convert 3 channel RGB images (ie, input images) into multiple first feature images, such as 64 first feature images. The present disclosure includes but is not limited to this.

It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the analysis network, as long as it can convert the input image to the convolution feature dimension (ie, convert it to the first feature image).

Step S230: Receive the first characteristic image;

Step S240: Perform at least one multi-scale cyclic sampling process on the first feature image.

It should be noted that, for step S230 to step S240, reference may be made to the foregoing description of step S110 to step S120, which will not be repeated in this disclosure.

Step S250: Use a synthesis network to convert the output of at least one multi-scale cyclic sampling process into an output image.

For example, in step S250, the synthesis network may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, the output of at least one multi-scale cyclic sampling process can be referred to as the second feature image. For example, the number of second feature images may be multiple, but is not limited to this. For example, in some examples, the synthesis network may convert multiple second feature images into output images. For example, the output image may include 3 channel RGB images. The present disclosure includes but is not limited to this.

FIG. 6A is a schematic diagram of an input image, and FIG. 6B is a result obtained by processing the input image shown in FIG. 6A according to an image processing method (for example, the image processing method shown in FIG. 5) provided by an embodiment of the present disclosure Schematic of the output image.

For example, as shown in Figure 6A and Figure 6B, the output image retains the content of the input image, but the contrast of the image is improved, and the problem of the input image being too dark is improved, so that the quality of the output image can be close to that of the input image. Based on the quality of photos taken by a real digital single-lens reflex camera, the output image is a high-quality image.

It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the synthesis network, as long as it can convert the convolution feature dimension (ie, the second feature image) into an output image.

The image processing method provided by the embodiments of the present disclosure can perform image enhancement processing on low-quality input images, and by repeatedly sampling at multiple scales to obtain higher image fidelity, the quality of output images can be greatly improved. For offline applications such as batch processing that require high image quality. Specifically, the PSNR of the image output by the image enhancement method proposed by Andrey Ignatov et al. is 20.08, while the PSNR of the output image obtained based on the image processing method provided in the embodiment shown in FIG. 4C of the present disclosure can reach 23.35, that is The image obtained by the image processing method provided by the embodiments of the present disclosure may be closer to a real photo taken by a digital single-lens reflex camera.

At least one embodiment of the present disclosure also provides a neural network training method. FIG. 7A is a schematic structural diagram of a neural network provided by an embodiment of the disclosure, FIG. 7B is a flowchart of a neural network training method provided by an embodiment of the disclosure, and FIG. 7C is a schematic diagram of a neural network training method provided by an embodiment of the disclosure. This corresponds to the schematic block diagram of the training method shown in FIG. 7B for training the neural network shown in FIG. 7A.

For example, as shown in FIG. 7A, the neural network 300 includes an analysis network 310, a first sub-neural network 320, and a synthesis network 330. For example, the analysis network 310 processes the input image to obtain the first feature image, the first sub-neural network 320 performs at least one multi-scale cyclic sampling process on the first feature image to obtain the second feature image, and the synthesis network 330 performs the second feature image The image is processed to obtain an output image.

For example, the structure of the analysis network 310 can refer to the description of the analysis network in the aforementioned step S220, which is not limited in the present disclosure; the structure of the first sub-neural network 320 can refer to the aforementioned step S120 (that is, step S240) regarding the multi-scale loop For the description of the implementation of sampling processing, for example, the first sub-neural network may include but is not limited to the aforementioned first convolutional neural network, which is not limited in the present disclosure; for example, the synthesis network 330 may refer to the synthesis network in the aforementioned step S250 Description, this disclosure does not limit this.

For example, the input image and the output image can also refer to the description of the input image and the output image in the image processing method provided in the foregoing embodiment, which will not be repeated in this disclosure.

For example, as shown in FIG. 7B and FIG. 7C, the training method of the neural network includes step S410 to step S460.

Step S410: Obtain training input images.

For example, similar to the input image in the aforementioned step S210, the training input image may also include photos taken by the camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a web camera. It may include images of people, images of animals and plants, or landscapes, etc., which is not limited in the present disclosure. For example, the quality of the training input image is lower than the quality of photos taken by a real digital single-lens reflex camera, that is, the training input image is a low-quality image. For example, in some examples, the training input image may include 3 channel RGB images.

Step S420: Use the analysis network to process the training input image to provide a first training feature image.

For example, similar to the analysis network in the aforementioned step S220, the analysis network 310 may be a convolutional neural network including one of a convolutional layer, a residual network, and a dense network. For example, in some examples, the analysis network can convert 3 channel RGB images (ie, training input images) into multiple first training feature images, such as 64 first training feature images. The present disclosure includes but is not limited to this.

Step S430: Use the first sub-neural network to perform multi-scale cyclic sampling processing on the first training feature image at least once to obtain a second training feature image.

For example, in step S430, the multi-scale cyclic sampling process can be implemented as the multi-scale cyclic sampling process in any of the embodiments shown in FIGS. 4A-4D, but is not limited thereto. In the following, the multi-scale cyclic sampling processing in step S430 is implemented as the multi-scale cyclic sampling processing shown in FIG. 4A as an example for description.

For example, as shown in FIG. 4A, the multi-scale cyclic sampling process nests the first-level sampling process and the second-level sampling process.

For example, as shown in FIG. 4A, the input of the multi-scale cyclic sampling process (ie, the first training feature image) is used as the input of the first-level sampling process, and the output of the first-level sampling process is used as the output of the multi-scale cyclic sampling process (ie, the first Two training feature images). For example, the size of the second training feature image may be the same as the size of the first training feature image.

For example, correspondingly, the first sub-neural network 320 may be implemented as the aforementioned first convolutional neural network. For example, the first sub-neural network 320 may include a nested first meta-network and a second meta-network, the first meta-network is used to perform the first-level sampling processing, and the second meta-network is used to perform the second-level sampling processing.

For example, the first meta-network may include a first sub-network and a second sub-network, the first sub-network is used to perform the first down-sampling process, and the second sub-network is used to perform the first up-sampling process. The second meta network is nested between the first sub network and the third sub network of the first meta network. For example, the second meta-network may include a third sub-network and a fourth sub-network, the third sub-network is used to perform the second down-sampling process, and the fourth sub-network is used to perform the second up-sampling process.

For example, each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes one of a convolutional layer, a residual network, a dense network, and the like. Specifically, the first sub-network and the third sub-network may include one of a convolutional layer (down-sampling layer) with down-sampling function, a residual network, a dense network, etc.; the second and fourth sub-networks may include One of the convolutional layer (upsampling layer), residual network, dense network, etc. of the upsampling function. It should be noted that the first sub-network and the third sub-network may have the same structure or different structures; the second sub-network and the fourth sub-network may have the same structure or different structures; There is no restriction on this publicly.

For example, in the embodiments of the present disclosure, in order to improve the global features such as brightness and contrast of the feature image, the multi-scale cyclic sampling processing may further include: first down-sampling processing, first up-sampling processing, second down-sampling processing, and After the second up-sampling processing, the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output are respectively subjected to instance standardization processing or layer standardization processing. It should be noted that the first down-sampling output, the first up-sampling output, the second down-sampling output, and the second up-sampling output can use the same standardization processing method (instance standardization processing or layer standardization processing), or different Standardized processing method, this disclosure does not limit this.

It should be noted that for more implementation methods and more details of the multi-scale cyclic sampling processing in step S430, please refer to the foregoing step S120 (ie, step S240) and the multi-scale cyclic sampling processing in the embodiment shown in FIGS. 4A-4D. This disclosure will not repeat the description. It should also be noted that when the multi-scale cyclic sampling processing in step S430 is implemented in other forms, the first sub-neural network 320 should be changed accordingly to implement other forms of multi-scale cyclic sampling processing, which will not be discussed in this disclosure. Repeat.

For example, in step S430, the number of second training feature images may be multiple, but is not limited thereto.

Step S440: Use the synthetic network to process the second training feature image to obtain a training output image.

For example, similar to the synthesis network in the foregoing step S250, the synthesis network 330 may be a convolutional neural network including one of a convolutional layer, a residual network, a dense network, and the like. For example, in some examples, the synthesis network may convert multiple second training feature images into training output images. For example, the training output image may include 3 channel RGB images, and the present disclosure includes but is not limited to this.

Step S450: Based on the training output image, calculate the loss value of the neural network through the loss function.

For example, the parameters of the neural network 300 include the parameters of the analysis network 310, the parameters of the first sub-neural network 320, and the parameters of the synthesis network 330. For example, the initial parameter of the neural network 300 may be a random number, for example, the random number conforms to a Gaussian distribution, which is not limited in the embodiment of the present disclosure.

For example, the loss function of this embodiment can refer to the loss function in the literature provided by Andrey Ignatov et al. For example, similar to the loss function in this document, the loss function can include a color loss function, a texture loss function, and a content loss function; accordingly, the specific process of calculating the loss value of the parameters of the neural network 300 through the loss function can also refer to this Description in the literature. It should be noted that the embodiment of the present disclosure does not limit the specific form of the loss function, which includes but is not limited to the form of the loss function in the above-mentioned documents.

Step S460: Correct the parameters of the neural network according to the loss value.

For example, the training process of the neural network 300 may also include an optimization function (not shown in FIG. 7C). The optimization function may calculate the error value of the parameters of the neural network 300 according to the loss value calculated by the loss function, and according to the error value The parameters of the neural network 300 are corrected. For example, the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (BGD) algorithm, etc., to calculate the error value of the parameters of the neural network 300.

For example, the training method of the neural network may further include: judging whether the training of the neural network satisfies a predetermined condition, if the predetermined condition is not met, repeating the above training process (ie, step S410 to step S460); if the predetermined condition is met, stopping the above During the training process, a trained neural network is obtained. For example, in one example, the foregoing predetermined condition is that the loss values corresponding to two consecutive (or more) training output images no longer decrease significantly. For example, in another example, the foregoing predetermined condition is that the number of training times or training cycles of the neural network reaches a predetermined number. This disclosure does not limit this.

For example, the training output image output by the trained neural network 300 retains the content of the training input image, but the quality of the training output image can be close to the quality of photos taken by a real digital single-lens reflex camera, that is, the training output image is high Quality image.

It should be noted that the above-mentioned embodiments only schematically illustrate the training process of the neural network. Those skilled in the art should know that in the training phase, a large number of sample images need to be used to train the neural network; at the same time, the training process of each sample image may include multiple iterations to correct the parameters of the neural network. For another example, the training phase also includes fine-tune the parameters of the neural network to obtain more optimized parameters.

The neural network training method provided by the embodiments of the present disclosure can train the neural network used in the image processing method of the embodiments of the present disclosure, and the neural network trained by the training method can perform image enhancement on low-quality input images Processing, by repeatedly sampling at multiple scales to obtain higher image fidelity, the quality of the output image can be greatly improved, and it is suitable for offline applications such as batch processing that require high image quality.

At least one embodiment of the present disclosure also provides an image processing device. FIG. 8 is a schematic block diagram of an image processing device provided by an embodiment of the present disclosure. For example, as shown in FIG. 8, the image processing apparatus 500 includes a memory 510 and a processor 520. For example, the memory 510 is used to non-temporarily store computer readable instructions, and the processor 520 is used to run the computer readable instructions. When the computer readable instructions are executed by the processor 520, the image processing method provided by the embodiments of the present disclosure is executed.

For example, the memory 510 and the processor 520 may directly or indirectly communicate with each other. For example, components such as the memory 510 and the processor 520 may communicate through a network connection. The network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network. The network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, etc. The wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi. The present disclosure does not limit the types and functions of the network here.

For example, the processor 520 may control other components in the image processing apparatus to perform desired functions. The processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities. The central processing unit (CPU) can be an X86 or ARM architecture. The GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard. The GPU can also be built into the central processing unit (CPU).

For example, the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory. Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example. The non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.

For example, one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions. The computer-readable storage medium may also store various application programs and various data, such as training input images, and various data used and/or generated by the application programs.

For example, when some computer instructions stored in the memory 510 are executed by the processor 520, one or more steps in the image processing method described above may be executed. For another example, when other computer instructions stored in the memory 510 are executed by the processor 520, one or more steps in the neural network training method described above may be executed.

For example, for a detailed description of the processing procedure of the image processing method, please refer to the relevant description in the embodiment of the above-mentioned image processing method, and for the detailed description of the processing procedure of the neural network training method, please refer to the above-mentioned embodiment of the neural network training method. The relevant description of, the repetition will not be repeated.

It should be noted that the image processing device provided by the above-mentioned embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the image processing device may also include other conventional components or structures, for example, to realize image processing. For necessary functions of the processing device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.

For the technical effects of the image processing device provided in the foregoing embodiment of the present disclosure, reference may be made to the corresponding description of the image processing method and the neural network training method in the foregoing embodiment, which will not be repeated here.

At least one embodiment of the present disclosure also provides a storage medium. FIG. 9 is a schematic diagram of a storage medium provided by an embodiment of the disclosure. For example, as shown in FIG. 9, the storage medium 600 non-transitory stores computer-readable instructions 601. When the non-transitory computer-readable instructions 601 are executed by a computer (including a processor), any of the embodiments of the present disclosure can be executed. Instructions for the image processing method.

For example, one or more computer instructions may be stored on the storage medium 600. Some computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the foregoing image processing method. The other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned neural network training method.

For example, the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), optical disk read only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.

For the technical effects of the storage medium provided by the embodiments of the present disclosure, reference may be made to the corresponding descriptions of the image processing method and the neural network training method in the above-mentioned embodiments, which will not be repeated here.

For this disclosure, the following points need to be explained:

(1) In the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are involved, and other structures can refer to the usual design.

(2) In the case of no conflict, the features of the same embodiment and different embodiments of the present disclosure can be combined with each other.

The above are only specific implementations of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed in the present disclosure, which shall cover Within the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

An image processing method, including:

Receiving the first characteristic image; and

Performing multi-scale cyclic sampling processing on the first feature image at least once;

Wherein, the multi-scale cyclic sampling processing includes nested first-level sampling processing and second-level sampling processing,

The first-level sampling processing includes first down-sampling processing, first up-sampling processing, and first residual link addition processing, wherein the first down-sampling processing performs down-sampling processing based on the input of the first-level sampling processing Obtain a first down-sampling output, the first up-sampling process performs an up-sampling process based on the first down-sampling output to obtain a first up-sampling output, and the first residual link addition process samples the first level The processed input and the first up-sampling output are subjected to a first residual link addition, and then the result of the first residual link addition is used as the output of the first-level sampling processing;

The second-level sampling process is nested between the first down-sampling process and the first up-sampling process, the first down-sampling output is received as the input of the second-level sampling process, and the second-level sampling is provided The processed output is used as the input of the first up-sampling process, so that the first up-sampling process performs up-sampling processing based on the first down-sampling output;

The second-level sampling process includes a second down-sampling process, a second up-sampling process, and a second residual link addition process, wherein the second down-sampling process is performed based on the input of the second-level sampling process. Sampling processing obtains a second down-sampling output, the second up-sampling processing performs up-sampling processing based on the second down-sampling output to obtain a second up-sampling output, and the second residual link addition processing adds the second The input of the hierarchical sampling process and the second up-sampling output are subjected to a second residual link addition, and then the result of the second residual link addition is used as the output of the second hierarchical sampling process.
The image processing method according to claim 1, wherein the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process;

The size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.
The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling processing further comprises a third-level sampling processing,

The third-level sampling process is nested between the second down-sampling process and the second up-sampling process, and the second down-sampling output is received as the input of the third-level sampling process to provide third-level sampling The processed output is used as the input of the second up-sampling process, so that the second up-sampling process performs up-sampling processing based on the second down-sampling output;

The third-level sampling process includes a third down-sampling process, a third up-sampling process, and a third residual link addition process, wherein the third down-sampling process performs down-sampling based on the input of the third-level sampling process. Sampling processing obtains a third down-sampling output, the third up-sampling processing performs up-sampling processing based on the third down-sampling output to obtain a third up-sampling output, and the third residual link addition processing adds the third The input of the hierarchical sampling process and the third up-sampling output are subjected to a third residual link addition, and then the result of the third residual link addition is used as the output of the third hierarchical sampling process.
The image processing method according to claim 1 or 2, wherein the multi-scale cyclic sampling processing includes the second-level sampling processing executed multiple times in sequence,

The first second-level sampling process receives the first down-sampled output as the input of the first second-level sampling process,

Each second-level sampling process except the first second-level sampling process receives the output of the previous second-level sampling process as the input of this second-level sampling process,

The output of the last second-level sampling process is used as the input of the first up-sampling process.
The image processing method according to any one of claims 1 to 4, wherein the at least one multi-scale cyclic sampling process includes the multi-scale cyclic sampling process executed sequentially multiple times,

Each time the input of the multi-scale cyclic sampling processing is used as the input of the first-level sampling processing in the multi-scale cyclic sampling processing this time, the first-level sampling in the multi-scale cyclic sampling processing each time The processed output is used as the output of the multi-scale cyclic sampling process;

The first multi-scale cyclic sampling process receives the first feature image as an input of the first multi-scale cyclic sampling process,

Each time the multi-scale cyclic sampling process except the first multi-scale cyclic sampling process receives the output of the previous multi-scale cyclic sampling process as the input of the multi-scale cyclic sampling process this time,

The output of the last multi-scale cyclic sampling process is used as the output of the at least one multi-scale cyclic sampling process.
The image processing method according to any one of claims 1-5, wherein the multi-scale cyclic sampling processing further comprises:

After the first down-sampling process, the first up-sampling process, the second down-sampling process, and the second up-sampling process, the first down-sampling output and the first up-sampling process are respectively output The output, the second down-sampling output and the second up-sampling output are subjected to instance standardization processing or layer standardization processing.
The image processing method according to any one of claims 1 to 6, further comprising: using a first convolutional neural network to perform the multi-scale cyclic sampling processing;

Wherein, the first convolutional neural network includes:

The first element network is used to perform the first-level sampling processing;

The second meta network is used to perform the second-level sampling processing.
The image processing method according to claim 7, wherein:

The first meta network includes:

The first sub-network is configured to perform the first down-sampling process;

The second sub-network is used to perform the first upsampling process;

The second element network includes:

The third sub-network is used to perform the second down-sampling process;

The fourth sub-network is used to perform the second upsampling process.
The image processing method according to claim 8, wherein each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes a convolutional layer, a residual One of network and dense network.
The image processing method according to claim 9, wherein each of the first sub-network, the second sub-network, the third sub-network, and the fourth sub-network includes an instance standardization layer or Layer standardization layer,

The instance standardization layer is used to perform instance standardization processing, and the layer standardization layer is used to perform layer standardization processing.
The image processing method according to any one of claims 1-10, further comprising:

Get the input image;

Use an analysis network to convert the input image into the first feature image; and:

A synthesis network is used to convert the output of the at least one multi-scale cyclic sampling process into an output image.
A neural network training method, in which,

The neural network includes: an analysis network, a first sub-neural network and a synthesis network,

The training method includes:

Obtain training input images;

Using the analysis network to process the training input image to provide a first training feature image;

Using the first sub-neural network to perform multi-scale cyclic sampling processing on the first training feature image at least once to obtain a second training feature image;

Use the synthesis network to process the second training feature image to obtain a training output image;

Based on the training output image, calculate the loss value of the neural network through a loss function; and

Correcting the parameters of the neural network according to the loss value;

Wherein, the multi-scale cyclic sampling processing includes nested first-level sampling processing and second-level sampling processing,

The first-level sampling processing includes first down-sampling processing, first up-sampling processing, and first residual link addition processing, wherein the first down-sampling processing performs down-sampling processing based on the input of the first-level sampling processing Obtain a first down-sampling output, the first up-sampling process performs an up-sampling process based on the first down-sampling output to obtain a first up-sampling output, and the first residual link addition process samples the first level The processed input and the first up-sampling output are subjected to a first residual link addition, and then the result of the first residual link addition is used as the output of the first-level sampling processing;

The second-level sampling process is nested between the first down-sampling process and the first up-sampling process, the first down-sampling output is received as the input of the second-level sampling process, and the second-level sampling is provided The processed output is used as an input of the first up-sampling process, so that the first up-sampling process performs up-sampling processing based on the first down-sampling output;

The second-level sampling process includes a second down-sampling process, a second up-sampling process, and a second residual link addition process, wherein the second down-sampling process is performed based on the input of the second-level sampling process. Sampling processing obtains a second down-sampling output, the second up-sampling processing performs up-sampling processing based on the second down-sampling output to obtain a second up-sampling output, and the second residual link addition processing adds The input of the hierarchical sampling process and the second up-sampling output are subjected to a second residual link addition, and then the result of the second residual link addition is used as the output of the second hierarchical sampling process.
The training method according to claim 12, wherein the size of the output of the first upsampling process is the same as the size of the input of the first downsampling process;

The size of the output of the second upsampling process is the same as the size of the input of the second downsampling process.
The training method according to claim 12 or 13, wherein the multi-scale cyclic sampling processing further comprises a third-level sampling processing,

The third-level sampling process is nested between the second down-sampling process and the second up-sampling process, and the second down-sampling output is received as the input of the third-level sampling process to provide third-level sampling The processed output is used as the input of the second up-sampling process, so that the second up-sampling process performs up-sampling processing based on the second down-sampling output;

The third-level sampling process includes a third down-sampling process, a third up-sampling process, and a third residual link addition process, wherein the third down-sampling process performs down-sampling based on the input of the third-level sampling process. Sampling processing obtains a third down-sampling output, the third up-sampling processing performs up-sampling processing based on the third down-sampling output to obtain a third up-sampling output, and the third residual link addition processing adds the third The input of the hierarchical sampling process and the third up-sampling output are subjected to a third residual link addition, and then the result of the third residual link addition is used as the output of the third hierarchical sampling process.
The training method according to claim 12 or 13, wherein the multi-scale cyclic sampling processing includes the second-level sampling processing executed multiple times in sequence,

The first second-level sampling process receives the first down-sampled output as the input of the first second-level sampling process,

Each second-level sampling process except the first second-level sampling process receives the output of the previous second-level sampling process as the input of this second-level sampling process,

The output of the last second-level sampling process is used as the input of the first up-sampling process.
The training method according to any one of claims 12-15, wherein the at least one multi-scale cyclic sampling process comprises the multi-scale cyclic sampling process executed sequentially multiple times,

Each time the input of the multi-scale cyclic sampling processing is used as the input of the first-level sampling processing in the multi-scale cyclic sampling processing this time, the first-level sampling in the multi-scale cyclic sampling processing each time The processed output is used as the output of the multi-scale cyclic sampling process;

The first multi-scale cyclic sampling process receives the first training feature image as the input of the first multi-scale cyclic sampling process,

Each time the multi-scale cyclic sampling process except the first multi-scale cyclic sampling process receives the output of the previous multi-scale cyclic sampling process as the input of the multi-scale cyclic sampling process this time,

The output of the last multi-scale cyclic sampling process is used as the output of the at least one multi-scale cyclic sampling process.
The training method according to any one of claims 12-16, wherein the multi-scale cyclic sampling processing further comprises:

After the first down-sampling process, the first up-sampling process, the second down-sampling process, and the second up-sampling process, the first down-sampling output and the first up-sampling process are respectively output The output, the second down-sampling output, and the second up-sampling output are subjected to instance standardization processing or layer standardization processing.
An image processing device including:

Memory for non-temporary storage of computer-readable instructions; and

The processor is configured to run the computer-readable instructions, and when the computer-readable instructions are executed by the processor, the image processing method according to any one of claims 1-11 or any one of claims 12-17 is executed. One of the training methods of neural network.
A storage medium that non-temporarily stores computer-readable instructions, which can execute the instructions of the image processing method according to any one of claims 1-11 or according to claim 12 when the computer-readable instructions are executed by a computer -17 The instruction of any one of the neural network training method.