CN117934277A

CN117934277A - Image super-resolution reconstruction method and system based on deep neural network

Info

Publication number: CN117934277A
Application number: CN202410089096.7A
Authority: CN
Inventors: 陈学宇; 邓磊
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2024-01-22
Filing date: 2024-01-22
Publication date: 2024-04-26

Abstract

The disclosure describes a depth neural network-based image super-resolution reconstruction method and system, the method comprising inputting an image to be processed into a trained neural network model to obtain a target image, the neural network model comprising a first extraction module, a first upsampling module, a second upsampling module, and a residual connection module, the first extraction module comprising a residual module and configured to obtain deep feature information, the residual module comprising at least one depth separable convolution layer and configured to obtain residual feature information, the first upsampling module channel-fusing the deep feature information to obtain fused feature information and pixel-recombining the fused feature information to obtain a recombined image, the second upsampling module performing interpolation upsampling on the image to be processed to obtain an upsampled image, the residual connection module performing residual connection on the upsampled image and the recombined image to obtain the target image. According to the method and the device, the demand of the neural network model on computing resources can be reduced.

Description

Image super-resolution reconstruction method and system based on deep neural network

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to an image super-resolution reconstruction method and system based on a deep neural network.

Background

With the continuous development of technology, image and video technologies are increasingly used in various fields. However, due to limitations of imaging devices, transmission channels and other factors, the acquired images or videos often have the problem of insufficient resolution, which results in unclear image details and poor visual effects. Therefore, an image super-resolution reconstruction method has been developed.

The image super-resolution reconstruction method refers to a method for generating a corresponding high-resolution image based on a low-resolution image through algorithm processing. In recent years, with the rapid development of deep learning technology, neural network models are widely applied in the field of image super-resolution reconstruction. The image super-resolution reconstruction method based on the neural network establishes a mapping relation between a large number of low-resolution image pairs and high-resolution image pairs by learning, thereby realizing the conversion from the low-resolution image to the high-resolution image. Compared with the traditional interpolation method, the neural network model has stronger learning ability and self-adaptive ability, can better maintain the details of the image and improves the reconstruction effect.

However, the existing image super-resolution reconstruction method based on the neural network generally has large calculation amount, needs a large amount of calculation resources, and is difficult to be deployed in devices with limited calculation resources (such as embedded devices or FPGA (Field-Programmable gate array) devices). Therefore, how to reduce the demand of the neural network model on the computing resources, so as to realize the efficient image super-resolution reconstruction, is a problem to be solved.

Disclosure of Invention

The present disclosure has been made in view of the above-mentioned circumstances, and an object thereof is to provide a deep neural network-based image super-resolution reconstruction method and system capable of reducing the demand of a neural network model for computational resources.

To this end, a first aspect of the present disclosure provides a depth neural network-based image super-resolution reconstruction method, including: inputting an image to be processed into a trained neural network model to obtain a target image corresponding to the image to be processed, wherein the resolution of the target image is higher than that of the image to be processed, the neural network model comprises a first extraction module, a first upsampling module, a second upsampling module and a residual connecting module, the first extraction module comprises a residual module and is configured to perform deep feature extraction on initial feature information related to the image to be processed through the residual module to obtain deep feature information, the residual module comprises at least one depth separable convolution layer and is configured to perform residual connection on the output of the last depth separable convolution layer in the at least one depth separable convolution layer with the input of the residual module to obtain residual feature information, the input of the first residual module is the initial feature information, the residual feature information output by the last residual module is used as the deep feature information, the first upsampling module is configured to perform channel fusion on the deep feature information to obtain fusion feature information, the pixel is recombined to perform deep feature extraction on the initial feature information related to the image to obtain the deep feature information, the image is configured to perform the recombination on the image to obtain the image to be recombined, the image is configured to be connected with the input of the residual module to obtain the residual feature information, the image is configured to be recombined, and the image is sampled to be sampled, the image is connected with the image to be sampled.

In a first aspect of the disclosure, the residual module comprises at least one depth separable convolutional layer and is configured to residual-connect an output of a last depth separable convolutional layer of the at least one depth separable convolutional layer with an input of the residual module to obtain residual feature information, the residual connection module being configured to residual-connect the upsampled image with the reconstructed image to obtain the target image. In this case, the residual module includes a depth separable convolution layer, and the depth separable convolution layer can perform convolution operation on a single channel of the feature information by decomposing the conventional convolution operation, so that the calculation amount can be reduced, and thus, the demand of the neural network model on the calculation resources can be reduced. In addition, the up-sampling image obtained by interpolation up-sampling of the image to be processed is connected with the recombined image in a residual way, so that the color information of the image to be processed can be restored, more original information of the image to be processed can be reserved, the information loss of the target image relative to the image to be processed can be reduced, and the accuracy of the target image can be improved. In addition, the residual module carries out residual connection on the output of the last depth separable convolution layer and the input of the residual module to obtain residual characteristic information, so that the residual module can be beneficial to retaining more original information in the input, and the accuracy of a target image can be further improved.

In addition, in the depth neural network-based image super-resolution reconstruction method according to the first aspect of the present disclosure, optionally, the neural network model further includes a second extraction module, and the second extraction module is configured to perform shallow feature extraction on the image to be processed to obtain the initial feature information, where the number of channels of the shallow feature information is multiple. In this case, shallow feature extraction is performed on the image to be processed first, so that the calculation amount of the first extraction module can be reduced compared with deep feature extraction performed on the image to be processed directly.

In addition, in the image super-resolution reconstruction method based on a deep neural network according to the first aspect of the present disclosure, optionally, the initial feature information is feature information of a luminance channel image of the image to be processed. In this case, the feature information of the luminance channel image of the image to be processed is extracted, and the calculation amount can be further reduced as compared with the feature extraction of the multi-channel image directly.

In addition, in the depth neural network-based image super-resolution reconstruction method according to the first aspect of the present disclosure, optionally, the number of the residual modules is a plurality of, and a plurality of the residual modules are cascaded, where inputs of residual modules except for a first one of the plurality of residual modules are outputs of a last residual module. In this case, the plurality of residual modules in the first extraction module are cascaded, so that the feature extraction capability of the first extraction module can be improved, and the accuracy of deep feature information can be improved.

In addition, in the image super-resolution reconstruction method based on a depth neural network according to the first aspect of the present disclosure, optionally, the number of the depth separable convolutional layers in the residual module is multiple and includes a first activation module, where the first activation module includes an activation function and is between two of the depth separable convolutional layers, and the activation function in the first activation module is used to convert an input value into a new value within a first preset interval, where the first preset interval has an upper bound and a lower bound. In this case, the activation function is between two depth separable convolution layers, and by the activation function, the next depth separable convolution layer can capture the nonlinear relationship in the output of the previous depth separable convolution layer, so that the modeling capability of the neural network model on the complex relationship can be further improved, and the accuracy of the target image can be further improved. In addition, the first preset interval is provided with an upper bound and a lower bound, so that new values can be in the same order of magnitude, and different features can be compared and weighted more easily, so that the calculation efficiency of a subsequent depth separable convolution layer can be improved, and the requirement of a neural network model on calculation resources can be reduced.

In addition, in the depth neural network-based image super-resolution reconstruction method according to the first aspect of the present disclosure, optionally, the residual module further includes a second activation module, and the second activation module includes an activation function and is configured to take as input an output of a residual connection in the residual module to output the residual feature information. In this case, the output of the residual connection is input to the activation function to output residual feature information, so that the residual module can further capture the nonlinear relationship in the feature information, and the accuracy of the target image can be further improved.

In addition, in the image super-resolution reconstruction method based on a deep neural network related to the first aspect of the present disclosure, optionally, an activation function in the second activation module is used to convert an input numerical value into a new numerical value within a second preset interval, where the second preset interval has an upper bound and a lower bound; and/or an activation function in the depth separable convolutional layer is used to convert the input value to a new value within a third preset interval, the third preset interval having an upper bound and a lower bound. In this case, the input value is converted into a new value in the second preset interval or the input value is converted into a new value in the third preset interval, so that the new values are in the same order of magnitude, the calculation efficiency of the neural network model can be improved, and the demand of the neural network model for calculation resources can be reduced. In addition, when the input numerical value is converted into a new numerical value in the second preset interval and the input numerical value is converted into a new numerical value in the third preset interval, the new numerical values can be further in the same order of magnitude, the calculation efficiency of the neural network model can be further improved, and therefore the demand of the neural network model on calculation resources can be further reduced.

In addition, in the depth neural network-based image super-resolution reconstruction method according to the first aspect of the present disclosure, optionally, the neural network model further includes a truncation module and a quantization module; the truncation module is configured to truncate a value in target information of the neural network model according to a truncation value as a maximum value, wherein the truncation value is an integer exponent power of 2, and the target information comprises at least one of characteristic information and network parameters; the quantization module is configured to represent the truncated value with a fixed point number. In this case, the truncation and quantization of the values can improve the applicability of the neural network model to images of different data types. In addition, the truncated value is an integer exponent power of 2, which can be beneficial to the computer to represent the floating point number by a fixed point number, and can be beneficial to reducing quantization errors during subsequent quantization.

In addition, in the image super-resolution reconstruction method based on a deep neural network according to the first aspect of the present disclosure, optionally, the number of channels of the deep feature information is the same as the number of channels of the shallow feature information, and the number of channels of the fused feature information is smaller than the number of channels of the deep feature information. In this case, the number of channels of the fusion feature information is smaller than that of the deep feature information, and the calculation amount can be reduced when the fusion feature information is used for the subsequent pixel reorganization.

A second aspect of the present disclosure provides an image super-resolution reconstruction system based on a deep neural network, including an acquisition module and a reconstruction module; the acquisition module is configured to acquire an image to be processed; the reconstruction module is configured to obtain a target image corresponding to the image to be processed using a trained neural network model, the target image having a resolution higher than a resolution of the image to be processed, wherein the neural network model comprises a first extraction module, a first upsampling module, a second upsampling module, and a residual connection module, the first extraction module comprising a residual module and configured to deep feature extract initial feature information related to the image to be processed by the residual module to obtain deep feature information, wherein the residual module comprises at least one depth separable convolutional layer and is configured to residual connect an output of a last depth separable convolutional layer of the at least one depth separable convolutional layer with an input of the residual module to obtain residual feature information, the input of the first residual module is the initial characteristic information, the residual characteristic information output by the last residual module is used as the deep characteristic information, the first upsampling module is configured to perform channel fusion on the deep characteristic information to obtain fusion characteristic information, and perform pixel recombination on the fusion characteristic information to obtain a recombined image, the resolution of the recombined image is the same as the resolution of the target image, the second upsampling module is configured to perform interpolation upsampling on the image to be processed to obtain an upsampled image, the upsampled image is the same as the resolution of the target image, and the residual connecting module is configured to perform residual connection on the upsampled image and the recombined image to obtain the target image. In this case, the residual module includes a depth separable convolution layer, and the depth separable convolution layer performs a convolution operation on a single channel of the feature information by decomposing a conventional convolution operation with respect to the conventional convolution layer, so that the calculation amount can be reduced, and thus, the demand of the neural network model for calculation resources can be reduced. In addition, the up-sampling image obtained by interpolation up-sampling the image to be processed is connected with the recombined image in a residual way, so that the color information of the image to be processed can be restored, more original information can be reserved, and the information loss of the target image can be reduced.

According to the method and the system for reconstructing the super-resolution of the image based on the deep neural network, the demand of the neural network model on computing resources can be reduced.

Drawings

The present disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings.

Fig. 1 is a schematic diagram showing an application scenario of a reconstruction method according to an example of the present disclosure.

Fig. 2 is a block diagram showing the structure of embodiment 1 of the neural network model according to the example of the present disclosure.

Fig. 3 is a schematic diagram illustrating obtaining a target image using a neural network model according to an example of the present disclosure.

Fig. 4 is a schematic diagram showing residual modules in a neural network model to which examples of the present disclosure relate.

Fig. 5 is a schematic diagram illustrating a depth separable convolutional layer in a neural network model to which examples of the present disclosure relate.

Fig. 6 is a block diagram showing the structure of embodiment 2 of the neural network model according to the example of the present disclosure.

Fig. 7 is a schematic diagram illustrating truncation and quantization in a neural network model according to an example of the present disclosure.

Fig. 8 is a flowchart illustrating reconstruction of an image to be processed using a neural network model to obtain a target image, which is related to an example of the present disclosure.

Fig. 9 is a block diagram illustrating a construction of a reconstruction system according to an example of the present disclosure.

Detailed Description

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same members are denoted by the same reference numerals, and overlapping description thereof is omitted. In addition, the drawings are schematic, and the ratio of the sizes of the components to each other, the shapes of the components, and the like may be different from actual ones.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in this disclosure, such as a process, method, system, article, or apparatus that comprises or has a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include or have other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The present disclosure provides an image super-resolution reconstruction method (hereinafter may be simply referred to as a reconstruction method or method) based on a deep neural network, which may obtain a high-resolution image corresponding to a low-resolution image, that is, super-resolution image reconstruction. According to the image super-resolution reconstruction method based on the depth neural network, the depth separable convolution layer and residual connection are used for reconstructing the image super-resolution, so that the calculated amount can be reduced, and the demand of the neural network model on calculation resources can be reduced.

The image super-resolution reconstruction system based on the deep neural network can be called as a reconstruction system or a system for short.

The feature information to which the present disclosure relates refers to various information extracted from an image to be processed for describing, identifying, and analyzing the content of the image. In some examples, the feature information may include at least one of color features, texture features, shape features, spatial relationship features, and semantic features. In some examples, the feature information may be represented in the form of tensors, vectors, matrices, or the like. In some examples, the characteristic information may be the image itself to be processed.

Hereinafter, a reconstruction method according to the present disclosure will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, in some examples, the reconstruction methods referred to by the examples of this disclosure may be applied to a computing device 100. The computing device 100 may be used to perform a reconstruction method, to which examples of the present disclosure relate, that may input the image to be processed 10 to a trained neural network model 200 (described later) to obtain a target image 20 corresponding to the image to be processed 10. In some examples, the resolution of the target image 20 may be higher than the resolution of the image 10 to be processed. That is, the image to be processed 10 may be a low resolution image, and the target image 20 may be a high resolution image. For example, the resolution of the image to be processed 10 may be 1024×1024, and the resolution of the target image 20 may be 2048×2048. In some examples, the image to be processed 10 may be a frame of image in a video. In some examples, the image to be processed 10 may be from any electronic device.

In some examples, the neural network model 200 may be trained in advance. In some examples, the neural network model 200 may be trained using a training data set. In some examples, a DIV2K dataset may be used as the training dataset. The DIV2K dataset is a training dataset that is commonly used.

In some examples, the images in the training dataset may be flipped or rotated. In some examples, the images in the training dataset may be double-three downsampled to obtain a low-resolution image as the image to be processed 10, the original image corresponding thereto as the target image 20, and the neural network model 200 is trained with the image to be processed 10 and the corresponding target image 20.

In some examples, adam (Adaptive Moment Estimation ) optimizers may be used in the neural network model 200 for optimization. In some examples, the loss function in the neural network model 200 may employ an L1 (Least Absolute Deviation, minimum absolute deviation) loss function.

As described above, the reconstruction method to which the present examples relate may input the image to be processed 10 to the trained neural network model 200 to obtain the target image 20 corresponding to the image to be processed 10.

Fig. 2 is a block diagram showing the structure of embodiment 1 of a neural network model 200 according to an example of the present disclosure. Fig. 3 is a schematic diagram illustrating obtaining a target image 20 using a neural network model 200 in accordance with an example of the present disclosure.

In some examples, referring to fig. 2 and 3, the neural network model 200 may include a first extraction module 210, a first upsampling module 220, a second upsampling module 230, and a residual connection module 240.

In some examples, the first extraction module 210 may be configured to obtain deep feature information related to the image 10 to be processed. The first upsampling module 220 may be configured to obtain a reconstructed image based on the deep feature information. The second upsampling module 230 may be configured to upsample the image to be processed 10 to obtain an upsampled image. The residual connection module 240 may be configured to residual connect the upsampled image with the reconstructed image to obtain the target image 20.

In some examples, the first extraction module 210 may include a residual module 211. In some examples, the first extraction module 210 may be configured to perform deep feature extraction on the initial feature information related to the image to be processed 10 by the residual module 211 to obtain deep feature information.

In some examples, the initial characteristic information may represent shallow characteristic information of the image 10 to be processed. The deep feature information may represent information that may express richer features of the image 10 to be processed by performing a certain process on the initial feature information on the basis of the initial feature information.

In some examples, the initial characteristic information may be the image 10 itself to be processed. In some examples, the initial feature information may be feature information obtained by feature extraction of the image to be processed 10.

Referring to fig. 3, in some examples, the neural network model 200 may further include a second extraction module 250. The second extraction module 250 may be configured to perform shallow feature extraction on the image 10 to be processed to obtain initial feature information. In this case, the shallow feature extraction is performed on the image to be processed 10 first, and the calculation amount of the first extraction module 210 can be reduced compared to the deep feature extraction performed on the image to be processed 10 directly.

In some examples, the second extraction module 250 may include at least one convolution layer. In some examples, the convolutional layer may be a conventional convolutional layer. In some examples, the convolution layer may be a depth separable convolution layer 212 (described later). In some examples, the second extraction module 250 may include a plurality of convolutional layers. For example, the number of convolution layers may be 1,2,3, 4,5, or 10, etc. In some examples, multiple convolutional layers may be cascaded. In this case, the feature extraction capability of the second extraction module 250 can be improved, so that the accuracy of the target image 20 can be improved.

In some examples, the number of channels of the shallow feature information may be multiple. For example, the number of channels of the shallow feature information may be 1,2,3, 4, 5, 10, 16, or the like. In this case, the information amount of the shallow feature information can be increased, and the accuracy of the target image 20 can be advantageously increased by subsequently processing the multi-channel shallow feature information.

In some examples, the initial characteristic information may be characteristic information of a luminance channel image of the image to be processed 10. In some examples, the image to be processed 10 may be a multi-channel image. The luminance channel image may be an image of only one channel containing luminance information in the image to be processed 10. In this case, the feature information of the luminance channel image of the image to be processed 10 is extracted, and the calculation amount can be further reduced than the feature extraction of the multi-channel image directly.

The luminance channel image includes more texture information of the image 10 to be processed than the images of the other channels. In general, the human eye is more sensitive to brightness variations, and brightness channel images are more likely to affect the perceived quality of the target image 20 relative to images of other channels. In this case, the feature information of the luminance channel image of the image to be processed 10 is extracted, and the perceived quality of the target image 20 can be made better while the amount of calculation is reduced.

As described above, the first extraction module 210 may include the residual module 211. The residual module 211 may be used to perform deep feature extraction on the initial feature information to obtain deep feature information.

In some examples, the number of residual modules 211 may be multiple. For example, the number of residual modules 211 may be 1, 2,3, 4, 5, 10, or the like. In some examples, the plurality of residual modules 211 may be cascaded, wherein an input of a residual module 211 other than the first one of the plurality of residual modules 211 may be an output of a last residual module 211. In this case, the plurality of residual modules 211 in the first extraction module 210 are cascaded, so that the feature extraction capability of the first extraction module 210 can be improved, and the accuracy of deep feature information can be advantageously improved.

In some examples, the input to the first residual module 211 may be initial characteristic information. The residual feature information output by the last residual module 211 may be used as deep feature information. In some examples, when the number of residual modules 211 is 1, the input of the 1 residual modules 211 may be initial feature information, and the residual feature information output by the residual modules 211 may be regarded as deep feature information.

Fig. 4 is a schematic diagram illustrating a residual module 211 in a neural network model 200 according to an example of the present disclosure.

Referring to fig. 4, in some examples, the residual module 211 may include at least one depth separable convolutional layer 212. The depth separable convolution layer 212 may perform a convolution operation on individual channels of the characteristic information, respectively. In this case, the residual module 211 includes the depth separable convolution layer 212, and the depth separable convolution layer 212 can perform a convolution operation on a single channel of the feature information by decomposing a conventional convolution operation with respect to the conventional convolution layer, so that the calculation amount can be reduced, and thus the demand for calculation resources by the neural network model 200 can be reduced.

In some examples, residual module 211 may be configured to residual connect an output of a last depth-separable convolutional layer 212 of the at least one depth-separable convolutional layer 212 with an input of residual module 211 to obtain residual characteristic information. In this case, it can be advantageous for the residual module 211 to retain more original information in the input, so that the accuracy of the target image 20 can be further improved.

In some examples, the output of the depth-separable convolutional layer 212 may refer to characteristic information of the output of the depth-separable convolutional layer 212. The input of the residual module 211 may refer to the characteristic information input by the residual module 211. In some examples, residual connection may refer to adding the values of corresponding locations in the feature information separately.

In some examples, the number of depth separable convolutional layers 212 in residual module 211 may be multiple. For example, the number of depth separable convolutional layers 212 may be 1, 2, 3, 4, 5, or 10, etc. In some examples, the residual module 211 may include a first activation module 213. In some examples, the number of depth separable convolutional layers 212 in residual module 211 may be multiple and may include first activation module 213.

In some examples, the first activation module 213 may include an activation function and may be between two depth separable convolutional layers 212. In this case, the activation function is between the two depth-separable convolutional layers 212, and by the activation function, the next depth-separable convolutional layer 212 can capture the nonlinear relationship in the output of the previous depth-separable convolutional layer 212, and the modeling capability of the neural network model 200 on the complex relationship can be further improved, so that the accuracy of the target image 20 can be further improved.

In some examples, the activation function in the first activation module 213 may be used to convert the entered value to a new value within the first preset interval. The first preset interval may have an upper bound and a lower bound. In this case, the first preset interval has an upper bound and a lower bound, so that new values can be in the same order of magnitude, and different features can be compared and weighted more easily, so that the calculation efficiency of the subsequent depth separable convolution layer 212 can be improved, and the requirement of the neural network model 200 on calculation resources can be reduced.

In some examples, the upper bound of the first preset interval may be no greater than 1 and the lower bound of the first preset interval may be no less than-1. In some examples, the activation function in the first activation module 213 may be a saturation activation function. For example, the activation function in the first activation module 213 may be a hard hyperbolic tangent function.

In some examples, the activation function in the neural network model 200 may be a saturation activation function. A saturation activation function may refer to an activation function having a value range with an upper bound and a lower bound. In some examples, the saturation activation function may include a hard hyperbolic tangent function, a hyperbolic tangent function, an anti-cotangent function, a Sigmoid (S-shaped) function, a clip-ReLU (clip-RECTIFIED LINEAR Unit), and the like. Preferably, the activation function may be a hard hyperbolic tangent function. The hard hyperbolic tangent function may map values into the interval of-1 to 1. That is, the hard hyperbolic tangent function may non-linearly truncate the value to a range of-1 to 1. The hard hyperbolic tangent function may enable the neural network model 200 to distribute values as much as possible within an order of magnitude of the interval approaching-1 to 1 through learning during training.

In this case, the hard hyperbolic tangent function is used, and the exponential calculation is not included in comparison with the Sigmoid function, so that the calculation amount can be reduced. In addition, the hard hyperbolic tangent function is adopted to map the numerical value into the interval from-1 to 1, so that the numerical value distribution can be limited to the interval with higher floating point number precision (namely the interval from-1 to 1), and the precision loss caused by quantization can be reduced during the subsequent quantization.

In some examples, the hard hyperbolic tangent function may satisfy the following formula:

HardTanh(x)＝max(min(1,x),-1)，

Wherein HardTanh may represent a hard hyperbolic tangent function, x may represent a numerical value, max may represent a maximum function, and min may represent a minimum function.

In some examples, the clip-ReLU function may satisfy the following formula:

clip-ReLU(x)＝max(min(x,max_value),min_value)，

Wherein clip-ReLU may represent a clip-ReLU function, max_value may represent an upper bound, and min_value may represent a lower bound. The upper and lower bounds may be preset.

In some examples, residual module 211 may also include a second activation module 214. In some examples, the second activation module 214 may include an activation function and may be configured to take as input the output of the residual connection in the residual module 211 to output residual characteristic information. In this case, the residual module 211 can be further caused to capture the nonlinear relationship in the feature information, so that the accuracy of the target image 20 can be further improved.

In some examples, the activation function in the second activation module 214 may be used to convert the entered value to a new value within a second preset interval. The second preset interval may have an upper bound and a lower bound. In this case, the new values can be made to be in the same order of magnitude, and the calculation efficiency of the neural network model 200 can be improved, so that the demand for calculation resources by the neural network model 200 can be reduced.

In some examples, the upper bound of the second preset interval may be no greater than 1 and the lower bound of the second preset interval may be no less than-1. In some examples, the activation function in the second activation module 214 may be a saturation activation function. For example, the activation function in the second activation module 214 may be a hard hyperbolic tangent function.

Fig. 5 is a schematic diagram illustrating a depth separable convolutional layer 212 in a neural network model 200 in accordance with examples of the present disclosure.

Referring to fig. 5, in some examples, the depth separable convolutional layer 212 may include at least one depth convolutional layer 2121 and a point-wise convolutional layer 2122. The at least one depth convolution layer 2121 may be configured to convolve the feature information input by the depth separable convolution layer 212 to obtain first intermediate feature information. The point-wise convolution layer 2122 may be configured to perform a point-wise convolution on the first intermediate feature information to obtain second intermediate feature information.

In some examples, depth separable convolutional layer 212 may include an activation function. In some examples, the second intermediate feature information may be input to an activation function to obtain feature information output by the depth separable convolutional layer 212. In this case, the depth separable convolution layer 212 can capture the nonlinear relationship in the input, thereby enabling an improvement in the accuracy of the target image 20.

In some examples, the activation function 2123 in the depth separable convolutional layer 212 may be used to convert the input value to a new value within a third preset interval. The third preset interval may have an upper bound and a lower bound. In this case, the third preset interval has an upper bound and a lower bound, which can make the new values in the same order of magnitude, so that the calculation efficiency of the subsequent depth separable convolution layer 212 can be improved, and further, the requirement of the neural network model 200 on the calculation resources can be reduced.

In some examples, the upper bound of the third preset interval may be no greater than 1 and the lower bound of the third preset interval may be no less than-1. In some examples, the activation function in the third activation module may be a saturation activation function. For example, the activation function in the third activation module may be a hard hyperbolic tangent function.

In some examples, the activation function in the second activation module 214 may be used to convert the input value to a new value within a second preset interval, which may have an upper bound and a lower bound, and the activation function 2123 in the depth separable convolution layer 212 may be used to convert the input value to a new value within a third preset interval, which may have an upper bound and a lower bound. In this case, the new values can be further made to be in the same order of magnitude, and the calculation efficiency of the neural network model 200 can be further improved, so that the demand for calculation resources by the neural network model 200 can be further reduced.

In some examples, the first upsampling module 220 may be configured to channel fuse the deep feature information to obtain fused feature information. Channel fusion may refer to convolving deep feature information to obtain fused feature information having fewer channels than deep feature information.

In some examples, the first upsampling module 220 may include at least one convolutional layer. In some examples, the first upsampling module 220 may convolve the deep feature information with a convolution layer to achieve channel fusion. In some examples, the convolutional layer may be a conventional convolutional layer. In some examples, the convolution layer may be a depth separable convolution layer 212.

In some examples, the number of channels of deep feature information may be the same as the number of channels of shallow feature information. For example, the number of channels of the shallow feature information may be 16, and the number of channels of the deep feature information may be 16. This can improve the calculation efficiency.

In some examples, the number of channels of the fused feature information may be less than the number of channels of the deep feature information. For example, the number of channels of the deep feature information may be 16, and the number of channels of the fused feature information may be 4. In this case, the calculation amount can be reduced when the pixel reorganization is performed using the fusion characteristic information later.

In some examples, the first upsampling module 220 may perform pixel rebinning on the fused feature information to obtain a rebinned image. Pixel reorganization may refer to the rearrangement and combination of values (i.e., pixel points) in the fused feature information. In some examples, the resolution of the reconstructed image may be the same as the resolution of the target image 20. In this case, when the resolution of the up-sampled image and the target image 20 is the same, the subsequent residual connection of the up-sampled image and the reorganized image can be facilitated to obtain the target image 20.

In some examples, the number of channels of the reconstructed image may be less than the number of channels of the fused feature information. For example, the number of channels of the fusion feature information may be 4, and the number of channels of the reorganized image may be 1.

In some examples, the resolution of the reconstructed image may be the same as the resolution of the target image 20. In some examples, the size of the target image 20 may be larger than the size of the image 10 to be processed. In other words, the resolution of the target image 20 may be greater than the resolution of the image 10 to be processed. In some examples, the size of the reorganized image may be greater than the size of the image 10 to be processed. For example, the size of the image to be processed 10 is 1024×1024, the size of the recombined image may be 2048×2048, and the size of the target image 20 may be 2048×2048.

In some examples, the second upsampling module 230 may be configured to interpolate up-sample the image to be processed 10 to obtain an up-sampled image. Interpolation upsampling is a method of upsampling without using a convolution layer. In some examples, when the image to be processed 10 is a multi-channel image (e.g., a three-channel image), the images of the multiple channels of the image to be processed 10 may be upsampled. That is, the upsampled image may be a multi-channel image.

In some examples, the upsampled image may be the same resolution as the target image 20. For example, the size of the image to be processed 10 is 1024×1024, the size of the recombined image may be 2048×2048, the size of the up-sampled image may be 2048×2048, and the size of the target image 20 may be 2048×2048. In this case, when the resolution of the reconstructed image and the target image 20 is the same, the subsequent residual connection of the upsampled image and the reconstructed image can be facilitated to obtain the target image 20.

In some examples, the method of interpolation up-sampling may include any one of nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. In this case, the image to be processed 10 can be interpolated up-sampled, and the amount of calculation can be reduced as compared with up-sampling using a convolution layer.

In some examples, residual connection module 240 may be configured to residual connect the upsampled image with the reconstructed image to obtain target image 20. Residual connection may refer to adding up the values of the up-sampled image and the corresponding positions of the reconstructed image. In this case, the color information of the image to be processed 10 can be restored, and further, it is also advantageous to retain more original information of the image to be processed 10, so that the information loss of the target image 20 relative to the image to be processed 10 can be reduced, and the accuracy of the target image 20 can be improved.

In some examples, when the initial feature information is the feature information of the luminance channel image of the image to be processed 10, the reconstructed image obtained by processing the initial feature information may represent the feature information of the luminance channel image of the image to be processed 10.

In some examples, when the upsampled image is a multi channel image, in the process of residual connecting the upsampled image to the reconstructed image, the reconstructed image is residual connected to the luminance channel image of the upsampled image. Specifically, in the residual connection, the values of the positions corresponding to the recombined image and the luminance channel image may be added, and the channel images except for the luminance channel image in the up-sampled image may be kept unchanged.

In addition, the target image 20 may be the output of the residual connection. In some examples, the target image 20 may be a multi-channel image. In some examples, the number of channels of the target image 20 may be the same as the number of channels of the image 10 to be processed. In this case, when the initial feature information is the feature information of the luminance channel image of the image to be processed 10, the calculation amount can be reduced. In addition, residual connection is performed to obtain a multi-channel target image 20, so that color information of the image 10 to be processed can be restored.

In some examples, the neural network model 200 may not include an attention module. In this case, the attention module is generally calculated in a relatively large amount, and the neural network model 200 does not include the attention module, so that the calculation amount can be reduced, and thus the demand of the neural network model 200 for computing resources can be reduced, so that it can be advantageous to deploy the neural network model 200 in a device with limited computing resources (for example, an embedded device or an FPGA (Field-Programmable gate array) device). In addition, the memory requirements of the neural network model 200 can be reduced. In the example of the present disclosure, by obtaining residual feature information by using residual connection in the residual module 211 and performing residual connection of the up-sampled image obtained by interpolating up-sampling the image to be processed 10 and the reorganized image, the accuracy of the target image 20 can be improved without depending on the attention module.

In some examples, the neural network model 200 may also include a truncation module 260 and a quantization module 270. In this case, the numeric value can be converted into a fixed point number by the truncation module 260 and the quantization module 270, so that the applicability of the neural network model 200 to images of different data types can be improved.

Fig. 6 is a block diagram showing the configuration of embodiment 2 of the neural network model 200 according to the example of the present disclosure.

Referring to fig. 6, in some examples, the neural network model 200 may also include a truncation module 260 and a quantization module 270. In this case, the values can be truncated and quantized by the truncation module 260 and the quantization module 270, and the applicability of the neural network model 200 to images of different data types can be improved.

In some examples, the truncation module 260 may be configured to truncate a value in the target information of the neural network model 200 according to a truncated value as a maximum value. In some examples, the truncated value may be an integer exponent power of 2. The target information may include at least one of characteristic information and network parameters. The network parameters may include weight parameters, the size of the convolution kernel, and the like.

In some examples, in the neural network model 200, a learnable initial maximum parameter may be set as the initial maximum corresponding to the value of the truncation and quantization operations. During training, the initial maximum parameter may be adjusted to a suitable value based on the numerical distribution of the characteristic information. In some examples, when quantizing the weight parameters of the convolutional layer, the maximum value of the weight parameters of the layer may be directly selected as the initial maximum value of the quantization operation.

In some examples, the integer exponent power of 2 closest to the initial trained maximum may be chosen as the final maximum for the numerical value. In this case, the truncated value is an integer exponent power of 2, which can be advantageous for the computer to represent the floating point number with a fixed point number, and for reducing quantization errors in the subsequent quantization.

In some examples, the truncated value may satisfy the following formula:

wherein, A truncated value may be represented and a may represent an initial maximum value.

In some examples, the truncation may satisfy the following equation:

/>

wherein, The values after the truncation may be represented.

In some examples, quantization module 270 may be configured to represent the truncated value with a fixed point number. In this case, the truncation and quantization of the values can improve the applicability of the neural network model 200 to images of different data types.

In a computer, floating point numbers are typically composed of sign bits, a step code, and mantissas. In the interval-1 to 1, the corresponding step code is typically small, since the value is close to 0, which enables the mantissa portion to provide relatively high accuracy. Because the more mantissa digits, the more digits can be represented, and the higher the precision.

In some examples, the value may be translated into a new value within a preset interval. The preset interval may have an upper bound and a lower bound. In some examples, the upper bound of the preset interval may be no greater than 1 and the lower bound of the preset interval may be no less than-1.

In some examples, the activation function in the neural network model 200 may be a saturation activation function. In some examples, the activation function may be a hard hyperbolic tangent function. The hard hyperbolic tangent function may map values into the interval of-1 to 1. That is, the hard hyperbolic tangent function may non-linearly truncate the value to a range of-1 to 1. The hard hyperbolic tangent function may enable the neural network model 200 to distribute values as much as possible within an order of magnitude of the interval approaching-1 to 1 through learning during training.

In this case, the hard hyperbolic tangent function is used, and the exponential calculation is not included in comparison with the Sigmoid function, so that the calculation amount can be reduced. Further, the hard hyperbolic tangent function is used to map the values into the interval of-1 to 1, so that the distribution of the values can be limited to the interval with high floating point number precision (i.e., -1 to 1), and the precision loss caused by quantization can be reduced when quantization is performed.

In some examples, quantization may satisfy the following formula:

Where x _q may represent a quantized value, n may represent a length of the quantized value, and [ (may represent a rounding operation ].

In some examples, the target information may be truncated using the truncating module 260 and the truncated values quantized using the quantizing module 270.

Fig. 7 is a schematic diagram illustrating truncation and quantization in the neural network model 200 according to an example of the present disclosure.

Referring to fig. 7, in some examples, the feature information input by the depth separable convolutional layer 212 may be truncated and quantized. In some examples, the first network parameter in the deep convolutional layer 2121 may be quantized. The first network parameters may include convolution kernel size and weight parameters in the deep convolution layer 2121. In some examples, the first intermediate feature information output by the deep convolutional layer 2121 may be truncated and quantized.

In some examples, the second network parameters in the point-wise convolutional layer 2122 may be quantized. The second network parameters may include weight parameters in the point-wise convolutional layer 2122. In some examples, the second intermediate feature information output by the deep convolutional layer 2121 may be truncated and quantized.

In some examples, the characteristic information output by the depth separable convolutional layer 212 may be truncated and quantized to obtain a final output. In this case, the accuracy loss due to quantization can be reduced, and the numerical value can be converted into a fixed point number, so that the applicability of the neural network model 200 to images of different data types can be improved.

In some examples, the target information in the second extraction module 250 may be truncated and quantized. In some examples, the target information in the first upsampling module 220 may be truncated and quantized.

Fig. 8 is a flowchart illustrating reconstruction of the image to be processed 10 to obtain the target image 20 using the neural network model 200 in accordance with an example of the present disclosure.

In addition, examples of the present disclosure also provide an exemplary flow of the neural network model 200 reconstructing the image 10 to be processed. It should be noted that the above description of the neural network model 200 applies equally to this exemplary flow, unless there is a conflict.

Referring to fig. 8, in some examples, the exemplary process may include deep feature extraction of initial feature information related to the image to be processed 10 using the first extraction module 210 to obtain deep feature information (step S100), channel fusion of the deep feature information using the first upsampling module 220 to obtain fused feature information (step S200), pixel rebinning of the fused feature information using the first upsampling module 220 to obtain a rebinned image (step S300), interpolation upsampling of the image to be processed 10 using the second upsampling module 230 to obtain an upsampled image (step S400), and residual connection of the upsampled image with the rebinned image using the residual connection module 240 to obtain the target image 20 (step S500). In this case, the up-sampled image obtained by the up-sampling of the image to be processed 10 through interpolation is connected with the reorganized image in a residual way, so that the color information of the image to be processed 10 can be restored, and further, more original information of the image to be processed 10 can be reserved, so that the information loss of the target image 20 relative to the image to be processed 10 can be reduced, and the accuracy of the target image 20 can be improved.

Fig. 9 is a block diagram illustrating a construction of a reconstruction system 300 according to an example of the present disclosure.

In some examples, the reconstruction system 300 to which the disclosed examples relate may be used to perform a reconstruction method. Referring to fig. 9, in some examples, the reconstruction system 300 may include an acquisition module 310 and a reconstruction module 320. In some examples, the acquisition module 310 may be configured to acquire the image 10 to be processed. In some examples, the reconstruction module 320 may be configured to obtain the target image 20 corresponding to the image 10 to be processed using the trained neural network model 200. In some examples, the reconstruction module 320 may be the neural network model 200. The description above is referred to with respect to the neural network model 200.

In order to verify the image super-resolution reconstruction method based on the deep neural network, simulation experiments are carried out on the reconstruction method, and experimental data are shown in tables 1 and 2.

The following are some descriptions of the simulation experiments.

Table 1 is a table comparing experimental results of the neural network model 200 (LSRN) of the present disclosure with the prior methods (ECBSR (Edge-oriented convolution block for real-time super-resolution reconstruction of mobile devices) and SESR (Collapsible Linear Blocks for Super-EFFICIENT SUPER RESOLUTION, foldable linear block for ultra-efficient super-resolution reconstruction)). The first column is the name of the model and its corresponding parameters, and the numbers following C represent the number of channels. The second column is the calculation force required for reconstructing 1080P image super resolution to 4K image corresponding to each model, and takes multiplication-addition (MAC) as an index. The unit of the multiply-add operand is G (gigabytes). Columns 3 to 7 show the effect of each model on the super-resolution reconstruction of the test image under different reference data sets, respectively, with peak signal-to-noise ratio (PSNR)/Structural Similarity (SSIM) as an indicator.

TABLE 1

As can be seen from Table 1, the proposed neural network model 200 has substantially the same computational power as the SESR (e.g., LSRN-C12 and ECBSR-M4C8, but LSRN-C12 has 0.09dB to 0.18dB higher PSNR index than ECBSR-M4C8 for Set5, set14, B100 and DIV2K, LSRN-C28 has 0.08dB to 0.15dB higher PSNR index than ECBSR-M4C16 for Set5, set14, B100 and DIV2K, but the computational power of the former is only 0.89G higher than the latter, while the proposed neural network model 200 has significantly less force requirement than ECBSR in the case of near image super-resolution reconstruction, e.g., the PSNR index of Set 62-C16, set 3, B100 and DIV2K has 0.08dB to 0.15dB higher PSNR index than ECBSR-M4C16, but the latter has far less force requirement than the former, and near image super-resolution of the latter.

TABLE 2

/>

Table 2 is a table comparing the effect of the INT8 quantization of the neural network model 200 proposed by the present disclosure with the experimental result of the existing method (the method of not performing the INT8 quantization of the neural network model 200) using the truncation module 260 and the quantization module 270 proposed by the present disclosure. The four neural network models 200 in table 2 only have differences in activation functions and quantization, and the network model structure and computational power requirements are consistent. Where the numbers following C represent the number of channels, quantized represent the quantization performed, GELU represent the activation function as GELU (Gaussian Error Linear Unit, gaussian error linear element) function, and HardTanh represent the activation function as a hard hyperbolic tangent function.

As can be seen from Table 2, although the image super-resolution reconstruction performance of the neural network model 200 using the hard hyperbolic tangent function as the activation function at the floating point calculation accuracy is reduced by 0.1dB to 0.2dB compared to the case of using the GELU function. However, after the INT8 quantization, the neural network model 200 using the hard hyperbolic tangent function maintains higher PSNR and SSIM metrics, i.e., better image super-resolution reconstruction performance, at Set5 and Set14 data sets relative to the neural network model 200 using the GELU function. The neural network model 200 using the hard hyperbolic tangent function as the activation function only drops the PSNR under each dataset test by 0.1dB to 0.12dB after the INT8 quantization is performed, while the neural network model 200 using the GELU function as the activation function drops the PSNR under the dataset test by 0.27 dB to 0.3dB after the INT8 quantization is performed. This demonstrates the improvement in performance loss from the proposed truncation module 260 and quantization module 270 of the present disclosure to the proposed neural network model 200INT8 quantization.

The method for reconstructing an image based on a depth neural network according to the present disclosure includes inputting an image 10 to be processed into a trained neural network model 200 to obtain a target image 20 corresponding to the image 10 to be processed, wherein the target image 20 has a higher resolution than the image 10 to be processed, the neural network model 200 includes a first extraction module 210, a first upsampling module 220, a second upsampling module 230, and a residual connection module 240, the first extraction module 210 includes a residual module 211 and is configured to perform deep feature extraction on initial feature information related to the image 10 to be processed through the residual module 211 to obtain deep feature information, the residual module 211 includes at least one depth-separable convolutional layer 212 and is configured to perform residual connection of an output of a last depth-separable convolutional layer 212 in the at least one depth-separable convolutional layer 212 with an input of the residual module 211 to obtain residual feature information, the input of the first residual module 211 is the initial feature information, the residual feature information output by the last upsampling module is used as deep feature information, the first upsampling module 220 is configured to perform channel feature extraction on the deep feature information to obtain a depth feature information, and is configured to perform up-sampling and up-sampling of the image to be recombined to obtain a depth feature information, and the image is configured to be recombined with the target image 20, and the image is configured to be recombined to obtain a target image 20, and the image is connected to the up-sampled to the target image. In this case, the residual module 211 includes the depth separable convolution layer 212, and the depth separable convolution layer 212 can perform a convolution operation on a single channel of the feature information by decomposing a conventional convolution operation with respect to the conventional convolution layer, so that the calculation amount can be reduced, and thus the demand for calculation resources by the neural network model 200 can be reduced. In addition, the up-sampled image obtained by the up-sampling of the image to be processed 10 through interpolation is connected with the recombined image in a residual way, so that the color information of the image to be processed 10 can be restored, more original information of the image to be processed 10 can be reserved, and therefore, the information loss of the target image 20 relative to the image to be processed 10 can be reduced, and the accuracy of the target image 20 can be improved. In addition, the residual module 211 performs residual connection on the output of the last depth separable convolution layer 212 and the input of the residual module 211 to obtain residual characteristic information, which can be beneficial for the residual module 211 to retain more original information in the input, so that the accuracy of the target image 20 can be further improved.

While the disclosure has been described in detail in connection with the drawings and examples, it is to be understood that the foregoing description is not intended to limit the disclosure in any way. Modifications and variations of the present disclosure may be made as desired by those skilled in the art without departing from the true spirit and scope of the disclosure, and such modifications and variations fall within the scope of the disclosure.

Claims

1. The image super-resolution reconstruction method based on the deep neural network is characterized by comprising the following steps of:

inputting an image to be processed into a trained neural network model to obtain a target image corresponding to the image to be processed, the target image having a resolution higher than that of the image to be processed, wherein the neural network model comprises a first extraction module, a first upsampling module, a second upsampling module, and a residual connection module,

The first extraction module comprises a residual module and is configured to perform deep feature extraction on initial feature information related to the image to be processed through the residual module to obtain deep feature information, wherein the residual module comprises at least one depth separable convolution layer and is configured to perform residual connection on the output of the last depth separable convolution layer in the at least one depth separable convolution layer and the input of the residual module to obtain residual feature information, the input of the first residual module is the initial feature information, the residual feature information output by the last residual module is used as the deep feature information,

The first upsampling module is configured to channel fuse the deep feature information to obtain fused feature information, and to pixel-recombine the fused feature information to obtain a recombined image having a resolution that is the same as the resolution of the target image,

The second upsampling module is configured to interpolate the image to be processed to obtain an upsampled image, the upsampled image having the same resolution as the target image,

The residual connection module is configured to residual connect the upsampled image with the reconstructed image to obtain the target image.

2. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1, wherein,

The neural network model further comprises a second extraction module, wherein the second extraction module is configured to perform shallow feature extraction on the image to be processed to obtain the initial feature information, and the number of channels of the shallow feature information is multiple.

3. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1 or 2, wherein,

The initial characteristic information is characteristic information of a brightness channel image of the image to be processed.

4. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1, wherein,

The number of the residual modules is multiple, and the residual modules are cascaded, wherein the inputs of the residual modules except the first one of the residual modules are the outputs of the last residual module.

5. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1 or 4, wherein,

The number of depth separable convolutional layers in the residual block is a plurality and includes a first activation block including an activation function and between two of the depth separable convolutional layers, the activation function in the first activation block being used to convert an input value to a new value within a first preset interval, the first preset interval having an upper bound and a lower bound.

6. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1 or 4, wherein,

The residual module further includes a second activation module including an activation function and configured to take as input an output of a residual connection in the residual module to output the residual characteristic information.

7. The method for reconstructing an image super-resolution based on a deep neural network according to claim 6, wherein,

The activation function in the second activation module is used for converting the input numerical value into a new numerical value in a second preset interval, and the second preset interval is provided with an upper bound and a lower bound; and/or

The activation function in the depth separable convolutional layer is used to convert the input value to a new value within a third preset interval, the third preset interval having an upper bound and a lower bound.

8. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1, wherein,

The neural network model also comprises a truncation module and a quantization module;

The truncation module is configured to truncate a value in target information of the neural network model according to a truncation value as a maximum value, wherein the truncation value is an integer exponent power of 2, and the target information comprises at least one of characteristic information and network parameters;

The quantization module is configured to represent the truncated value with a fixed point number.

9. The method for reconstructing an image super-resolution based on a deep neural network according to claim 1, wherein,

The number of channels of the deep characteristic information is the same as that of the shallow characteristic information, and the number of channels of the fusion characteristic information is smaller than that of the deep characteristic information.

10. The image super-resolution reconstruction system based on the deep neural network is characterized by comprising an acquisition module and a reconstruction module;

the acquisition module is configured to acquire an image to be processed;

The reconstruction module is configured to obtain a target image corresponding to the image to be processed using a trained neural network model, the target image having a resolution higher than a resolution of the image to be processed, wherein the neural network model comprises a first extraction module, a first upsampling module, a second upsampling module, and a residual connection module,