CN111429347A

CN111429347A - Image super-resolution reconstruction method and device and computer-readable storage medium

Info

Publication number: CN111429347A
Application number: CN202010201996.8A
Authority: CN
Inventors: 陈沅涛; 陶家俊; 陈曦; 张建明; 张艺兴; 吴一鸣; 刘林武; 王柳; 谷科; 余飞
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2020-07-17

Abstract

The application discloses an image super-resolution reconstruction method, an image super-resolution reconstruction device and a computer-readable storage medium. The method comprises the steps of carrying out up-sampling processing on a low-resolution image to be processed to obtain a secondary low-resolution image with the same pixels as a target high-resolution image; inputting the extracted image characteristics of the secondary low-resolution image into a coding and decoding noise reduction module to obtain an image characteristic graph with noise removed; the coding and decoding noise reduction module is obtained by connecting a plurality of convolution layers and a plurality of deconvolution layers, wherein each convolution layer gradually reduces the size of an image characteristic graph and is used for obtaining abstract contents of image characteristics, and each deconvolution layer gradually enlarges the size of the image characteristic graph and is used for compensating image characteristic detail information; and mapping the image characteristic map to a cavity convolution network obtained by cascading a plurality of layers of cavity convolution layers and the convolution layers to obtain a super-resolution reconstruction image. The method and the device realize high-quality image output, strong practicability and high-efficiency and quick image super-resolution reconstruction of model training.

Description

Image super-resolution reconstruction method and device and computer-readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for reconstructing super-resolution images, and a computer-readable storage medium.

Background

The purpose of Super Resolution (SR) reconstruction is to reconstruct a High Resolution (HR) image containing rich details by inputting one or more low Resolution (L ow Resolution, L R) images, whereas Single Image Super Resolution (SISR) reconstruction is to use the rich information contained in one image and the prior knowledge of vision obtained from the sample image to identify important visual cues to fill in details and to reproduce the rendering details as much as possible.

In computer vision tasks, reconstruction of high resolution images from low resolution images is usually achieved using conventional SISR methods and deep learning methods. The traditional SISR method is realized based on an empirical algorithm, a large amount of high-frequency information is forgotten, a certain amount of manual intervention work is needed, and the performance and the result of the method are still far from the actual application requirements.

The deep learning method has been a hot method for researching in the field of mode recognition and artificial intelligence, wherein a Convolutional network (ConvNet) has taken great effect in recent years in computer vision tasks, for example, SRCNN uses a three-layer Convolutional network to firstly apply deep learning to image super-resolution reconstruction work, the network learns mapping from L R to HR in an end-to-end mode, does not need any engineering characteristics in the traditional method, and obtains more advanced performance than the traditional method, DRCN and VDSR overcome some defects existing in SRCNN to obtain the most advanced performance at that time, and another related technology ESRGAN integrates sensing loss to improve the feature reconstruction capability of low-resolution images, restores missing high-frequency semantic information of images and obtains better vivid visual effect in generating a network model mainly comprising an antagonistic network.

The related art, although successful in introducing deep learning techniques into the super-resolution problem, has certain corresponding limitations. SRCNN has the corresponding defects that the training convergence speed is slow depending on the context area information of a small image area, and the network is only suitable for a single scale. Although VDSR solves the problem of SRCNN, the small receiver field of VDSR makes the network convolution layer simply chain-stacked, and robustness and generalization are not guaranteed. The DRCN uses a cyclic convolution network, which results in long training time, large storage capacity, and is inconvenient for practical application. The ESRGAN generates the super-resolution image in a countermeasure mode, the corresponding relationship between the super-resolution image and the high-resolution image pixels is destroyed, and the large network scale is not beneficial to training and use. That is to say, the existing image super-resolution reconstruction method has the disadvantages of poor quality of output images, low model training efficiency and poor practicability.

In view of this, how to achieve high-quality image output, strong practicability, and efficient and fast model training image super-resolution reconstruction is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides an image super-resolution reconstruction method, an image super-resolution reconstruction device and a computer-readable storage medium, which realize high-quality image output, strong practicability and efficient and quick model training image super-resolution reconstruction.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

the embodiment of the invention provides an image super-resolution reconstruction method on one hand, which comprises the following steps:

carrying out up-sampling processing on the low-resolution image to be processed to obtain a secondary low-resolution image with the same pixels as the target high-resolution image;

extracting the image characteristics of the secondary low-resolution image, and inputting the image characteristics into a pre-constructed coding and decoding noise reduction module to obtain an image characteristic graph with noise removed;

mapping the image feature map to a pre-constructed cavity convolution network to obtain a super-resolution reconstruction image;

the encoding and decoding noise reduction module is obtained by connecting a plurality of convolution layers and a plurality of deconvolution layers, wherein each convolution layer gradually reduces the size of the image characteristic graph and is used for obtaining abstract contents of the image characteristic, and each deconvolution layer gradually enlarges the size of the image characteristic graph and is used for compensating the detailed information of the image characteristic; the cavity convolution network is obtained by cascading a plurality of layers of cavity convolution layers and is used for reconstructing the image characteristic diagram into a super-resolution image.

Optionally, the hole convolutional network includes a hole convolutional layer and a reconstruction layer;

the hollow convolution layer is

The reconstruction layer is

In the formula, F₂For the image feature map, F₃For the output of said void convolution layer, W₄、b₄Convolution weights and offsets for the hole convolution layerThe setting value is set to the preset value,

for convolution operation, I^SRFor reconstructing the image at super-resolution, W_c ^dC convolution kernels of size 1 x 64, c the number of image channels, b₅Is the convolution offset value of the reconstruction layer.

Optionally, the hole convolutional network further includes a hopping connection layer; the jump connection layer is used for directly transmitting the image characteristics to a later layer in the network by adding jump connection in the hole convolution network by using residual error learning;

the jump connection layer is F₄(F₃)＝F₃+F₁，F₃For the output of the void convolution layer, F₁Is the image feature; correspondingly, the reconstruction layer is

Optionally, the extracting the image feature of the sub-low resolution image includes:

extracting image features of the sub-low resolution image by using a feature extraction sub-network, wherein the feature extraction sub-network comprises the following steps:

in the formula, F₁(X) is an image feature, X is the sub-low resolution image, W₁、b₁Convolution weights and bias values for the sub-networks are extracted for the features respectively,

for convolution operation, the convolution operation of the feature extraction sub-network adds a 0 boundary and a step size of 1 to keep the sizes of the input image and the output image of the feature extraction sub-network consistent.

Optionally, the performing an upsampling process on the low-resolution image to be processed to obtain a sub-low-resolution image having the same pixels as the target high-resolution image includes:

and upsampling the size of the low-resolution image to be processed to the size of the target high-resolution image by using a deconvolution network structure so that the low-resolution image to be processed and the target high-resolution image have the same pixels.

Optionally, the encoding and decoding noise reduction module includes a convolutional network structure, a deconvolution network structure, and a hopping connection structure;

the convolutional network structure is

Abstract content used for acquiring the image features;

the deconvolution network structure is H₂(H₁)＝max(0,W₃ΘH₁+b₃) For compensating the image feature detail information;

the jump connection structure is F₂(H₂)＝H₂+F_1，For transferring the image features directly to a later layer in the network by adding a jump connection to the convolutional network structure using residual learning;

in the formula, F₁For the image feature, W₂、b₂For the convolution weights and bias values of the convolutional network structure,

for convolution operation, the convolution operation of the convolution network structure does not add 0 boundary and the step length is 2; h₁For the output of the convolutional network structure and simultaneously for the input of the deconvolution network structure, W₃、b₃The theta is deconvolution operation for the convolution weight and the offset value of the deconvolution network structure; f₂Is the output of the jump connection structure.

The embodiment of the invention provides an image super-resolution reconstruction device on the other hand, which comprises a mixed depth convolution neural network, a first image processing unit and a second image processing unit, wherein the mixed depth convolution neural network is used for processing an input low-resolution image to be processed and outputting a reconstructed super-resolution image; the hybrid deep convolutional neural network includes:

the up-sampling module is used for carrying out up-sampling processing on the low-resolution image to be processed to obtain a secondary low-resolution image with the same pixels as the target high-resolution image;

the characteristic extraction module is used for extracting the image characteristics of the secondary low-resolution image;

the coding and decoding noise reduction module is used for carrying out noise reduction processing on the image characteristics to obtain an image characteristic graph with noise removed; the coding and decoding noise reduction module is obtained by connecting a plurality of convolution layers and a plurality of deconvolution layers, each convolution layer gradually reduces the size of the image characteristic graph and is used for obtaining abstract contents of the image characteristic, and each deconvolution layer gradually enlarges the size of the image characteristic graph and is used for compensating the detailed information of the image characteristic;

the image reconstruction module is used for mapping the image feature map to a pre-constructed cavity convolution network to obtain a super-resolution reconstruction image; the cavity convolution network is obtained by cascading a plurality of layers of cavity convolution layers and is used for reconstructing the image characteristic diagram into a super-resolution image.

Optionally, the loss function of the hybrid deep convolutional neural network is:

in the formula (f)_lossTaking the loss function as a reference, wherein N is the total number of samples in a training sample set, i is the ith sample in the training sample set, F (X) is the super-resolution reconstructed image, Y is a real high-resolution image corresponding to the super-resolution reconstructed image, β is a multiplication coefficient of weight attenuation, and theta is a network weight parameter to be calculated;

the network weight parameter to be calculated is subjected to parameter updating according to a parameter updating relational expression so as to reduce the loss function value to the lowest, wherein the parameter updating relational expression is as follows:

wherein,

m_tis the mean value of the first time t of the gradient, v_tVariance value of the second time t +1 of the gradient, η learning rate, β₁、β₂All are known constants.

An embodiment of the present invention further provides an image super-resolution reconstruction apparatus, which includes a processor, and the processor is configured to implement the steps of the image super-resolution reconstruction method according to any one of the preceding items when executing the computer program stored in the memory.

Finally, an embodiment of the present invention provides a computer-readable storage medium, on which an image super-resolution reconstruction program is stored, which when executed by a processor implements the steps of the image super-resolution reconstruction method according to any of the previous items.

The technical scheme provided by the application has the advantages that the coding and decoding noise reduction module constructed by convolution and deconvolution is adopted to remove noise generated in the super-resolution image reconstruction process, so that the definition of the reconstructed image is improved, a high-quality image is output, and a better visual effect is achieved; the hole convolution layer is applied in the image reconstruction process, so that a small convolution kernel obtains a large receptive field, the convolution receptive field is enlarged, more adjacent pixel information can be obtained under the premise condition that the network capacity is not increased, the reconstruction effect can be improved, the calculation amount can be reduced, the model training time can be effectively shortened, the model training efficiency is improved, the storage amount is small, the practical application is convenient, and the practicability is high.

In addition, the embodiment of the invention also provides a corresponding realization device and a computer readable storage medium for the image super-resolution reconstruction method, so that the method has higher practicability, and the device and the computer readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an image super-resolution reconstruction method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a cascading effect of a multi-void convolutional layer according to an embodiment of the present invention;

FIG. 3a is a view of the field of a hole convolution after 1-fold convolution according to an embodiment of the present invention;

FIG. 3b is a view of the cavity convolution field after 2 times convolution according to the embodiment of the present invention;

FIG. 3c is a view of the cavity convolution field after 4 times convolution according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a residual structure according to an embodiment of the present invention;

fig. 5 is a block diagram of an embodiment of an image super-resolution reconstruction apparatus according to an embodiment of the present invention;

fig. 6 is a structural diagram of an embodiment of a hybrid deep convolutional neural network according to an embodiment of the present invention;

fig. 7 is a block diagram of another embodiment of an image super-resolution reconstruction apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a comparison between a de-noising module with de-convolution build and a de-noising module without de-convolution network according to an embodiment of the present invention;

FIG. 9 is a schematic diagram comparing a convolution network with holes with a convolution network without holes according to an embodiment of the present invention;

FIG. 10 is a graph illustrating a comparison of performance comparison curves using both deconvolution and hole convolution, and neither of them, according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of an image super-resolution reconstruction method according to an embodiment of the present invention, where the embodiment of the present invention may include the following:

s101: and performing up-sampling processing on the low-resolution image to be processed to obtain a secondary low-resolution image with the same pixels as the target high-resolution image.

In the present application, the low resolution image to be processed may be represented as:

I^L＝DI^HB+n (1)

wherein, I^LIs a low-resolution image to be processed; i is^HIs a target high resolution image; d is a downsampling operator; b is a fuzzy operator; n is additive noise; the target high-resolution image is also the super-resolution reconstructed image obtained by S103 reconstruction.

Based on the mathematical expression form of the low-resolution image to be processed, the super-resolution reconstruction process for obtaining the high-resolution image from the low-resolution image to be processed can be represented as follows:

I^H＝D^-1I^LB^-1-D^-1nB^-1(2)

if will generate noise B^-1D^-1n reduces to S, then the relation (2) can be expressed as:

I^H＝D^-1I^LB^-1-S (3)

the low-resolution image to be processed can be reconstructed into a high-resolution image through a series of image processing operations such as upsampling, deblurring, noise reduction and the like according to the formula (3).

S102: and extracting the image characteristics of the secondary low-resolution image, and inputting the image characteristics into a pre-constructed coding and decoding noise reduction module to obtain an image characteristic graph with noise removed.

It is understood that the image processing of computer vision is very dependent on the image features, which are less data describing the important information contained in the captured image content, and thus the image features can be understood as sparse representation of the image per se, based on which, deconvolution network (Decn) can be used as a regularization-based image sparse representation method to extract image features and reconstruct the image with the extracted features, and it is known from research studies based on L₁The regularization Deconvolution network model is applied to image expression and restoration, image noise can be effectively removed, an FCN (Deconvolution L eye) model for image segmentation has excellent effect, the huge effect of the Deconvolution network in image processing is remarkably proved, real fuzzy degradation rarely conforms to an ideal linear convolution model, the Deconvolution network is adopted to capture image degradation characteristics, and then an image is restored (as shown in a relational expression 4), rather than purely performing perfect modeling on an abnormal value from the perspective of generating the model, the relational expression 4 can be expressed as follows:

wherein x is the inputPicture, c is the number of channels of the picture, i is the image pixel, k is the number of feature maps,

for convolution operations, Z is a feature map, a locally hidden variable, which is different for each input x. f is the convolution kernel, which is a global variable, which is the same for all inputs x. By using the method, convolution and deconvolution can be combined for use, and the convolution depth is increased, so that the method plays a greater role in the image super-resolution reconstruction denoising process, and further proves that the performance of super-resolution can be remarkably improved by combining a deeper convolution layer with a deconvolution layer. That is, the encoding/decoding noise reduction module in S102 of the present application may be obtained by using a plurality of convolutional layers and a plurality of deconvolution layers, each convolutional layer gradually reduces the image feature map size and is used for obtaining the abstract content of the image feature, and each deconvolution layer gradually enlarges the image feature map size and is used for compensating the image feature detail information

S103: and mapping the image characteristic graph to a pre-constructed cavity convolution network to obtain a super-resolution reconstruction image.

The hole convolution network is constructed in advance and trained, as shown in fig. 3, a plurality of layers of hole convolution layers and convolution layers are cascaded to obtain the hole convolution network, and the hole convolution network can be used for reconstructing an image feature map into a super-resolution image.

The VGG inheriting L eNet and partial framework structure of AlexNet can directly utilize the advantage of superposition of multilayer small convolution kernels, for example, the chain superposition of 3 convolution kernels 3 × is equivalent to 1 convolution kernel 7 ×, the design not only can greatly reduce parameters, but also can more easily learn a universal and expressible feature space, a convolution graph with the regular property can more easily learn a universal and expressible feature space, ResNet is added with a direct connection channel in the network, and original input information is allowed to be directly transmitted to a network layer behind, so that the neural network can not only extremely quickly accelerate the neural network speed, but also can more accurately promote the model training rate, and the like.

However, deep convolutional neural networks have some fatal drawbacks for other tasks. For example: the downsampling/pooling process results in loss of internal data structures, loss of spatial hierarchical information, inability to reconstruct small object information, etc. And the image super-resolution reconstruction process needs more adjacent pixels, namely depends on a large receptive field. Therefore, how to find a balance point between the size of the convolution receptive field and the number of network parameters is a crucial problem. The hole convolution or the expansion convolution can provide a larger receptive field under the condition of not using a pooling layer which can cause information loss and having equivalent calculation amount, which is coincident with the concept of image super-resolution reconstruction, and the SISR network does not need the pooling layer. Thus, hole convolution can be used to determine the best balance point in both the convolution field size and the number of network parameters.

The method comprises the steps of generating a hole convolution in the image segmentation field, inputting an image into a network, extracting features through a CNN (convolutional neural network), reducing the image scale through pooling, and simultaneously increasing a receptive field, wherein the image needs global information or a voice text needs longer sequence information to be dependent, the corresponding technical effect can be achieved, compared with the traditional convolution and pooling operation, the common multilayer convolution is only used in a mode that the number of convolution layers is linear to the size of the receptive field, the combined use of the hole convolution and the convolution can enable the receptive field to grow exponentially with the number of the layers, for example, FIGS. 3a-3c show contrast of the receptive field with different rates of the hole convolution, points in the graph represent convolution points, a gray area represents the receptive field, the convolution hole network enables the convolution receptive field index to be expanded, the number of parameters is linearly increased, the number of the receptive fields is increased while the parameter number is increased, the parameter number of the convolution field is increased, the parameter number of the parameter is decreased, the parameter number of the convolution field is increased, the parameter of the convolution field is increased, the convolution field, the result that the image quality of the image is increased by a convolutional field generated by 1 convolution kernel 1, the convolution kernel 351, the convolution kernel 3b is increased by the convolution kernel 357, the convolution field is increased by the convolution field, the convolution field is increased by the length of the convolution field, the length of the convolution field is increased by the length of the convolution field, the convolution field is increased by the length of the convolution field, the convolution field is increased by the length of the convolution field.

In the technical scheme provided by the embodiment of the invention, the coding and decoding noise reduction module constructed by convolution and deconvolution is adopted to remove the noise generated in the super-resolution image reconstruction process, thereby being beneficial to improving the definition of the reconstructed image, outputting a high-quality image and having better visual effect; the hole convolution layer is applied in the image reconstruction process, so that a small convolution kernel obtains a large receptive field, the convolution receptive field is enlarged, more adjacent pixel information can be obtained under the premise condition that the network capacity is not increased, the reconstruction effect can be improved, the calculation amount can be reduced, the model training time can be effectively shortened, the model training efficiency is improved, the storage amount is small, the practical application is convenient, and the practicability is high.

The above embodiment does not limit how to perform the upsampling operation on the low-resolution image to be processed, and an embodiment of the present invention provides an implementation manner, which may include:

in the image super-resolution reconstruction task based on the deep learning, the relation (3) can be used as a reconstruction model of a high-resolution image, and the low-resolution image needs to be scaled to a secondary low-resolution image with the same number of pixels as the high-resolution image through upsampling in the reconstruction process. In one embodiment, the upsampling can be implemented by resampling and interpolation, i.e. by rescaling the input picture to a desired size and calculating the pixel point of each point. For example, a low-resolution image may be upsampled by using a bicubic linear interpolation before the image is input into a network, but this method formally increases manual intervention, excessively adds engineering features, and affects reconstruction effects. As another embodiment, the size of the low-resolution image to be processed may be up-sampled to the size of the target high-resolution image using a deconvolution network structure, so that the low-resolution image to be processed and the target high-resolution image have the same pixels. The embodiment does not need manual intervention, and has little influence on the image reconstruction effect.

The above embodiment does not limit how to extract the image features of the sub-low resolution image, and an embodiment of the present invention provides an implementation manner, which may include:

in the conventional image restoration process, image feature extraction generally includes performing dense extraction on image blocks, and then representing the image blocks by using a set of pre-trained bases such as PCA (principal component analysis) and DCT (Discrete Cosine Transform). In a convolutional neural network, this part can be incorporated into the network-based optimization process, and the convolution operation can automatically extract image features. That is, the image features of the sub-low resolution image can be extracted by using the feature extraction sub-network, which can be expressed as:

in the relation (5), F₁(X) is an image feature, X is a sub-low resolution image, W₁、b₁Convolution weights and bias values for the sub-networks are extracted for the features respectively,

for convolution, the convolution of the feature extraction sub-network adds a 0 boundary and a step size of 1 to keep the sizes of the input and output images of the feature extraction sub-network consistent. W₁For example, 3 × 3 × 64, the convolution operation in this relation can keep the input and output sizes consistent by adding 0 boundary and setting the step size to 1, preventing the generation of boundary rank reduction, and at the same time, RE L U (max (0)) (Rectified L initial Unit) can be used for convolution feature activation, RE L U as the activation function, and the neurons using the linear rectification activation function will haveOutput to the next layer of neurons or as output for the entire neural network.

As an optional implementation manner, the present application further provides an implementation manner of performing denoising processing in an image reconstruction process, which may include the following contents:

in the characteristic denoising structure, convolution and deconvolution are cascaded to use a structure coding and decoding structure, so that the characteristic noise of the image can be resolved to the maximum extent. The convolution layer reserves main image content, and the deconvolution layer is used for compensating detail information, so that the image content is well reserved while a good denoising effect is achieved. The codec noise reduction module may include a convolutional network structure, a deconvolution network structure, and a skip connection structure.

The convolutional network structure may be a representation

Abstract content used for obtaining image characteristics; the deconvolution network structure may be represented as H₂(H₁)＝max(0,W₃ΘH₁+b₃) For compensating image characteristic detail information; the jump connection structure may be denoted as F₂(H₂)＝H₂+F₁And the method is used for directly transmitting the image characteristics to a later layer in the network by adding jump connection in the convolutional network structure by using residual error learning.

In the formula, F₁As a feature of the image, W₂、b₂The convolution weights and bias values for the convolutional network structure,

for convolution operation, the convolution operation of the convolution network structure does not add 0 boundary and the step length is 2; h₁For the output of the convolutional network structure and at the same time for the input of the deconvolution network structure, W₃、b₃The convolution weight and the offset value of the deconvolution network structure are shown, and theta is deconvolution operation; f₂Is the output of the jump connection structure. W₂For example, it may be 3 × 3 × 64, W₃For example, it may be 3 x 64

In summary, the convolutional layer of the embodiment of the present invention gradually reduces the size of the feature map, retains the information in the main image, and obtains the abstract content of the image feature; and the deconvolution layer gradually increases the size of the feature map, enlarges the feature size and restores the detail information of the image features. The jump connection can be used for solving the problem of gradient dispersion under the condition that the network layer number is deep, and meanwhile, the jump connection is beneficial to the reverse propagation of the gradient and accelerates the training process. Meanwhile, the jumping connection is adopted to accelerate the training process, finally, the testing efficiency under the condition that the computing capacity of the mobile terminal is limited is also ensured while the input and output sizes of the coding and decoding structure are consistent, and the image characteristic diagram for removing noise is obtained.

As another alternative embodiment, the present application further provides an embodiment of performing image reconstruction processing in an image reconstruction process, which may include the following:

inputting a characteristic diagram F of a hidden state in a reconstruction process₂And outputting the super-resolution reconstructed image which can be regarded as the inverse operation of the feature extraction stage. In conventional SR methods, the overlaid high resolution feature maps are typically averaged to produce the final complete image. And in the network convolution, a convolution kernel W is used_c ^dAs a reaction basis coefficient, each position of the high-dimensional hidden state image feature is regarded as a vector form of different dimensions of a pixel corresponding to the high-resolution image. And conversely, the feature map can be projected into an image domain to obtain a super-resolution reconstruction image. Based on this, the present application may define a hole convolution network to generate a final high resolution image, where the hole convolution network may include a hole convolution layer and a reconstruction layer; the void convolution layer can be expressed as

The reconstruction layer can be represented as

In the formula, F₂As a feature map of the image, F₃Is the output of the void convolution layer, W₄、b₄For the convolution weight and offset of the void convolution layer, rates can be set to 1, 2, 4,

for convolution operation, I^SRFor super-resolution reconstruction of images, W_c ^dC convolution kernels of size 1 x 64, c the number of image channels, b₅Is the convolution offset value of the reconstructed layer.

In order to further improve the training efficiency of the network model and shorten the training time of the model, the hole convolution network can also comprise a jump connection layer; and the jump connection layer is arranged between the hole convolution layer and the reconstruction layer and is used for directly transmitting the image characteristics to a later layer in the network by adding jump connection in the hole convolution network by using residual error learning. The jump connection layer is F₄(F₃)＝F₃+F₁，F₃Is the output of the void convolution layer, F₁Is an image feature; accordingly, in embodiments of the present invention, the reconstruction layer may be represented as

It should be noted that, when the conventional convolutional layer or full link layer performs information transmission, there is a problem of more or less information loss. For example, the residual network ResNet proposed to solve the problem of deep network training difficulty adds a direct connection channel in the convolutional network, allowing the original input information to pass directly to the following network layers. The training of the neural network can be accelerated very quickly, and the accuracy of the model is improved. Specifically, the residual error network solves the problem of lack of image feature details to some extent by directly bypassing the input information to the output, and the integrity of the image information is protected. Namely, the whole network only needs to learn the residual error parts of input and output, thereby simplifying the learning objective and difficulty. The residual block structure can be shown in fig. 4, assuming that the original network learning function is h (x), and is decomposed into h (x) ═ f (x) + x. After decomposition the original network (flow vertically downwards in fig. 4) is fitted with f (x), branch x (the curved part in fig. 4 is a jump connection). The ResNet incorporates a residual function f (x) ═ h (x) -x (i.e., the deviation of the target value from the input value), and then training to fit f (x) and further obtain h (x) from f (x) + x. If f (x) is 0, it is equivalent to introduce an identity map. In the task of image super-resolution reconstruction, the initial characteristics of the image can be directly transmitted to a back layer in a network by adding jump connection in a convolutional network by using residual learning, the back propagation of the gradient is assisted, the training process can be accelerated, the training efficiency is improved, and the model performance is improved. Residual error learning is introduced in the model training process, so that not only can the network be optimized more quickly, but also the image reconstruction effect can be improved.

In addition, in the present application, there is no strict sequential execution order among the steps, and as long as the logical order is met, the steps may be executed simultaneously or according to a certain preset order, and fig. 1 is only a schematic way, and does not represent that only such an execution order is available.

The embodiment of the invention also provides a corresponding device for the image super-resolution reconstruction method, so that the method has higher practicability. Wherein the means can be described separately from the functional module point of view and the hardware point of view. The following describes an image super-resolution reconstruction apparatus provided by an embodiment of the present invention, and the image super-resolution reconstruction apparatus described below and the image super-resolution reconstruction method described above may be referred to correspondingly.

Based on the angle of functional modules, referring to fig. 5, fig. 5 is a structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present invention in a specific implementation, where the apparatus may include a hybrid depth convolutional neural network 51 for processing an input low-resolution image to be processed and outputting a reconstructed super-resolution image, and the hybrid depth convolutional neural network 51 may include:

the upsampling module 511 is configured to perform upsampling processing on the low-resolution image to be processed to obtain a sub-low-resolution image having the same pixels as the target high-resolution image.

And a feature extraction module 512, configured to extract image features of the sub-low resolution image.

The encoding and decoding denoising module 513 is configured to perform denoising processing on the image features to obtain an image feature map with noise removed; the coding and decoding noise reduction module is obtained by connecting a plurality of convolution layers and a plurality of deconvolution layers, wherein each convolution layer gradually reduces the size of an image characteristic graph and is used for obtaining abstract contents of image characteristics, and each deconvolution layer gradually enlarges the size of the image characteristic graph and is used for compensating image characteristic detail information.

The image reconstruction module 514 is configured to map the image feature map to a pre-constructed hollow convolution network to obtain a super-resolution reconstructed image; the cavity convolution network is obtained by cascading a plurality of layers of cavity convolution layers and is used for reconstructing the image characteristic diagram into a super-resolution image.

Optionally, the structure and data flow of the hybrid deep convolutional neural network 51 can be as shown in fig. 6, and in this embodiment of the present invention, the hole convolutional network may include a hole convolutional layer, a hopping connection layer, and a reconstruction layer; the jump connection layer is used for directly transmitting the image characteristics to a later layer in the network by adding jump connection in the hole convolution network by using residual error learning;

the void convolution layer is

The jump connection layer is F₄(F₃)＝F₃+F₁；

The reconstruction layer is

In the formula, F₁As a feature of the image, F₂As a feature map of the image, F₃Is the output of the void convolution layer, W₄、b₄The convolution weights and bias values for the hole convolution layer,

Optionally, the codec noise reduction module 513 may include a convolutional network structure, a deconvolution network structure, and a hopping connection structure;

the convolutional network structure is

Abstract content used for obtaining image characteristics;

the deconvolution network structure is H₂(H₁)＝max(0,W₃ΘH₁+b₃) For compensating image characteristic detail information;

a jump connection structure of F₂(H₂)＝H₂+F₁For using residual learning to transmit image features directly to a later layer in the network by adding a jump connection to the convolutional network structure;

for convolution operation, the convolution operation of the convolution network structure does not add 0 boundary and the step length is 2; h₁For the output of the convolutional network structure and at the same time for the input of the deconvolution network structure, W₃、b₃The convolution weight and the offset value of the deconvolution network structure are shown, and theta is deconvolution operation; f₂Is the output of the jump connection structure.

As an alternative embodiment, the present application also provides an embodiment for the training process of the hybrid deep convolutional neural network 51, which may include the following:

for any given training data set

The aim is to find an accurate mapping value Y (F) (x) and minimize the mean square error MSE between the obtained super-resolution reconstructed image F (x) and the real high-resolution image Y. Meanwhile, the method is also beneficial to the improvement of the peak signal-to-noise ratio (PSNR) which is an image quality evaluation index. Although a high PSNR value does not represent the absolute effect of the reconstructed image, a superior effect can be observed when the model is evaluated in combination with the use of the alternative evaluation index SSIM. The mean square error can be expressed as:

in the relation (6), P (i, j), T (i, j) respectively indicate the predicted image and the real image, and H, W respectively indicate the height and width of the image.

MSE may be incorporated as part of the loss function of the hybrid deep convolutional neural network 51, i.e., f_loss＝MSE+β||θ||²To improve the training precision of the model, the loss function can be expressed as:

in the formula (f)_lossFor the loss function, N is the total number of samples in the training sample set, i is the ith sample in the training sample set, F (X) is the super-resolution reconstructed image, Y is the real high-resolution image corresponding to the super-resolution reconstructed image, β is the multiplication coefficient of weight attenuation, and theta is the network weight parameter to be calculated by adding L₂Weight attenuation regularizes training, β may be 10, for example^-3And theta is the parameter to be obtained. In this way, the parameters can be fine-tuned to achieve as optimal a result as possible, given the resulting losses.

Optionally, the weight matrix may be updated using Adam optimization methods to minimize the loss function value. That is, the network weight parameter to be calculated is updated according to a parameter update relationship for minimizing the loss function value, where the parameter update relationship can be expressed as:

wherein,

m_tis the mean value of the first time t of the gradient, v_tVariance value of the second time t +1 of the gradient, η learning rate, β₁、β₂All known constants, optional, β₁May be 0.9, β₂Can be 0.9999, can be 10^-8η may be 0.0002.

Before the mixed deep convolutional neural network training, the network parameters are initialized for updating of the subsequent network training. Values are randomly drawn from a positive-Taiwan distribution with a mean of zero, a standard deviation of 0.001, and a deviation of 0, the convolution kernel weight W for each layer is initialized, and the offsets b are all set to 0. The initial learning rate is set to 0.0002 followed by a decrement 1/2 every 20 rounds of training.

The functions of the functional modules of the image super-resolution reconstruction apparatus according to the embodiment of the present invention can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process thereof can refer to the related description of the foregoing method embodiment, and will not be described herein again.

Therefore, the four modules with different functions and driven in a form similar to that of the convolutional network are mixed together to form the mixed depth convolutional neural network with the functions of denoising and increasing the receptive field, and the mixed depth convolutional neural network is used for the image super-resolution reconstruction task. The method realizes high-quality image output, strong practicability and high-efficiency and quick image super-resolution reconstruction of model training.

The image super-resolution reconstruction device mentioned above is described from the perspective of functional modules, and further, the present application also provides an image super-resolution reconstruction device described from the perspective of hardware. Fig. 7 is a structural diagram of another image super-resolution reconstruction apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus comprises a memory 70 for storing a computer program;

a processor 71, configured to execute a computer program to implement the steps of the image super-resolution reconstruction method as mentioned in the above embodiments.

The processor 71 may also include a main processor, which is a processor for Processing data in a wake-up state, also called a CPU (Central Processing Unit), and a coprocessor, which is a low power consumption processor for Processing data in a standby state, the processor 71 may, in some embodiments, be integrated with a GPU (Graphics Processing Unit) for rendering and rendering content to be displayed on a display screen, and the processor 71 may further include an AI (intelligent processor) for performing an AI operation related to learning of human Intelligence (AI) Processing.

The memory 70 may comprise one or more computer-readable storage media, which may be non-transitory, the memory 70 may further comprise a high-speed random access memory, and a non-volatile memory, such as one or more disk storage devices, a flash memory storage device, in the present embodiment, the memory 70 is at least used for storing a computer program 201, wherein the computer program is loaded and executed by the processor 71, and is capable of implementing the relevant steps of the image super-resolution reconstruction method disclosed in any of the foregoing embodiments.

In some embodiments, the image super-resolution reconstruction apparatus may further include a display screen 72, an input/output interface 73, a communication interface 74, a power supply 75, and a communication bus 76.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the image super-resolution reconstruction apparatus and may include more or less components than those shown, such as the sensor 77.

Therefore, the embodiment of the invention realizes the super-resolution reconstruction of the image with high quality image output, strong practicability and high efficiency and high speed of model training.

It is to be understood that, if the image super-resolution reconstruction method in the above embodiments is implemented in the form of a software functional unit and sold or used as a stand-alone product, it can be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the present application may be substantially or partially implemented in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods of the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrically erasable programmable ROM, a register, a hard disk, a removable magnetic disk, a CD-ROM, a magnetic or optical disk, and other various media capable of storing program codes.

Based on this, the embodiment of the present invention further provides a computer-readable storage medium, which stores an image super-resolution reconstruction program, wherein the image super-resolution reconstruction program is executed by a processor, and the image super-resolution reconstruction program comprises the steps of the image super-resolution reconstruction method according to any one of the above embodiments.

The functions of the functional modules of the computer-readable storage medium according to the embodiment of the present invention may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Finally, in order to verify that the technical scheme of the application can effectively realize the reconstruction of the super-resolution image, the application also carries out a verification test, and the verification test can comprise the following contents:

the hardware environment for implementing the validation experiment of the present application includes: an Intel Core i5-7400 processor; NvidiaGeForce GTX1080 GPU; the operating system software is Ubuntu 16.04. The model used a TensorFlow framework. The network parameters were trained for 100 rounds of the experiment (1000 iterations per round, iteration batch size 32), and the learning process was stopped after 100 rounds.

The training data set may use the public natural image data set BSD200(200 images) and the T91 data set (91 images), which is suitable for most experimental conditions. Depth models have been shown in SRCNN/VDSR to generally benefit from training of large amounts of data. In the experiment, 291 training images are rotated by 90 degrees, mirrored, inverted and the like, and an enhanced training data set containing 1164 images is obtained. And meanwhile, a 64 × 64 size (the cutting step is 16) subgraph input network is cut on the enhanced training data set, so that the network parameters can be conveniently learned. Because it is based on full convolution, the network model can be applied to images of arbitrary size, and cropping can make the application more convenient to optimize network model parameters.

The test data Set may use the data sets Set5(5 images) and Set14(14 images), BSD100(100 images), and the Urban image Urban100 data Set provided by Huang et al, which are commonly used for benchmarking, three scale factors are evaluated in the test data Set, including × 2, × 3, and × 4.

In the application, a convolution structure in a coding and decoding noise reduction module is used for extracting image characteristics and reserving main characteristics; deconvolution is to perform up-sampling on the features to restore image details, thereby completing image feature noise filtering. Thereby achieving the purpose of image characteristic noise reduction. Fig. 8 shows a comparison example of the structure of the codec with the deconvolution structure and the structure without the deconvolution network. When there is no deconvolution layer, the network model is a standard convolutional network. By observing comparison, the network containing the deconvolution coding and decoding denoising part can achieve higher precision (taking PSNR as an evaluation index). The hole convolution does not change the operation mode and parameters of the convolution operation itself, and the filter parameters are used in different modes by modifying the structure of the convolution kernel. The hole convolution uses different expansion scale factors to apply the same convolution kernel parameters in different ranges so as to achieve the purpose of acquiring more image context information. Obviously, the profiles of the first layer contain different structures, e.g. edges in different directions), while the profiles of the second layer differ mainly in strength. A comparative example of a network with and without hole convolution is shown in fig. 9. The use of hole convolution facilitates performance enhancement by expanding the convolution field. Finally, the performance comparison curves of using both deconvolution and hole convolution and neither of them are shown in fig. 10, and it was found that the performance improvement effect is significant when both are used simultaneously.

The input image carries a lot of detail information in the network computation, but if there are many convolution recursions between the input and the output, gradient problems occur, such as gradient vanishing/explosion, etc. The learning of the mapping relationship between them is very difficult and the learning efficiency is very low. At this time, this problem can be quickly solved by setting residual learning. The present application adds two hopping connections to the network as shown in figure 6. One is local connection of a characteristic denoising part, assists the back propagation of the gradient, and accelerates the training process. The other is long-distance jump connection, so that the training process is accelerated, and the result performance is improved. The results of the specific comparative analyses are shown in Table 1.

Table 1 adopts a full convolution network, and on a data set Urban100, according to the super-resolution qualitative and quantitative analysis results of scale factors × 2 and × 4, an evaluation index is PSNR/SSIM.

TABLE 1 residual learning test (learning rate: 0.0002)

In conclusion, the effect of the mapping learned by residual learning is better than that of direct learning, and the learning speed is higher.

The technical scheme of the application is compared with other existing image super-resolution reconstruction methods on a plurality of public data sets to evaluate the characteristics and performance of the image reconstruction results. Experimental results show that the method has more excellent effects compared with other methods, no matter performance indexes or visual experience, please refer to table 2, Bicubic in table 2 is a pixel interpolation method, and a + is an adjustment anchoring neighborhood regression method; SelfEx is super-resolution reconstruction of the picture by utilizing transformation self-similarity; SRCNN is a simple end-to-end full convolution image super-resolution method; VDSR is a deep full convolution image super-resolution processing network with residual learning.

TABLE 2 comparison of SR Process Performance

In table 2, the results of various SR methods on multiple public data sets are shown. By comparison, the application can find that the data set is obviously superior to other traditional SR methods such as A +, SelfEx and the like. Compared with other network SR methods, the method has the advantages of improving both PSNR and SSIM.

In addition, the super-resolution reconstruction result with the super-resolution factor of 2 is used in the No. 43 image of the BSD100 data set, and the fact that the detail of the top part of the ship is recovered is observed to be sharper and clearer; the super-resolution reconstruction result with the scale factor of 4 is used in the image No. 82 of the Urban100 data set, and clearer and continuous line part recovery of the application is observed. Through comparison, the obtained reconstructed picture effect is obviously superior to that of other advanced SR methods in objective data or subjective visual comparison, and therefore the method has obvious superiority compared with other SR methods.

In view of the above, in the image super-resolution reconstruction method based on the hybrid depth convolution network according to the embodiment of the present invention, the low-resolution image is scaled to a specified size in the up-sampling stage; then extracting the initial features of the low-resolution images in a feature extraction stage; then, the extracted initial features are sent to a convolution coding and decoding structure to carry out image feature denoising; and finally, performing high-dimensional feature extraction and operation on the reconstruction layer by using cavity convolution, and finally reconstructing a high-resolution image. And the residual error learning is used for quickly optimizing the network, so that the definition and the visual effect of the reconstructed image are better while the noise is reduced.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The above description details an image super-resolution reconstruction method, an image super-resolution reconstruction device and a computer-readable storage medium provided by the present application. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. An image super-resolution reconstruction method is characterized by comprising the following steps:

2. The image super-resolution reconstruction method according to claim 1, wherein the hole convolution network includes a hole convolution layer and a reconstruction layer;

the hollow convolution layer is

The reconstruction layer is

In the formula, F₂For the image feature map, F₃For the output of said void convolution layer, W₄、b₄The convolution weights and bias values for the hole convolution layer,

for convolution operation, I^SRFor the purpose of the super-resolution reconstructed image,

c convolution kernels of size 1 x 64, c the number of image channels, b₅Is the convolution offset value of the reconstruction layer.

3. The image super-resolution reconstruction method according to claim 2, wherein the hole convolution network further includes a jump connection layer; the jump connection layer is used for directly transmitting the image characteristics to a later layer in the network by adding jump connection in the hole convolution network by using residual error learning;

4. The image super-resolution reconstruction method according to claim 3, wherein the extracting the image features of the sub-low resolution image comprises:

in the formula, F₁(X) is the image feature, X is the sub-low resolution image, W₁、b₁Convolution weights and bias values for the sub-networks are extracted for the features respectively,

for convolution operations, the convolution operation of the feature extraction sub-network adds a 0 boundary and a step size of 1, such that the feature extraction sub-network has a zero crossingThe size of the input image and the output image of the feature extraction sub-network remain the same.

5. The image super-resolution reconstruction method according to claim 4, wherein the up-sampling the low-resolution image to be processed to obtain a sub-low-resolution image having the same pixels as the target high-resolution image comprises:

6. The image super-resolution reconstruction method according to any one of claims 1 to 5, wherein the encoding/decoding noise reduction module comprises a convolutional network structure, a deconvolution network structure and a jump connection structure;

the convolutional network structure is

Abstract content used for acquiring the image features;

the jump connection structure is F₂(H₂)＝H₂+F₁Means for transferring the image features directly to a later layer in the network by adding a jump connection to the convolutional network structure using residual learning;

for convolution operation, the convolution operation of the convolution network structure does not add 0 boundary and the step length is 2; h₁For said convolutional netThe output of the network structure and simultaneously the input of the deconvolution network structure, W₃、b₃The theta is deconvolution operation for the convolution weight and the offset value of the deconvolution network structure; f₂Is the output of the jump connection structure.

7. The image super-resolution reconstruction device is characterized by comprising a mixed depth convolution neural network, a super-resolution reconstruction unit and a depth convolution neural network, wherein the mixed depth convolution neural network is used for processing an input low-resolution image to be processed and outputting a reconstructed super-resolution image; the hybrid deep convolutional neural network includes:

8. The image super-resolution reconstruction apparatus according to claim 7, wherein the loss function of the hybrid deep convolutional neural network is:

in the formula,f_losstaking the loss function as a reference, wherein N is the total number of samples in a training sample set, i is the ith sample in the training sample set, F (X) is the super-resolution reconstructed image, Y is a real high-resolution image corresponding to the super-resolution reconstructed image, β is a multiplication coefficient of weight attenuation, and theta is a network weight parameter to be calculated;

wherein,

9. An image super-resolution reconstruction apparatus comprising a processor for implementing the steps of the image super-resolution reconstruction method according to any one of claims 1 to 6 when executing a computer program stored in a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an image super-resolution reconstruction program which, when executed by a processor, implements the steps of the image super-resolution reconstruction method according to any one of claims 1 to 6.