CN113191947A

CN113191947A - Method and system for image super-resolution

Info

Publication number: CN113191947A
Application number: CN202110310875.1A
Authority: CN
Inventors: 赵楠; 肖明宇; 陈南
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-07-30
Anticipated expiration: 2041-03-23
Also published as: CN113191947B

Abstract

The invention belongs to the technical field of image super-resolution, and discloses a method and a system for image super-resolution, wherein the method for image super-resolution comprises the following steps: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN, so that the basic model becomes a special case of the improved model. The utilization of the non-local operation provided by the invention can effectively capture non-local similar information in the image and improve the SR reconstruction performance; compared with the existing CNN model, the method has obvious reconstruction advantages and is prominent in image scenes with rich structural information.

Description

Method and system for image super-resolution

Technical Field

The invention belongs to the technical field of image super-resolution, and particularly relates to a method and a system for image super-resolution.

Background

At present, super-resolution based on learning becomes a mainstream image super-resolution scheme in recent ten years because of its fast calculation speed and excellent performance, which is a ill-posed problem in the computer vision direction. Unlike interpolation and reconstruction-based methods, data-based super-resolution application scenarios are richer and higher reconstructed image quality can be obtained. By learning the mapping relationship implied by the LR and HR images, and then performing super-resolution reconstruction of the images through this relationship, the learning-based super-resolution reconstruction can be divided into, according to the learned object and learning approach: the method comprises a method based on popular learning, a learning method based on an over-complete dictionary, a learning method based on K-nearest neighbor, a method based on example learning and a method based on deep learning.

With the development of deep learning technology in various fields of computers, methods based on deep learning are becoming mature. Each student focuses more attention on the research of deep learning networks, and particularly, the proposal and the perfection of a Convolutional Neural Network (CNN) enable the performance of the deep learning technology in various fields such as image classification, semantic segmentation, target detection and image restoration to be greatly improved. The CNN uses a convolution kernel to replace the field of view in humans, which reduces the amount of computation, effectively retains the features of the image, and more efficiently processes the image. The two ideas are the most prominent in local receptive field and weight sharing, and under the combined extension of the two ideas, the CNN network has the advantages of displacement, scale invariance and the like when extracting features. In the field of super-resolution reconstruction, the CNN model exhibits excellent feature learning ability through a network structure having the above characteristics, which are all necessary conditions for generating a high-quality SR image.

In 2014, Dong first proposed using CNN to process image super-resolution, proposed SRCNN algorithm, and introduced CNN into SR field. According to the algorithm, the LR image is subjected to up-sampling processing, then the LR image is input into a network structure with the depth of 3, three parts of feature extraction, nonlinear mapping and image reconstruction are combined, the LR-HR mapping relation is learned in an end-to-end mode, and the research heat trend of SR is started. Dong then further proposed FSRCNN. Because SRCNN firstly interpolates LR images and then reconstructs the LR images in a hyper-resolution mode, the network calculation amount is obviously increased, therefore, the FSRCNN directly puts the LR images into a network for training, and finally reconstructs the images by using an deconvolution layer. Therefore, the network can input the small graph without interpolation, and the calculated amount of the network is greatly reduced because the deconvolution layer is positioned at the tail end of the network. While the network introduces 1 x1 dimensionality reduction and dimensionality enhancement and decomposes the 5 x 5 convolution kernel into 2 3 x 3 convolution kernels, which again reduces the network computational load. Later networks all use this method to reduce the amount of data computation. The ESPCN appearing hereafter is that the quantity of operation of the network is increased by up-sampling the input to the network through an interpolation method, and a large amount of calculation redundancy is generated by using deconvolution, so that noise is amplified while the image is amplified, and the quality of the reconstructed image is influenced.

The main component of ESPCN is a sub-pixel convolution layer (sub-pixel convolution layer). Inputting an original low-resolution image through a network, and obtaining a channel number r through three convolution layers²(r²Is the image magnification times) of the same size as the input image. Then r of each pixel of the characteristic image is calculated²The channels are rearranged into an r x r region corresponding to an r in the high resolution image²Sub-blocks of size H x W, thus of size r²The characteristic images of H × W are rearranged into high resolution images of 1 × rH rW. This transformation, although referred to as sub-pixel convolution, does not actually perform a convolution operation. The process of image magnification from low resolution to high resolution, interpolation function is implicitly contained in the previous convolutional layer, and can be learned automatically. The image size is only transformed in the last layer, and the former convolution operation is performed on the low-resolution image, so that the efficiency is higher. The sub-pixel convolution layer is widely applied to a more advanced SR model in the future, then the appeared VDSR considers that SR is modeled in an HR space, an HR picture can be decomposed into high-frequency information and low-frequency information, the input picture and the output picture share the same low-frequency information, SRCNN transmits the input to the tail end to construct a residual error, the concept of the method is similar to that of automatic coding, and the training time can be consumed in the automatic codingAnd modeling residual errors to accelerate convergence speed. The model introduces a residual error network, and the size of the feather map is ensured to be unchanged by supplementing 0 in each layer, so that the performance of the CNN network is greatly improved. Kim et al introduces the cyclic learning idea of the Recurrent Neural Network (RNN) into the design of the CNN model, proposes a DRCN model, further refreshes SR reconstruction performance through parameter sharing, inputs an interpolated image into the DRCN, and divides the image into three modules, the first is an Embedding network, which is equivalent to feature extraction, the second is an Inference network, which is equivalent to nonlinear mapping of features, and the third is a reconstruction network, which is to restore the final reconstruction result from the feature image. The reference network is a recursive network, i.e. data passes through the layer cyclically a plurality of times. This loop is unrolled to be equivalent to a plurality of concatenated convolutional layers using the same set of parameters. The authors of DRRN were subsequently inspired by ResNet, VDSR and DRCN, which used deeper network structures to achieve performance gains, each residual unit in DRRN has in common one identical input, i.e. the output of the first convolutional layer in the recursive block. Each residual unit contains 2 convolutional layers. Within a recursive block, convolutional layer parameters that are correspondingly located the same in each residual unit are shared. The LapSRN proposed later is combined with the traditional Laplacian pyramid algorithm to design a set of cascaded pyramid deep learning models, and the models mainly have the following three characteristics: one approach is to use a predefined upsampling operation (e.g., bicubic) to obtain the spatial size of the target before the input image enters the network, which adds additional computational overhead and also results in visible reconstruction artifacts. Some methods use sub-pixel convolution layers or deconvolution layers to replace predefined up-sampling operations, and the network structures of the methods are relatively simple and have poor performance, so that complex mapping from a low-resolution image to a high-resolution image cannot be learned. Secondly, when the L2-type loss function is used in training the network, a fuzzy prediction is inevitably generated, and the recovered high-resolution picture is often too smooth. Thirdly, when reconstructing high-resolution images, if onlyWith one up-sampling operation, it is difficult to obtain a large multiple (more than 8 times) of up-sampling factor. And in different applications, models with different upsampling multiples need to be trained.

Most SR algorithms based on cnni (conditional neural network) do not fully consider the non-local similarity of images, and this property is proved to effectively improve the reconstruction performance of images in the conventional non-local method. Meanwhile, only a few studies are currently conducted to explore how to combine the non-local self-similarity of the image with deep learning, and an SR algorithm with greater potential is developed.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the up-sampling input to the network by the interpolation method can increase the operation amount of the network, and the use of deconvolution can generate a large amount of calculation redundancy, so that the noise is amplified while the image is amplified, and the quality of the reconstructed image is influenced.

(2) The pyramid deep learning model requires the use of a predefined upsampling operation to obtain the spatial size of the target, which adds additional computational overhead and also results in visible reconstruction artifacts.

(3) Some methods use sub-pixel convolution layer or deconvolution layer to replace pre-defined up-sampling operation, and the network structure of these methods is relatively simple and has poor performance, and complex mapping from low-resolution images to high-resolution images cannot be learned.

(4) When the loss function of the type L2 is used in training the network, a blurred prediction is inevitably generated, and the recovered high resolution picture is often too smooth.

(5) When reconstructing high-resolution images, if only one up-sampling operation is used, it is difficult to obtain a large number of up-sampling factors, and in different applications, models with different up-sampling times need to be trained.

(6) Most of the SR algorithms based on CNNI do not fully consider the non-local similarity of the image, and this property is proved to effectively improve the reconstruction performance of the image in the conventional non-local method.

The difficulty in solving the above problems and defects is:

by introducing PCA (principal component analysis) dimension reduction, image denoising and image hyper-resolution are combined, a non-local similar block of an image is extracted by using a block matching algorithm, an L1 type loss function is used during training, and end-to-end image reconstruction is realized by using 3 DCNN. The flow is worth noting that original information needs to be reserved while PCA dimension reduction is carried out, so that it is very critical to define data for dimension reduction and data not for dimension reduction, complexity is reduced as much as possible by using a KNN algorithm in block matching, and finally a reasonable network structure needs to be designed when 3DCNN is designed.

The significance of solving the problems and the defects is as follows:

experiments prove that the basic model and the improved model of the method both realize effective performance improvement compared with the prior method. The improved model realizes the best SR performance in the comparison algorithm under the condition of the same parameter quantity as the basic model, and provides a new idea in the field of image super-resolution.

Disclosure of Invention

The invention provides an image super-resolution method and system aiming at the defects of a CNN reconstruction model in capturing non-local self-similarity information of an image, in particular to a non-local image super-resolution method, system, medium, equipment and processing terminal based on block matching and a 3D convolution neural network.

The invention is realized in such a way that the method for super-resolution of the image comprises the following steps: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN, so that the basic model becomes a special case of the improved model.

Further, the 3D convolutional neural network includes three parts, which are:

1) the single-layer 3D convolutional layer and ReLU combination is used for converting the input image block set into a feature space;

2) a feature subnetwork for extracting local and non-local information;

3) a single-layer 3D convolutional layer for outputting a residual image block set; the designed 3D network output is the sum between the set of residual image blocks and the set of input image blocks.

Further, the image super-resolution method comprises the following steps:

step one, comparing a traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network as a training model of an image, wherein the third-dimensional information of the model uses non-local similar properties;

extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;

performing PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of the image, and performing preprocessing for block matching;

step four, extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;

and fifthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.

Further, in the first step, compared with the conventional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similar properties, including:

the first two dimensions of the set of three-dimensional image blocks hold image 2D local information, while the third dimension holds non-local information. For such three-dimensional data, the SR image will be obtained more efficiently by using a more powerful 3d convolution.

The 2D convolution extracts the features in the local neighborhood of the previous layer of feature map, and is different from the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more K_dDimensionality, hence 3D convolution kernelHas a size of (K)_h,K_w,K_d). Each time the sliding window is correlated with a value in the window, a value in the output 3D image is obtained. Wherein the depth of convolution kernel is smaller than the input layer depth, and the kernel size<The size of the channel; the 3D convolution kernel can be moved in all three directions, i.e. height, width and path of the image, and at each position, element-by-element multiplication and addition results in a value that enhances the ability to obtain data information for such three-dimensional data processing. Where the 2D convolution can be expressed by the formula:

wherein the content of the first and second substances,

the jth feature location, which may represent the current/, is a value of (x, y), where i represents the location coordinate of the previous layer feature map set,

expressed as the value at the (P, q) position of the kernel when the connection is the ith feature map, P_lAnd Q_lThe size of the convolution kernel. The 3D convolution is defined as:

wherein the content of the first and second substances,

represents the value of the j (th) 3D position of the current l layer as (x, y, z), R_lThird-dimensional scale information representing the 3D convolution.

For a three-dimensional image block set, 3D convolution refers to that in a small three-dimensional receptive field range, the calculated target pixel value is a joint weighted average of local and non-local pixel points. With the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved.

And capturing local and non-local information of the image block set by using the 3D convolutional layer through a full convolutional 3D network in network design. The whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted through convolution kernels, a residual learning strategy is adopted to relieve the problem of gradient disappearance or explosion, more network layers can be stacked and used, or more features are extracted from each layer, namely input and output jump connection of the network. Generally, the data size is reduced after convolution operation, and the sizes of the network input and output are ensured to be consistent through zero filling operation on each layer of convolution layer in the training process. Whereas for the activation function, the proposed method chooses ReLU, then the 3D convolution and nonlinear operation of layer i can be defined as:

H_l(H_l-1)＝max(0,W_l*H_l-1+b_l)；

wherein H_l-1The output of the previous layer, i.e. the input of the current layer, is represented. Definition of W_l*H_l-1Is a 3D convolution operation of the current layer. The 3DCNN was designed to learn the mapping f (Y) between LR image patch set Y and HR image patch set X as close to X as possible. 3DCNN takes a set of LR image blocks as input (i.e., H)₀Y), then the first layer output of the network can be expressed as:

H₁＝H₁(Y)＝max(0,W₁*Y+b₁)；

the depth of the network is set to be D, namely D layers of convolution layers exist. In the basic model of 3DCNN, a total of 8 convolutional layers are included, i.e., D is 8. The output of the entire network, f (y), may be calculated as:

F(Y)＝H₈(H₇(......H₁(Y)))+Y。

further, in step three, subjecting the partial data set to PCA processing to reduce dimensions while removing noise and redundant information while retaining original information of the image includes:

(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing image information redundant parts, specifically obtaining image similar characteristics, and removing image noise to ensure non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.

For n x m image samples, there are n samples, each row is m dimensions, for samples without m dimensions by interpolation to m dimensions, the real matrix of p x n can be decomposed as:

X＝U∑V^T；

the dimension of the orthogonal array U is n multiplied by m, the dimension of the orthogonal array V is m multiplied by m, and the orthogonal array satisfies the following conditions: UU (unmanned underwater vehicle)^T＝V^TV is 1; Σ is an m × m diagonal matrix. Divide sigma into r rows, denoted as sigma_r(ii) a Obtaining a dimension reduction data sample Y by utilizing U and V_r：

Y_r＝U∑_r；

(2) BM algorithms are employed to extract non-local blocks of LR images. First, data sample Y_rIs broken into a set of images of size p

Wherein N is (d)_h-p+1)×(d_wP +1) represents the number of blocks broken up by the whole image. For omega_pThe ith target image block p (i) can be calculated as:

p(i)＝R(i)y_bic；

wherein

d_wIs a binary sparse matrix which represents the extracted ith image block.

(3) The BM algorithm collects a series of close pictures using a search window of size s x s based on euclidean distance. Finally, the pictures collected in each window are overlapped together to be used as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, from the reduced-dimension data set Y_rA data blocks are selected, and then b number of blocks are selected from the original data set, wherein a + b is k. For the generated HR image x corresponding to it, the same is usedLine processing to form an HR image block set

Meanwhile, collecting similar image blocks in an HR image requires ensuring non-local similar positions in the interpolated image, rather than recalculating the distance search. The 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.

Further, in step four, the extracting non-local similar image blocks from the LR image dataset by using the conventional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks includes:

the constructed 3DCNN is output as a series of enhanced image block sets

Ω_YAll the sets of enhanced image blocks can be expressed as

Forming a set of image blocks; final SR image

By

The image reconstruction recovery in (1), defined as:

in view of

Are repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values:

wherein the content of the first and second substances,

is a constant column vector with an element value of 1. Finally, the SR picture

Can be calculated as:

another object of the present invention is to provide an image super-resolution system applying the image super-resolution method, the image super-resolution system comprising:

the 3DCNN basic model building module is used for comparing a traditional super-resolution model and designing a 3DCNN basic model based on an 8-layer full convolution network, and the third-dimensional information of the model uses non-local similar properties;

the image non-local characteristic extraction module is used for extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network;

the dimensionality reduction processing module is used for carrying out dimensionality reduction on part of the data set through PCA (principal component analysis), removing noise and redundant information, simultaneously reserving original information of an image, and carrying out preprocessing for block matching;

the non-local similar image block extraction module is used for extracting a non-local similar image block from an LR image data set by adopting a traditional block matching method;

the non-local similar image block characterization module is used for stacking image blocks to form a three-dimensional image block set to characterize the non-local image blocks;

and the SR image output module is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similarity;

carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of an image, and carrying out preprocessing for block matching;

extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;

and outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an information data processing terminal for implementing the system for image super-resolution.

By combining all the technical schemes, the invention has the advantages and positive effects that: the image super-resolution method provided by the invention is used for carrying out deep research aiming at the problem of insufficient extraction of image non-local similar information in a CNN model, and carrying out the following two works from different angles: from a data level, a non-local SR method based on block matching and a 3D convolutional neural network is provided. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and forms a three-dimensional image block set. Based on the three-dimensional image block set, the method constructs and trains a 3D convolutional neural network for extracting local and non-local similar information in the 3D convolutional neural network, and learns the mapping relation between the LR-HR image block sets. Finally, the method reconstructs an HR image from the set of predicted image blocks. Starting from a network structure, an image SR model based on a non-local neural network is provided. The method reforms the existing non-local operation based on the CNN, combines the non-local operation with the traditional CNN structure, and provides a mixed residual error unit. With the mixed residual unit as a cyclic unit, the method constructs a cyclic network for extracting local and non-local information of the image in the LR space. And finally, converting the features into an HR space by utilizing an up-sampling network and realizing the reconstruction of an HR image. Experimental results show that the utilization of non-local operation can effectively capture non-local similar information in an image and improve SR reconstruction performance. Compared with the existing CNN model, the proposed non-local residual network shows obvious reconstruction advantages and is prominent in image scenes with rich structural information.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for super-resolution of an image according to an embodiment of the present invention.

FIG. 2 is a block diagram of a system for super-resolution of images according to an embodiment of the present invention;

in the figure: 1.3 DCNN basic model building module; 2. an image non-local characteristic extraction module; 3. a dimension reduction processing module; 4. a non-local similar image block extraction module; 5. a non-local similar image block representation module; 6. and an SR image output module.

Fig. 3 is a schematic diagram of a network model according to an embodiment of the present invention.

Fig. 4 is a comparison diagram of operations of 2D convolution and 3D convolution according to an embodiment of the present invention.

Fig. 4(a) is a schematic diagram of a 2D convolution according to an embodiment of the present invention.

Fig. 4(b) is a schematic diagram of 3D convolution according to an embodiment of the present invention.

FIG. 5 is a schematic diagram showing the ratio r of the selected data set to the PCA part and the non-PCA part compared with the original data result.

FIG. 6 is a comparison of data sets generated by different methods provided by embodiments of the present invention.

Fig. 7 is a schematic diagram of a model with and without a residual error introduced into a residual error network according to an embodiment of the present invention.

Fig. 8(a) shows an original image (PSNR/SSIM) according to an embodiment of the present invention.

FIG. 8(b) is a schematic diagram of Bicubic (29.55/0.8432) provided by the embodiment of the present invention.

FIG. 8(c) is a schematic diagram of ScSR (30.77/0.8749) provided in the embodiment of the present invention.

FIG. 8(d) is a diagram of SelfExSR (31.18/0.8859) according to an embodiment of the present invention.

Fig. 8(e) is a schematic diagram of an srnnn (31.36/0.8882) provided in the embodiment of the present invention.

FIG. 8(f) is a schematic diagram of FSRCNN (31.50/0.8909) provided by an embodiment of the present invention.

FIG. 8(g) is a schematic illustration of an unincorporated pca our (32.89/0.9106) provided by an embodiment of the present invention.

FIG. 8(h) is a schematic drawing of a lead-in pca our (33.01/0.9109) provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a method and a system for super-resolution of images, which are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for super-resolution of an image provided by an embodiment of the present invention includes the following steps:

s101, compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and non-local similarity is used for third-dimensional information of the model;

s102, extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;

s103, carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, simultaneously removing noise and redundant information, retaining original information of the image, and carrying out preprocessing for block matching;

s104, extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;

and S105, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.

As shown in fig. 2, the system for super-resolution of images provided by the embodiment of the present invention includes:

the 3DCNN basic model building module 1 is used for designing a 3DCNN basic model based on an 8-layer full convolution network by comparing a traditional super-resolution model, and the third-dimensional information of the model has non-local similar properties;

the image non-local characteristic extraction module 2 is used for extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network;

the dimensionality reduction processing module 3 is used for carrying out PCA processing dimensionality reduction on part of the data set, simultaneously removing noise and redundant information, simultaneously reserving original information of the image, and carrying out preprocessing for block matching;

a non-local similar image block extraction module 4, configured to extract a non-local similar image block from an LR image dataset by using a conventional block matching method;

the non-local similar image block characterization module 5 is used for stacking image blocks to form a three-dimensional image block set as a non-local image block;

and the SR image output module 6 is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.

The technical solution of the present invention will be further described with reference to the following examples.

Example 1

Aiming at the problem of insufficient extraction of image non-local similar information in a CNN model, the invention carries out deep research and develops the following two works from different angles: from a data level, a non-local SR method based on block matching and a 3D convolutional neural network is proposed. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and forms a three-dimensional image block set. Based on the three-dimensional image block set, the method constructs and trains a 3D convolutional neural network for extracting local and non-local similar information in the 3D convolutional neural network, and learns the mapping relation between the LR-HR image block sets. And finally, reconstructing an HR image from the predicted image block set by the method. Starting from a network structure, an image SR model based on a non-local neural network is provided. The method modifies the existing non-local operation based on CNN and combines the non-local operation with the traditional CNN structure to provide a mixed residual error unit. With the mixed residual unit as a cyclic unit, the method constructs a cyclic network for extracting local and non-local information of the image in the LR space. And finally, converting the features into an HR space by utilizing an up-sampling network and realizing the reconstruction of an HR image. Experimental results show that the utilization of non-local operation can effectively capture non-local similar information in an image and improve SR reconstruction performance. Compared with the existing CNN model, the proposed non-local residual error network has obvious reconstruction advantages and is prominent in image scenes with rich structural information.

Aiming at the defects of a CNN reconstruction model in capturing image non-local self-similarity information, the invention provides a non-local image super-resolution model based on a block matching and 3D convolutional neural network and an image super-resolution model based on a non-local convolutional neural network.

The method comprises the steps of firstly, selecting partial data in a data set, reducing the dimension of the data by adopting principal and subordinate analysis (PCA), removing image information redundant parts, specifically obtaining image similar characteristics, removing image noise and ensuring non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.

Second, non-locally similar image blocks are extracted from the LR image dataset according to formula and characterized. The extraction method adopts a traditional block matching method to extract non-local similar blocks, and finally, image blocks are stacked to form a three-dimensional image block set as a representation of the non-local image blocks.

And thirdly, designing a 3D convolutional neural network to extract the information of the image block set and enhance the non-local image blocks. As shown in fig. 3, the constructed 3D network includes three parts, respectively: 1) a single-layer 3D convolutional layer and ReLU combination for converting the input image block set into a feature space; 2) a feature subnetwork for extracting local and non-local information; 3) a single-layer 3D convolutional layer for outputting a residual image block set. The designed 3D network output is the sum between the set of residual image blocks and the set of input image blocks.

And fourthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.

Example 2

The super-resolution method provided by the embodiment of the invention comprises the following steps: by combining non-local self-similarity of the image, the image SR is processed by utilizing a 3D convolutional neural network (3 DCONVOLUONAL NeuralNet, 3DCNN) for the first time, and a non-local super-resolution method based on 3DCNN is provided. According to the method, the 3DCNN modeling non-local similarity is directly utilized, and the non-local similarity information of the natural image is extracted. A 3DCNN base model (Basemodel) based on an 8-layer full convolution network was constructed. Then, on the basis, 3D network design in 3DCNN is further researched, an improved model based on RNN is provided, and the basic model becomes a special case of the improved model.

A schematic diagram of a network model provided by the embodiment of the present invention is shown in fig. 3.

The super-resolution method provided by the embodiment of the invention comprises the following steps:

firstly, most SR methods based on CNN are based on the traditional network structure, the non-local self-similarity is not fully utilized, compared with the traditional super-resolution model, a 3DCNN basic model based on 8-layer full convolution network is designed, and the non-local similarity is used for the third-dimensional information of the model.

And step two, extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network.

And thirdly, performing PCA (principal component analysis) processing on part of the data set to reduce the dimension, wherein the dimension reduction can make the data set easier to use, reduces the calculation overhead of an algorithm, simultaneously removes noise and redundant information, retains the original information of the image, and performs preprocessing for the next block matching.

And step four, extracting non-local similar image blocks from the LR image data set and characterizing the non-local similar image blocks. The extraction method adopts a traditional block matching method to extract non-local similar blocks, and finally, the image blocks are stacked to form a three-dimensional image block set as a representation of the non-local image blocks.

And step five, outputting the image block set after training in the 3DCNN, and outputting the reconstructed SR image.

The third step comprises the following steps:

(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, specifically obtaining image similar characteristics, removing image noise and ensuring non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.

X＝U∑V^T；

the dimension of the orthogonal array U is n × m and the dimension of the orthogonal array V is m × m (the orthogonal array satisfies: UU)^T＝V^TV ═ 1), and Σ is a diagonal matrix of m × m. Next, Σ is divided into r rows, denoted as Σ_r(ii) a The dimension reduction data sample Y can be obtained by utilizing U and V_r：

Y_r＝U∑_r；

A conventional BM algorithm is then employed to extract non-local blocks of the LR image. First, data sample Y_rIs broken into a set of images of size p

p(i)＝R(i)y_bic；

wherein

d_wIs a binary sparse matrix which represents the extracted ith image block.

2) The BM algorithm then collects a series of close pictures using a search window of size s x s based on euclidean distance. Finally, the pictures collected in each window are overlapped together to be used as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, in order to fully utilize the non-local and local characteristics of the image, from the dimension-reduced data set Y_rA data blocks are selected, and then b number of blocks are selected from the original data set, wherein a + b is k. For the generated HR image x corresponding to it, the same processing is performed to form a set of HR image blocks

Meanwhile, collecting similar image blocks in an HR image requires ensuring an interpolated imageRather than to recalculate the distance search. The 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.

The first step comprises the following steps:

convolution mostly refers to 2D convolution, which is good at processing two-dimensional data information. But for three-dimensional data (e.g., video), 3D convolution can be more efficiently processed. In the proposed method, the first two dimensions of the set of three-dimensional image blocks hold the 2D local information of the image, while the third dimension holds the non-local information. For such three-dimensional data, the SR image will be more efficiently obtained by using a more powerful 3d convolution.

The 2D convolution extracts the features in the local neighborhood of the feature map of the previous layer, and is different from the 2D convolution in that the input image has one depth dimension and the convolution kernel has one K dimension more_dDimension, and therefore the size of the 3D convolution kernel is (K)_h,K_w,K_d). Each time the sliding window is correlated with the value in the window, a value in the output 3D image is obtained. Wherein the depth of convolution kernel is smaller in size than the input layer depth (kernel size)<Channel size). The 3D convolution kernel can be moved in all three directions (height, width, channel of the image), and at each position, element-by-element multiplication and addition will result in a value, which enhances the ability to acquire data information for such three-dimensional data processing. Where the 2D convolution can be expressed by the formula:

wherein the content of the first and second substances,

denoted as the connection is the ithThe value at the kernel is (P, q) position, P, in the feature map_lAnd Q_lThe size of the convolution kernel. Like the 2D convolution formula, the 3D convolution can be defined as:

wherein the content of the first and second substances,

represents the value of the j (th) 3D position of the current l layer as (x, y, z), R_lThird-dimensional scale information representing the 3D convolution. A comparison of the 2D convolution kernel 3D convolution operation is shown in fig. 4.

Local and non-local information of the image block set is captured by a full convolution 3D network through a 3D convolution layer (Conv 3D for short) in network design. The whole network consists of a series of nonlinear units and a 3D convolution layer, the characteristics are extracted through convolution kernels,

residual learning strategies are employed to alleviate gradient vanishing or explosion problems and may be stacked to use more network layers or have each layer extract more features, i.e. input and output hopping connections of the network. Generally, the data size is reduced after convolution operation, and the sizes of the network input and output are ensured to be consistent through zero filling operation on each layer of convolution layer in the training process. For the activation function, the proposed method uses ReLU, and the 3D convolution and nonlinear operation of the ith layer can be defined as:

H_l(H_l-1)＝max(0,W_l*H_l-1+b_l)；

H₁＝H₁(Y)＝max(0,W₁*Y+b₁)；

F(Y)＝H₈(H₇(......H₁(Y)))+Y。

the fourth step comprises the following steps:

the last step of the proposed method is to reconstruct an SR image from the enhanced image block. Constructed 3DCNN output is a series of enhanced image block sets

Ω_YAll the sets of enhanced image blocks can be expressed as

A set of image blocks is formed. Final SR image

By

And (4) image reconstruction recovery. It can be defined as:

in view of

Are repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping elementsThe value:

wherein the content of the first and second substances,

is a constant column vector with an element value of 1. Finally, the SR picture

Can be calculated as:

example 3

Firstly, setting up model

1) Training set and test set

A conventional 291 image set was used consisting of 91 images from Yang et al and 200 images from the BSD. This data set is widely used for the training of SR models. The data set obtained by subjecting the 291 images to PCA processing and the original data set of the images not subjected to PCA processing are collectively used as the desired LR image block set. The invention selects common Set5, Set14 and BSD100 test sets to evaluate the SR performance of the algorithm, the image contents are rich and various, including human beings, animals, plants, natural landscapes, buildings and the like, and PSNR and SSIM are adopted as objective indexes for evaluating the SR performance.

2) Training arrangement

In the designed 3D convolutional network, each convolutional layer, except the output layer, uses 64 3D convolutional kernels with size of 3 × 3 × 3 to extract features. All models are developed and realized on a laboratory PC machine based on a Tensorflow deep learning framework. The software adopts Windows10 operating system, Python3.7 and Tensflow2.0, and the hardware adopts Intel core i7-3770CPU, Intel core GTX1080ti video card and 16G memory.

3) Model set-up

The weights of the convolutional layers were initialized using the method proposed by He et al, which was verified to combine better with ReLU. The optimizer employs an Adam optimizer to train all experimental models with the parameters set to: beta 1-0.9, beta 2-0.999, epsilon-10^-8. Optimizer initial learning setting is 3 × 10^-4And the learning rate is reduced by 10 times per 30 training rounds. The data mini-batch size during training is set to 512.

The method is characterized in that a residual model is added on the premise of a basic model, a residual element mainly comprises two functions of shortcut connection and identity mapping, the shortcut connection enables the residual to have possibility, and the identity mapping enables a network to be deeper through activating functions and jumping connection. The 3D network adopts an end-to-end training mode, and the loss function is used for guiding the updating of the model parameters.

On the image SR task, minimize l₂The loss function is equivalent to maximizing the PSNR. However, some work in recent years has shown that₁The loss function has great potential. In the proposed method, l₁The loss function is expressed as a model loss function as:

l(F(Y),X)＝||F(Y)-X||；

thus the method selects₁The loss function is taken as a model loss function. Through continuous training of the model, the weight parameters are optimized until the loss function value is stabilized in the interval [0, 0.0005 ]]。

Second, experimental comparison index

Judging the image reconstruction performance by comparing the PSNR and the SSIM of the image, wherein for the image reconstruction, the PSNR is determined by the maximum pixel value and the image seen mean square error; given an m x n pre-compressed image x, the compressed reconstructed image is defined by the mean square error and PSNR as:

the idea of SSIM is to measure the structural similarity between two images, rather than the inter-pixel differences like PSNR. The basic assumption is that the human eye is more sensitive to changes in image structure. The SSIM between x and the compressed reconstructed image y can be defined as:

wherein u is_xOr u_yMean value of pixels, σ, representing original image x or compressed reconstructed image y_xOr σ_yExpressing the standard deviation, σ, of the pixels of x or y_xyRepresents sigma_xAnd σ_yCovariance between c₁And c₂Indicating a constant disturbance prevents instability.

Third, experimental analysis

1) Non-local block set

The method comprises the steps of selecting 291 image sets, decomposing the 291 image sets into 2750 image blocks with the size of 100 x 100 by using the step size of 100, dividing the 2750 image blocks into 2 parts according to a certain proportion r (0< r <1), carrying out PCA (principal component analysis) dimension reduction processing on one part of the image blocks to extract similar feature image sets to ensure that non-local information is not doped with noise, and using the other part of image blocks which are not subjected to PCA processing to ensure that the local information is not lost. The KNN algorithm is used for collecting the 2 parts of block sets. Finally, the 2 partial image block sets are combined together, and a total of 275 ten thousand image block sets are used as a training set of the model. Therefore, according to the method, the following parameters are considered, including the window size s of the KNN search, the size p of the scattered image block, the non-local scale k and the proportion r of dimension reduction processing data and selected data. Theoretically, the larger the search window s, the more accurate the search is, and the larger the non-local scale k, the more information is extracted. However, the increase in parameters is too costly for 3DCNN data calculation, and the window size 31 may be selected most appropriately based on experience from others before. P, k, s are then determined experimentally.

Experimental model the model discussed above was chosen.

Table 1 shows the test results of PSNR under SET5 and SET14 under the condition of selecting different search windows s and non-local scale k to make 3-fold SR, and three parameters of 5, 7 and 9 are selected as an comparison scheme in consideration of the calculation cost experiment to determine the appropriate search window s and non-local scale k.

Table 1 image Block set and Performance Table

The results in table 1 show that a larger size p leads to higher PSNR values, and increasing the non-local scale k improves SR performance. Finally, the invention selects the settings: p-9, k-9, i.e. a set of non-local image blocks of size 9 × 9 × 9 was used in the experiments that follow.

In order to further verify the advantages of the non-local image block set selected after introducing PCA dimension reduction, several sets of comparative experiments are designed to illustrate, and the advantages of PCA dimension reduction on the selection of the non-local similarity block are verified by comparing the ratio r of the data set selected and subjected to the dimension reduction processing with the selected data set. Wherein r is 0,0.1,0.2,0.3,0.4 and 0.5 respectively. r represents the proportion r of the image subjected to PCA processing in the data set, and the comparison scheme proves that the non-local similarity block of the image can be better extracted when partial data in the data are subjected to PCA processing. Firstly, the optimal dimensionality reduction is realized by removing the redundant information which has larger influence than dimensionality reduction when PCA dimensionality reduction is carried out on a verification set through experimental verification. In the experiment, the redundancy of a data set is removed by adopting PCA, the PCA reduces the dimension of high-latitude space data to a low-dimension space through linear mapping, the data variance of the low-latitude space needs to be large as much as possible, so that the dimension can be effectively reduced under the condition of keeping the original data point relation unchanged, and based on the principle, the relation between the dimension and the precision after the dimension reduction is obtained by calculating the proportion of each principal component variance.

TABLE 2 PCA similarity accuracy comparison of different dimensions

Dimension (d) of	128	64	32	16	8
						Accuracy of measurement	0.9133	0.9141	0.9237	0.9049	0.8188

It can be seen from table 2 that the selected dimension is too large to remove redundant information of the image, and when the selected dimension is too small, the PCA reduces the dimension more than the redundant information, so the dimension 32 is selected as the dimension for reducing the dimension in the following experiment.

2) Experimental contrast plot of ratio r between PCA-processed and unprocessed images of a data set

After the dimension reduction is determined, experiments verify that the effect is the best when the dimension reduction data accounts for a certain proportion in the image data, the 3DCNN cannot only pay attention to non-local similarity, and local information needs to be extracted as much as possible. The data selected as PCA is randomly selected to ensure the limitation of the experiment, and multiple pairs of experiments are performed by multiple times of random selection and the result is subjected to mean value processing, so that the proportion r of the data subjected to PCA processing to the part not subjected to PCA processing needs to be proved by the experiment.

As can be seen from fig. 5, when the proportion of the PCA processing part is selected to be 0.15, the experimental result is the best, and the non-local information of the image can be effectively retained while denoising. As the selected part becomes larger, the experimental effect becomes worse, and the local information of the image is seriously lost. The proportion of the portion to be treated as the PCA was selected to be 0.15 for the next experiment.

3) Different methods for making comparison graph under data set

Next, comparative experiments were performed to demonstrate the effectiveness of the non-local similarity application design in 3DCNN and the effectiveness after PCA treatment was selected. Firstly, a training set is reproduced, image blocks in a comparison image block set are similar image blocks which are randomly selected and searched by KNN without PCA processing, and then the similar image blocks which are searched by KNN after the PCA processing and the similar image blocks which are searched by KNN without PCA processing are compared.

As can be seen from fig. 6, when extracting non-local information of an image, a randomly scattered data set is obviously inferior to a processed data set, when the training round is moderate, the performance of an image block selected by the same KNN algorithm, which is processed by PCA, of a part of the data set is superior to that of a data set which is not processed, and when the training round is large, the performance of a part of the data set which is processed by PCA is marginally superior to that of a data set which is not processed. The data set after PCA processing can effectively extract image features, and the interference of redundancy on the data set is reduced. The model can extract enough non-local information from the data set without being processed by PCA after the training turns are gradually increased, and the advantages of the data set part after being processed by PCA are not obvious.

4) Contrast map after introduction of residual block

In order to show the influence caused by the residual error learning strategy, jump connection between the input and the output of the basic model is removed, and others are retrained in the same training mode.

In the basic model, a global residual learning strategy is adopted, namely a residual image block set is learned by a 3D network. As shown in fig. 7, the 2PSNR contrast curve shows that the residual learning strategy in the proposed method achieves faster and more stable model convergence effect and higher reconstruction performance.

5) Image self-fusion

And finally, reconstructing an SR image from the enhanced image block, rotating one image at multiple angles to obtain a group of images, and sending the group of images into a model to obtain a median value of the results so as to obtain a final over-score result. And further, the information fusion of all parts of the image is enhanced, so that the image representation has a better effect.

3.4.3 reconstruction Performance comparison

In this section, the present invention compares the experimental models of the present invention with existing SR algorithms, where they include ScSR based on dictionary learning, SelfExSR based on internal instance self-similarity, SRCNN based on CNN and FSRCNN. Table 3, table 4 summarize PSNR and SSIM performance evaluations of the proposed method and comparative algorithm on Set5 and BSD100 test sets, respectively. As can be seen from the table, the performance of the experimental model provided by the invention is superior to that of other comparative SR models, the PCA dimension reduction treatment introduced by the invention is slightly superior to that of the model without introducing the PCA dimension reduction model in PSNR, and is slightly weaker than that of the model without introducing the PCA dimension reduction model in SSIM. The fact that the information of the overall structure of the image is lost after PCA processing is illustrated, but the performance improvement of the non-local denoising is facilitated. The BSD100 test concentration images are relatively rich in types and complex in structure, so that the BSD100 test concentration image is more applicable and persuasive. In quantitative evaluation of a BSD100 test set, it can be found that PSNR indexes of different images have large promotion change, and the images contain abundant texture information and show strong structural similarity. The improvement of the performance directly shows that the proposed method has advantages in mining the structural similarity information of the image.

TABLE 3 PSNR/SSIM evaluation results of different algorithms on Set5

TABLE 4 PSNR/SSIM evaluation results of different algorithms on BSD100

Next, fig. 8 shows a reconstruction diagram of the proposed method and other SR algorithms. The BSD100 test set is selected from the pictures, and the method provided by the invention shows a more vivid structural outline visually. Particularly in animal images, the proposed method reconstructs clearer animal images, mainly because the proposed method fully considers that very strong non-local similarity appears in animal images. By extracting and enhancing the similar information, the method shows more outstanding SR reconstruction performance for the scenes

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in a computer program product that includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the invention may be generated in whole or in part when the computer program instructions are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present invention disclosed in the present invention should be covered within the scope of the present invention.

Claims

1. A method for super-resolution of an image, the method comprising: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and (3) designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN.

2. The method for image super resolution according to claim 1, wherein the 3D convolutional neural network comprises three parts, respectively:

2) a feature subnetwork for extracting local and non-local information;

3. The method for image super-resolution according to claim 1, wherein the method for image super-resolution comprises the steps of:

step one, compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and non-local similar properties are used for third-dimensional information of the model;

4. The method for super-resolution of images according to claim 3, wherein in step one, the 3DCNN basic model based on 8-layer full convolution network is designed by comparing with the traditional super-resolution model, and the third dimension information of the model has non-local similarity, comprising:

the first two dimensions of the set of three-dimensional image blocks hold image 2D local information, while the third dimension holds non-local information; for such three-dimensional data, the SR image can be more effectively obtained by adopting the 3d convolution with stronger capacity;

the 2D convolution extracts the features in the local neighborhood of the previous layer of feature map, and is different from the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more K_dDimension, and therefore the size of the 3D convolution kernel is (K)_h,K_w,K_d) (ii) a Performing relevant operation on the sliding window and the value in the window each time to obtain a value in the output 3D image; wherein the depth of convolution kernel is smaller than the input layer depth, and the kernel size<The size of the channel; the 3D convolution kernel can move in all three directions, namely the height, the width and the channel of the image, and element-by-element multiplication and addition at each position can obtain a numerical value, so that the capability of acquiring data information is enhanced for the three-dimensional data processing; where the 2D convolution can be expressed by the formula:

wherein the content of the first and second substances,

expressed as the value at the (P, q) position of the kernel when the connection is the ith feature map, P_lAnd Q_lIs the size of the convolution kernel; the 3D convolution is defined as:

wherein the content of the first and second substances,

represents the value of the j (th) 3D position of the current l layer as (x, y, z), R_lThird dimension information representing a 3D convolution;

for a three-dimensional image block set, 3D convolution means that in a small three-dimensional receptive field range, a calculated target pixel value is a combined weighted average value of local and non-local pixel points; with the stacking of the convolution layer, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved;

capturing local and non-local information of an image block set by using a 3D convolutional layer through a full convolutional 3D network in network design; the whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted through convolution kernels, a residual learning strategy is adopted to relieve the problem of gradient disappearance or explosion, more network layers can be stacked and used, or more features are extracted from each layer, namely input and output jump connection of the network; the data size is reduced after convolution operation, and the consistency of the sizes of the network input and output is ensured by zero filling operation on each layer of convolution layer in the training process; for the activation function, the proposed method uses ReLU, and the 3D convolution and nonlinear operation of the ith layer can be defined as:

H_l(H_l-1)＝max(0,W_l*H_l-1+b_l)；

wherein H_l-1Representing the output of the previous layer, namely the input of the current layer; definition of W_l*H_l-1A 3D convolution operation for the current layer; the 3DCNN was designed to learn the mapping f (Y) between LR image patch set Y and HR image patch set X to be as close to X as possible; 3DCNN takes as input a set of LR image blocks, i.e. H₀Y, then the first layer output of the network can be expressed as:

H₁＝H₁(Y)＝max(0,W₁*Y+b₁)；

setting the network depth as D, namely, D layers of convolution layers exist; in the basic model of 3DCNN, a total of 8 convolutional layers are included, i.e., D ═ 8; the output of the entire network, f (y), may be calculated as:

F(Y)＝H₈(H₇(......H₁(Y)))+Y。

5. the method for super-resolution of images according to claim 3, wherein in step three, subjecting the partial data set to PCA for dimensionality reduction while preserving the original information of the images while removing noise and redundant information comprises:

(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, specifically obtaining image similar characteristics, and removing image noise to ensure non-local similarity; meanwhile, the original data of a part of data set is reserved as the other part of the data set, and the original information and the local information of the image are ensured not to be lost;

for n x m image samples, there are n samples, each row is m dimensions, for samples without m dimensions, by interpolation to m dimensions, the real matrix of p x n can be decomposed as:

X＝U∑V^T；

the dimension of the orthogonal array U is n multiplied by m, the dimension of the orthogonal array V is m multiplied by m, and the orthogonal array satisfies the following conditions: UU (unmanned underwater vehicle)^T＝V^TV is 1; Σ is an m × m diagonal matrix; divide sigma into r rows, denoted as sigma_r(ii) a Obtaining a dimension reduction data sample Y by utilizing U and V_r：

Y_r＝U∑_r；

(2) Extracting non-local blocks of the LR image by adopting a BM algorithm; first, data sample Y_rIs broken into a set of images of size p

Wherein N is (d)_h-p+1)×(d_w-p +1) represents the number of blocks broken up by the whole image; for omega_pThe ith target image block p (i) can be calculated as:

p(i)＝R(i)y_bic；

wherein

d_wThe image extraction method comprises the steps of (1) representing an ith extracted image block by a binary sparse matrix;

(3) the BM algorithm collects a series of similar pictures using a search window of size s x s based on Euclidean distance; finally, pictures collected by each window are overlapped together to be used as an LR image set Y required by network training; for each element in the image set Y consisting of K image blocks, from the reduced-dimension data set Y_rSelecting a data blocks, and then selecting b quantity blocks from an original data set, wherein a + b is k; for the generated HR image x corresponding to it, the same processing is performed to form a set of HR image blocks

Meanwhile, collecting similar image blocks in the HR image needs to ensure non-local similar positions in the interpolated image instead of recalculating the distance search; the 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.

6. The method for image super resolution according to claim 3, wherein in step four, said extracting non-local similar image blocks from LR image data set by using conventional block matching method and stacking the image blocks to form a three-dimensional image block set as representing the non-local image blocks comprises:

the constructed 3DCNN is output as a series of enhanced image block sets

Ω_YAll the sets of enhanced image blocks can be expressed as

Forming a set of image blocks; final SR image

By

The image reconstruction recovery in (1), defined as:

in view of

wherein the content of the first and second substances,

is a constant column vector with an element value of 1; finally, the SR picture

Can be calculated as:

7. a system for super-resolution of images, which implements the method for super-resolution of images according to any one of claims 1 to 6, wherein the system for super-resolution of images comprises:

the non-local similar image block representation module is used for stacking image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;

8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

10. An information data processing terminal characterized by being used for a system for realizing the image super-resolution as set forth in claim 7.