CN113191947B - Image super-resolution method and system - Google Patents
Image super-resolution method and system Download PDFInfo
- Publication number
- CN113191947B CN113191947B CN202110310875.1A CN202110310875A CN113191947B CN 113191947 B CN113191947 B CN 113191947B CN 202110310875 A CN202110310875 A CN 202110310875A CN 113191947 B CN113191947 B CN 113191947B
- Authority
- CN
- China
- Prior art keywords
- image
- local
- convolution
- network
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 43
- 230000008569 process Effects 0.000 claims abstract description 9
- 239000010410 layer Substances 0.000 claims description 95
- 238000012545 processing Methods 0.000 claims description 50
- 230000009467 reduction Effects 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 14
- 238000012512 characterization method Methods 0.000 claims description 13
- 238000013461 design Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 239000002356 single layer Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004880 explosion Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000008034 disappearance Effects 0.000 claims 1
- 238000000513 principal component analysis Methods 0.000 description 47
- 238000002474 experimental method Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 7
- 230000006872 improvement Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000007547 defect Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 235000008694 Humulus lupulus Nutrition 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101100164980 Arabidopsis thaliana ATX3 gene Proteins 0.000 description 1
- 101100456616 Arabidopsis thaliana MEA gene Proteins 0.000 description 1
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 101150051720 SET5 gene Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image super-resolution, and discloses a method and a system for image super-resolution, wherein the method for image super-resolution comprises the following steps: combining with the non-local self-similarity of the images, utilizing a 3D convolutional neural network 3DCNN to process the image SR, and providing a non-local super-resolution method based on the 3 DCNN; modeling non-local similarity by directly utilizing 3DCNN, and extracting non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and (3) designing a 3D convolutional neural network in the 3DCNN, and providing an improved model based on the RNN, so that the basic model becomes a special case of the improved model. The utilization of the non-local operation provided by the invention can effectively capture the non-local similar information in the image, and improves the SR reconstruction performance; compared with the prior CNN model, the method has obvious reconstruction advantages, and is prominent in image scenes with rich structural information.
Description
Technical Field
The invention belongs to the technical field of image super-resolution, and particularly relates to a method and a system for image super-resolution.
Background
At present, the super-resolution based on learning has become a mainstream image super-resolution scheme in recent decades because of high calculation speed and excellent performance, and is a ill-posed problem in the visual direction of a computer. Different from interpolation-based and reconstruction-based methods, the super-resolution application scene based on the data is richer, and higher reconstructed image quality can be obtained. By learning the implicit mapping relation between the LR image and the HR image, and then performing super-resolution reconstruction of the image according to the learned object and the learning mode by the relation, the super-resolution reconstruction based on the learning can be divided into: a popular learning-based method, an overcomplete dictionary-based learning method, a K-nearest neighbor-based learning method, an example-based learning method and a deep learning-based method.
With the development of deep learning technology in various fields of computers, deep learning-based methods are becoming mature. More eyes of students are concentrated on the study of a study depth learning network, and particularly, the performance of the depth learning technology in various fields such as image classification, semantic segmentation, target detection, image restoration and the like is greatly improved through the proposal and perfection of a convolutional neural network (Convolutional Neural Network, CNN). The CNN uses convolution kernel to replace the field of vision in human, thus reducing the calculation amount, effectively retaining the characteristics of the image and simultaneously processing the image more efficiently. The local receptive field and the weight sharing are the most prominent two ideas, and under the combination and extension of the two ideas, the CNN network has the advantages of displacement, scale invariance and the like when extracting the characteristics. In the field of super-resolution reconstruction, the CNN model exhibits excellent feature learning ability through a network structure having the above characteristics, which are all necessary conditions for generating a high-quality SR image.
In 2014, dong first proposed to use CNN to process image super resolution, proposed SRCNN algorithm, and introduced CNN into SR field. The algorithm firstly carries out up-sampling processing on the LR image, then inputs the LR image into a network structure with depth of 3, combines three parts of feature extraction, nonlinear mapping and image reconstruction, learns the LR-HR mapping relation in an end-to-end mode, and starts the research hot tide of SR. Then Dong further proposes FSRCNN. Since SRCNN first interpolates the LR image and then reconstruct it in the super-division, the network computation is significantly increased, so FSRCNN directly places the LR image into the network for training and finally reconstruct the image using the deconvolution layer. Thus, the network can input small pictures which are not interpolated, and the network calculation amount is greatly reduced because the deconvolution layer is positioned at the end of the network. At the same time, the network introduces 1×1 dimension reduction and dimension increase and decomposes the convolution kernel of 5×5 into 2 convolution kernels of 3×3, thus reducing the network calculation amount again. The latter networks all use this method to reduce the computational effort of the data. Thereafter ESPCN appears to increase the computational effort of the network for up-sampling the input to the network by interpolation, while the use of deconvolution creates a significant amount of computational redundancy, amplifying the image as well as noise, affecting the quality of the reconstructed image.
ESPCN are mainly subpixel convolutions (sub-pixel convolutional layer). The network inputs the original low resolution image and then through three convolution layers, obtains a feature image with the same size as the input image with the channel number r 2(r2 being the image magnification). The r 2 channels of each pixel of the feature image are rearranged into an r region, and the r 2 H W sub-block is corresponding to the r 2 H W feature image, so that the r 2 channels of each pixel of the feature image are rearranged into a 1H rw high-resolution image. This transformation, while referred to as sub-pixel convolution, does not actually have a convolution operation. The image is enlarged from low resolution to high resolution, the interpolation function is implicitly contained in the previous convolution layer, and can be automatically learned. The previous convolution operation is efficient because it is performed on a low resolution image, with only the last layer transforming the image size. The sub-pixel convolution layer is widely used in a more advanced SR model in the future, the later VDSR considers that SR models in HR space, HR pictures can be decomposed into high-frequency information and low-frequency information, the input and output pictures share the same low-frequency information, SRCNN transmits the input to the tail end to construct residual errors, the method is similar to the concept of automatic coding, training time is consumed in automatic coding, and the method provides direct modeling of the residual errors to accelerate convergence speed. The model introduces a residual network, ensures that featrue map sizes are unchanged by supplementing 0 in each layer, and greatly improves the performance of the CNN network. Kim et al introduced the cyclic learning idea of the cyclic neural network (RecursiveNeuralNetwork, RNN) into the design of the CNN model, proposed DRCN model, further refreshed SR reconstruction performance through parameter sharing, DRCN input was interpolated images divided into three modules, the first Embedding network equivalent to feature extraction, the second INFERENCE NETWORK equivalent to nonlinear mapping of features, the third Reconstructionnetwork, i.e., recovering the final reconstruction result from the feature images. INFERENCE NETWORK is a recursive network, i.e., the data loops through the layer multiple times. This loop is expanded, equivalent to multiple serially connected convolutional layers using the same set of parameters. The authors of DRRN were then motivated by ResNet, VDSR and DRCN, which utilized deeper network structures to obtain performance improvement, each residual unit in DRRN commonly had one and the same input, i.e., the output of the first convolutional layer in the recursive block. Each residual unit contains 2 convolutional layers. Within a recursive block, the convolution layer parameters corresponding to the same position within each residual unit are shared. The LapSRN provided later combines the traditional Laplacian pyramid algorithm to design a set of cascaded pyramid deep learning models, and the models mainly have the following three characteristics: one approach is to use a predefined upsampling operation (e.g., bicubic) to obtain the spatial dimensions of the object before the input image enters the network, which adds additional computational overhead and also causes visible reconstruction artifacts. Some methods use sub-pixel convolution layer or deconvolution layer to replace pre-defined up-sampling operation, and the network structure of these methods is relatively simple, the performance is poor, and complex mapping from low-resolution image to high-resolution image cannot be learned. Secondly, when an L2 type loss function is used in training a network, fuzzy prediction is inevitably generated, and the recovered high-resolution picture is often too smooth. Thirdly, in reconstructing a high resolution image, if only one upsampling operation is used, it is difficult to obtain a upsampling factor of a large multiple (more than 8 times). And in different applications, models with different upsampling multiples need to be trained.
Most CNNI (ConvolutionalNeuralNetwork) -based SR algorithms do not adequately account for non-local similarity of images, which has been demonstrated to effectively improve image reconstruction performance in conventional non-local methods. Meanwhile, only a few researches at present explore how to combine non-local self-similarity of images with deep learning to discover a SR algorithm with greater potential.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) Up-sampling the input to the network by interpolation increases the computational effort of the network, while using deconvolution creates a large amount of computational redundancy, amplifying the image while amplifying the noise, affecting the quality of the reconstructed image.
(2) The pyramid deep learning model requires the use of predefined upsampling operations to obtain the spatial dimensions of the target, which adds additional computational overhead and also results in visible reconstruction artifacts.
(3) Some methods use sub-pixel convolution or deconvolution to replace pre-defined up-sampling operation, and the network structure of these methods is relatively simple, the performance is poor, and complex mapping from low-resolution image to high-resolution image cannot be learned.
(4) When using the L2 type loss function in training the network, blurred predictions are inevitably generated, and the recovered high resolution pictures tend to be too smooth.
(5) In reconstructing high resolution images, if the upsampling operation is used only once, it is difficult to obtain a large number of upsampling factors, and in different applications, it is necessary to train models of different upsampling factors.
(6) Most CNNI-based SR algorithms do not adequately account for non-local similarity of images, which has been demonstrated to effectively improve image reconstruction performance in conventional non-local methods.
The difficulty of solving the problems and the defects is as follows:
Through introducing PCA dimension reduction, image denoising and image super-division are combined, and meanwhile, non-local similar blocks of an image are extracted by using a block matching algorithm, an L1 type loss function is used during training, and end-to-end image reconstruction is realized by using 3 DCNN. The process is remarkable in that original information needs to be reserved while PCA (principal component analysis) is used for reducing the dimension, so that it is very critical to define data which are well subjected to dimension reduction processing and data which are not subjected to dimension reduction processing, meanwhile, complexity is reduced as much as possible by using a KNN algorithm in block matching, and finally, a reasonable network structure needs to be designed when 3DCNN is designed.
The meaning of solving the problems and the defects is as follows:
Experiments prove that the basic model and the improved model of the method realize effective performance improvement compared with the existing method. The improved model realizes the best SR performance in the comparison algorithm under the condition of the same parameter quantity as the basic model, and provides a new thought in the field of image superdivision.
Disclosure of Invention
Aiming at the defect of the CNN reconstruction model in capturing non-local self-similarity information of an image, the invention provides an image super-resolution method and system, and particularly relates to a non-local image super-resolution method, system, medium, device and processing terminal based on block matching and a 3D convolutional neural network.
The invention is realized in such a way that an image super-resolution method comprises the following steps: combining with the non-local self-similarity of the images, utilizing a 3D convolutional neural network 3DCNN to process the image SR, and providing a non-local super-resolution method based on the 3 DCNN; modeling non-local similarity by directly utilizing 3DCNN, and extracting non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and (3) designing a 3D convolutional neural network in the 3DCNN, and providing an improved model based on the RNN, so that the basic model becomes a special case of the improved model.
Further, the 3D convolutional neural network includes three parts, respectively:
1) A single layer 3D convolution layer and ReLU combination to transform the input image block set into a feature space;
2) A feature subnetwork for extracting local and non-local information;
3) A single layer 3D convolution layer for outputting a set of residual image blocks; the 3D network output of the design is the sum between the set of residual image blocks and the set of input image blocks.
Further, the method for super resolution of the image comprises the following steps:
Step one, comparing a traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network as a training model of an image, wherein the third-dimensional information of the model uses non-local similarity;
step two, extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
step three, reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of the image, and preprocessing for block matching;
Extracting non-local similar image blocks from the LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set serving as a characterization non-local image block;
And fifthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
In a first step, the 3DCNN base model based on the 8-layer full convolution network is designed compared with the traditional super resolution model, and the third dimension information of the model uses non-local similarity properties, including:
The first two dimensions of the three-dimensional image block set hold image 2D local information, while the third dimension holds non-local information. For such three-dimensional data, the SR image will be more efficiently obtained using a more powerful 3d convolution.
The 2D convolution extracts features in the local neighborhood of the previous layer of feature map, unlike the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more K d dimension, so the size of the 3D convolution kernel is (K h,Kw,Kd). And performing related operation on the sliding window and the value in the window every time to obtain one value in the output 3D image. Wherein the convolution kernel depth size is less than the input layer depth, the kernel size < the channel size; the 3D convolution kernel can be shifted in all three directions, i.e. height, width and channel of the image, and at each position, the multiplication and addition element by element will result in a value, which for such three-dimensional data processing can enhance the ability to obtain data information. Wherein the 2D convolution can be expressed by the formula:
Wherein, The j-th feature position, which may represent the current l, is a value of (x, y), where i represents the position coordinates of the previous layer feature map set,Values expressed as the (P, Q) position at the kernel when the connection is the ith feature map, P l and Q l are the sizes of the convolution kernels. The 3D convolution is defined as:
Wherein, Representing the value when the j-th 3D position of the current l layer is (x, y, z), R l represents the third dimensional scale information of the 3D convolution.
For a three-dimensional image block set, 3D convolution refers to the calculation of a target pixel value that is a jointly weighted average of local and non-local pixel points over a small three-dimensional receptive field range. With the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved.
Local and non-local information of the image block set is captured by a full convolution 3D network over the network design using the 3D convolution layer. The whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted by convolution kernels, residual learning strategies are employed to alleviate gradient vanishing or explosion problems and more network layers can be stacked or each layer can be made to extract more features, i.e. input and output hops of the network. In general, the size of data is reduced after convolution operation, and the consistency of the size of network input and output is ensured by zero padding operation on each convolution layer in the training process. Whereas for the activation function, the proposed method chooses a ReLU, then the 3D convolution and nonlinear operation of the first layer can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
where H l-1 denotes the output of the previous layer, i.e., the input of the current layer. W l*Hl-1 is defined as the 3D convolution operation of the current layer. The purpose of the 3DCNN design is to learn the mapping F (Y) between LR image block set Y and HR image block set X as close as possible to X. The 3DCNN takes the LR set of tiles as input (i.e., H 0 =y), then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
The depth of the network is set to be D, namely, a D-layer convolution layer exists. In the base model of 3DCNN, a total of 8 convolutional layers are included, i.e., d=8. The output F (Y) of the entire network can be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
In the third step, the step of performing PCA processing on the partial data set to reduce the dimension, and simultaneously removing noise and redundant information while retaining original information of the image includes:
(1) Introducing PCA to reduce the dimension of a part of data set, selecting part of data in the data set, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, taking similar characteristics of images, and removing image noise to ensure non-local similarity. Meanwhile, the original data of part of the data set is reserved as the other part of the data set, so that the original information and the local information of the image are not lost.
For n x m image samples, n samples are total, each row of m dimensions, the m dimension is reached to the m dimension for the samples without m dimension through interpolation, and the real matrix of p x n can be decomposed into:
X=U∑VT;
The dimension of the orthogonal array U is n×m, the dimension of the orthogonal array V is m×m, and the orthogonal array satisfies: UU T=VT v=1; sigma is a diagonal matrix of m x m. Dividing sigma into r columns, denoted as sigma r; obtaining a reduced-dimension data sample Y r by using U and V:
Yr=U∑r;
(2) The BM algorithm is used to extract non-local blocks of the LR image. First, the LR image of the data sample Y r is broken up into an image set of size p×p Where n= (d h-p+1)×(dw -p+1) represents the number of broken up blocks of the whole image. For the i-th target image block p (i) in Ω p, it can be calculated as:
p(i)=R(i)ybic;
Wherein the method comprises the steps of D w is a binary sparse matrix representing the extracted i-th image block.
(3) The BM algorithm uses a search window of s x s size based on euclidean distance to collect a series of close pictures. Finally, the pictures collected by each window are stacked together to serve as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, a data blocks are selected from the reduced dimension data set Y r, then b number of blocks are selected from the original data set, where a+b=k. For the generated corresponding HR image x, processing is performed in the same way to form an HR image block setMeanwhile, collecting similar image blocks in HR images needs to guarantee non-local similar positions in the interpolated image, rather than to recalculate the distance search. The 3DCNN represents non-local similar image blocks in the form of image block sets, which can well preserve local and non-local similar information in images.
Further, in the fourth step, the extracting a non-local similar image block from the LR image data set by using a conventional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image block, includes:
The constructed 3DCNN is output as a series of enhanced image block sets All enhancement image block sets in Ω Y can be represented as/>, after vectorizationForming an image block set; final SR imageByIs defined as:
Taking into account that Is repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values:
Wherein, Is a constant column vector with an element value of 1. Finally, SR imageCan be calculated as:
another object of the present invention is to provide an image super-resolution system applying the method of image super-resolution, the system comprising:
The 3DCNN basic model construction module is used for comparing the traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, and using non-local similarity for the third-dimensional information of the model;
the image non-local characteristic extraction module is used for extracting the non-local characteristic of the image according to the mixed residual error unit and the residual error network;
the dimension reduction processing module is used for reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of the image, and preprocessing for block matching;
The non-local similar image block extraction module is used for extracting the non-local similar image blocks from the LR image data set by adopting a traditional block matching method;
The non-local similar image block characterization module is used for stacking the image blocks to form a three-dimensional image block set as a characterization of the non-local image blocks;
and the SR image output module is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
Comparing the traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, wherein the third-dimensional information of the model uses non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of an image, and preprocessing for block matching;
Extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set serving as a characterization non-local image block;
And outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
Comparing the traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, wherein the third-dimensional information of the model uses non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of an image, and preprocessing for block matching;
Extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set serving as a characterization non-local image block;
And outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Another object of the present invention is to provide an information data processing terminal for implementing the above-mentioned image super-resolution system.
By combining all the technical schemes, the invention has the advantages and positive effects that: aiming at the problem of insufficient extraction of non-local similar information of images in a CNN model, the method for super-resolution of images provided by the invention carries out intensive research, and carries out the following two works from different angles: from the data layer, a non-local SR method based on block matching and a 3D convolutional neural network is provided. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and form a three-dimensional image block set. Based on the three-dimensional image block set, a 3D convolutional neural network is constructed and trained for extracting local and non-local similar information, and the mapping relation between the LR-HR image block sets is learned. Finally, the method reconstructs HR images from the predicted image block set. From the network structure, an image SR model based on a non-local neural network is proposed. The method reforms the existing CNN-based non-local operation, combines the CNN-based non-local operation with the traditional CNN structure, and provides a mixed residual unit. The method constructs a loop network for extracting local and non-local information of the image in LR space with the hybrid residual unit as the loop unit. And finally, converting the features into an HR space by utilizing an upsampling network and realizing the reconstruction of HR images. Experimental results show that non-local similar information in an image can be effectively captured by utilizing non-local operation, and SR reconstruction performance is improved. Compared with the existing CNN model, the proposed non-local residual error network has obvious reconstruction advantages and is outstanding in image scenes with rich structural information.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for image super resolution according to an embodiment of the present invention.
FIG. 2 is a block diagram of a system architecture for super resolution of an image provided by an embodiment of the present invention;
In the figure: 1. a 3DCNN basic model building module; 2. an image non-local characteristic extraction module; 3. the dimension reduction processing module; 4. a non-local similar image block extraction module; 5. a non-local similar image block characterization module; 6. and an SR image output module.
Fig. 3 is a schematic diagram of a network model according to an embodiment of the present invention.
Fig. 4 is a comparison of the operation of a 2D convolution and a 3D convolution provided by an embodiment of the present invention.
Fig. 4 (a) is a schematic diagram of a 2D convolution provided by an embodiment of the present disclosure.
Fig. 4 (b) is a schematic diagram of a 3D convolution provided by an embodiment of the present disclosure.
Fig. 5 is a schematic diagram showing the comparison of the ratio r of the selected data set as the PCA processing part and the non-PCA processing part with the original data result.
FIG. 6 is a comparison of data sets made by different methods provided by embodiments of the present invention.
Fig. 7 is a schematic diagram of an introduced residual network and an unincorporated residual model provided by an embodiment of the present invention.
Fig. 8 (a) is a master image (PSNR/SSIM) provided by an embodiment of the present invention.
FIG. 8 (b) is a schematic diagram of Bicubic (29.55/0.8432) provided by an embodiment of the present invention.
FIG. 8 (c) is a schematic diagram of ScSR (30.77/0.8749) provided by an embodiment of the present invention.
FIG. 8 (d) is a schematic diagram of SelfExSR (31.18/0.8859) provided by an embodiment of the present invention.
FIG. 8 (e) is a schematic diagram of SRCNN (31.36/0.8882) provided by an embodiment of the present invention.
FIG. 8 (f) is a schematic diagram of FSRCNN (31.50/0.8909) provided by an embodiment of the present invention.
FIG. 8 (g) is a schematic illustration of the non-incorporated pca our (32.89/0.9106) provided by an embodiment of the present invention.
FIG. 8 (h) is a schematic illustration of an introduction pca our (33.01/0.9109) provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a method and a system for super resolution of an image, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for image super-resolution provided by the embodiment of the invention comprises the following steps:
S101, comparing a traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, wherein the third-dimensional information of the model uses non-local similarity;
s102, extracting non-local characteristics of an image according to a mixed residual error unit and a residual error network;
S103, performing PCA processing on part of the data set to reduce the dimension, removing noise and redundant information, and simultaneously retaining original information of the image, and preprocessing for block matching;
s104, extracting non-local similar image blocks from the LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set serving as a characterization non-local image block;
s105, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
As shown in fig. 2, the system for super resolution of an image provided by the embodiment of the invention includes:
The 3DCNN basic model building module 1 is used for comparing the traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, and using non-local similarity for the third-dimensional information of the model;
The image non-local characteristic extraction module 2 is used for extracting the non-local characteristic of the image according to the mixed residual error unit and the residual error network;
the dimension reduction processing module 3 is used for reducing the dimension of a part of data set through PCA processing, and simultaneously removing noise and redundant information while retaining the original information of the image, and preprocessing for block matching;
A non-local similar image block extraction module 4, configured to extract a non-local similar image block from the LR image dataset by using a conventional block matching method;
A non-local similar image block characterization module 5, configured to stack image blocks to form a three-dimensional image block set as a characterization of the non-local image blocks;
And the SR image output module 6 is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
The technical scheme of the invention is further described below by combining the embodiments.
Example 1
Aiming at the problem of insufficient extraction of non-local similar information of images in a CNN model, the invention carries out intensive research and develops the following two works from different angles: from the data layer, a non-local SR method based on block matching and a 3D convolutional neural network is provided. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and form a three-dimensional image block set. Based on the three-dimensional image block set, a 3D convolutional neural network is constructed and trained for extracting local and non-local similar information, and the mapping relation between the LR-HR image block sets is learned. Finally, the method reconstructs HR images from the predicted image block set. From the network structure, an image SR model based on a non-local neural network is proposed. The method reforms the existing CNN-based non-local operation, combines the CNN-based non-local operation with the traditional CNN structure, and provides a mixed residual unit. The method constructs a loop network for extracting local and non-local information of the image in LR space with the hybrid residual unit as the loop unit. And finally, converting the features into an HR space by utilizing an upsampling network and realizing the reconstruction of HR images. Experimental results show that non-local similar information in an image can be effectively captured by utilizing non-local operation, and SR reconstruction performance is improved. Compared with the existing CNN model, the proposed non-local residual error network has obvious reconstruction advantages and is outstanding in image scenes with rich structural information.
Aiming at the defect of the CNN reconstruction model in capturing non-local self-similarity information of an image, the invention provides a non-local image super-resolution model based on block matching and a 3D convolutional neural network and an image super-resolution model based on the non-local convolutional neural network.
Selecting partial data in the data set, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, taking similar characteristics of images, and removing image noise to ensure non-local similarity. Meanwhile, the original data of part of the data set is reserved as the other part of the data set, so that the original information and the local information of the image are not lost.
Second, non-locally similar image blocks are extracted from the LR image dataset according to a formula and characterized. The method for extracting adopts a traditional block matching method to extract non-local similar blocks, and finally stacks the image blocks to form a three-dimensional image block set which is used for representing the non-local image blocks.
Third, a 3D convolutional neural network is designed to extract image block set information and enhance these non-local image blocks. As shown in fig. 3, the constructed 3D network includes three parts, respectively: 1) A single layer 3D convolution layer and ReLU combination to transform the input image block set into a feature space; 2) A feature subnetwork for extracting local and non-local information; 3) A single layer 3D convolution layer to output a set of residual image blocks. The 3D network output of the design is the sum between the set of residual image blocks and the set of input image blocks.
And fourthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Example 2
The super-resolution method provided by the embodiment of the invention comprises the following steps: in combination with the non-local self-similarity of the images, the 3D convolutional neural network (3 DCNN) is utilized for processing the image SR for the first time, and a non-local super-resolution method based on the 3DCNN is provided. According to the method, 3DCNN is directly utilized to model non-local similarity, and non-local similarity information of natural images is extracted. A 3DCNN base model (Basemodel) based on an 8-layer full convolutional network was constructed. Then, on the basis, the 3D network design in the 3DCNN is further researched, and an RNN-based improved model is provided, so that the basic model becomes a special case of the improved model.
A network model schematic diagram provided by the embodiment of the invention is shown in FIG. 3.
The super-resolution method provided by the embodiment of the invention comprises the following steps:
Step one, most of the SR methods based on CNN are based on the traditional network structure, the non-local self-similarity property is not fully utilized, a 3DCNN basic model based on an 8-layer full convolution network is designed in comparison with the traditional super-resolution model, and the third dimension information of the model is of the non-local similarity property.
And step two, extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network.
And thirdly, firstly, reducing the dimension of a part of data set through PCA processing, wherein the dimension reduction can enable the data set to be more easy to use, reduce the calculation cost of an algorithm, simultaneously remove noise and redundant information, and simultaneously retain the original information of an image, and preprocess the next block matching.
And step four, extracting non-local similar image blocks from the LR image data set and characterizing the non-local similar image blocks. The method for extracting adopts a traditional block matching method to extract non-local similar blocks, and finally stacks the image blocks to form a three-dimensional image block set which is used for representing the non-local image blocks.
And fifthly, outputting the trained image block set in the 3DCNN, and outputting the reconstructed SR image.
The third step comprises the following steps:
(1) Introducing PCA to reduce the dimension of a part of data set, selecting part of data in the data set, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, taking similar characteristics of images, and removing image noise to ensure non-local similarity. Meanwhile, the original data of part of the data set is reserved as the other part of the data set, so that the original information and the local information of the image are not lost.
For n x m image samples, n samples are total, each row of m dimensions, the m dimension is reached to the m dimension for the samples without m dimension through interpolation, and the real matrix of p x n can be decomposed into:
X=U∑VT;
The dimension of the orthogonal array U is n×m, the dimension of the orthogonal array V is m×m (orthogonal array satisfies: UU T=VT v=1), and Σ is a diagonal array of m×m. Next, Σ is divided into r columns, denoted as Σ r; the reduced data sample Y r can be obtained using U and V:
Yr=U∑r;
a conventional BM algorithm is then used to extract non-local blocks of the LR image. First, the LR image of the data sample Y r is broken up into an image set of size p×p Where n= (d h-p+1)×(dw -p+1) represents the number of broken up blocks of the whole image. For the i-th target image block p (i) in Ω p, it can be calculated as:
p(i)=R(i)ybic;
Wherein the method comprises the steps of D w is a binary sparse matrix representing the extracted i-th image block.
2) The BM algorithm then uses a search window of s x s size based on the euclidean distance to collect a series of close pictures. Finally, the pictures collected by each window are stacked together to serve as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, in order to make full use of the non-local and local properties of the image, a data blocks are selected from the reduced-dimension data set Y r, then b data blocks are selected from the original data set, where a+b=k. For the generated corresponding HR image x, processing is performed in the same way to form an HR image block setMeanwhile, collecting similar image blocks in HR images needs to guarantee non-local similar positions in the interpolated image, rather than to recalculate the distance search. The 3DCNN represents non-local similar image blocks in the form of image block sets, which can well preserve local and non-local similar information in images.
The first step comprises the following steps:
Convolution most refer to 2D convolution, which is good at processing two-dimensional data information. But for three-dimensional data (such as video) 3D convolution can be more efficient to process. In the proposed method the first two dimensions of the three-dimensional image block set hold image 2D local information, while the third dimension holds non-local information. For such three-dimensional data, the SR image will be more efficiently obtained using a more powerful 3d convolution.
The 2D convolution extracts features in the local neighborhood of the previous layer of feature map, unlike the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more K d dimension, so the size of the 3D convolution kernel is (K h,Kw,Kd). And performing related operation on the sliding window and the value in the window every time to obtain one value in the output 3D image. Where the convolution kernel depth size is smaller than the input layer depth (kernel size < channel size). The 3D convolution kernel can be shifted in all three directions (height, width, channel of the image) and at each position, the multiplication and addition from element to element will result in a value, which for such three-dimensional data processing can enhance the ability to obtain data information. Wherein the 2D convolution can be expressed by the formula:
Wherein, The j-th feature position, which may represent the current l, is a value of (x, y), where i represents the position coordinates of the previous layer feature map set,Values expressed as the (P, Q) position at the kernel when the connection is the ith feature map, P l and Q l are the sizes of the convolution kernels. Similar to the 2D convolution formula, the 3D convolution can be defined as:
Wherein, Representing the value when the j-th 3D position of the current l layer is (x, y, z), R l represents the third dimensional scale information of the 3D convolution. A comparison of the 2D convolution kernel 3D convolution operation is shown in fig. 4.
For a three-dimensional image block set, 3D convolution refers to the calculation of a target pixel value that is a jointly weighted average of local and non-local pixel points over a small three-dimensional receptive field range. With the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved.
Local and non-local information of the image block set is captured by a full convolution 3D network on the network design using a 3D convolution layer (Conv 3D for short). The whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted by convolution kernels,
Residual learning strategies are employed to alleviate the gradient vanishing or explosion problem and may be stacked to use more network layers or have each layer extract more features, i.e. input and output hops of the network. In general, the size of data is reduced after convolution operation, and the consistency of the size of network input and output is ensured by zero padding operation on each convolution layer in the training process. Whereas for the activation function, the proposed method chooses a ReLU, then the 3D convolution and nonlinear operation of the first layer can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
where H l-1 denotes the output of the previous layer, i.e., the input of the current layer. W l*Hl-1 is defined as the 3D convolution operation of the current layer. The purpose of the 3DCNN design is to learn the mapping F (Y) between LR image block set Y and HR image block set X as close as possible to X. The 3DCNN takes the LR set of tiles as input (i.e., H 0 =y), then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
The depth of the network is set to be D, namely, a D-layer convolution layer exists. In the base model of 3DCNN, a total of 8 convolutional layers are included, i.e., d=8. The output F (Y) of the entire network can be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
The fourth step comprises the following steps:
The final step of the proposed method is to reconstruct the SR image from the enhanced image block. The constructed 3DCNN is output as a series of enhanced image block sets All enhancement image block sets in Ω Y can be represented as vectorizedA set of image blocks is formed. Final SR imageByIs restored. It can be defined as:
Taking into account that Is repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values: /(I)
Wherein,Is a constant column vector with an element value of 1. Finally, SR imageCan be calculated as:
Example 3
1. Model arrangement
1) Training set and test set
A common 291 image set was used, consisting of 91 images made by Yang et al and 200 images from BSD. This dataset is widely used for training of SR models. The data set obtained by PCA processing the 291 images and the original data set of the images which are not processed by PCA are used as a required LR image block set. The invention selects common Set5, set14 and BSD100 test sets to evaluate the SR performance of the algorithm, has rich and various image contents including human beings, animals, plants, natural landscapes, buildings and the like, and adopts PSNR and SSIM as objective indexes for evaluating the SR performance.
2) Training arrangement
In a designed 3D convolutional network, each convolutional layer, except for the output layer, uses 64 3D convolutional kernels of size 3 x 3 to extract features. All models were developed and implemented on a laboratory PC based on Tensorflow deep learning framework. The software adopts Windows10 operating system, python3.7 and Tensorflow2.0, and the hardware adopts Intel Kuui 7-3770CPU, english to reach GTX1080ti display card and 16G memory.
3) Model arrangement
The weights of the convolution layers are initialized by the method proposed by He et al, and the initial method is verified to be better combined with the ReLU. The optimizer employs Adam optimizer to train all experimental models, with parameters set as: β1=0.9, β2=0.999, epsilon=10 -8. The optimizer initial learning was set to 3 x 10 -4 and the learning rate was reduced by a factor of 10 per training 30 rounds. The data small lot size during training is set to 512.
On the premise of the basic model, a residual model is added, and residual elements mainly comprise two functions of shortcut connection and identity mapping, wherein the shortcut connection enables residual to become possible, and the identity mapping can enable a network to become deeper through activation of functions and jump connection. The 3D network adopts an end-to-end training mode, and the loss function is used for guiding the updating of model parameters.
Minimizing the l 2 loss function is equivalent to maximizing the PSNR on the image SR task. However, some work in recent years has shown that the l 1 loss function has great potential. In the proposed method, the i 1 loss function is expressed as a model loss function as:
l(F(Y),X)=||F(Y)-X||;
The method therefore selects l 1 loss functions as model loss functions. The weight parameters are optimized by continuously training the model until the loss function value is stabilized in the interval [0,0.0005].
2. Experiment contrast index
Judging the image reconstruction performance by comparing PSNR and SSIM of the image, wherein for image reconstruction, the PSNR is determined by the maximum pixel value and the square error of the image; given a pre-compression image x of m x n, the reconstructed image after compression is defined as the mean square error and PSNR:
The idea of SSIM is to measure the structural similarity between two images, rather than the inter-pixel differences like PSNR. The basic assumption is that the human eye is more sensitive to changes in the image structure. The SSIM between x and the reconstructed image after compression being y can be defined as:
Where u x or u y represents the pixel mean of the original image x or the compressed reconstructed image y, σ x or σ y represents the pixel standard deviation of x or y, σ xy represents the covariance between σ x and σ y, and c 1 and c 2 represent a constant perturbation to prevent instability.
3. Experimental analysis
1) Non-local block set
Firstly, decomposing the image blocks with the step length of 100 into 2750 image blocks with the size of 100 x 100 in 291 atlas, dividing the image blocks into 2 parts according to a certain proportion r (0 < r < 1), performing PCA dimension reduction processing on one part of the image blocks to extract similar characteristic atlas so as to ensure that non-local information is not doped with noise, and ensuring that the local information is not lost by the image blocks which are not subjected to PCA processing on the other part of the image blocks. The collection is done with the KNN algorithm for all the 2-part block sets. Finally, the 2 partial block sets are combined together, and 275 ten thousand image block sets are taken as training sets of the model. Therefore, according to this method, the following parameters need to be considered, including the window size s of the KNN search, the size p of the scattered image block, the non-local scale k and the proportion r of the dimension reduction processing data and the selected data. Theoretically, the larger the search window s, the more accurate the search, and the larger the non-local scale k, the more information is extracted. However, the increase in parameters is too costly for a 3DCNN data calculation, and the window size 31 may be selected to be most suitable based on previous experience by others. P, k, s are then determined experimentally.
Experimental model the model discussed above was chosen.
Table 1 shows the test results of PSNR at SET5 and SET14 with 3 times SR for different search windows s and non-local scales k, three parameter comparison schemes 5,7,9 were chosen in consideration of the calculation cost experiment to determine the appropriate search window s and non-local scale k.
Table 1 image block set and performance table
The results as shown in table 1 show that a larger size p brings higher PSNR values and that increasing the non-local scale k improves SR performance. Finally, the invention selects the setting: p= 9,k =9, i.e. a non-local image block set of size 9 x 9 is used in the experiments that follow.
In order to further verify the advantages of the non-local image block set selected after introducing the PCA dimension reduction, the following description is made by designing several groups of comparison experiments, and the advantages of the PCA dimension reduction on the non-local similarity block selection are verified by comparing the dimension reduction processing to the ratio r of the data set to the selected data set. Wherein r= 0,0.1,0.2,0.3,0.4,0.5 is chosen respectively. And r represents the proportion r of the image subjected to PCA processing in the data set, and the comparison scheme proves that the non-local similarity blocks of the image can be better extracted when partial data in the data is subjected to PCA processing. Experiments prove that when PCA dimension reduction is performed on the verification set, the optimal dimension reduction dimension is removed when the influence of redundant information is larger than that caused by dimension reduction. The experiment adopts PCA to remove redundancy of a data set, PCA reduces the dimension of high-latitude space data to low-dimension space through linear mapping, and the data variance of the low-latitude space needs to be made as large as possible, so that the dimension can be effectively reduced under the condition that the original data point relationship is kept unchanged, and the relationship between the dimension and the precision after the dimension reduction is obtained by counting the variance proportion of each main component based on the principle.
Table 2 PCA similarity accuracy vs. different dimensions
Dimension(s) | 128 | 64 | 32 | 16 | 8 |
Precision of | 0.9133 | 0.9141 | 0.9237 | 0.9049 | 0.8188 |
It can be seen from table 2 that the selected dimension is too large, which is not beneficial to removing redundant information of the image, and that the PCA dimension reduction loses more information than the redundant information when the selected dimension is too small, so dimension 32 is selected as the dimension of dimension reduction in the following experiment.
2) Experimental contrast plot of data set as PCA processed image and as unprocessed image proportion r
After determining the dimension of the dimension reduction, next experiments verify that the effect is best when the dimension reduction data is in a proportion of the image data, and at 3DCNN, the non-local similarity cannot be considered only, and local information needs to be extracted as much as possible. The data selected as PCA ensures the limitation of the experiment through random selection, and meanwhile, a plurality of pairs of experiments are performed through random selection for a plurality of times and the result is subjected to average processing, so that the proportion r of the data subjected to PCA processing and the data not subjected to PCA processing needs to be proved through the experiment.
As can be seen from fig. 5, when the ratio of the portion selected as the PCA processing portion is 0.15, the experimental result is the best, and the non-local information of the image can be effectively preserved while denoising. As the experimental effect becomes worse as the selected portion becomes larger, the local information of the image is lost more seriously. The PCA treatment portion ratio of 0.15 was selected for the next experiment.
3) Making contrast graphs under data sets by different methods
Next, to demonstrate the effectiveness of the non-local similarity application design in 3DCNN and the effectiveness of the selected PCA treatment, a comparative experiment was performed. Firstly, the training set is reproduced, the image blocks in the comparison image block set are randomly selected and similar image blocks which are searched through the KNN without PCA processing, and then the similar image blocks which are searched through the KNN after PCA processing are compared with the similar image blocks which are searched through the KNN without PCA processing.
As can be seen from fig. 6, the random scatter data set is significantly inferior to the processed data set when extracting non-local information of the image, and the performance of the image block selected by the same KNN algorithm, which is processed by the PCA, is superior to that of the non-processed data set when the training round is moderate, and the performance of the processed data set is marginally superior to that of the non-processed data set when the training round is large. The method has the advantages that the image characteristics of the data set processed by PCA can be effectively extracted, and the interference of redundancy on the data set is reduced. The model can extract enough non-local information from the data set that has not been PCA-processed as the training runs are gradually increased, and the advantages of the PCA-processed data set portions are relatively less obvious.
4) Contrast map after introduction of residual block
In order to demonstrate the influence of the residual learning strategy, the jump connection between the input and the output of the basic model is removed, and the other model is retrained by adopting the same training mode.
In the basic model, a global residual error learning strategy is adopted, namely, a residual error image block set is adopted for 3D network learning. As shown in fig. 7, the 2PSNR comparison curve shows that the residual learning strategy achieves faster and more stable model convergence effect and higher reconstruction performance in the proposed method.
5) Image self-fusion
The final step is to reconstruct SR images from the enhanced image blocks, rotate one image at multiple angles to obtain a group of images, send the group of images into the model to obtain the median value of the result, and obtain the final superscore result. The information fusion of each part of the image is further enhanced, so that the image characterization obtains a better effect.
3.4.3 Reconstruction Performance contrast
In this section, the present invention compares the experimental models of the present invention to existing SR algorithms, where they include ScSR based on dictionary learning, selfExSR based on internal instance self-similarity, SRCNN and FSRCNN based on CNN. Tables 3 and 4 summarize the PSNR and SSIM performance evaluations of the proposed method and comparative algorithm on Set5 and BSD100 test sets, respectively. The table shows that the experimental model performance is superior to other comparison SR models, the PCA dimension reduction processing is slightly superior to the PCA dimension reduction model which is not introduced in PSNR, and the PCA dimension reduction processing is slightly weaker than the model which is not introduced in SSIM. This illustrates that there is some loss of information on the overall structure of the image after PCA processing, but it is helpful for performance improvement for non-local denoising. The BSD100 is more applicable and convincing because of relatively abundant image types and complex structure in the test set. In the quantitative evaluation of the BSD100 test set, the PSNR index improvement of different images can be found to be greatly changed, and the images all contain rich texture information and show stronger structural similarity. The improvement of the performance directly explains that the proposed method has advantages in mining the structural similarity information of the image.
TABLE 3 PSNR/SSIM evaluation results of different algorithms on Set5
TABLE 4 PSNR/SSIM evaluation results of different algorithms on BSD100
Next, fig. 8 shows a reconstruction map of the proposed method and other SR algorithms. The BSD100 test set is selected from the pictures, and the method provided by the invention shows a more vivid structural outline visually. In particular in animal images, the proposed method reconstructs a clearer animal image mainly because the proposed method fully considers the very strong non-local similarity present in animal images. Through extraction and enhancement of the similar information, the proposed method shows more outstanding SR reconstruction performance for the scene
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.
Claims (7)
1. A method of image super resolution, the method of image super resolution comprising: combining with the non-local self-similarity of the images, utilizing a 3D convolutional neural network 3DCNN to process the image SR, and providing a non-local super-resolution method based on the 3 DCNN; modeling non-local similarity by directly utilizing 3DCNN, and extracting non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; designing a 3D convolutional neural network in a 3DCNN, and providing an improved model based on the RNN;
The 3D convolutional neural network comprises three parts, namely:
1) A single layer 3D convolution layer and ReLU combination to transform the input image block set into a feature space;
2) A feature subnetwork for extracting local and non-local information;
3) A single layer 3D convolution layer for outputting a set of residual image blocks; the 3D network output is the sum between the residual image block set and the input image block set;
the method for super-resolution of the image comprises the following steps:
step one, comparing a traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, wherein the third-dimensional information of the model uses non-local similarity;
step two, extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
step three, reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of the image, and preprocessing for block matching;
Extracting non-local similar image blocks from the LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set serving as a characterization non-local image block;
step five, outputting a trained image block set from the 3DCNN, and outputting a reconstructed SR image;
In the first step, the 3DCNN base model based on the 8-layer full convolution network is designed by comparing the traditional super resolution model, and the third dimension information of the model uses non-local similarity property, which comprises the following steps:
The first two dimensions of the three-dimensional image block set hold image 2D local information, while the third dimension holds non-local information; for such three-dimensional data, the SR image can be obtained more effectively by adopting 3d convolution with stronger capability;
The 2D convolution extracts the features in the local neighborhood of the previous layer of feature map, and is different from the 2D convolution in that the input image is one more depth dimension, and the convolution kernel is one more K d dimension, so that the size of the 3D convolution kernel is (K h,Kw,Kd); performing related operation on the sliding window and the value in the window each time to obtain one value in the output 3D image; wherein the convolution kernel depth size is less than the input layer depth, the kernel size < the channel size; the 3D convolution kernel can move in all three directions, namely the height, the width and the channel of the image, and multiplication and addition of elements can obtain a numerical value at each position, so that the capability of acquiring data information is enhanced for the three-dimensional data processing; wherein the 2D convolution can be expressed by the formula:
Wherein, The j-th feature position, which may represent the current l, is a value of (x, y), where i represents the position coordinates of the previous layer feature map set,Values expressed as the (P, Q) position of the kernel when the connection is the ith feature map, P l and Q l are the sizes of the convolution kernels; the 3D convolution is defined as:
Wherein, Representing a value when the j-th 3D position of the current layer is (x, y, z), and R l represents third dimensional scale information of the 3D convolution;
For a three-dimensional image block set, 3D convolution means that in a small three-dimensional receptive field range, the calculated target pixel value is a joint weighted average of local and non-local pixel points; with the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved;
Capturing local and non-local information of the image block set by using a 3D convolution layer through a full convolution 3D network on network design; the whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted through convolution kernels, residual error learning strategies are adopted to relieve gradient disappearance or explosion problems, and more network layers can be stacked to be used or each layer can extract more features, namely input and output jump connection of the network; the data size is reduced after the convolution operation, and the consistency of the size of the input and the output of the network is ensured through zero padding operation on each layer of convolution layer in the training process; whereas for the activation function, the proposed method chooses a ReLU, then the 3D convolution and nonlinear operation of the first layer can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
Wherein H l-1 represents the output of the previous layer, i.e., the input of the current layer; defining W l*Hl-1 as the 3D convolution operation of the current layer; the purpose of the 3DCNN design is to learn the mapping F (Y) between LR and HR image block sets Y and X so that it is as close as possible to X; the 3DCNN takes the LR set of tiles as input, i.e. H 0 =y, then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
Setting the depth of the network as D, namely, a D-layer convolution layer is arranged; in the base model of 3DCNN, a total of 8 convolutional layers are included, i.e., d=8; the output F (Y) of the entire network can be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
2. the method of super resolution of an image as claimed in claim 1, wherein in the third step, the step of subjecting the partial data set to PCA processing to reduce the dimension while retaining original information of the image while removing noise and redundant information includes:
(1) Introducing PCA to reduce the dimension of a part of data set, selecting part of data in the data set, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, taking image similar characteristics, and removing image noise to ensure non-local similarity; meanwhile, original data of part of the data set is reserved as another part of the data set, so that original information and local information of the image are not lost;
For n x m image samples, n samples are total, each row of m dimensions, the m dimension is reached to the m dimension for the samples without m dimension through interpolation, and the real matrix of p x n can be decomposed into:
X=UΣVT;
The dimension of the orthogonal array U is n×m, the dimension of the orthogonal array V is m×m, and the orthogonal array satisfies: UU T=VT v=1; Σ is a diagonal matrix of m×m; dividing Σ into r columns, denoted Σ r; obtaining a reduced-dimension data sample Y r by using U and V:
Yr=UΣr;
(2) Extracting non-local blocks of the LR image by adopting a BM algorithm; first, the LR image of the data sample Y r is broken up into an image set of size p×p Wherein n= (d h-p+1)×(dw -p+1) represents the number of broken blocks of the whole image; for the i-th target image block p (i) in Ω p, it can be calculated as:
p(i)=R(i)ybic;
Wherein the method comprises the steps of D w is a binary sparse matrix, representing the i-th image block extracted;
(3) The BM algorithm uses a search window of s.s size based on Euclidean distance to collect a series of similar pictures; finally, the pictures collected by each window are overlapped together to be used as an LR image set Y required by network training; for each element in the image set Y consisting of K image blocks, selecting a data blocks from the dimension reduction data set Y r, and then selecting b number of blocks from the original data set, wherein a+b=k; for the generated corresponding HR image x, processing is performed in the same way to form an HR image block set Meanwhile, collecting similar image blocks in the HR image needs to ensure non-local similar positions in the interpolated image, instead of recalculating the distance search; the 3DCNN represents non-local similar image blocks in the form of image block sets, which can well preserve local and non-local similar information in images.
3. The method of image super-resolution as claimed in claim 1, wherein in step four, the extracting non-local similar image blocks from the LR image data set using the conventional block matching method and stacking the image blocks to form a three-dimensional image block set as characterizing the non-local image blocks includes:
The constructed 3DCNN is output as a series of enhanced image block sets All enhancement image block sets in Ω Y can be represented as/>, after vectorizationForming an image block set; final SR imageFrom the following componentsIs defined as:
Taking into account that Is repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values:
Wherein, Is a constant column vector with an element value of 1; finally, SR imageCan be calculated as:
4. An image super-resolution system for implementing the method of image super-resolution as claimed in any one of claims 1 to 3, characterized in that the image super-resolution system comprises:
The 3DCNN basic model construction module is used for comparing the traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network, and using non-local similarity for the third-dimensional information of the model;
the image non-local characteristic extraction module is used for extracting the non-local characteristic of the image according to the mixed residual error unit and the residual error network;
the dimension reduction processing module is used for reducing the dimension of a part of data set through PCA processing, simultaneously removing noise and redundant information, and simultaneously keeping the original information of the image, and preprocessing for block matching;
The non-local similar image block extraction module is used for extracting the non-local similar image blocks from the LR image data set by adopting a traditional block matching method;
The non-local similar image block characterization module is used for stacking the image blocks to form a three-dimensional image block set as a characterization of the non-local image blocks;
and the SR image output module is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
5. A computer device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the method of image super resolution according to any one of claims 1-3.
6. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the method of image super resolution of any one of claims 1 to 3.
7. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the system of super resolution of an image as claimed in claim 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110310875.1A CN113191947B (en) | 2021-03-23 | 2021-03-23 | Image super-resolution method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110310875.1A CN113191947B (en) | 2021-03-23 | 2021-03-23 | Image super-resolution method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113191947A CN113191947A (en) | 2021-07-30 |
CN113191947B true CN113191947B (en) | 2024-05-14 |
Family
ID=76973652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110310875.1A Active CN113191947B (en) | 2021-03-23 | 2021-03-23 | Image super-resolution method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113191947B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505860B (en) * | 2021-09-07 | 2021-12-31 | 天津所托瑞安汽车科技有限公司 | Screening method and device for blind area detection training set, server and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952228A (en) * | 2017-03-10 | 2017-07-14 | 北京工业大学 | The super resolution ratio reconstruction method of single image based on the non local self-similarity of image |
WO2020056791A1 (en) * | 2018-09-21 | 2020-03-26 | 五邑大学 | Method and apparatus for super-resolution reconstruction of multi-scale dilated convolution neural network |
CN111602147A (en) * | 2017-11-17 | 2020-08-28 | 脸谱公司 | Machine learning model based on non-local neural network |
CN112308772A (en) * | 2019-08-02 | 2021-02-02 | 四川大学 | Super-resolution reconstruction method based on deep learning local and non-local information |
-
2021
- 2021-03-23 CN CN202110310875.1A patent/CN113191947B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952228A (en) * | 2017-03-10 | 2017-07-14 | 北京工业大学 | The super resolution ratio reconstruction method of single image based on the non local self-similarity of image |
CN111602147A (en) * | 2017-11-17 | 2020-08-28 | 脸谱公司 | Machine learning model based on non-local neural network |
WO2020056791A1 (en) * | 2018-09-21 | 2020-03-26 | 五邑大学 | Method and apparatus for super-resolution reconstruction of multi-scale dilated convolution neural network |
CN112308772A (en) * | 2019-08-02 | 2021-02-02 | 四川大学 | Super-resolution reconstruction method based on deep learning local and non-local information |
Non-Patent Citations (2)
Title |
---|
唐艳秋 ; 潘泓 ; 朱亚平 ; 李新德 ; .图像超分辨率重建研究综述.电子学报.2020,(07),全文. * |
翟森 ; 任超 ; 熊淑华 ; 占文枢 ; .基于深度学习局部与非局部信息的单幅图像超分辨率重建.现代计算机.2019,(33),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN113191947A (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Multi-level wavelet convolutional neural networks | |
Wang et al. | Ultra-dense GAN for satellite imagery super-resolution | |
CN109978762B (en) | Super-resolution reconstruction method based on condition generation countermeasure network | |
CN110136063B (en) | Single image super-resolution reconstruction method based on condition generation countermeasure network | |
CN109035142B (en) | Satellite image super-resolution method combining countermeasure network with aerial image prior | |
Zhang et al. | One-two-one networks for compression artifacts reduction in remote sensing | |
CN110599401A (en) | Remote sensing image super-resolution reconstruction method, processing device and readable storage medium | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
Xie et al. | Deep coordinate attention network for single image super‐resolution | |
Tu et al. | SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution | |
Muqeet et al. | HRAN: Hybrid residual attention network for single image super-resolution | |
CN112017116B (en) | Image super-resolution reconstruction network based on asymmetric convolution and construction method thereof | |
Rivadeneira et al. | Thermal image super-resolution challenge-pbvs 2021 | |
Zhu et al. | Generative adversarial image super‐resolution through deep dense skip connections | |
CN110533591B (en) | Super-resolution image reconstruction method based on codec structure | |
CN113538246A (en) | Remote sensing image super-resolution reconstruction method based on unsupervised multi-stage fusion network | |
CN113344110B (en) | Fuzzy image classification method based on super-resolution reconstruction | |
Liu et al. | Single image super resolution techniques based on deep learning: Status, applications and future directions | |
Zhang et al. | Deformable and residual convolutional network for image super-resolution | |
CN111861886A (en) | Image super-resolution reconstruction method based on multi-scale feedback network | |
CN116385454A (en) | Medical image segmentation method based on multi-stage aggregation | |
CN115660955A (en) | Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion | |
CN117575915A (en) | Image super-resolution reconstruction method, terminal equipment and storage medium | |
CN110288529B (en) | Single image super-resolution reconstruction method based on recursive local synthesis network | |
CN110288603B (en) | Semantic segmentation method based on efficient convolutional network and convolutional conditional random field |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |