CN113191947A - Method and system for image super-resolution - Google Patents

Method and system for image super-resolution Download PDF

Info

Publication number
CN113191947A
CN113191947A CN202110310875.1A CN202110310875A CN113191947A CN 113191947 A CN113191947 A CN 113191947A CN 202110310875 A CN202110310875 A CN 202110310875A CN 113191947 A CN113191947 A CN 113191947A
Authority
CN
China
Prior art keywords
image
local
network
information
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110310875.1A
Other languages
Chinese (zh)
Other versions
CN113191947B (en
Inventor
赵楠
肖明宇
陈南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110310875.1A priority Critical patent/CN113191947B/en
Publication of CN113191947A publication Critical patent/CN113191947A/en
Application granted granted Critical
Publication of CN113191947B publication Critical patent/CN113191947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image super-resolution, and discloses a method and a system for image super-resolution, wherein the method for image super-resolution comprises the following steps: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN, so that the basic model becomes a special case of the improved model. The utilization of the non-local operation provided by the invention can effectively capture non-local similar information in the image and improve the SR reconstruction performance; compared with the existing CNN model, the method has obvious reconstruction advantages and is prominent in image scenes with rich structural information.

Description

Method and system for image super-resolution
Technical Field
The invention belongs to the technical field of image super-resolution, and particularly relates to a method and a system for image super-resolution.
Background
At present, super-resolution based on learning becomes a mainstream image super-resolution scheme in recent ten years because of its fast calculation speed and excellent performance, which is a ill-posed problem in the computer vision direction. Unlike interpolation and reconstruction-based methods, data-based super-resolution application scenarios are richer and higher reconstructed image quality can be obtained. By learning the mapping relationship implied by the LR and HR images, and then performing super-resolution reconstruction of the images through this relationship, the learning-based super-resolution reconstruction can be divided into, according to the learned object and learning approach: the method comprises a method based on popular learning, a learning method based on an over-complete dictionary, a learning method based on K-nearest neighbor, a method based on example learning and a method based on deep learning.
With the development of deep learning technology in various fields of computers, methods based on deep learning are becoming mature. Each student focuses more attention on the research of deep learning networks, and particularly, the proposal and the perfection of a Convolutional Neural Network (CNN) enable the performance of the deep learning technology in various fields such as image classification, semantic segmentation, target detection and image restoration to be greatly improved. The CNN uses a convolution kernel to replace the field of view in humans, which reduces the amount of computation, effectively retains the features of the image, and more efficiently processes the image. The two ideas are the most prominent in local receptive field and weight sharing, and under the combined extension of the two ideas, the CNN network has the advantages of displacement, scale invariance and the like when extracting features. In the field of super-resolution reconstruction, the CNN model exhibits excellent feature learning ability through a network structure having the above characteristics, which are all necessary conditions for generating a high-quality SR image.
In 2014, Dong first proposed using CNN to process image super-resolution, proposed SRCNN algorithm, and introduced CNN into SR field. According to the algorithm, the LR image is subjected to up-sampling processing, then the LR image is input into a network structure with the depth of 3, three parts of feature extraction, nonlinear mapping and image reconstruction are combined, the LR-HR mapping relation is learned in an end-to-end mode, and the research heat trend of SR is started. Dong then further proposed FSRCNN. Because SRCNN firstly interpolates LR images and then reconstructs the LR images in a hyper-resolution mode, the network calculation amount is obviously increased, therefore, the FSRCNN directly puts the LR images into a network for training, and finally reconstructs the images by using an deconvolution layer. Therefore, the network can input the small graph without interpolation, and the calculated amount of the network is greatly reduced because the deconvolution layer is positioned at the tail end of the network. While the network introduces 1 x1 dimensionality reduction and dimensionality enhancement and decomposes the 5 x 5 convolution kernel into 2 3 x 3 convolution kernels, which again reduces the network computational load. Later networks all use this method to reduce the amount of data computation. The ESPCN appearing hereafter is that the quantity of operation of the network is increased by up-sampling the input to the network through an interpolation method, and a large amount of calculation redundancy is generated by using deconvolution, so that noise is amplified while the image is amplified, and the quality of the reconstructed image is influenced.
The main component of ESPCN is a sub-pixel convolution layer (sub-pixel convolution layer). Inputting an original low-resolution image through a network, and obtaining a channel number r through three convolution layers2(r2Is the image magnification times) of the same size as the input image. Then r of each pixel of the characteristic image is calculated2The channels are rearranged into an r x r region corresponding to an r in the high resolution image2Sub-blocks of size H x W, thus of size r2The characteristic images of H × W are rearranged into high resolution images of 1 × rH rW. This transformation, although referred to as sub-pixel convolution, does not actually perform a convolution operation. The process of image magnification from low resolution to high resolution, interpolation function is implicitly contained in the previous convolutional layer, and can be learned automatically. The image size is only transformed in the last layer, and the former convolution operation is performed on the low-resolution image, so that the efficiency is higher. The sub-pixel convolution layer is widely applied to a more advanced SR model in the future, then the appeared VDSR considers that SR is modeled in an HR space, an HR picture can be decomposed into high-frequency information and low-frequency information, the input picture and the output picture share the same low-frequency information, SRCNN transmits the input to the tail end to construct a residual error, the concept of the method is similar to that of automatic coding, and the training time can be consumed in the automatic codingAnd modeling residual errors to accelerate convergence speed. The model introduces a residual error network, and the size of the feather map is ensured to be unchanged by supplementing 0 in each layer, so that the performance of the CNN network is greatly improved. Kim et al introduces the cyclic learning idea of the Recurrent Neural Network (RNN) into the design of the CNN model, proposes a DRCN model, further refreshes SR reconstruction performance through parameter sharing, inputs an interpolated image into the DRCN, and divides the image into three modules, the first is an Embedding network, which is equivalent to feature extraction, the second is an Inference network, which is equivalent to nonlinear mapping of features, and the third is a reconstruction network, which is to restore the final reconstruction result from the feature image. The reference network is a recursive network, i.e. data passes through the layer cyclically a plurality of times. This loop is unrolled to be equivalent to a plurality of concatenated convolutional layers using the same set of parameters. The authors of DRRN were subsequently inspired by ResNet, VDSR and DRCN, which used deeper network structures to achieve performance gains, each residual unit in DRRN has in common one identical input, i.e. the output of the first convolutional layer in the recursive block. Each residual unit contains 2 convolutional layers. Within a recursive block, convolutional layer parameters that are correspondingly located the same in each residual unit are shared. The LapSRN proposed later is combined with the traditional Laplacian pyramid algorithm to design a set of cascaded pyramid deep learning models, and the models mainly have the following three characteristics: one approach is to use a predefined upsampling operation (e.g., bicubic) to obtain the spatial size of the target before the input image enters the network, which adds additional computational overhead and also results in visible reconstruction artifacts. Some methods use sub-pixel convolution layers or deconvolution layers to replace predefined up-sampling operations, and the network structures of the methods are relatively simple and have poor performance, so that complex mapping from a low-resolution image to a high-resolution image cannot be learned. Secondly, when the L2-type loss function is used in training the network, a fuzzy prediction is inevitably generated, and the recovered high-resolution picture is often too smooth. Thirdly, when reconstructing high-resolution images, if onlyWith one up-sampling operation, it is difficult to obtain a large multiple (more than 8 times) of up-sampling factor. And in different applications, models with different upsampling multiples need to be trained.
Most SR algorithms based on cnni (conditional neural network) do not fully consider the non-local similarity of images, and this property is proved to effectively improve the reconstruction performance of images in the conventional non-local method. Meanwhile, only a few studies are currently conducted to explore how to combine the non-local self-similarity of the image with deep learning, and an SR algorithm with greater potential is developed.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) the up-sampling input to the network by the interpolation method can increase the operation amount of the network, and the use of deconvolution can generate a large amount of calculation redundancy, so that the noise is amplified while the image is amplified, and the quality of the reconstructed image is influenced.
(2) The pyramid deep learning model requires the use of a predefined upsampling operation to obtain the spatial size of the target, which adds additional computational overhead and also results in visible reconstruction artifacts.
(3) Some methods use sub-pixel convolution layer or deconvolution layer to replace pre-defined up-sampling operation, and the network structure of these methods is relatively simple and has poor performance, and complex mapping from low-resolution images to high-resolution images cannot be learned.
(4) When the loss function of the type L2 is used in training the network, a blurred prediction is inevitably generated, and the recovered high resolution picture is often too smooth.
(5) When reconstructing high-resolution images, if only one up-sampling operation is used, it is difficult to obtain a large number of up-sampling factors, and in different applications, models with different up-sampling times need to be trained.
(6) Most of the SR algorithms based on CNNI do not fully consider the non-local similarity of the image, and this property is proved to effectively improve the reconstruction performance of the image in the conventional non-local method.
The difficulty in solving the above problems and defects is:
by introducing PCA (principal component analysis) dimension reduction, image denoising and image hyper-resolution are combined, a non-local similar block of an image is extracted by using a block matching algorithm, an L1 type loss function is used during training, and end-to-end image reconstruction is realized by using 3 DCNN. The flow is worth noting that original information needs to be reserved while PCA dimension reduction is carried out, so that it is very critical to define data for dimension reduction and data not for dimension reduction, complexity is reduced as much as possible by using a KNN algorithm in block matching, and finally a reasonable network structure needs to be designed when 3DCNN is designed.
The significance of solving the problems and the defects is as follows:
experiments prove that the basic model and the improved model of the method both realize effective performance improvement compared with the prior method. The improved model realizes the best SR performance in the comparison algorithm under the condition of the same parameter quantity as the basic model, and provides a new idea in the field of image super-resolution.
Disclosure of Invention
The invention provides an image super-resolution method and system aiming at the defects of a CNN reconstruction model in capturing non-local self-similarity information of an image, in particular to a non-local image super-resolution method, system, medium, equipment and processing terminal based on block matching and a 3D convolution neural network.
The invention is realized in such a way that the method for super-resolution of the image comprises the following steps: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN, so that the basic model becomes a special case of the improved model.
Further, the 3D convolutional neural network includes three parts, which are:
1) the single-layer 3D convolutional layer and ReLU combination is used for converting the input image block set into a feature space;
2) a feature subnetwork for extracting local and non-local information;
3) a single-layer 3D convolutional layer for outputting a residual image block set; the designed 3D network output is the sum between the set of residual image blocks and the set of input image blocks.
Further, the image super-resolution method comprises the following steps:
step one, comparing a traditional super-resolution model, designing a 3DCNN basic model based on an 8-layer full convolution network as a training model of an image, wherein the third-dimensional information of the model uses non-local similar properties;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
performing PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of the image, and performing preprocessing for block matching;
step four, extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and fifthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Further, in the first step, compared with the conventional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similar properties, including:
the first two dimensions of the set of three-dimensional image blocks hold image 2D local information, while the third dimension holds non-local information. For such three-dimensional data, the SR image will be obtained more efficiently by using a more powerful 3d convolution.
The 2D convolution extracts the features in the local neighborhood of the previous layer of feature map, and is different from the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more KdDimensionality, hence 3D convolution kernelHas a size of (K)h,Kw,Kd). Each time the sliding window is correlated with a value in the window, a value in the output 3D image is obtained. Wherein the depth of convolution kernel is smaller than the input layer depth, and the kernel size<The size of the channel; the 3D convolution kernel can be moved in all three directions, i.e. height, width and path of the image, and at each position, element-by-element multiplication and addition results in a value that enhances the ability to obtain data information for such three-dimensional data processing. Where the 2D convolution can be expressed by the formula:
Figure BDA0002989501560000061
wherein the content of the first and second substances,
Figure BDA0002989501560000062
the jth feature location, which may represent the current/, is a value of (x, y), where i represents the location coordinate of the previous layer feature map set,
Figure BDA0002989501560000063
expressed as the value at the (P, q) position of the kernel when the connection is the ith feature map, PlAnd QlThe size of the convolution kernel. The 3D convolution is defined as:
Figure BDA0002989501560000064
wherein the content of the first and second substances,
Figure BDA0002989501560000065
represents the value of the j (th) 3D position of the current l layer as (x, y, z), RlThird-dimensional scale information representing the 3D convolution.
For a three-dimensional image block set, 3D convolution refers to that in a small three-dimensional receptive field range, the calculated target pixel value is a joint weighted average of local and non-local pixel points. With the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved.
And capturing local and non-local information of the image block set by using the 3D convolutional layer through a full convolutional 3D network in network design. The whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted through convolution kernels, a residual learning strategy is adopted to relieve the problem of gradient disappearance or explosion, more network layers can be stacked and used, or more features are extracted from each layer, namely input and output jump connection of the network. Generally, the data size is reduced after convolution operation, and the sizes of the network input and output are ensured to be consistent through zero filling operation on each layer of convolution layer in the training process. Whereas for the activation function, the proposed method chooses ReLU, then the 3D convolution and nonlinear operation of layer i can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
wherein Hl-1The output of the previous layer, i.e. the input of the current layer, is represented. Definition of Wl*Hl-1Is a 3D convolution operation of the current layer. The 3DCNN was designed to learn the mapping f (Y) between LR image patch set Y and HR image patch set X as close to X as possible. 3DCNN takes a set of LR image blocks as input (i.e., H)0Y), then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
the depth of the network is set to be D, namely D layers of convolution layers exist. In the basic model of 3DCNN, a total of 8 convolutional layers are included, i.e., D is 8. The output of the entire network, f (y), may be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
further, in step three, subjecting the partial data set to PCA processing to reduce dimensions while removing noise and redundant information while retaining original information of the image includes:
(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing image information redundant parts, specifically obtaining image similar characteristics, and removing image noise to ensure non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.
For n x m image samples, there are n samples, each row is m dimensions, for samples without m dimensions by interpolation to m dimensions, the real matrix of p x n can be decomposed as:
X=U∑VT
the dimension of the orthogonal array U is n multiplied by m, the dimension of the orthogonal array V is m multiplied by m, and the orthogonal array satisfies the following conditions: UU (unmanned underwater vehicle)T=VTV is 1; Σ is an m × m diagonal matrix. Divide sigma into r rows, denoted as sigmar(ii) a Obtaining a dimension reduction data sample Y by utilizing U and Vr
Yr=U∑r
(2) BM algorithms are employed to extract non-local blocks of LR images. First, data sample YrIs broken into a set of images of size p
Figure BDA0002989501560000081
Wherein N is (d)h-p+1)×(dwP +1) represents the number of blocks broken up by the whole image. For omegapThe ith target image block p (i) can be calculated as:
p(i)=R(i)ybic
wherein
Figure BDA0002989501560000082
dwIs a binary sparse matrix which represents the extracted ith image block.
(3) The BM algorithm collects a series of close pictures using a search window of size s x s based on euclidean distance. Finally, the pictures collected in each window are overlapped together to be used as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, from the reduced-dimension data set YrA data blocks are selected, and then b number of blocks are selected from the original data set, wherein a + b is k. For the generated HR image x corresponding to it, the same is usedLine processing to form an HR image block set
Figure BDA0002989501560000083
Meanwhile, collecting similar image blocks in an HR image requires ensuring non-local similar positions in the interpolated image, rather than recalculating the distance search. The 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.
Further, in step four, the extracting non-local similar image blocks from the LR image dataset by using the conventional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks includes:
the constructed 3DCNN is output as a series of enhanced image block sets
Figure BDA0002989501560000084
ΩYAll the sets of enhanced image blocks can be expressed as
Figure BDA0002989501560000085
Forming a set of image blocks; final SR image
Figure BDA0002989501560000086
By
Figure BDA0002989501560000087
The image reconstruction recovery in (1), defined as:
Figure BDA0002989501560000088
in view of
Figure BDA0002989501560000089
Are repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values:
Figure BDA00029895015600000810
wherein the content of the first and second substances,
Figure BDA0002989501560000091
is a constant column vector with an element value of 1. Finally, the SR picture
Figure BDA0002989501560000092
Can be calculated as:
Figure BDA0002989501560000093
another object of the present invention is to provide an image super-resolution system applying the image super-resolution method, the image super-resolution system comprising:
the 3DCNN basic model building module is used for comparing a traditional super-resolution model and designing a 3DCNN basic model based on an 8-layer full convolution network, and the third-dimensional information of the model uses non-local similar properties;
the image non-local characteristic extraction module is used for extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network;
the dimensionality reduction processing module is used for carrying out dimensionality reduction on part of the data set through PCA (principal component analysis), removing noise and redundant information, simultaneously reserving original information of an image, and carrying out preprocessing for block matching;
the non-local similar image block extraction module is used for extracting a non-local similar image block from an LR image data set by adopting a traditional block matching method;
the non-local similar image block characterization module is used for stacking image blocks to form a three-dimensional image block set to characterize the non-local image blocks;
and the SR image output module is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of an image, and carrying out preprocessing for block matching;
extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of an image, and carrying out preprocessing for block matching;
extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Another object of the present invention is to provide an information data processing terminal for implementing the system for image super-resolution.
By combining all the technical schemes, the invention has the advantages and positive effects that: the image super-resolution method provided by the invention is used for carrying out deep research aiming at the problem of insufficient extraction of image non-local similar information in a CNN model, and carrying out the following two works from different angles: from a data level, a non-local SR method based on block matching and a 3D convolutional neural network is provided. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and forms a three-dimensional image block set. Based on the three-dimensional image block set, the method constructs and trains a 3D convolutional neural network for extracting local and non-local similar information in the 3D convolutional neural network, and learns the mapping relation between the LR-HR image block sets. Finally, the method reconstructs an HR image from the set of predicted image blocks. Starting from a network structure, an image SR model based on a non-local neural network is provided. The method reforms the existing non-local operation based on the CNN, combines the non-local operation with the traditional CNN structure, and provides a mixed residual error unit. With the mixed residual unit as a cyclic unit, the method constructs a cyclic network for extracting local and non-local information of the image in the LR space. And finally, converting the features into an HR space by utilizing an up-sampling network and realizing the reconstruction of an HR image. Experimental results show that the utilization of non-local operation can effectively capture non-local similar information in an image and improve SR reconstruction performance. Compared with the existing CNN model, the proposed non-local residual network shows obvious reconstruction advantages and is prominent in image scenes with rich structural information.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for super-resolution of an image according to an embodiment of the present invention.
FIG. 2 is a block diagram of a system for super-resolution of images according to an embodiment of the present invention;
in the figure: 1.3 DCNN basic model building module; 2. an image non-local characteristic extraction module; 3. a dimension reduction processing module; 4. a non-local similar image block extraction module; 5. a non-local similar image block representation module; 6. and an SR image output module.
Fig. 3 is a schematic diagram of a network model according to an embodiment of the present invention.
Fig. 4 is a comparison diagram of operations of 2D convolution and 3D convolution according to an embodiment of the present invention.
Fig. 4(a) is a schematic diagram of a 2D convolution according to an embodiment of the present invention.
Fig. 4(b) is a schematic diagram of 3D convolution according to an embodiment of the present invention.
FIG. 5 is a schematic diagram showing the ratio r of the selected data set to the PCA part and the non-PCA part compared with the original data result.
FIG. 6 is a comparison of data sets generated by different methods provided by embodiments of the present invention.
Fig. 7 is a schematic diagram of a model with and without a residual error introduced into a residual error network according to an embodiment of the present invention.
Fig. 8(a) shows an original image (PSNR/SSIM) according to an embodiment of the present invention.
FIG. 8(b) is a schematic diagram of Bicubic (29.55/0.8432) provided by the embodiment of the present invention.
FIG. 8(c) is a schematic diagram of ScSR (30.77/0.8749) provided in the embodiment of the present invention.
FIG. 8(d) is a diagram of SelfExSR (31.18/0.8859) according to an embodiment of the present invention.
Fig. 8(e) is a schematic diagram of an srnnn (31.36/0.8882) provided in the embodiment of the present invention.
FIG. 8(f) is a schematic diagram of FSRCNN (31.50/0.8909) provided by an embodiment of the present invention.
FIG. 8(g) is a schematic illustration of an unincorporated pca our (32.89/0.9106) provided by an embodiment of the present invention.
FIG. 8(h) is a schematic drawing of a lead-in pca our (33.01/0.9109) provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method and a system for super-resolution of images, which are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for super-resolution of an image provided by an embodiment of the present invention includes the following steps:
s101, compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and non-local similarity is used for third-dimensional information of the model;
s102, extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
s103, carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, simultaneously removing noise and redundant information, retaining original information of the image, and carrying out preprocessing for block matching;
s104, extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and S105, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
As shown in fig. 2, the system for super-resolution of images provided by the embodiment of the present invention includes:
the 3DCNN basic model building module 1 is used for designing a 3DCNN basic model based on an 8-layer full convolution network by comparing a traditional super-resolution model, and the third-dimensional information of the model has non-local similar properties;
the image non-local characteristic extraction module 2 is used for extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network;
the dimensionality reduction processing module 3 is used for carrying out PCA processing dimensionality reduction on part of the data set, simultaneously removing noise and redundant information, simultaneously reserving original information of the image, and carrying out preprocessing for block matching;
a non-local similar image block extraction module 4, configured to extract a non-local similar image block from an LR image dataset by using a conventional block matching method;
the non-local similar image block characterization module 5 is used for stacking image blocks to form a three-dimensional image block set as a non-local image block;
and the SR image output module 6 is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
The technical solution of the present invention will be further described with reference to the following examples.
Example 1
Aiming at the problem of insufficient extraction of image non-local similar information in a CNN model, the invention carries out deep research and develops the following two works from different angles: from a data level, a non-local SR method based on block matching and a 3D convolutional neural network is proposed. The method utilizes a block matching method to extract non-local similar image blocks from a two-dimensional image and forms a three-dimensional image block set. Based on the three-dimensional image block set, the method constructs and trains a 3D convolutional neural network for extracting local and non-local similar information in the 3D convolutional neural network, and learns the mapping relation between the LR-HR image block sets. And finally, reconstructing an HR image from the predicted image block set by the method. Starting from a network structure, an image SR model based on a non-local neural network is provided. The method modifies the existing non-local operation based on CNN and combines the non-local operation with the traditional CNN structure to provide a mixed residual error unit. With the mixed residual unit as a cyclic unit, the method constructs a cyclic network for extracting local and non-local information of the image in the LR space. And finally, converting the features into an HR space by utilizing an up-sampling network and realizing the reconstruction of an HR image. Experimental results show that the utilization of non-local operation can effectively capture non-local similar information in an image and improve SR reconstruction performance. Compared with the existing CNN model, the proposed non-local residual error network has obvious reconstruction advantages and is prominent in image scenes with rich structural information.
Aiming at the defects of a CNN reconstruction model in capturing image non-local self-similarity information, the invention provides a non-local image super-resolution model based on a block matching and 3D convolutional neural network and an image super-resolution model based on a non-local convolutional neural network.
The method comprises the steps of firstly, selecting partial data in a data set, reducing the dimension of the data by adopting principal and subordinate analysis (PCA), removing image information redundant parts, specifically obtaining image similar characteristics, removing image noise and ensuring non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.
Second, non-locally similar image blocks are extracted from the LR image dataset according to formula and characterized. The extraction method adopts a traditional block matching method to extract non-local similar blocks, and finally, image blocks are stacked to form a three-dimensional image block set as a representation of the non-local image blocks.
And thirdly, designing a 3D convolutional neural network to extract the information of the image block set and enhance the non-local image blocks. As shown in fig. 3, the constructed 3D network includes three parts, respectively: 1) a single-layer 3D convolutional layer and ReLU combination for converting the input image block set into a feature space; 2) a feature subnetwork for extracting local and non-local information; 3) a single-layer 3D convolutional layer for outputting a residual image block set. The designed 3D network output is the sum between the set of residual image blocks and the set of input image blocks.
And fourthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
Example 2
The super-resolution method provided by the embodiment of the invention comprises the following steps: by combining non-local self-similarity of the image, the image SR is processed by utilizing a 3D convolutional neural network (3 DCONVOLUONAL NeuralNet, 3DCNN) for the first time, and a non-local super-resolution method based on 3DCNN is provided. According to the method, the 3DCNN modeling non-local similarity is directly utilized, and the non-local similarity information of the natural image is extracted. A 3DCNN base model (Basemodel) based on an 8-layer full convolution network was constructed. Then, on the basis, 3D network design in 3DCNN is further researched, an improved model based on RNN is provided, and the basic model becomes a special case of the improved model.
A schematic diagram of a network model provided by the embodiment of the present invention is shown in fig. 3.
The super-resolution method provided by the embodiment of the invention comprises the following steps:
firstly, most SR methods based on CNN are based on the traditional network structure, the non-local self-similarity is not fully utilized, compared with the traditional super-resolution model, a 3DCNN basic model based on 8-layer full convolution network is designed, and the non-local similarity is used for the third-dimensional information of the model.
And step two, extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network.
And thirdly, performing PCA (principal component analysis) processing on part of the data set to reduce the dimension, wherein the dimension reduction can make the data set easier to use, reduces the calculation overhead of an algorithm, simultaneously removes noise and redundant information, retains the original information of the image, and performs preprocessing for the next block matching.
And step four, extracting non-local similar image blocks from the LR image data set and characterizing the non-local similar image blocks. The extraction method adopts a traditional block matching method to extract non-local similar blocks, and finally, the image blocks are stacked to form a three-dimensional image block set as a representation of the non-local image blocks.
And step five, outputting the image block set after training in the 3DCNN, and outputting the reconstructed SR image.
The third step comprises the following steps:
(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, specifically obtaining image similar characteristics, removing image noise and ensuring non-local similarity. Meanwhile, the original data of a part of the data set is reserved as the other part of the data set, and the original information and the local information of the image are guaranteed not to be lost.
For n x m image samples, there are n samples, each row is m dimensions, for samples without m dimensions by interpolation to m dimensions, the real matrix of p x n can be decomposed as:
X=U∑VT
the dimension of the orthogonal array U is n × m and the dimension of the orthogonal array V is m × m (the orthogonal array satisfies: UU)T=VTV ═ 1), and Σ is a diagonal matrix of m × m. Next, Σ is divided into r rows, denoted as Σr(ii) a The dimension reduction data sample Y can be obtained by utilizing U and Vr
Yr=U∑r
A conventional BM algorithm is then employed to extract non-local blocks of the LR image. First, data sample YrIs broken into a set of images of size p
Figure BDA0002989501560000151
Wherein N is (d)h-p+1)×(dwP +1) represents the number of blocks broken up by the whole image. For omegapThe ith target image block p (i) can be calculated as:
p(i)=R(i)ybic
wherein
Figure BDA0002989501560000152
dwIs a binary sparse matrix which represents the extracted ith image block.
2) The BM algorithm then collects a series of close pictures using a search window of size s x s based on euclidean distance. Finally, the pictures collected in each window are overlapped together to be used as an LR image set Y required by network training. For each element in the image set Y consisting of K image blocks, in order to fully utilize the non-local and local characteristics of the image, from the dimension-reduced data set YrA data blocks are selected, and then b number of blocks are selected from the original data set, wherein a + b is k. For the generated HR image x corresponding to it, the same processing is performed to form a set of HR image blocks
Figure BDA0002989501560000161
Meanwhile, collecting similar image blocks in an HR image requires ensuring an interpolated imageRather than to recalculate the distance search. The 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.
The first step comprises the following steps:
convolution mostly refers to 2D convolution, which is good at processing two-dimensional data information. But for three-dimensional data (e.g., video), 3D convolution can be more efficiently processed. In the proposed method, the first two dimensions of the set of three-dimensional image blocks hold the 2D local information of the image, while the third dimension holds the non-local information. For such three-dimensional data, the SR image will be more efficiently obtained by using a more powerful 3d convolution.
The 2D convolution extracts the features in the local neighborhood of the feature map of the previous layer, and is different from the 2D convolution in that the input image has one depth dimension and the convolution kernel has one K dimension moredDimension, and therefore the size of the 3D convolution kernel is (K)h,Kw,Kd). Each time the sliding window is correlated with the value in the window, a value in the output 3D image is obtained. Wherein the depth of convolution kernel is smaller in size than the input layer depth (kernel size)<Channel size). The 3D convolution kernel can be moved in all three directions (height, width, channel of the image), and at each position, element-by-element multiplication and addition will result in a value, which enhances the ability to acquire data information for such three-dimensional data processing. Where the 2D convolution can be expressed by the formula:
Figure BDA0002989501560000162
wherein the content of the first and second substances,
Figure BDA0002989501560000163
the jth feature location, which may represent the current/, is a value of (x, y), where i represents the location coordinate of the previous layer feature map set,
Figure BDA0002989501560000164
denoted as the connection is the ithThe value at the kernel is (P, q) position, P, in the feature maplAnd QlThe size of the convolution kernel. Like the 2D convolution formula, the 3D convolution can be defined as:
Figure BDA0002989501560000171
wherein the content of the first and second substances,
Figure BDA0002989501560000172
represents the value of the j (th) 3D position of the current l layer as (x, y, z), RlThird-dimensional scale information representing the 3D convolution. A comparison of the 2D convolution kernel 3D convolution operation is shown in fig. 4.
For a three-dimensional image block set, 3D convolution refers to that in a small three-dimensional receptive field range, the calculated target pixel value is a joint weighted average of local and non-local pixel points. With the stacking of the convolution layers, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved.
Local and non-local information of the image block set is captured by a full convolution 3D network through a 3D convolution layer (Conv 3D for short) in network design. The whole network consists of a series of nonlinear units and a 3D convolution layer, the characteristics are extracted through convolution kernels,
residual learning strategies are employed to alleviate gradient vanishing or explosion problems and may be stacked to use more network layers or have each layer extract more features, i.e. input and output hopping connections of the network. Generally, the data size is reduced after convolution operation, and the sizes of the network input and output are ensured to be consistent through zero filling operation on each layer of convolution layer in the training process. For the activation function, the proposed method uses ReLU, and the 3D convolution and nonlinear operation of the ith layer can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
wherein Hl-1The output of the previous layer, i.e. the input of the current layer, is represented. Definition of Wl*Hl-1Is a 3D convolution operation of the current layer. The 3DCNN was designed to learn the mapping f (Y) between LR image patch set Y and HR image patch set X as close to X as possible. 3DCNN takes a set of LR image blocks as input (i.e., H)0Y), then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
the depth of the network is set to be D, namely D layers of convolution layers exist. In the basic model of 3DCNN, a total of 8 convolutional layers are included, i.e., D is 8. The output of the entire network, f (y), may be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
the fourth step comprises the following steps:
the last step of the proposed method is to reconstruct an SR image from the enhanced image block. Constructed 3DCNN output is a series of enhanced image block sets
Figure BDA0002989501560000181
ΩYAll the sets of enhanced image blocks can be expressed as
Figure BDA0002989501560000182
A set of image blocks is formed. Final SR image
Figure BDA0002989501560000183
By
Figure BDA0002989501560000184
And (4) image reconstruction recovery. It can be defined as:
Figure BDA0002989501560000185
in view of
Figure BDA0002989501560000186
Are repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping elementsThe value:
Figure BDA0002989501560000187
wherein the content of the first and second substances,
Figure BDA0002989501560000188
is a constant column vector with an element value of 1. Finally, the SR picture
Figure BDA0002989501560000189
Can be calculated as:
Figure BDA00029895015600001810
example 3
Firstly, setting up model
1) Training set and test set
A conventional 291 image set was used consisting of 91 images from Yang et al and 200 images from the BSD. This data set is widely used for the training of SR models. The data set obtained by subjecting the 291 images to PCA processing and the original data set of the images not subjected to PCA processing are collectively used as the desired LR image block set. The invention selects common Set5, Set14 and BSD100 test sets to evaluate the SR performance of the algorithm, the image contents are rich and various, including human beings, animals, plants, natural landscapes, buildings and the like, and PSNR and SSIM are adopted as objective indexes for evaluating the SR performance.
2) Training arrangement
In the designed 3D convolutional network, each convolutional layer, except the output layer, uses 64 3D convolutional kernels with size of 3 × 3 × 3 to extract features. All models are developed and realized on a laboratory PC machine based on a Tensorflow deep learning framework. The software adopts Windows10 operating system, Python3.7 and Tensflow2.0, and the hardware adopts Intel core i7-3770CPU, Intel core GTX1080ti video card and 16G memory.
3) Model set-up
The weights of the convolutional layers were initialized using the method proposed by He et al, which was verified to combine better with ReLU. The optimizer employs an Adam optimizer to train all experimental models with the parameters set to: beta 1-0.9, beta 2-0.999, epsilon-10-8. Optimizer initial learning setting is 3 × 10-4And the learning rate is reduced by 10 times per 30 training rounds. The data mini-batch size during training is set to 512.
The method is characterized in that a residual model is added on the premise of a basic model, a residual element mainly comprises two functions of shortcut connection and identity mapping, the shortcut connection enables the residual to have possibility, and the identity mapping enables a network to be deeper through activating functions and jumping connection. The 3D network adopts an end-to-end training mode, and the loss function is used for guiding the updating of the model parameters.
On the image SR task, minimize l2The loss function is equivalent to maximizing the PSNR. However, some work in recent years has shown that1The loss function has great potential. In the proposed method, l1The loss function is expressed as a model loss function as:
l(F(Y),X)=||F(Y)-X||;
thus the method selects1The loss function is taken as a model loss function. Through continuous training of the model, the weight parameters are optimized until the loss function value is stabilized in the interval [0, 0.0005 ]]。
Second, experimental comparison index
Judging the image reconstruction performance by comparing the PSNR and the SSIM of the image, wherein for the image reconstruction, the PSNR is determined by the maximum pixel value and the image seen mean square error; given an m x n pre-compressed image x, the compressed reconstructed image is defined by the mean square error and PSNR as:
Figure BDA0002989501560000191
Figure BDA0002989501560000192
the idea of SSIM is to measure the structural similarity between two images, rather than the inter-pixel differences like PSNR. The basic assumption is that the human eye is more sensitive to changes in image structure. The SSIM between x and the compressed reconstructed image y can be defined as:
Figure BDA0002989501560000201
wherein u isxOr uyMean value of pixels, σ, representing original image x or compressed reconstructed image yxOr σyExpressing the standard deviation, σ, of the pixels of x or yxyRepresents sigmaxAnd σyCovariance between c1And c2Indicating a constant disturbance prevents instability.
Third, experimental analysis
1) Non-local block set
The method comprises the steps of selecting 291 image sets, decomposing the 291 image sets into 2750 image blocks with the size of 100 x 100 by using the step size of 100, dividing the 2750 image blocks into 2 parts according to a certain proportion r (0< r <1), carrying out PCA (principal component analysis) dimension reduction processing on one part of the image blocks to extract similar feature image sets to ensure that non-local information is not doped with noise, and using the other part of image blocks which are not subjected to PCA processing to ensure that the local information is not lost. The KNN algorithm is used for collecting the 2 parts of block sets. Finally, the 2 partial image block sets are combined together, and a total of 275 ten thousand image block sets are used as a training set of the model. Therefore, according to the method, the following parameters are considered, including the window size s of the KNN search, the size p of the scattered image block, the non-local scale k and the proportion r of dimension reduction processing data and selected data. Theoretically, the larger the search window s, the more accurate the search is, and the larger the non-local scale k, the more information is extracted. However, the increase in parameters is too costly for 3DCNN data calculation, and the window size 31 may be selected most appropriately based on experience from others before. P, k, s are then determined experimentally.
Experimental model the model discussed above was chosen.
Table 1 shows the test results of PSNR under SET5 and SET14 under the condition of selecting different search windows s and non-local scale k to make 3-fold SR, and three parameters of 5, 7 and 9 are selected as an comparison scheme in consideration of the calculation cost experiment to determine the appropriate search window s and non-local scale k.
Table 1 image Block set and Performance Table
Figure BDA0002989501560000202
Figure BDA0002989501560000211
The results in table 1 show that a larger size p leads to higher PSNR values, and increasing the non-local scale k improves SR performance. Finally, the invention selects the settings: p-9, k-9, i.e. a set of non-local image blocks of size 9 × 9 × 9 was used in the experiments that follow.
In order to further verify the advantages of the non-local image block set selected after introducing PCA dimension reduction, several sets of comparative experiments are designed to illustrate, and the advantages of PCA dimension reduction on the selection of the non-local similarity block are verified by comparing the ratio r of the data set selected and subjected to the dimension reduction processing with the selected data set. Wherein r is 0,0.1,0.2,0.3,0.4 and 0.5 respectively. r represents the proportion r of the image subjected to PCA processing in the data set, and the comparison scheme proves that the non-local similarity block of the image can be better extracted when partial data in the data are subjected to PCA processing. Firstly, the optimal dimensionality reduction is realized by removing the redundant information which has larger influence than dimensionality reduction when PCA dimensionality reduction is carried out on a verification set through experimental verification. In the experiment, the redundancy of a data set is removed by adopting PCA, the PCA reduces the dimension of high-latitude space data to a low-dimension space through linear mapping, the data variance of the low-latitude space needs to be large as much as possible, so that the dimension can be effectively reduced under the condition of keeping the original data point relation unchanged, and based on the principle, the relation between the dimension and the precision after the dimension reduction is obtained by calculating the proportion of each principal component variance.
TABLE 2 PCA similarity accuracy comparison of different dimensions
Dimension (d) of 128 64 32 16 8
Accuracy of measurement 0.9133 0.9141 0.9237 0.9049 0.8188
It can be seen from table 2 that the selected dimension is too large to remove redundant information of the image, and when the selected dimension is too small, the PCA reduces the dimension more than the redundant information, so the dimension 32 is selected as the dimension for reducing the dimension in the following experiment.
2) Experimental contrast plot of ratio r between PCA-processed and unprocessed images of a data set
After the dimension reduction is determined, experiments verify that the effect is the best when the dimension reduction data accounts for a certain proportion in the image data, the 3DCNN cannot only pay attention to non-local similarity, and local information needs to be extracted as much as possible. The data selected as PCA is randomly selected to ensure the limitation of the experiment, and multiple pairs of experiments are performed by multiple times of random selection and the result is subjected to mean value processing, so that the proportion r of the data subjected to PCA processing to the part not subjected to PCA processing needs to be proved by the experiment.
As can be seen from fig. 5, when the proportion of the PCA processing part is selected to be 0.15, the experimental result is the best, and the non-local information of the image can be effectively retained while denoising. As the selected part becomes larger, the experimental effect becomes worse, and the local information of the image is seriously lost. The proportion of the portion to be treated as the PCA was selected to be 0.15 for the next experiment.
3) Different methods for making comparison graph under data set
Next, comparative experiments were performed to demonstrate the effectiveness of the non-local similarity application design in 3DCNN and the effectiveness after PCA treatment was selected. Firstly, a training set is reproduced, image blocks in a comparison image block set are similar image blocks which are randomly selected and searched by KNN without PCA processing, and then the similar image blocks which are searched by KNN after the PCA processing and the similar image blocks which are searched by KNN without PCA processing are compared.
As can be seen from fig. 6, when extracting non-local information of an image, a randomly scattered data set is obviously inferior to a processed data set, when the training round is moderate, the performance of an image block selected by the same KNN algorithm, which is processed by PCA, of a part of the data set is superior to that of a data set which is not processed, and when the training round is large, the performance of a part of the data set which is processed by PCA is marginally superior to that of a data set which is not processed. The data set after PCA processing can effectively extract image features, and the interference of redundancy on the data set is reduced. The model can extract enough non-local information from the data set without being processed by PCA after the training turns are gradually increased, and the advantages of the data set part after being processed by PCA are not obvious.
4) Contrast map after introduction of residual block
In order to show the influence caused by the residual error learning strategy, jump connection between the input and the output of the basic model is removed, and others are retrained in the same training mode.
In the basic model, a global residual learning strategy is adopted, namely a residual image block set is learned by a 3D network. As shown in fig. 7, the 2PSNR contrast curve shows that the residual learning strategy in the proposed method achieves faster and more stable model convergence effect and higher reconstruction performance.
5) Image self-fusion
And finally, reconstructing an SR image from the enhanced image block, rotating one image at multiple angles to obtain a group of images, and sending the group of images into a model to obtain a median value of the results so as to obtain a final over-score result. And further, the information fusion of all parts of the image is enhanced, so that the image representation has a better effect.
3.4.3 reconstruction Performance comparison
In this section, the present invention compares the experimental models of the present invention with existing SR algorithms, where they include ScSR based on dictionary learning, SelfExSR based on internal instance self-similarity, SRCNN based on CNN and FSRCNN. Table 3, table 4 summarize PSNR and SSIM performance evaluations of the proposed method and comparative algorithm on Set5 and BSD100 test sets, respectively. As can be seen from the table, the performance of the experimental model provided by the invention is superior to that of other comparative SR models, the PCA dimension reduction treatment introduced by the invention is slightly superior to that of the model without introducing the PCA dimension reduction model in PSNR, and is slightly weaker than that of the model without introducing the PCA dimension reduction model in SSIM. The fact that the information of the overall structure of the image is lost after PCA processing is illustrated, but the performance improvement of the non-local denoising is facilitated. The BSD100 test concentration images are relatively rich in types and complex in structure, so that the BSD100 test concentration image is more applicable and persuasive. In quantitative evaluation of a BSD100 test set, it can be found that PSNR indexes of different images have large promotion change, and the images contain abundant texture information and show strong structural similarity. The improvement of the performance directly shows that the proposed method has advantages in mining the structural similarity information of the image.
TABLE 3 PSNR/SSIM evaluation results of different algorithms on Set5
Figure BDA0002989501560000231
TABLE 4 PSNR/SSIM evaluation results of different algorithms on BSD100
Figure BDA0002989501560000241
Next, fig. 8 shows a reconstruction diagram of the proposed method and other SR algorithms. The BSD100 test set is selected from the pictures, and the method provided by the invention shows a more vivid structural outline visually. Particularly in animal images, the proposed method reconstructs clearer animal images, mainly because the proposed method fully considers that very strong non-local similarity appears in animal images. By extracting and enhancing the similar information, the method shows more outstanding SR reconstruction performance for the scenes
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in a computer program product that includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the invention may be generated in whole or in part when the computer program instructions are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present invention disclosed in the present invention should be covered within the scope of the present invention.

Claims (10)

1. A method for super-resolution of an image, the method comprising: processing the image SR by using a 3D convolutional neural network (3 DCNN) in combination with the non-local self-similarity of the image, and providing a 3 DCNN-based non-local super-resolution method; directly utilizing 3DCNN modeling non-local similarity to extract non-local similarity information of natural images; constructing a 3DCNN basic model based on an 8-layer full convolution network; and (3) designing a 3D convolutional neural network in 3DCNN, and providing an improved model based on RNN.
2. The method for image super resolution according to claim 1, wherein the 3D convolutional neural network comprises three parts, respectively:
1) the single-layer 3D convolutional layer and ReLU combination is used for converting the input image block set into a feature space;
2) a feature subnetwork for extracting local and non-local information;
3) a single-layer 3D convolutional layer for outputting a residual image block set; the designed 3D network output is the sum between the set of residual image blocks and the set of input image blocks.
3. The method for image super-resolution according to claim 1, wherein the method for image super-resolution comprises the steps of:
step one, compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and non-local similar properties are used for third-dimensional information of the model;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
performing PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of the image, and performing preprocessing for block matching;
step four, extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and fifthly, outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
4. The method for super-resolution of images according to claim 3, wherein in step one, the 3DCNN basic model based on 8-layer full convolution network is designed by comparing with the traditional super-resolution model, and the third dimension information of the model has non-local similarity, comprising:
the first two dimensions of the set of three-dimensional image blocks hold image 2D local information, while the third dimension holds non-local information; for such three-dimensional data, the SR image can be more effectively obtained by adopting the 3d convolution with stronger capacity;
the 2D convolution extracts the features in the local neighborhood of the previous layer of feature map, and is different from the 2D convolution in that the input image has one more depth dimension and the convolution kernel has one more KdDimension, and therefore the size of the 3D convolution kernel is (K)h,Kw,Kd) (ii) a Performing relevant operation on the sliding window and the value in the window each time to obtain a value in the output 3D image; wherein the depth of convolution kernel is smaller than the input layer depth, and the kernel size<The size of the channel; the 3D convolution kernel can move in all three directions, namely the height, the width and the channel of the image, and element-by-element multiplication and addition at each position can obtain a numerical value, so that the capability of acquiring data information is enhanced for the three-dimensional data processing; where the 2D convolution can be expressed by the formula:
Figure FDA0002989501550000021
wherein the content of the first and second substances,
Figure FDA0002989501550000022
the jth feature location, which may represent the current/, is a value of (x, y), where i represents the location coordinate of the previous layer feature map set,
Figure FDA0002989501550000023
expressed as the value at the (P, q) position of the kernel when the connection is the ith feature map, PlAnd QlIs the size of the convolution kernel; the 3D convolution is defined as:
Figure FDA0002989501550000024
wherein the content of the first and second substances,
Figure FDA0002989501550000025
represents the value of the j (th) 3D position of the current l layer as (x, y, z), RlThird dimension information representing a 3D convolution;
for a three-dimensional image block set, 3D convolution means that in a small three-dimensional receptive field range, a calculated target pixel value is a combined weighted average value of local and non-local pixel points; with the stacking of the convolution layer, the local and non-local information of the whole image block set is finally covered with the expansion of the receptive field, so that the effect of feature extraction is achieved;
capturing local and non-local information of an image block set by using a 3D convolutional layer through a full convolutional 3D network in network design; the whole network consists of a series of nonlinear units and 3D convolution layers, features are extracted through convolution kernels, a residual learning strategy is adopted to relieve the problem of gradient disappearance or explosion, more network layers can be stacked and used, or more features are extracted from each layer, namely input and output jump connection of the network; the data size is reduced after convolution operation, and the consistency of the sizes of the network input and output is ensured by zero filling operation on each layer of convolution layer in the training process; for the activation function, the proposed method uses ReLU, and the 3D convolution and nonlinear operation of the ith layer can be defined as:
Hl(Hl-1)=max(0,Wl*Hl-1+bl);
wherein Hl-1Representing the output of the previous layer, namely the input of the current layer; definition of Wl*Hl-1A 3D convolution operation for the current layer; the 3DCNN was designed to learn the mapping f (Y) between LR image patch set Y and HR image patch set X to be as close to X as possible; 3DCNN takes as input a set of LR image blocks, i.e. H0Y, then the first layer output of the network can be expressed as:
H1=H1(Y)=max(0,W1*Y+b1);
setting the network depth as D, namely, D layers of convolution layers exist; in the basic model of 3DCNN, a total of 8 convolutional layers are included, i.e., D ═ 8; the output of the entire network, f (y), may be calculated as:
F(Y)=H8(H7(......H1(Y)))+Y。
5. the method for super-resolution of images according to claim 3, wherein in step three, subjecting the partial data set to PCA for dimensionality reduction while preserving the original information of the images while removing noise and redundant information comprises:
(1) introducing PCA to reduce the dimension of a part of data sets, selecting part of data in the data sets, reducing the dimension of the data by adopting principal and subordinate analysis PCA, removing redundant parts of image information, specifically obtaining image similar characteristics, and removing image noise to ensure non-local similarity; meanwhile, the original data of a part of data set is reserved as the other part of the data set, and the original information and the local information of the image are ensured not to be lost;
for n x m image samples, there are n samples, each row is m dimensions, for samples without m dimensions, by interpolation to m dimensions, the real matrix of p x n can be decomposed as:
X=U∑VT
the dimension of the orthogonal array U is n multiplied by m, the dimension of the orthogonal array V is m multiplied by m, and the orthogonal array satisfies the following conditions: UU (unmanned underwater vehicle)T=VTV is 1; Σ is an m × m diagonal matrix; divide sigma into r rows, denoted as sigmar(ii) a Obtaining a dimension reduction data sample Y by utilizing U and Vr
Yr=U∑r
(2) Extracting non-local blocks of the LR image by adopting a BM algorithm; first, data sample YrIs broken into a set of images of size p
Figure FDA0002989501550000041
Wherein N is (d)h-p+1)×(dw-p +1) represents the number of blocks broken up by the whole image; for omegapThe ith target image block p (i) can be calculated as:
p(i)=R(i)ybic
wherein
Figure FDA0002989501550000042
dwThe image extraction method comprises the steps of (1) representing an ith extracted image block by a binary sparse matrix;
(3) the BM algorithm collects a series of similar pictures using a search window of size s x s based on Euclidean distance; finally, pictures collected by each window are overlapped together to be used as an LR image set Y required by network training; for each element in the image set Y consisting of K image blocks, from the reduced-dimension data set YrSelecting a data blocks, and then selecting b quantity blocks from an original data set, wherein a + b is k; for the generated HR image x corresponding to it, the same processing is performed to form a set of HR image blocks
Figure FDA0002989501550000043
Meanwhile, collecting similar image blocks in the HR image needs to ensure non-local similar positions in the interpolated image instead of recalculating the distance search; the 3DCNN adopts the mode of the image block set to represent the non-local similar image blocks, and can well store local and non-local similar information in the image.
6. The method for image super resolution according to claim 3, wherein in step four, said extracting non-local similar image blocks from LR image data set by using conventional block matching method and stacking the image blocks to form a three-dimensional image block set as representing the non-local image blocks comprises:
the constructed 3DCNN is output as a series of enhanced image block sets
Figure FDA0002989501550000044
ΩYAll the sets of enhanced image blocks can be expressed as
Figure FDA0002989501550000045
Forming a set of image blocks; final SR image
Figure FDA0002989501550000046
By
Figure FDA0002989501550000047
The image reconstruction recovery in (1), defined as:
Figure FDA0002989501550000048
in view of
Figure FDA0002989501550000049
Are repeatedly superimposed, and thus a weight vector needs to be considered to average the overlapping element values:
Figure FDA00029895015500000410
wherein the content of the first and second substances,
Figure FDA0002989501550000051
is a constant column vector with an element value of 1; finally, the SR picture
Figure FDA0002989501550000052
Can be calculated as:
Figure FDA0002989501550000053
7. a system for super-resolution of images, which implements the method for super-resolution of images according to any one of claims 1 to 6, wherein the system for super-resolution of images comprises:
the 3DCNN basic model building module is used for comparing a traditional super-resolution model and designing a 3DCNN basic model based on an 8-layer full convolution network, and the third-dimensional information of the model uses non-local similar properties;
the image non-local characteristic extraction module is used for extracting the non-local characteristics of the image according to the mixed residual error unit and the residual error network;
the dimensionality reduction processing module is used for carrying out dimensionality reduction on part of the data set through PCA (principal component analysis), removing noise and redundant information, simultaneously reserving original information of an image, and carrying out preprocessing for block matching;
the non-local similar image block extraction module is used for extracting a non-local similar image block from an LR image data set by adopting a traditional block matching method;
the non-local similar image block representation module is used for stacking image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and the SR image output module is used for outputting the trained image block set from the 3DCNN and outputting the reconstructed SR image.
8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of an image, and carrying out preprocessing for block matching;
extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
compared with a traditional super-resolution model, a 3DCNN basic model based on an 8-layer full convolution network is designed, and the third-dimensional information of the model has non-local similarity;
extracting non-local characteristics of the image according to the mixed residual error unit and the residual error network;
carrying out PCA (principal component analysis) processing on part of the data set to reduce the dimension, removing noise and redundant information, simultaneously retaining original information of an image, and carrying out preprocessing for block matching;
extracting non-local similar image blocks from an LR image data set by adopting a traditional block matching method, and stacking the image blocks to form a three-dimensional image block set as a representation of the non-local image blocks;
and outputting the trained image block set from the 3DCNN, and outputting the reconstructed SR image.
10. An information data processing terminal characterized by being used for a system for realizing the image super-resolution as set forth in claim 7.
CN202110310875.1A 2021-03-23 2021-03-23 Image super-resolution method and system Active CN113191947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110310875.1A CN113191947B (en) 2021-03-23 2021-03-23 Image super-resolution method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110310875.1A CN113191947B (en) 2021-03-23 2021-03-23 Image super-resolution method and system

Publications (2)

Publication Number Publication Date
CN113191947A true CN113191947A (en) 2021-07-30
CN113191947B CN113191947B (en) 2024-05-14

Family

ID=76973652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110310875.1A Active CN113191947B (en) 2021-03-23 2021-03-23 Image super-resolution method and system

Country Status (1)

Country Link
CN (1) CN113191947B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505860A (en) * 2021-09-07 2021-10-15 天津所托瑞安汽车科技有限公司 Screening method and device for blind area detection training set, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952228A (en) * 2017-03-10 2017-07-14 北京工业大学 The super resolution ratio reconstruction method of single image based on the non local self-similarity of image
WO2020056791A1 (en) * 2018-09-21 2020-03-26 五邑大学 Method and apparatus for super-resolution reconstruction of multi-scale dilated convolution neural network
CN111602147A (en) * 2017-11-17 2020-08-28 脸谱公司 Machine learning model based on non-local neural network
CN112308772A (en) * 2019-08-02 2021-02-02 四川大学 Super-resolution reconstruction method based on deep learning local and non-local information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952228A (en) * 2017-03-10 2017-07-14 北京工业大学 The super resolution ratio reconstruction method of single image based on the non local self-similarity of image
CN111602147A (en) * 2017-11-17 2020-08-28 脸谱公司 Machine learning model based on non-local neural network
WO2020056791A1 (en) * 2018-09-21 2020-03-26 五邑大学 Method and apparatus for super-resolution reconstruction of multi-scale dilated convolution neural network
CN112308772A (en) * 2019-08-02 2021-02-02 四川大学 Super-resolution reconstruction method based on deep learning local and non-local information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
唐艳秋;潘泓;朱亚平;李新德;: "图像超分辨率重建研究综述", 电子学报, no. 07 *
翟森;任超;熊淑华;占文枢;: "基于深度学习局部与非局部信息的单幅图像超分辨率重建", 现代计算机, no. 33 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505860A (en) * 2021-09-07 2021-10-15 天津所托瑞安汽车科技有限公司 Screening method and device for blind area detection training set, server and storage medium
CN113505860B (en) * 2021-09-07 2021-12-31 天津所托瑞安汽车科技有限公司 Screening method and device for blind area detection training set, server and storage medium

Also Published As

Publication number Publication date
CN113191947B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Liu et al. Multi-level wavelet convolutional neural networks
Dong et al. Model-guided deep hyperspectral image super-resolution
Wang et al. Ultra-dense GAN for satellite imagery super-resolution
Chen et al. The face image super-resolution algorithm based on combined representation learning
CN109035142B (en) Satellite image super-resolution method combining countermeasure network with aerial image prior
CN110599401A (en) Remote sensing image super-resolution reconstruction method, processing device and readable storage medium
Zhang et al. One-two-one networks for compression artifacts reduction in remote sensing
Muqeet et al. HRAN: Hybrid residual attention network for single image super-resolution
CN110246084B (en) Super-resolution image reconstruction method, system and device thereof, and storage medium
Li et al. Deep learning methods in real-time image super-resolution: a survey
Kim et al. MAMNet: Multi-path adaptive modulation network for image super-resolution
CN111951167B (en) Super-resolution image reconstruction method, super-resolution image reconstruction device, computer equipment and storage medium
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
Rivadeneira et al. Thermal image super-resolution challenge-pbvs 2021
CN112017116B (en) Image super-resolution reconstruction network based on asymmetric convolution and construction method thereof
CN110533591B (en) Super-resolution image reconstruction method based on codec structure
CN113538246A (en) Remote sensing image super-resolution reconstruction method based on unsupervised multi-stage fusion network
Zhang et al. Deformable and residual convolutional network for image super-resolution
CN115393191A (en) Method, device and equipment for reconstructing super-resolution of lightweight remote sensing image
Muqeet et al. Hybrid residual attention network for single image super resolution
CN115660955A (en) Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion
CN114926337A (en) Single image super-resolution reconstruction method and system based on CNN and Transformer hybrid network
CN112950478B (en) Face super-resolution method and system based on dual identity attribute constraint
CN113191947B (en) Image super-resolution method and system
CN116977651B (en) Image denoising method based on double-branch and multi-scale feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant