CN113362223B - Image super-resolution reconstruction method based on attention mechanism and two-channel network - Google Patents

Image super-resolution reconstruction method based on attention mechanism and two-channel network Download PDF

Info

Publication number
CN113362223B
CN113362223B CN202110573693.3A CN202110573693A CN113362223B CN 113362223 B CN113362223 B CN 113362223B CN 202110573693 A CN202110573693 A CN 202110573693A CN 113362223 B CN113362223 B CN 113362223B
Authority
CN
China
Prior art keywords
image
layer
channel
information
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110573693.3A
Other languages
Chinese (zh)
Other versions
CN113362223A (en
Inventor
张旭
何涛
夏英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110573693.3A priority Critical patent/CN113362223B/en
Publication of CN113362223A publication Critical patent/CN113362223A/en
Application granted granted Critical
Publication of CN113362223B publication Critical patent/CN113362223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4023Decimation- or insertion-based scaling, e.g. pixel or line decimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an attention mechanism and two-channel network-based image super-resolution reconstruction method, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network; the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-grade characteristics, and the other network uses an improved VGG network, so that the sizes of input and output images are consistent, abundant low-frequency characteristics are extracted, and finally, the characteristics are fused, so that the reconstructed images are clearer.

Description

Image super-resolution reconstruction method based on attention mechanism and two-channel network
Technical Field
The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an image super-resolution reconstruction method based on an attention mechanism and a dual-channel network.
Background
Image super-resolution reconstruction techniques use a set of low-quality, low-resolution images (or motion sequences) to produce high-quality, high-resolution images. Image super-resolution reconstruction is applied to various computer vision tasks including monitoring imaging, medical imaging, object recognition and the like. In real life, the method is limited by the cost of image acquisition equipment, the transmission bandwidth of a video image or the technical bottleneck of an imaging modality, and a large-size high-definition image with a sharpened edge and no block blur is not obtained conditionally every time. Against this demand, super-resolution reconstruction techniques have come to light, and the conventional problem of image super-resolution (SR) reconstruction is defined as recovering a high-resolution (HR) image from its low-resolution (LR) view. The image super-resolution reconstruction technology can improve the identification capability and the identification precision of the image, and can realize the concentration analysis of the target object, so that the image with higher spatial resolution of the region of interest can be obtained without directly adopting the configuration of the image with high spatial resolution with huge data volume. The learning-based image super-resolution reconstruction technology is a popular direction in recent years, and estimates lost high-frequency details in a low-resolution image by learning a mapping relation between the low-resolution image and a high-resolution image by means of a convolutional neural network technology of end-to-end mapping in deep learning so as to obtain a high-quality image with clear edges and rich texture details. In recent years, with the introduction of attention mechanism, more and more models are trying to improve the effect by using the attention mechanism, and in the image super-resolution reconstruction, the attention mechanism is embedded into the model to improve the accuracy of the result.
In the traditional image reconstruction method, a CNN convolutional neural network is usually adopted for calculation, a lot of smooth information can be gradually lost in the calculation process, the main purpose of image super-resolution reconstruction is to obtain a high-precision high-definition picture, and the final reconstruction precision can be influenced by the loss of any information; and the CNN convolutional neural network adopts a local receptive field mode, during calculation, the CNN convolutional neural network is limited by the size of a convolutional kernel, the obtained information is the local information of the picture, the global information of the whole picture cannot be obtained, the pixel points in the picture are related, and the information of long-distance dependence is also important dependence information of image reconstruction.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an image super-resolution reconstruction method based on an attention mechanism and a two-channel network, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network;
the process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting non-local cavity convolution to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
Preferably, the images in the data set are scaled by 2 times, 3 times, 4 times and 8 times using a bicubic interpolation degradation model.
Preferably, the formula of the bicubic interpolation degradation model is as follows:
ILR=HdnIHR+n
preferably, the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing in horizontal and vertical directions on the image; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
Preferably, the information cascade module comprises a 10-time stacked feature aggregation structure; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.
Preferably, the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
Preferably, the non-local hole convolution block includes: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved dependency information input by a residual error network by adopting cavity convolution with four different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; finally, the two kinds of fused feature information are added to obtain global feature information
Preferably, the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.
Preferably, the loss function expression of the image super-resolution reconstruction model is as follows:
Figure BDA0003083483500000041
preferably, the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:
Figure BDA0003083483500000042
Figure BDA0003083483500000043
the invention has the advantages that:
1. the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-level characteristics, and the other network uses an improved VGG (fine tuning parameters of VGG convolutional layers and pooling layers to ensure that the sizes of input and output images are consistent, and simultaneously abandons the last full connection layer) to extract rich low-frequency characteristics, and finally performs characteristic fusion.
2. The invention uses a dense connection mode at the specific position (2 information cascade modules at the head and the tail) of the model, aggregates the information of each convolution layer to achieve the purpose of fully utilizing the information of the convolution layers, and finally uses a channel attention mechanism to calculate the channel weight of the combined information instead of simply reducing the channel.
3. The invention uses a space attention mechanism, and adds space attention after the existing channel attention mechanism, so that the extraction of global information is more sufficient, and the utilization of characteristics is more comprehensive. Meanwhile, before upsampling, non-local hole convolution is used, and the previous result information is subjected to one-time global dependence feature extraction, so that the output result is more closely related, and the feature information is richer.
Drawings
FIG. 1 is a general structure diagram of an image super-resolution reconstruction model according to the present invention;
FIG. 2 is an information level junction diagram of the present invention;
FIG. 3 is a diagram of the residual structure of the present invention;
FIG. 4 is a diagram of the channel attention and spatial attention configurations of the present invention;
FIG. 5 is a convolution diagram of a non-local hole of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An image super-resolution reconstruction method based on an attention mechanism and a two-channel network comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the image super-resolution reconstruction model is based on a convolutional neural network.
An image super-resolution reconstruction model structure is shown in fig. 1, and comprises a deep layer feature channel, a shallow layer feature channel, an up-sampling layer and a third convolution layer; the deep feature channel comprises a first convolution layer, an information cascade module, an improved residual error module and a non-local cavity convolution block; processing an input image in an information cascade module, an improved residual error module and a non-local cavity convolution block in sequence after first convolution layer processing to obtain a deep layer characteristic diagram; the shallow feature channel comprises a second convolution layer and an improved VGG network, and the input image is processed by the second convolution layer and then is processed by the improved VGG network to obtain a shallow feature map; and fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, upsampling the fused image by adopting an upsampling layer, and performing convolution operation on the upsampled image by adopting a third convolution layer to obtain a high-definition reconstruction diagram.
Optionally, the deep feature channel includes n information cascade modules and m improved residual modules, wherein all the information cascade modules are connected in series to obtain an information cascade module group; and all the improved residual modules are connected in series to obtain an improved residual module group.
Preferably, the deep feature channel comprises 2n information cascade modules, wherein n information cascade modules are connected in series to obtain a first information cascade module group, and the rest n information cascade modules are connected in series to obtain a second information cascade module group; the first information cascade module group and the second information cascade module group are respectively arranged at the input end and the output end of the improved residual module group.
The process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting non-local cavity convolution to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
The data set adopts a DIV2K data set, wherein eight hundred high-definition (HR) pictures and low-resolution-rate (LR) pictures which are subjected to degradation models (bicubic interpolation degradation) and correspond to the HR pictures are used as training sets, and five pictures are used as verification sets. Five data sets of Set5, Set14, Urban100, Manga109 and BSD100 are used as test sets, and the test data sets are characterized in that texture information is very rich, most of texture information can be lost in degraded low-resolution pictures, and the accuracy of super-resolution reconstruction of the images is very tested. The evaluation indexes are traditional PSNR and SSIM, wherein PSNR represents peak signal-to-noise ratio, and SSIM represents structural similarity.
One forward and one backward propagation of all data in the training set in the neural network is called a round, each round updates the parameters of the model, and the maximum number of rounds is set to 1000 rounds. We set the learning rate to be updated every 200 iterations, and the model and its parameters that achieve the best results on the test data set are saved during 1000 iterations of training the model.
The times of scaling the images in the data set by adopting a bicubic interpolation degradation model in the original high-definition image data set are 2 times, 3 times, 4 times and 8 times. The formula of the degradation model is:
ILR=HdnIHR+n
wherein, ILRRepresenting low resolution images, HdnRepresenting a model of degradation, IHRRepresenting the original high resolution image and n representing additional noise.
The preprocessing process of the zoomed data set comprises the steps of enhancing the image, including the steps of translating the image and turning the image in the horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
As shown in fig. 2, the structure of the information concatenation module includes: the following structures are stacked 10 times-three layers of convolutional neural networks, a feature channel merging layer, a channel attention layer and a channel number transformation layer in sequence. The process of processing image data by the module comprises the following steps: firstly, extracting feature information of an input image in sequence by using each layer of convolutional neural network, then combining the feature information extracted by each layer of convolution on a feature channel combination layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels to the size of the number of input channels, and repeating the steps for 10 times to obtain hierarchical feature information of the aggregation convolutional layer.
The information cascade module is used for carrying out aggregation on image information, information of each layer of convolution layer is fully reserved, because the image just enters the convolution neural network, low-frequency information is sufficient and abundant, but with the deepening of the network layer number, more attention is paid to more abstract characteristics, and a lot of edge texture information and smooth information are gradually lost, so that the information cascade module can be used for well capturing more low-frequency information and fusing the low-frequency information into the model.
FIC=HIC(ILR)
Wherein, ILRRepresenting a low resolution input image, HICRepresenting convolution operations of cascaded blocks, FICRepresenting the result of the convolution calculation.
As shown in fig. 3, the structure of the improved residual module includes: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of processing image data by the module comprises the following steps: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
The output of the cascade module is used as the input of the improved residual module, a channel attention mechanism and a space attention mechanism are connected after each ResNetBlock, meanwhile, relevance on the channel and dependency information on the global space are captured and integrated into the convolutional neural network, characteristic information is enriched, and stability is provided for training of the deep network.
FRBC=HRBC(FIC)
FCA=HCA(FRBC)
FSA=HSA(FCA)
Wherein HRBCIt represents the convolution operation with the residual block structure of the input source connection, i.e. the input information is merged with the output via the residual block. FRBCRepresents the output characteristic information through the residual block, which can be expressed as [ f [ ]1,f2,f3…fn,]Respectively representing the channel characteristics calculated by each convolution kernel. Then, a channel attention mechanism is used for each channel characteristic, and the product of the weight value of each channel and the original input data, namely H, is obtainedCAConvolution operation, F, representing the attention of the channelCARepresenting the characteristic information after the attention of the channel. Then, the global dependency information and the original input number are calculated by using a space attention mechanism on the output characteristic informationAccording to fusion, i.e. HSAConvolution operations, F, representing a spatial attention mechanismSARepresenting the feature information after spatial attention.
As shown in fig. 4, the channel attention structure includes: the structure consists of a global tie pooling layer, a 1x1 convolutional layer, a nonlinear activation layer and a 1x1 convolutional layer in sequence. The process of processing image data by the module comprises the following steps: firstly, obtaining a weight representation symbol of each channel through global average pooling, then reducing the number of channels by using a 1x1 convolutional layer, introducing nonlinear information by using a nonlinear active layer, then converting the number of channels back by using a 1x1 convolutional layer, and finally multiplying the number of channels back by the original input characteristic information to obtain the relevance on the characteristic channel. The spatial attention structure includes: the structure comprises a 1x1 convolution layer, a softmax active layer, a 1x1 convolution layer and a nonlinear active layer in sequence. The process of the module processing image data comprises: firstly, converting input feature information CxHxW into a HWx1x1 global feature map through a 1x1 convolutional layer, then carrying out normalization constraint on the global feature map by using a softmax function, then multiplying the global feature map back to original input information, and finally obtaining dependency information on a global space through a layer of 1x1 convolutional layer and a layer of nonlinear activation layer.
As shown in fig. 5, the structure of the non-local hole convolution block includes: the structure comprises four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 respectively and three layers of common convolution neural network layers. The process of processing image data by the module comprises the following steps: firstly, feature information can be extracted by simultaneously using cavity convolutions of four different expansion parameters and two common convolution neural networks, then the feature information obtained by the convolution of the four cavities is fused on a feature channel on one hand, the feature information extracted by the common convolution neural networks is fused according to the value of a pixel matrix on the other hand, and finally the two kinds of fused feature information are added to obtain global feature information.
The improved VGG network structure comprises: embedding each pooling layer into the common convolutional layer to obtain a VGG network structure; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.
And extracting global features by using non-local cavity convolution, and up-sampling the extracted feature information to expand the feature information into a size output result required by people. The cavity convolution can enlarge the receptive field without increasing parameters by setting the expansion rate, and the receptive field is embedded into the non-local convolution, so that the calculated amount can be obviously reduced, and meanwhile, global information can be obtained from different scales, and the extraction of the features is more comprehensive.
FNLHC=HNLHC(FSA)
Wherein HNLHCConvolution operations representing convolution of non-local holes, FNLHCRepresenting the characteristic information obtained after convolution of the non-local holes. After the final feature information is subjected to upsampling, the final feature information is output as a corresponding high-definition reconstructed image, namely the formula of the reconstructed image is as follows:
FUp=HUp(FNLHC)
wherein HUpRepresenting convolution operations of the upsampling, FUpRepresenting the upsampled output characteristics.
The loss function expression of the image super-resolution reconstruction model is as follows:
Figure BDA0003083483500000101
where θ represents the number of parameters of the model, CHRThe super-resolution calculation equation is expressed,
Figure BDA0003083483500000102
and
Figure BDA0003083483500000103
respectively, the ith low-resolution image and the ith corresponding high-resolution image, N represents the number of images in the data set, HR represents high resolution, and LR represents low resolution.
The expression of the super-resolution calculation equation is as follows:
CHR=FUP(FNLHC(FSA(FCA(FRBC(FIC(ILR))))))
wherein, FUPRepresenting the up-sampled output information, FNLHCInformation representing convolution extraction of non-local holes, FSAInformation extracted representing a spatial attention mechanism, FCAInformation representing the channel attention mechanism extraction, FRBCInformation representing the extraction of residual blocks, FICRepresenting the information output by the cascaded modules.
Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) were used as the result evaluation indices:
Figure BDA0003083483500000104
Figure BDA0003083483500000105
where MSE represents the mean square error, MAX represents the maximum value in the pixel values, μXAnd muYMeans, σ, representing the mean of the pixels of image X, image YXAnd σYStandard value, σ, of pixel representing image X, image YXYRepresenting the covariance of image X and image Y.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An image super-resolution reconstruction method based on an attention mechanism and a two-channel network is characterized by comprising the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result;
the process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting a non-local cavity rolling block to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
2. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the magnification of the picture in the data set by using bicubic interpolation degradation model is 2 times, 3 times, 4 times and 8 times.
3. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula of the bicubic interpolation degradation model is as follows:
ILR=HdnIHR+n
wherein, ILRRepresenting low resolution images, HdnRepresents a model of degradation, IHRRepresenting the original high resolution image and n representing additional noise.
4. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing on the image in horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
5. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the information cascade module comprises stacking 10 times feature aggregation structures; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.
6. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
7. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the non-local hole convolution block comprises: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved residual error network input dependency information by adopting four cavity convolution with different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; and finally, adding the two kinds of fused feature information to obtain global feature information.
8. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information by using 3-layer convolution to obtain 64 channels; wherein the pooling layer maintains the feature dimension unchanged using padding.
9. The method for image super-resolution reconstruction based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the loss function expression of the image super-resolution reconstruction model is as follows:
Figure FDA0003637868900000031
where θ represents the number of parameters of the model, CHRThe super-resolution calculation equation is expressed,
Figure FDA0003637868900000032
and
Figure FDA0003637868900000033
respectively, the ith low-resolution image and the ith corresponding high-resolution image, N the number of images in the data set, HR the high resolution, and LR the low resolution.
10. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:
Figure FDA0003637868900000034
Figure FDA0003637868900000041
wherein PSNR represents peak signal-to-noise ratio, MSE represents mean square error, MAX represents maximum value in pixel values, SSIM represents structural similarity, and μXAnd muYMean values, σ, of pixels representing image X and image Y, respectivelyXAnd σYStandard values, σ, representing pixels of image X and image Y, respectivelyXYRepresenting the covariance of image X and image Y.
CN202110573693.3A 2021-05-25 2021-05-25 Image super-resolution reconstruction method based on attention mechanism and two-channel network Active CN113362223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110573693.3A CN113362223B (en) 2021-05-25 2021-05-25 Image super-resolution reconstruction method based on attention mechanism and two-channel network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110573693.3A CN113362223B (en) 2021-05-25 2021-05-25 Image super-resolution reconstruction method based on attention mechanism and two-channel network

Publications (2)

Publication Number Publication Date
CN113362223A CN113362223A (en) 2021-09-07
CN113362223B true CN113362223B (en) 2022-06-24

Family

ID=77527539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110573693.3A Active CN113362223B (en) 2021-05-25 2021-05-25 Image super-resolution reconstruction method based on attention mechanism and two-channel network

Country Status (1)

Country Link
CN (1) CN113362223B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034408B (en) * 2021-04-30 2022-08-12 广东工业大学 Infrared thermal imaging deep learning image denoising method and device
CN114283486B (en) * 2021-12-20 2022-10-28 北京百度网讯科技有限公司 Image processing method, model training method, image processing device, model training device, image recognition method, model training device, image recognition device and storage medium
CN114332592B (en) * 2022-03-11 2022-06-21 中国海洋大学 Ocean environment data fusion method and system based on attention mechanism
CN114638762A (en) * 2022-03-24 2022-06-17 华南理工大学 Modularized hyperspectral image scene self-adaptive panchromatic sharpening method
CN114882203A (en) * 2022-05-20 2022-08-09 周莉莎 Image super-resolution reconstruction method for power grid inspection robot
WO2024007160A1 (en) * 2022-07-05 2024-01-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Convolutional neural network (cnn) filter for super-resolution with reference picture resampling (rpr) functionality
CN115082317B (en) * 2022-07-11 2023-04-07 四川轻化工大学 Image super-resolution reconstruction method for attention mechanism enhancement
CN115170398B (en) * 2022-07-11 2023-10-13 重庆芸山实业有限公司 Image super-resolution reconstruction method and device for chrysanthemum storage warehouse
CN115511748A (en) * 2022-09-30 2022-12-23 北京航星永志科技有限公司 Image high-definition processing method and device and electronic equipment
CN116129143B (en) * 2023-02-08 2023-09-08 山东省人工智能研究院 Edge broad extraction method based on series-parallel network feature fusion
CN116523800B (en) * 2023-07-03 2023-09-22 南京邮电大学 Image noise reduction model and method based on residual dense network and attention mechanism
CN117274064B (en) * 2023-11-15 2024-04-02 中国科学技术大学 Image super-resolution method
CN117576573A (en) * 2024-01-16 2024-02-20 广州航海学院 Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model
CN117788477A (en) * 2024-02-27 2024-03-29 贵州健易测科技有限公司 Image reconstruction method and device for automatically quantifying tea leaf curl

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827295A (en) * 2019-10-31 2020-02-21 北京航空航天大学青岛研究院 Three-dimensional semantic segmentation method based on coupling of voxel model and color information
CN111414888A (en) * 2020-03-31 2020-07-14 杭州博雅鸿图视频技术有限公司 Low-resolution face recognition method, system, device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120008868A1 (en) * 2010-07-08 2012-01-12 Compusensor Technology Corp. Video Image Event Attention and Analysis System and Method
CN107437100A (en) * 2017-08-08 2017-12-05 重庆邮电大学 A kind of picture position Forecasting Methodology based on the association study of cross-module state
CN110570416B (en) * 2019-09-12 2020-06-30 杭州海睿博研科技有限公司 Method for visualization and 3D printing of multi-modal cardiac images

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827295A (en) * 2019-10-31 2020-02-21 北京航空航天大学青岛研究院 Three-dimensional semantic segmentation method based on coupling of voxel model and color information
CN111414888A (en) * 2020-03-31 2020-07-14 杭州博雅鸿图视频技术有限公司 Low-resolution face recognition method, system, device and storage medium

Also Published As

Publication number Publication date
CN113362223A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN113362223B (en) Image super-resolution reconstruction method based on attention mechanism and two-channel network
CN110706157B (en) Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN110119780B (en) Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network
CN109859106B (en) Image super-resolution reconstruction method of high-order fusion network based on self-attention
CN111861961B (en) Single image super-resolution multi-scale residual error fusion model and restoration method thereof
CN112734646B (en) Image super-resolution reconstruction method based on feature channel division
CN111028150A (en) Rapid space-time residual attention video super-resolution reconstruction method
CN111986108A (en) Complex sea-air scene image defogging method based on generation countermeasure network
CN112288627A (en) Recognition-oriented low-resolution face image super-resolution method
CN111861884A (en) Satellite cloud image super-resolution reconstruction method based on deep learning
CN113538243B (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN112699844A (en) Image super-resolution method based on multi-scale residual error level dense connection network
CN116664397B (en) TransSR-Net structured image super-resolution reconstruction method
CN113139489A (en) Crowd counting method and system based on background extraction and multi-scale fusion network
Luo et al. Bi-GANs-ST for perceptual image super-resolution
CN111640116A (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN112906675B (en) Method and system for detecting non-supervision human body key points in fixed scene
CN112102388B (en) Method and device for obtaining depth image based on inspection robot monocular image
CN111080533B (en) Digital zooming method based on self-supervision residual sensing network
CN112418229A (en) Unmanned ship marine scene image real-time segmentation method based on deep learning
CN115131206A (en) Semantic understanding-based satellite video super-resolution reconstruction method and system
CN110853040B (en) Image collaborative segmentation method based on super-resolution reconstruction
CN113240589A (en) Image defogging method and system based on multi-scale feature fusion
CN111951177B (en) Infrared image detail enhancement method based on image super-resolution loss function
CN115797183B (en) Image super-resolution reconstruction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240111

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS