CN113362223B - Image super-resolution reconstruction method based on attention mechanism and two-channel network - Google Patents
Image super-resolution reconstruction method based on attention mechanism and two-channel network Download PDFInfo
- Publication number
- CN113362223B CN113362223B CN202110573693.3A CN202110573693A CN113362223B CN 113362223 B CN113362223 B CN 113362223B CN 202110573693 A CN202110573693 A CN 202110573693A CN 113362223 B CN113362223 B CN 113362223B
- Authority
- CN
- China
- Prior art keywords
- image
- layer
- channel
- information
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 238000011176 pooling Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 19
- 230000015556 catabolic process Effects 0.000 claims description 13
- 238000006731 degradation reaction Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000002776 aggregation Effects 0.000 claims description 8
- 238000004220 aggregation Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4023—Decimation- or insertion-based scaling, e.g. pixel or line decimation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4046—Scaling the whole image or part thereof using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration by the use of local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Abstract
The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an attention mechanism and two-channel network-based image super-resolution reconstruction method, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network; the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-grade characteristics, and the other network uses an improved VGG network, so that the sizes of input and output images are consistent, abundant low-frequency characteristics are extracted, and finally, the characteristics are fused, so that the reconstructed images are clearer.
Description
Technical Field
The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an image super-resolution reconstruction method based on an attention mechanism and a dual-channel network.
Background
Image super-resolution reconstruction techniques use a set of low-quality, low-resolution images (or motion sequences) to produce high-quality, high-resolution images. Image super-resolution reconstruction is applied to various computer vision tasks including monitoring imaging, medical imaging, object recognition and the like. In real life, the method is limited by the cost of image acquisition equipment, the transmission bandwidth of a video image or the technical bottleneck of an imaging modality, and a large-size high-definition image with a sharpened edge and no block blur is not obtained conditionally every time. Against this demand, super-resolution reconstruction techniques have come to light, and the conventional problem of image super-resolution (SR) reconstruction is defined as recovering a high-resolution (HR) image from its low-resolution (LR) view. The image super-resolution reconstruction technology can improve the identification capability and the identification precision of the image, and can realize the concentration analysis of the target object, so that the image with higher spatial resolution of the region of interest can be obtained without directly adopting the configuration of the image with high spatial resolution with huge data volume. The learning-based image super-resolution reconstruction technology is a popular direction in recent years, and estimates lost high-frequency details in a low-resolution image by learning a mapping relation between the low-resolution image and a high-resolution image by means of a convolutional neural network technology of end-to-end mapping in deep learning so as to obtain a high-quality image with clear edges and rich texture details. In recent years, with the introduction of attention mechanism, more and more models are trying to improve the effect by using the attention mechanism, and in the image super-resolution reconstruction, the attention mechanism is embedded into the model to improve the accuracy of the result.
In the traditional image reconstruction method, a CNN convolutional neural network is usually adopted for calculation, a lot of smooth information can be gradually lost in the calculation process, the main purpose of image super-resolution reconstruction is to obtain a high-precision high-definition picture, and the final reconstruction precision can be influenced by the loss of any information; and the CNN convolutional neural network adopts a local receptive field mode, during calculation, the CNN convolutional neural network is limited by the size of a convolutional kernel, the obtained information is the local information of the picture, the global information of the whole picture cannot be obtained, the pixel points in the picture are related, and the information of long-distance dependence is also important dependence information of image reconstruction.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an image super-resolution reconstruction method based on an attention mechanism and a two-channel network, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network;
the process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting non-local cavity convolution to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
Preferably, the images in the data set are scaled by 2 times, 3 times, 4 times and 8 times using a bicubic interpolation degradation model.
Preferably, the formula of the bicubic interpolation degradation model is as follows:
ILR=HdnIHR+n
preferably, the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing in horizontal and vertical directions on the image; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
Preferably, the information cascade module comprises a 10-time stacked feature aggregation structure; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.
Preferably, the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
Preferably, the non-local hole convolution block includes: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved dependency information input by a residual error network by adopting cavity convolution with four different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; finally, the two kinds of fused feature information are added to obtain global feature information
Preferably, the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.
Preferably, the loss function expression of the image super-resolution reconstruction model is as follows:
preferably, the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:
the invention has the advantages that:
1. the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-level characteristics, and the other network uses an improved VGG (fine tuning parameters of VGG convolutional layers and pooling layers to ensure that the sizes of input and output images are consistent, and simultaneously abandons the last full connection layer) to extract rich low-frequency characteristics, and finally performs characteristic fusion.
2. The invention uses a dense connection mode at the specific position (2 information cascade modules at the head and the tail) of the model, aggregates the information of each convolution layer to achieve the purpose of fully utilizing the information of the convolution layers, and finally uses a channel attention mechanism to calculate the channel weight of the combined information instead of simply reducing the channel.
3. The invention uses a space attention mechanism, and adds space attention after the existing channel attention mechanism, so that the extraction of global information is more sufficient, and the utilization of characteristics is more comprehensive. Meanwhile, before upsampling, non-local hole convolution is used, and the previous result information is subjected to one-time global dependence feature extraction, so that the output result is more closely related, and the feature information is richer.
Drawings
FIG. 1 is a general structure diagram of an image super-resolution reconstruction model according to the present invention;
FIG. 2 is an information level junction diagram of the present invention;
FIG. 3 is a diagram of the residual structure of the present invention;
FIG. 4 is a diagram of the channel attention and spatial attention configurations of the present invention;
FIG. 5 is a convolution diagram of a non-local hole of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An image super-resolution reconstruction method based on an attention mechanism and a two-channel network comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the image super-resolution reconstruction model is based on a convolutional neural network.
An image super-resolution reconstruction model structure is shown in fig. 1, and comprises a deep layer feature channel, a shallow layer feature channel, an up-sampling layer and a third convolution layer; the deep feature channel comprises a first convolution layer, an information cascade module, an improved residual error module and a non-local cavity convolution block; processing an input image in an information cascade module, an improved residual error module and a non-local cavity convolution block in sequence after first convolution layer processing to obtain a deep layer characteristic diagram; the shallow feature channel comprises a second convolution layer and an improved VGG network, and the input image is processed by the second convolution layer and then is processed by the improved VGG network to obtain a shallow feature map; and fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, upsampling the fused image by adopting an upsampling layer, and performing convolution operation on the upsampled image by adopting a third convolution layer to obtain a high-definition reconstruction diagram.
Optionally, the deep feature channel includes n information cascade modules and m improved residual modules, wherein all the information cascade modules are connected in series to obtain an information cascade module group; and all the improved residual modules are connected in series to obtain an improved residual module group.
Preferably, the deep feature channel comprises 2n information cascade modules, wherein n information cascade modules are connected in series to obtain a first information cascade module group, and the rest n information cascade modules are connected in series to obtain a second information cascade module group; the first information cascade module group and the second information cascade module group are respectively arranged at the input end and the output end of the improved residual module group.
The process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting non-local cavity convolution to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
The data set adopts a DIV2K data set, wherein eight hundred high-definition (HR) pictures and low-resolution-rate (LR) pictures which are subjected to degradation models (bicubic interpolation degradation) and correspond to the HR pictures are used as training sets, and five pictures are used as verification sets. Five data sets of Set5, Set14, Urban100, Manga109 and BSD100 are used as test sets, and the test data sets are characterized in that texture information is very rich, most of texture information can be lost in degraded low-resolution pictures, and the accuracy of super-resolution reconstruction of the images is very tested. The evaluation indexes are traditional PSNR and SSIM, wherein PSNR represents peak signal-to-noise ratio, and SSIM represents structural similarity.
One forward and one backward propagation of all data in the training set in the neural network is called a round, each round updates the parameters of the model, and the maximum number of rounds is set to 1000 rounds. We set the learning rate to be updated every 200 iterations, and the model and its parameters that achieve the best results on the test data set are saved during 1000 iterations of training the model.
The times of scaling the images in the data set by adopting a bicubic interpolation degradation model in the original high-definition image data set are 2 times, 3 times, 4 times and 8 times. The formula of the degradation model is:
ILR=HdnIHR+n
wherein, ILRRepresenting low resolution images, HdnRepresenting a model of degradation, IHRRepresenting the original high resolution image and n representing additional noise.
The preprocessing process of the zoomed data set comprises the steps of enhancing the image, including the steps of translating the image and turning the image in the horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
As shown in fig. 2, the structure of the information concatenation module includes: the following structures are stacked 10 times-three layers of convolutional neural networks, a feature channel merging layer, a channel attention layer and a channel number transformation layer in sequence. The process of processing image data by the module comprises the following steps: firstly, extracting feature information of an input image in sequence by using each layer of convolutional neural network, then combining the feature information extracted by each layer of convolution on a feature channel combination layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels to the size of the number of input channels, and repeating the steps for 10 times to obtain hierarchical feature information of the aggregation convolutional layer.
The information cascade module is used for carrying out aggregation on image information, information of each layer of convolution layer is fully reserved, because the image just enters the convolution neural network, low-frequency information is sufficient and abundant, but with the deepening of the network layer number, more attention is paid to more abstract characteristics, and a lot of edge texture information and smooth information are gradually lost, so that the information cascade module can be used for well capturing more low-frequency information and fusing the low-frequency information into the model.
FIC=HIC(ILR)
Wherein, ILRRepresenting a low resolution input image, HICRepresenting convolution operations of cascaded blocks, FICRepresenting the result of the convolution calculation.
As shown in fig. 3, the structure of the improved residual module includes: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of processing image data by the module comprises the following steps: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
The output of the cascade module is used as the input of the improved residual module, a channel attention mechanism and a space attention mechanism are connected after each ResNetBlock, meanwhile, relevance on the channel and dependency information on the global space are captured and integrated into the convolutional neural network, characteristic information is enriched, and stability is provided for training of the deep network.
FRBC=HRBC(FIC)
FCA=HCA(FRBC)
FSA=HSA(FCA)
Wherein HRBCIt represents the convolution operation with the residual block structure of the input source connection, i.e. the input information is merged with the output via the residual block. FRBCRepresents the output characteristic information through the residual block, which can be expressed as [ f [ ]1,f2,f3…fn,]Respectively representing the channel characteristics calculated by each convolution kernel. Then, a channel attention mechanism is used for each channel characteristic, and the product of the weight value of each channel and the original input data, namely H, is obtainedCAConvolution operation, F, representing the attention of the channelCARepresenting the characteristic information after the attention of the channel. Then, the global dependency information and the original input number are calculated by using a space attention mechanism on the output characteristic informationAccording to fusion, i.e. HSAConvolution operations, F, representing a spatial attention mechanismSARepresenting the feature information after spatial attention.
As shown in fig. 4, the channel attention structure includes: the structure consists of a global tie pooling layer, a 1x1 convolutional layer, a nonlinear activation layer and a 1x1 convolutional layer in sequence. The process of processing image data by the module comprises the following steps: firstly, obtaining a weight representation symbol of each channel through global average pooling, then reducing the number of channels by using a 1x1 convolutional layer, introducing nonlinear information by using a nonlinear active layer, then converting the number of channels back by using a 1x1 convolutional layer, and finally multiplying the number of channels back by the original input characteristic information to obtain the relevance on the characteristic channel. The spatial attention structure includes: the structure comprises a 1x1 convolution layer, a softmax active layer, a 1x1 convolution layer and a nonlinear active layer in sequence. The process of the module processing image data comprises: firstly, converting input feature information CxHxW into a HWx1x1 global feature map through a 1x1 convolutional layer, then carrying out normalization constraint on the global feature map by using a softmax function, then multiplying the global feature map back to original input information, and finally obtaining dependency information on a global space through a layer of 1x1 convolutional layer and a layer of nonlinear activation layer.
As shown in fig. 5, the structure of the non-local hole convolution block includes: the structure comprises four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 respectively and three layers of common convolution neural network layers. The process of processing image data by the module comprises the following steps: firstly, feature information can be extracted by simultaneously using cavity convolutions of four different expansion parameters and two common convolution neural networks, then the feature information obtained by the convolution of the four cavities is fused on a feature channel on one hand, the feature information extracted by the common convolution neural networks is fused according to the value of a pixel matrix on the other hand, and finally the two kinds of fused feature information are added to obtain global feature information.
The improved VGG network structure comprises: embedding each pooling layer into the common convolutional layer to obtain a VGG network structure; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.
And extracting global features by using non-local cavity convolution, and up-sampling the extracted feature information to expand the feature information into a size output result required by people. The cavity convolution can enlarge the receptive field without increasing parameters by setting the expansion rate, and the receptive field is embedded into the non-local convolution, so that the calculated amount can be obviously reduced, and meanwhile, global information can be obtained from different scales, and the extraction of the features is more comprehensive.
FNLHC=HNLHC(FSA)
Wherein HNLHCConvolution operations representing convolution of non-local holes, FNLHCRepresenting the characteristic information obtained after convolution of the non-local holes. After the final feature information is subjected to upsampling, the final feature information is output as a corresponding high-definition reconstructed image, namely the formula of the reconstructed image is as follows:
FUp=HUp(FNLHC)
wherein HUpRepresenting convolution operations of the upsampling, FUpRepresenting the upsampled output characteristics.
The loss function expression of the image super-resolution reconstruction model is as follows:
where θ represents the number of parameters of the model, CHRThe super-resolution calculation equation is expressed,andrespectively, the ith low-resolution image and the ith corresponding high-resolution image, N represents the number of images in the data set, HR represents high resolution, and LR represents low resolution.
The expression of the super-resolution calculation equation is as follows:
CHR=FUP(FNLHC(FSA(FCA(FRBC(FIC(ILR))))))
wherein, FUPRepresenting the up-sampled output information, FNLHCInformation representing convolution extraction of non-local holes, FSAInformation extracted representing a spatial attention mechanism, FCAInformation representing the channel attention mechanism extraction, FRBCInformation representing the extraction of residual blocks, FICRepresenting the information output by the cascaded modules.
Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) were used as the result evaluation indices:
where MSE represents the mean square error, MAX represents the maximum value in the pixel values, μXAnd muYMeans, σ, representing the mean of the pixels of image X, image YXAnd σYStandard value, σ, of pixel representing image X, image YXYRepresenting the covariance of image X and image Y.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An image super-resolution reconstruction method based on an attention mechanism and a two-channel network is characterized by comprising the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result;
the process of training the image super-resolution reconstruction model comprises the following steps:
s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;
s2: preprocessing the zoomed data set to obtain a training data set;
s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;
s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;
s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;
s6: adopting a non-local cavity rolling block to carry out global feature extraction on the dependence information to obtain a final deep feature map;
s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;
s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;
s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.
2. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the magnification of the picture in the data set by using bicubic interpolation degradation model is 2 times, 3 times, 4 times and 8 times.
3. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula of the bicubic interpolation degradation model is as follows:
ILR=HdnIHR+n
wherein, ILRRepresenting low resolution images, HdnRepresents a model of degradation, IHRRepresenting the original high resolution image and n representing additional noise.
4. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing on the image in horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.
5. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the information cascade module comprises stacking 10 times feature aggregation structures; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.
6. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.
7. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the non-local hole convolution block comprises: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved residual error network input dependency information by adopting four cavity convolution with different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; and finally, adding the two kinds of fused feature information to obtain global feature information.
8. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information by using 3-layer convolution to obtain 64 channels; wherein the pooling layer maintains the feature dimension unchanged using padding.
9. The method for image super-resolution reconstruction based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the loss function expression of the image super-resolution reconstruction model is as follows:
where θ represents the number of parameters of the model, CHRThe super-resolution calculation equation is expressed,andrespectively, the ith low-resolution image and the ith corresponding high-resolution image, N the number of images in the data set, HR the high resolution, and LR the low resolution.
10. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:
wherein PSNR represents peak signal-to-noise ratio, MSE represents mean square error, MAX represents maximum value in pixel values, SSIM represents structural similarity, and μXAnd muYMean values, σ, of pixels representing image X and image Y, respectivelyXAnd σYStandard values, σ, representing pixels of image X and image Y, respectivelyXYRepresenting the covariance of image X and image Y.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110573693.3A CN113362223B (en) | 2021-05-25 | 2021-05-25 | Image super-resolution reconstruction method based on attention mechanism and two-channel network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110573693.3A CN113362223B (en) | 2021-05-25 | 2021-05-25 | Image super-resolution reconstruction method based on attention mechanism and two-channel network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113362223A CN113362223A (en) | 2021-09-07 |
CN113362223B true CN113362223B (en) | 2022-06-24 |
Family
ID=77527539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110573693.3A Active CN113362223B (en) | 2021-05-25 | 2021-05-25 | Image super-resolution reconstruction method based on attention mechanism and two-channel network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113362223B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113034408B (en) * | 2021-04-30 | 2022-08-12 | 广东工业大学 | Infrared thermal imaging deep learning image denoising method and device |
CN114283486B (en) * | 2021-12-20 | 2022-10-28 | 北京百度网讯科技有限公司 | Image processing method, model training method, image processing device, model training device, image recognition method, model training device, image recognition device and storage medium |
CN114332592B (en) * | 2022-03-11 | 2022-06-21 | 中国海洋大学 | Ocean environment data fusion method and system based on attention mechanism |
CN114638762A (en) * | 2022-03-24 | 2022-06-17 | 华南理工大学 | Modularized hyperspectral image scene self-adaptive panchromatic sharpening method |
CN114882203A (en) * | 2022-05-20 | 2022-08-09 | 周莉莎 | Image super-resolution reconstruction method for power grid inspection robot |
WO2024007160A1 (en) * | 2022-07-05 | 2024-01-11 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Convolutional neural network (cnn) filter for super-resolution with reference picture resampling (rpr) functionality |
CN115082317B (en) * | 2022-07-11 | 2023-04-07 | 四川轻化工大学 | Image super-resolution reconstruction method for attention mechanism enhancement |
CN115170398B (en) * | 2022-07-11 | 2023-10-13 | 重庆芸山实业有限公司 | Image super-resolution reconstruction method and device for chrysanthemum storage warehouse |
CN115511748A (en) * | 2022-09-30 | 2022-12-23 | 北京航星永志科技有限公司 | Image high-definition processing method and device and electronic equipment |
CN116129143B (en) * | 2023-02-08 | 2023-09-08 | 山东省人工智能研究院 | Edge broad extraction method based on series-parallel network feature fusion |
CN116523800B (en) * | 2023-07-03 | 2023-09-22 | 南京邮电大学 | Image noise reduction model and method based on residual dense network and attention mechanism |
CN117274064B (en) * | 2023-11-15 | 2024-04-02 | 中国科学技术大学 | Image super-resolution method |
CN117576573A (en) * | 2024-01-16 | 2024-02-20 | 广州航海学院 | Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model |
CN117788477A (en) * | 2024-02-27 | 2024-03-29 | 贵州健易测科技有限公司 | Image reconstruction method and device for automatically quantifying tea leaf curl |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827295A (en) * | 2019-10-31 | 2020-02-21 | 北京航空航天大学青岛研究院 | Three-dimensional semantic segmentation method based on coupling of voxel model and color information |
CN111414888A (en) * | 2020-03-31 | 2020-07-14 | 杭州博雅鸿图视频技术有限公司 | Low-resolution face recognition method, system, device and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120008868A1 (en) * | 2010-07-08 | 2012-01-12 | Compusensor Technology Corp. | Video Image Event Attention and Analysis System and Method |
CN107437100A (en) * | 2017-08-08 | 2017-12-05 | 重庆邮电大学 | A kind of picture position Forecasting Methodology based on the association study of cross-module state |
CN110570416B (en) * | 2019-09-12 | 2020-06-30 | 杭州海睿博研科技有限公司 | Method for visualization and 3D printing of multi-modal cardiac images |
-
2021
- 2021-05-25 CN CN202110573693.3A patent/CN113362223B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827295A (en) * | 2019-10-31 | 2020-02-21 | 北京航空航天大学青岛研究院 | Three-dimensional semantic segmentation method based on coupling of voxel model and color information |
CN111414888A (en) * | 2020-03-31 | 2020-07-14 | 杭州博雅鸿图视频技术有限公司 | Low-resolution face recognition method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113362223A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113362223B (en) | Image super-resolution reconstruction method based on attention mechanism and two-channel network | |
CN110706157B (en) | Face super-resolution reconstruction method for generating confrontation network based on identity prior | |
CN110119780B (en) | Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network | |
CN109859106B (en) | Image super-resolution reconstruction method of high-order fusion network based on self-attention | |
CN111861961B (en) | Single image super-resolution multi-scale residual error fusion model and restoration method thereof | |
CN112734646B (en) | Image super-resolution reconstruction method based on feature channel division | |
CN111028150A (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN111986108A (en) | Complex sea-air scene image defogging method based on generation countermeasure network | |
CN112288627A (en) | Recognition-oriented low-resolution face image super-resolution method | |
CN111861884A (en) | Satellite cloud image super-resolution reconstruction method based on deep learning | |
CN113538243B (en) | Super-resolution image reconstruction method based on multi-parallax attention module combination | |
CN112699844A (en) | Image super-resolution method based on multi-scale residual error level dense connection network | |
CN116664397B (en) | TransSR-Net structured image super-resolution reconstruction method | |
CN113139489A (en) | Crowd counting method and system based on background extraction and multi-scale fusion network | |
Luo et al. | Bi-GANs-ST for perceptual image super-resolution | |
CN111640116A (en) | Aerial photography graph building segmentation method and device based on deep convolutional residual error network | |
CN112906675B (en) | Method and system for detecting non-supervision human body key points in fixed scene | |
CN112102388B (en) | Method and device for obtaining depth image based on inspection robot monocular image | |
CN111080533B (en) | Digital zooming method based on self-supervision residual sensing network | |
CN112418229A (en) | Unmanned ship marine scene image real-time segmentation method based on deep learning | |
CN115131206A (en) | Semantic understanding-based satellite video super-resolution reconstruction method and system | |
CN110853040B (en) | Image collaborative segmentation method based on super-resolution reconstruction | |
CN113240589A (en) | Image defogging method and system based on multi-scale feature fusion | |
CN111951177B (en) | Infrared image detail enhancement method based on image super-resolution loss function | |
CN115797183B (en) | Image super-resolution reconstruction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240111 Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province Patentee after: Dragon totem Technology (Hefei) Co.,Ltd. Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS |