CN113362223B

CN113362223B - Image super-resolution reconstruction method based on attention mechanism and two-channel network

Info

Publication number: CN113362223B
Application number: CN202110573693.3A
Authority: CN
Inventors: 张旭; 何涛; 夏英
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Dragon Totem Technology Hefei Co ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2022-06-24
Anticipated expiration: 2041-05-25
Also published as: CN113362223A

Abstract

The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an attention mechanism and two-channel network-based image super-resolution reconstruction method, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network; the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-grade characteristics, and the other network uses an improved VGG network, so that the sizes of input and output images are consistent, abundant low-frequency characteristics are extracted, and finally, the characteristics are fused, so that the reconstructed images are clearer.

Description

Image super-resolution reconstruction method based on attention mechanism and two-channel network

Technical Field

The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an image super-resolution reconstruction method based on an attention mechanism and a dual-channel network.

Background

Image super-resolution reconstruction techniques use a set of low-quality, low-resolution images (or motion sequences) to produce high-quality, high-resolution images. Image super-resolution reconstruction is applied to various computer vision tasks including monitoring imaging, medical imaging, object recognition and the like. In real life, the method is limited by the cost of image acquisition equipment, the transmission bandwidth of a video image or the technical bottleneck of an imaging modality, and a large-size high-definition image with a sharpened edge and no block blur is not obtained conditionally every time. Against this demand, super-resolution reconstruction techniques have come to light, and the conventional problem of image super-resolution (SR) reconstruction is defined as recovering a high-resolution (HR) image from its low-resolution (LR) view. The image super-resolution reconstruction technology can improve the identification capability and the identification precision of the image, and can realize the concentration analysis of the target object, so that the image with higher spatial resolution of the region of interest can be obtained without directly adopting the configuration of the image with high spatial resolution with huge data volume. The learning-based image super-resolution reconstruction technology is a popular direction in recent years, and estimates lost high-frequency details in a low-resolution image by learning a mapping relation between the low-resolution image and a high-resolution image by means of a convolutional neural network technology of end-to-end mapping in deep learning so as to obtain a high-quality image with clear edges and rich texture details. In recent years, with the introduction of attention mechanism, more and more models are trying to improve the effect by using the attention mechanism, and in the image super-resolution reconstruction, the attention mechanism is embedded into the model to improve the accuracy of the result.

In the traditional image reconstruction method, a CNN convolutional neural network is usually adopted for calculation, a lot of smooth information can be gradually lost in the calculation process, the main purpose of image super-resolution reconstruction is to obtain a high-precision high-definition picture, and the final reconstruction precision can be influenced by the loss of any information; and the CNN convolutional neural network adopts a local receptive field mode, during calculation, the CNN convolutional neural network is limited by the size of a convolutional kernel, the obtained information is the local information of the picture, the global information of the whole picture cannot be obtained, the pixel points in the picture are related, and the information of long-distance dependence is also important dependence information of image reconstruction.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an image super-resolution reconstruction method based on an attention mechanism and a two-channel network, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network;

the process of training the image super-resolution reconstruction model comprises the following steps:

s1: obtaining an original high-definition picture data set, and zooming pictures in the data set by adopting a bicubic interpolation degradation model;

s2: preprocessing the zoomed data set to obtain a training data set;

s3: inputting each image data in the training data set into a shallow layer characteristic channel and a deep layer characteristic channel in an image super-resolution reconstruction model respectively for characteristic extraction;

s4: extracting initial features of the input image by adopting the first convolution layer; inputting the initial characteristics into an information cascade module, and aggregating the hierarchical characteristic information of the convolutional layer;

s5: inputting hierarchical characteristic information aggregated by the information cascade module into an improved residual error module to obtain relevance on a channel and dependency information on a global space;

s6: adopting non-local cavity convolution to carry out global feature extraction on the dependence information to obtain a final deep feature map;

s7: extracting initial features of the input image by adopting the second convolution layer; inputting the initial features into an improved VGG network, and extracting shallow features of the image to obtain a shallow feature map;

s8: fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, and performing up-sampling on the fused characteristic diagram to obtain a high-definition reconstruction diagram;

s9: and (5) constraining the difference between the high-definition reconstructed image and the original high-definition image by using a loss function, and continuously adjusting the parameters of the model until the model is converged to finish the training of the model.

Preferably, the images in the data set are scaled by 2 times, 3 times, 4 times and 8 times using a bicubic interpolation degradation model.

Preferably, the formula of the bicubic interpolation degradation model is as follows:

I^LR＝H_dnI^HR+n

preferably, the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing in horizontal and vertical directions on the image; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.

Preferably, the information cascade module comprises a 10-time stacked feature aggregation structure; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.

Preferably, the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.

Preferably, the non-local hole convolution block includes: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved dependency information input by a residual error network by adopting cavity convolution with four different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; finally, the two kinds of fused feature information are added to obtain global feature information

Preferably, the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.

Preferably, the loss function expression of the image super-resolution reconstruction model is as follows:

preferably, the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:

the invention has the advantages that:

1. the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-level characteristics, and the other network uses an improved VGG (fine tuning parameters of VGG convolutional layers and pooling layers to ensure that the sizes of input and output images are consistent, and simultaneously abandons the last full connection layer) to extract rich low-frequency characteristics, and finally performs characteristic fusion.

2. The invention uses a dense connection mode at the specific position (2 information cascade modules at the head and the tail) of the model, aggregates the information of each convolution layer to achieve the purpose of fully utilizing the information of the convolution layers, and finally uses a channel attention mechanism to calculate the channel weight of the combined information instead of simply reducing the channel.

3. The invention uses a space attention mechanism, and adds space attention after the existing channel attention mechanism, so that the extraction of global information is more sufficient, and the utilization of characteristics is more comprehensive. Meanwhile, before upsampling, non-local hole convolution is used, and the previous result information is subjected to one-time global dependence feature extraction, so that the output result is more closely related, and the feature information is richer.

Drawings

FIG. 1 is a general structure diagram of an image super-resolution reconstruction model according to the present invention;

FIG. 2 is an information level junction diagram of the present invention;

FIG. 3 is a diagram of the residual structure of the present invention;

FIG. 4 is a diagram of the channel attention and spatial attention configurations of the present invention;

FIG. 5 is a convolution diagram of a non-local hole of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An image super-resolution reconstruction method based on an attention mechanism and a two-channel network comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the image super-resolution reconstruction model is based on a convolutional neural network.

An image super-resolution reconstruction model structure is shown in fig. 1, and comprises a deep layer feature channel, a shallow layer feature channel, an up-sampling layer and a third convolution layer; the deep feature channel comprises a first convolution layer, an information cascade module, an improved residual error module and a non-local cavity convolution block; processing an input image in an information cascade module, an improved residual error module and a non-local cavity convolution block in sequence after first convolution layer processing to obtain a deep layer characteristic diagram; the shallow feature channel comprises a second convolution layer and an improved VGG network, and the input image is processed by the second convolution layer and then is processed by the improved VGG network to obtain a shallow feature map; and fusing the deep layer characteristic diagram and the shallow layer characteristic diagram, upsampling the fused image by adopting an upsampling layer, and performing convolution operation on the upsampled image by adopting a third convolution layer to obtain a high-definition reconstruction diagram.

Optionally, the deep feature channel includes n information cascade modules and m improved residual modules, wherein all the information cascade modules are connected in series to obtain an information cascade module group; and all the improved residual modules are connected in series to obtain an improved residual module group.

Preferably, the deep feature channel comprises 2n information cascade modules, wherein n information cascade modules are connected in series to obtain a first information cascade module group, and the rest n information cascade modules are connected in series to obtain a second information cascade module group; the first information cascade module group and the second information cascade module group are respectively arranged at the input end and the output end of the improved residual module group.

s2: preprocessing the zoomed data set to obtain a training data set;

The data set adopts a DIV2K data set, wherein eight hundred high-definition (HR) pictures and low-resolution-rate (LR) pictures which are subjected to degradation models (bicubic interpolation degradation) and correspond to the HR pictures are used as training sets, and five pictures are used as verification sets. Five data sets of Set5, Set14, Urban100, Manga109 and BSD100 are used as test sets, and the test data sets are characterized in that texture information is very rich, most of texture information can be lost in degraded low-resolution pictures, and the accuracy of super-resolution reconstruction of the images is very tested. The evaluation indexes are traditional PSNR and SSIM, wherein PSNR represents peak signal-to-noise ratio, and SSIM represents structural similarity.

One forward and one backward propagation of all data in the training set in the neural network is called a round, each round updates the parameters of the model, and the maximum number of rounds is set to 1000 rounds. We set the learning rate to be updated every 200 iterations, and the model and its parameters that achieve the best results on the test data set are saved during 1000 iterations of training the model.

The times of scaling the images in the data set by adopting a bicubic interpolation degradation model in the original high-definition image data set are 2 times, 3 times, 4 times and 8 times. The formula of the degradation model is:

I^LR＝H_dnI^HR+n

wherein, I^LRRepresenting low resolution images, H_dnRepresenting a model of degradation, I^HRRepresenting the original high resolution image and n representing additional noise.

The preprocessing process of the zoomed data set comprises the steps of enhancing the image, including the steps of translating the image and turning the image in the horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.

As shown in fig. 2, the structure of the information concatenation module includes: the following structures are stacked 10 times-three layers of convolutional neural networks, a feature channel merging layer, a channel attention layer and a channel number transformation layer in sequence. The process of processing image data by the module comprises the following steps: firstly, extracting feature information of an input image in sequence by using each layer of convolutional neural network, then combining the feature information extracted by each layer of convolution on a feature channel combination layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels to the size of the number of input channels, and repeating the steps for 10 times to obtain hierarchical feature information of the aggregation convolutional layer.

The information cascade module is used for carrying out aggregation on image information, information of each layer of convolution layer is fully reserved, because the image just enters the convolution neural network, low-frequency information is sufficient and abundant, but with the deepening of the network layer number, more attention is paid to more abstract characteristics, and a lot of edge texture information and smooth information are gradually lost, so that the information cascade module can be used for well capturing more low-frequency information and fusing the low-frequency information into the model.

F_IC＝H_IC(I^LR)

Wherein, I^LRRepresenting a low resolution input image, H_ICRepresenting convolution operations of cascaded blocks, F_ICRepresenting the result of the convolution calculation.

As shown in fig. 3, the structure of the improved residual module includes: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of processing image data by the module comprises the following steps: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.

The output of the cascade module is used as the input of the improved residual module, a channel attention mechanism and a space attention mechanism are connected after each ResNetBlock, meanwhile, relevance on the channel and dependency information on the global space are captured and integrated into the convolutional neural network, characteristic information is enriched, and stability is provided for training of the deep network.

F_RBC＝H_RBC(F_IC)

F_CA＝H_CA(F_RBC)

F_SA＝H_SA(F_CA)

Wherein H_RBCIt represents the convolution operation with the residual block structure of the input source connection, i.e. the input information is merged with the output via the residual block. F_RBCRepresents the output characteristic information through the residual block, which can be expressed as [ f [ ]₁，f₂，f₃…f_n，]Respectively representing the channel characteristics calculated by each convolution kernel. Then, a channel attention mechanism is used for each channel characteristic, and the product of the weight value of each channel and the original input data, namely H, is obtained_CAConvolution operation, F, representing the attention of the channel_CARepresenting the characteristic information after the attention of the channel. Then, the global dependency information and the original input number are calculated by using a space attention mechanism on the output characteristic informationAccording to fusion, i.e. H_SAConvolution operations, F, representing a spatial attention mechanism_SARepresenting the feature information after spatial attention.

As shown in fig. 4, the channel attention structure includes: the structure consists of a global tie pooling layer, a 1x1 convolutional layer, a nonlinear activation layer and a 1x1 convolutional layer in sequence. The process of processing image data by the module comprises the following steps: firstly, obtaining a weight representation symbol of each channel through global average pooling, then reducing the number of channels by using a 1x1 convolutional layer, introducing nonlinear information by using a nonlinear active layer, then converting the number of channels back by using a 1x1 convolutional layer, and finally multiplying the number of channels back by the original input characteristic information to obtain the relevance on the characteristic channel. The spatial attention structure includes: the structure comprises a 1x1 convolution layer, a softmax active layer, a 1x1 convolution layer and a nonlinear active layer in sequence. The process of the module processing image data comprises: firstly, converting input feature information CxHxW into a HWx1x1 global feature map through a 1x1 convolutional layer, then carrying out normalization constraint on the global feature map by using a softmax function, then multiplying the global feature map back to original input information, and finally obtaining dependency information on a global space through a layer of 1x1 convolutional layer and a layer of nonlinear activation layer.

As shown in fig. 5, the structure of the non-local hole convolution block includes: the structure comprises four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 respectively and three layers of common convolution neural network layers. The process of processing image data by the module comprises the following steps: firstly, feature information can be extracted by simultaneously using cavity convolutions of four different expansion parameters and two common convolution neural networks, then the feature information obtained by the convolution of the four cavities is fused on a feature channel on one hand, the feature information extracted by the common convolution neural networks is fused according to the value of a pixel matrix on the other hand, and finally the two kinds of fused feature information are added to obtain global feature information.

The improved VGG network structure comprises: embedding each pooling layer into the common convolutional layer to obtain a VGG network structure; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information to 64 channels by using 3-layer convolution; wherein the pooling layer maintains the feature dimension unchanged using padding.

And extracting global features by using non-local cavity convolution, and up-sampling the extracted feature information to expand the feature information into a size output result required by people. The cavity convolution can enlarge the receptive field without increasing parameters by setting the expansion rate, and the receptive field is embedded into the non-local convolution, so that the calculated amount can be obviously reduced, and meanwhile, global information can be obtained from different scales, and the extraction of the features is more comprehensive.

F_NLHC＝H_NLHC(F_SA)

Wherein H_NLHCConvolution operations representing convolution of non-local holes, F_NLHCRepresenting the characteristic information obtained after convolution of the non-local holes. After the final feature information is subjected to upsampling, the final feature information is output as a corresponding high-definition reconstructed image, namely the formula of the reconstructed image is as follows:

F_Up＝H_Up(F_NLHC)

wherein H_UpRepresenting convolution operations of the upsampling, F_UpRepresenting the upsampled output characteristics.

The loss function expression of the image super-resolution reconstruction model is as follows:

where θ represents the number of parameters of the model, C_HRThe super-resolution calculation equation is expressed,

and

respectively, the ith low-resolution image and the ith corresponding high-resolution image, N represents the number of images in the data set, HR represents high resolution, and LR represents low resolution.

The expression of the super-resolution calculation equation is as follows:

C_HR＝F_UP(F_NLHC(F_SA(F_CA(F_RBC(F_IC(I^LR))))))

wherein, F_UPRepresenting the up-sampled output information, F_NLHCInformation representing convolution extraction of non-local holes, F_SAInformation extracted representing a spatial attention mechanism, F_CAInformation representing the channel attention mechanism extraction, F_RBCInformation representing the extraction of residual blocks, F_ICRepresenting the information output by the cascaded modules.

Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) were used as the result evaluation indices:

where MSE represents the mean square error, MAX represents the maximum value in the pixel values, μ_XAnd mu_YMeans, σ, representing the mean of the pixels of image X, image Y_XAnd σ_YStandard value, σ, of pixel representing image X, image Y_XYRepresenting the covariance of image X and image Y.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image super-resolution reconstruction method based on an attention mechanism and a two-channel network is characterized by comprising the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result;

s2: preprocessing the zoomed data set to obtain a training data set;

s6: adopting a non-local cavity rolling block to carry out global feature extraction on the dependence information to obtain a final deep feature map;

2. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the magnification of the picture in the data set by using bicubic interpolation degradation model is 2 times, 3 times, 4 times and 8 times.

3. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula of the bicubic interpolation degradation model is as follows:

I^LR＝H_dnI^HR+n

wherein, I^LRRepresenting low resolution images, H_dnRepresents a model of degradation, I^HRRepresenting the original high resolution image and n representing additional noise.

4. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing and flipping processing on the image in horizontal and vertical directions; and dividing the enhanced data into different small image blocks, and collecting the divided images to obtain a training data set.

5. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the information cascade module comprises stacking 10 times feature aggregation structures; the characteristic aggregation structure comprises at least three layers of convolutional neural networks, a characteristic channel merging layer, a channel attention layer and a channel number conversion layer, wherein all the layers of convolutional neural networks are connected in sequence, the output end branches of all the layers of convolutional neural networks except the last layer of convolutional neural network are connected with the characteristic channel merging layer, and the characteristic channel merging layer, the channel attention layer and the channel number conversion layer are connected in sequence to form an information cascade module; the process of the module processing image data comprises: firstly, extracting characteristic information of an input image in sequence by using each layer of convolutional neural network, then combining the characteristic information extracted by each layer of convolutional on a characteristic channel combining layer, distinguishing the importance of the combined information by using a channel attention mechanism, finally reducing the number of channels into the number of input channels, and repeating the steps for 10 times to obtain the hierarchical characteristic information of the aggregation convolutional layer.

6. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved residual module comprises: the system comprises a residual error network structure, a channel attention mechanism layer and a space attention mechanism layer, wherein the residual error network structure comprises a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the process of the module processing image data comprises: inputting the hierarchical feature information into a residual error network structure to extract feature information, acquiring the relevance of the extracted feature information on a channel by using a channel attention mechanism, transmitting the relevance downwards, and acquiring the dependency on the global space by using a space attention mechanism.

7. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the non-local hole convolution block comprises: four layers of parallel cavity convolution layers with expansion parameters of 1, 2, 4 and 6 and three layers of common convolution neural network layers; the process of the module processing image data comprises: firstly, extracting characteristic information of improved residual error network input dependency information by adopting four cavity convolution with different expansion parameters and two common convolution neural networks respectively; then, fusing the characteristic information obtained by convolution of the four cavities on a characteristic channel, and fusing the characteristic information extracted by the common convolution neural network according to the value of the pixel matrix; and finally, adding the two kinds of fused feature information to obtain global feature information.

8. The method for image super-resolution reconstruction based on attention mechanism and two-channel network as claimed in claim 1, wherein the improved VGG network structure comprises: embedding the pooling layers into the common convolutional layers to obtain a VGG network structure, wherein the pooling layers comprise 10 common convolutional layers and 3 pooling layers; the process of the module processing image data comprises: firstly, extracting 64 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 128 channel feature information by using 2-layer convolution and one-layer pooling, then extracting 512 channel feature information by using 3-layer convolution and one-layer pooling, and finally recovering the 512 channel information by using 3-layer convolution to obtain 64 channels; wherein the pooling layer maintains the feature dimension unchanged using padding.

9. The method for image super-resolution reconstruction based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the loss function expression of the image super-resolution reconstruction model is as follows:

and

respectively, the ith low-resolution image and the ith corresponding high-resolution image, N the number of images in the data set, HR the high resolution, and LR the low resolution.

10. The image super-resolution reconstruction method based on the attention mechanism and the two-channel network as claimed in claim 1, wherein the formula for evaluating the reconstructed image by using the peak signal-to-noise ratio and the structural similarity is as follows:

wherein PSNR represents peak signal-to-noise ratio, MSE represents mean square error, MAX represents maximum value in pixel values, SSIM represents structural similarity, and μ_XAnd mu_YMean values, σ, of pixels representing image X and image Y, respectively_XAnd σ_YStandard values, σ, representing pixels of image X and image Y, respectively_XYRepresenting the covariance of image X and image Y.