CN116934590A

CN116934590A - Image super-resolution reconstruction method and system

Info

Publication number: CN116934590A
Application number: CN202310750678.0A
Authority: CN
Inventors: 贺强; 李学武; 冯文昕; 郭天炜; 李兵; 王玉峰; 石永立; 刘洋; 何双利; 吕刚; 杨明权; 李道豫; 谢辰昱; 陈田济帆; 方睿; 叶露
Original assignee: Guiyang Bureau Extra High Voltage Power Transmission Co
Current assignee: Guiyang Bureau Extra High Voltage Power Transmission Co
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-10-24

Abstract

The application discloses a reconstruction method and a reconstruction system for super-resolution of an image, wherein the method comprises the following steps: preprocessing an original image to obtain a preprocessed image; extracting features of the preprocessed image to obtain a first feature extraction image; nonlinear mapping is carried out on the first feature extraction image to obtain a first convolution feature image, and a second feature image is generated according to the first convolution feature image; and carrying out aggregation reconstruction on the first feature extraction image and the second feature image to obtain a second convolution feature image, and generating a super-resolution image according to the convolution feature image. The application obtains the high-resolution image by using the end-to-end image super-resolution multilayer network structure, processes classical degradation models with different scale factors, fuzzy kernels and noise levels, simultaneously takes advantages of a modeling-based method and a learning-based method into consideration, and can realize the technical effects of taking detection reconstruction precision and flexibility into consideration.

Description

Image super-resolution reconstruction method and system

Technical Field

The application belongs to the technical field of artificial intelligence visual detection, and particularly relates to a method and a system for reconstructing super-resolution of an image.

Background

In recent years, video monitoring cameras are rapidly spreading nationwide. According to the prediction of IDC in 2018, 9 months and 30 days, the deployment amount of Chinese video monitoring cameras in 2022 can reach 27.6 hundred million. In addition, with the rapid development of visual communication and image processing technologies, digital image data that people can acquire has exhibited explosive growth, greatly promoting the rapid development of the computer vision field. Digital images have been closely related to people's daily lives as information carriers for recording visual tasks. In recent years, scientific research based on computer vision is increasingly important, and digital images are widely applied to security monitoring, medical identification, satellite detection and daily life of people. Accordingly, there is an urgent need for continuous improvement and perfection of digital image processing technology.

At present, imaging devices commonly used in daily life include digital cameras, mobile phones, monitoring cameras and the like. However, the imaging process is limited by factors such as a sensor, quality of category and the like, which may cause too low resolution and blurring of the acquired digital image, and information in the scene cannot be clearly and definitely recorded. The visual effect of the digital image is that the visual effect of the image can not meet the human perception requirement, and the application value of the digital image is limited. Further affecting the processing effect of other subsequent visual tasks. Therefore, there is a need for an efficient image super-resolution algorithm to recover reliable high-definition images from low-resolution images. To solve this problem, there are many super-resolution methods proposed, including early conventional methods and recent learning-based methods. While the conventional method relies on priori information very much and processes the optimization process very time-consuming, the deep learning method improves the image resolution performance by increasing the network depth, which increases the network calculation cost and is not suitable for portable devices such as mobile phones and cameras. The lightweight super-resolution algorithm proposed later can be used for a smaller calculation cost by sacrificing the performance.

However, in the current scheme, the extraction degree of the features of the low-resolution image is not high, and further, it is difficult to restore the texture details of the high-resolution image more.

Disclosure of Invention

The method is used for solving the technical problems that the reconstruction accuracy, the reconstruction flexibility and the reconstruction efficiency cannot be considered in the prior art. The application provides an image super-resolution reconstruction method and system, wherein the method comprises the following steps:

s1, preprocessing an original image to obtain a preprocessed image;

s2, carrying out feature extraction on the preprocessed image to obtain a first feature extraction image;

s3, performing nonlinear mapping on the first feature extraction image to obtain a first convolution feature image, and generating a second feature image according to the first convolution feature image;

and S4, carrying out aggregation reconstruction on the first feature extraction image and the second feature image to obtain a second convolution feature image, and generating a super-resolution image according to the convolution feature image.

Optionally, the method for reconstructing a super-resolution image further includes:

s5, performing size expansion and pixel cleaning on the second convolution characteristic image by using first sub-pixel convolution to obtain a third convolution image, and generating a fourth characteristic image according to the third convolution image;

and S6, performing size expansion and pixel cleaning on the fourth characteristic image by using second sub-pixel convolution to obtain a fourth convolution image, and generating a fifth characteristic image, namely an optimized super-resolution image, according to the fourth convolution image.

Optionally, the method for acquiring the original image in step S1 includes:

the method comprises the steps of converting an RGB format into a YCBCR format image according to a field sampling image, wherein the YCBCR format image is an original image.

Optionally, in step S1, the preprocessing of the original image includes;

the original image is enlarged to the target size by bicubic interpolation as a low resolution first image.

Optionally, in step S2, the process of obtaining the first feature extraction map includes;

and receiving the first image, downsampling the feature blocks in the first image, representing the downsampled result as a vector group, and obtaining the first feature extraction graph according to the vector group.

Optionally, in step S3, the process of generating the second feature image includes:

performing nonlinear mapping operation on the first feature extraction graph;

mapping each set of feature extraction images to a high-dimensional vector, the high-dimensional vector being a representation of a high-resolution feature block;

and generating a second characteristic image according to the high-dimensional vector.

Also included is an image super-resolution reconstruction system comprising: the device comprises a preprocessing module, a first feature extraction module, a second feature extraction module, a reconstruction module, a first sub-pixel convolution module and a second pixel convolution module;

the preprocessing module is used for preprocessing an original image to obtain a preprocessed image;

the first feature extraction module is used for carrying out feature extraction on the preprocessed image to obtain a first feature extraction image;

the second feature extraction module is used for carrying out nonlinear mapping on the first feature extraction image to obtain a first convolution feature image, and generating a second feature image according to the first convolution feature image;

the reconstruction module is used for carrying out aggregation reconstruction on the first feature extraction image and the second feature image to obtain a second convolution feature image, and generating a super-resolution image according to the convolution feature image;

the first sub-pixel convolution module is used for performing size expansion and pixel cleaning on the second convolution characteristic image by using first sub-pixel convolution to obtain a third convolution image, and generating a fourth characteristic image according to the third convolution image;

the second sub-pixel convolution module is configured to perform size expansion and pixel cleaning on the fourth feature image by using second sub-pixel convolution to obtain a fourth convolution image, and generate a fifth feature image, that is, an optimized super-resolution image, according to the fourth convolution image.

Optionally, the process of cleaning the pixels includes:

the method comprises the steps of inputting an image, and obtaining a characteristic image with the same channel number as a preset value and the same size as the input image through three convolution layers;

rearranging preset channels of each pixel of the characteristic image, and corresponding to sub-blocks with preset sizes in the high-resolution image to obtain the rearranged high-resolution image.

Compared with the prior art, the application has the beneficial effects that:

according to the image super-resolution reconstruction method provided by the application, the end-to-end image super-resolution multilayer network structure is used for obtaining the high-resolution image, and because the image super-resolution multilayer convolution network structure adopts a single end-to-end training model to process classical degradation models with different scale factors, fuzzy kernels and noise levels, the advantages of a modeling-based method and a learning-based method are simultaneously considered, and the technical effects of detecting reconstruction precision and flexibility can be realized.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a method step diagram of an image super-resolution reconstruction method and system according to an embodiment of the present application;

FIG. 2 is a roadmap of an improved image super-resolution multi-layer convolution reconstruction network technique of an image super-resolution reconstruction method and system according to an embodiment of the application;

FIG. 3 is a schematic diagram of a multi-layer convolution reconstruction portion of an image super-resolution reconstruction method and system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a mechanism of a sub-pixel convolution part of an image super-resolution reconstruction method and system according to an embodiment of the present application;

FIG. 5 is a technical roadmap of YoloV3 series target detection of an image super-resolution reconstruction method and system according to an embodiment of the application;

fig. 6 is a system configuration diagram of an image super-resolution reconstruction method and system according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.

Embodiment one:

in this embodiment, as shown in fig. 1-2, a method and a system for reconstructing super-resolution image, the method includes: s1, preprocessing an original image to obtain a preprocessed image;

The reconstruction method of the super-resolution image further comprises the following steps:

s5, performing size expansion and pixel cleaning on the second convolution characteristic image by using the first sub-pixel convolution to obtain a third convolution image, and generating a fourth characteristic image according to the third convolution image;

The original image acquisition method in step S1 includes:

the images in the YCBCR format are original images, which are converted from RGB format to YCBCR format according to the live sampled images.

In step S1, the process of preprocessing the original image includes;

In step S2, the process of obtaining the first feature extraction map includes:

and receiving the first image, downsampling the feature blocks in the first image, expressing the downsampling result as a vector group, and obtaining a first feature extraction graph according to the vector group.

In step S3, the process of generating the second feature image includes:

performing nonlinear mapping operation on the first feature extraction graph;

a second feature image is generated from the high-dimensional vector.

In step S4, the process of generating the super-resolution image includes: and performing aggregation reconstruction on the first feature extraction image to obtain a second convolution feature image, which is high resolution, and generating a third feature image, namely a super-resolution image, according to the second convolution feature image.

In steps S5-S6, a low resolution image acquisition unit is used for acquiring a low resolution image; the image reconstruction unit is used for inputting the low-resolution image into a super-resolution reconstruction model of the well-trained image to obtain a high-resolution image; the image super-resolution reconstruction model comprises an image super-resolution multi-layer convolution reconstruction network sub-model and a pixel cleaning model connected behind the image super-resolution multi-layer convolution reconstruction unfolding network sub-model.

The image super-resolution network model comprises a first convolution layer, a first activation layer, a second convolution layer, a second activation layer, a third convolution layer, a third activation layer, a fourth convolution layer, a first pixel cleaning layer and a fourth activation layer; the low-resolution image passes through a feature extraction unit, a nonlinear mapping unit, an image reorganization unit and a sub-pixel convolution unit to obtain a target residual feature image.

In some possible implementations, the convolution units include a first lower convolution unit, a second lower convolution unit, a third lower convolution unit, and a sub-pixel convolution unit in cascade.

The upper activation system comprises a first upper activation layer, a second activation layer, a third activation layer and a sub-pixel activation layer which are cascaded at intervals;

the first convolution unit is used for receiving the low-resolution image and obtaining a feature extraction image;

the first convolution unit is used for downsampling the feature extraction image to obtain a first downsampled image, and generating a first feature image according to the first downsampled image; the second deconvolution unit is used for downsampling the first characteristic image to obtain a second downsampled image, and generating a second characteristic image according to the second downsampled image;

the third deconvolution unit is used for downsampling the second feature to obtain a third downsampled image, and generating a third feature image according to the third downsampled image;

the first sub-pixel convolution unit is used for receiving the third characteristic image and generating a fourth characteristic image according to the third residual characteristic image;

the second sub-pixel convolution unit is used for receiving the fourth characteristic image and generating a fifth characteristic image according to the fourth residual characteristic image;

the first convolution layer is used for carrying out convolution operation on the first downsampled image to obtain a first convolution image;

the activation function layer is used for activating the first convolution image based on the ReLu activation function to obtain an activation image; the second convolution layer is used for activating the image to carry out convolution operation to obtain a convolution image;

and the second fusion layer is used for superposing the first downsampled image and the convolution image to obtain a first residual characteristic image.

The region extraction layer is used for extracting a research object image of the research object region from the characteristic connection diagram based on the weight value.

In some possible implementations, the basic chunk includes two basic units connected in sequence, the basic units including a basic convolution layer, a basic ReLU function layer;

the basic convolution layer is used for extracting the characteristics of the target characteristic image to obtain a basic characteristic image;

the batch normalization layer is used for carrying out normalization processing on the basic feature images to obtain normalized images;

the basic ReLU function layer is used for carrying out nonlinear mapping on the normalized image to obtain a characteristic connection diagram.

Specifically, a corresponding nonlinear loss function is designed according to the PSNR and SSIM values so as to promote index convergence. And PSNR and SSIM are used as indexes for evaluating the performance of the model, and PSNR is used as peak signal-to-noise ratio for measuring the difference between pixel values of the generated high-resolution image and the real high-resolution image; SSIM is a structural similarity that measures the degree of similarity in brightness, contrast, and structure of two images.

And if the model performance of the image super-resolution reconstruction model is smaller than the performance threshold, updating the image super-resolution reconstruction model.

A pre-process is performed on the processed super-resolution image, which is adjusted to 416 x 416 by the size function.

And inputting the adjusted picture into a target detection network based on the yolov3 model, and detecting and identifying faults.

The image was first downsampled to a 13 x 13 feature map over a dark-53 full convolution network.

The dark net-53 full convolutional network is from layer 0 up to layer 74, with a total of 53 convolutional layers, the remainder being Res layers.

Wherein the Res layers are selected from five Res layers with different scales and depths, and only perform residual operation between different layer outputs.

The second convolution layer is used for carrying out convolution operation on the activated image to obtain a convolution image;

and the second fusion layer is used for superposing the first downsampled image and the convolution image to obtain the first residual characteristic image.

In some possible implementations, the basic chunk includes two basic units connected in sequence, where the basic units include a basic convolution layer and a basic ReLU function layer;

the basic ReLU function layer is used for carrying out nonlinear mapping on the normalized image to obtain the characteristic connection graph.

As the main network structure for feature extraction by Yolov3, dark uses a series of 3×3 and 1×1 convolved convolutional layers to effect feature extraction.

The 13×13 feature map was then upsampled and combined with the feature map obtained in the middle of DarkNet-53 to obtain a 26×26 feature map.

Where the Concat module uses tensor stitching, it expands the dimension of two tensors, e.g., 26×26, 256 and 26×26, 256 tensor stitching, resulting in 26×26, 512.

The 26×26 up-samples were then combined with the feature map obtained in the middle of DarkNet-53 to obtain a 52×52 feature map.

So far we have obtained three different scales of picture identification data, each scale outputting 3× (4+1+80) =255 data.

The meaning of these data is 3 a priori boxes of different shapes, each box having 4 coordinate data, 1 confidence data, and 80 category data, respectively.

After the data output in the algorithm model is obtained, the same object is identified by a plurality of frames, and non-maximum suppression is required.

The non-maximum suppressing NMS selects those prediction frames in the neighborhood that have the highest scores (i.e., the highest probabilities of correctly identifying the target), and suppresses those prediction frames that have low scores.

After suppression, each target will only hold one optimal prediction frame data, but at this time we need to output this data in the form of a picture.

The data is positioned and regressed through the supplementary codes, and the output result is a picture with a prediction frame and a classification label (namely, target identification is successfully completed).

As shown in fig. 3-4, step one, performing feature extraction on the preprocessed image to obtain a first feature extraction diagram: collecting image information at a job site and converting an RGB image thereof into a YCBCR image;

wherein: the color mode hue, chroma and saturation of the RGB image are mixed together and are difficult to separate, and Y refers to a brightness component, CB refers to the difference between the blue part of the RGB input signal and the brightness value of the RGB signal, and CR refers to the difference between the red part of the RGB input signal and the brightness value of the RGB signal.

The original image generally contains a low-resolution image and a high-resolution image, the high-resolution image is an image before downsampling, the low-resolution image is an image after downsampling, and generally adopts downsampling which is 2 times, and the length and the width of the image are changed to 1/2 of the original length and width.

Wherein the downsampling factor may also be adjusted with 3 or 4.

Image preprocessing: because the input images of the neural network adopted later are required to be consistent in length and width, and the images in the data set are inconsistent in length and width, the images are required to be cut. The method adopted here is to locate each picture center first, then expand n pixels to four directions with the picture center as a reference, and then cut the picture into a square of 2n×2n.

Wherein 150 pixels are extended in general, and finally a 300×300 square image is obtained.

Step two: extracting features of the preprocessed image to obtain a first feature extraction image;

step three: performing nonlinear mapping on the first feature extraction image to obtain a first convolution feature image, and generating a second feature image according to the first convolution feature image;

step four: and carrying out aggregation reconstruction on the first feature extraction image and the second feature image to obtain a second convolution feature image, and generating a super-resolution image according to the convolution feature image.

The steps 2-4 comprise a convolution layer and a RELU activation layer, respectively.

The first feature extraction is to extract (overlap) feature blocks from the low resolution image and represent each feature block as a high-dimensional vector. These vectors include a set of feature maps equal in number to the dimension of the vector, the number of channels in the cell equal to the number of input images, a convolution kernel size 9*9, and 64 filters.

The second feature image non-linearly maps each high-dimensional vector to another high-dimensional vector. Each mapping vector is a high resolution feature block. These vectors also include another set of feature maps, the unit convolution kernel size 1*1, the channel 64, and the filter 32.

Image reconstruction aggregates the high resolution patch-wise (region between pixel level and image level) representations to produce the final high resolution image, which is convolved by the kernel size 5*5.

The y color space of the low resolution image is trained, different models are trained according to different magnification factors 2, 3, 4, and each network model is trained with only 1 magnification factor. For different amplification factors (2, 3 and 4), respectively performing corresponding downsampling on the original high-resolution image in the test set to obtain an original low-resolution image, and then recovering the low-resolution image L into an original high-resolution image H' through amplification, and then comparing the original high-resolution image with the original high-resolution image H. The difference value between H' and H is used for adjusting the parameters of the model, the difference value is minimized through iterative training, and the required result is obtained through three convolution layers in the model after adjustment.

Step five: performing size expansion and pixel cleaning on the second convolution characteristic image by using first sub-pixel convolution to obtain a third convolution image, and generating a fourth characteristic image according to the third convolution image;

step six: and performing size expansion and pixel cleaning on the fourth characteristic image by using second sub-pixel convolution to obtain a fourth convolution image, and generating a fifth characteristic image, namely an optimized super-resolution image, according to the fourth convolution image.

The reconstructed high resolution image is input to a sub-pixel convolution unit for pixel cleaning to obtain the high resolution image.

Wherein the input is a high-resolution image obtained by the original low-resolution image through a multi-layer convolution reconstruction network, and the channel number is r after passing through three convolution layers ² Is the same as the input image in size. R of each pixel of the characteristic image ² The channels are rearranged into an r x r region corresponding to the high scoreA subblock of size r×r in the resolution image, thereby having a size of H×W×r ² Is rearranged into a high resolution image of rH x rW x 1.

After the high resolution image is finally obtained. Then, the peak signal-to-noise ratio (PSNR) is taken as an evaluation index, the larger the value is, the smaller the distortion is, and the PSNR is generally considered to be qualified above 38 dB.

The specific evaluation method is to design a corresponding nonlinear loss function according to the PSNR and SSIM values so as to promote index convergence. And PSNR and SSIM are used as indexes for evaluating the performance of the model, and PSNR is used as peak signal-to-noise ratio for measuring the difference between pixel values of the generated high-resolution image and the real high-resolution image; SSIM is a structural similarity that measures the degree of similarity in brightness, contrast, and structure of two images.

The size of the image is unified into 416×416 size: since in the dark-53 network architecture the magnification of each downsampling is 32, the picture size we input into the model must be a multiple of 32, a picture size typically used in practical model training is 416 x 416. The picture size is adjusted to a size of 416 x 416 by a size function before the algorithmic model is input. The size of the scaled image is specified and the size function stretches the image to this size without any cropping of the stretched image compared to the original.

And inputting the image subjected to the size adjustment into a yolov3 network for feature extraction.

As shown in fig. 5, the picture is first downsampled through dark-53. Downsampling is understood to mean reducing the number of samples of the matrix by reducing the image. For a pair of images of size 416×416, s times downsampling is performed to obtain a resolution image of size (416/s) × (416/s), although s should be a common divisor of 416. The network is mainly composed of a series of 1x1 and 3x3 convolution layers (each convolution layer is followed by a BN layer and a LeakyReLU layer, because each Res contains 1+2x convolution layers, the whole Backbone network backhaul contains 1+ (1+21) + (1+22) + (1+28) + (1+28) + (1+24) =52, and a FC full connection layer is added to form a Darknet53 classification network.

The picture is downsampled to a 13×13 feature map through a full convolution network of the dark net-53, then upsampled (which can be understood as an enlarged image colloquially, the number of sampling points of a matrix is increased) and a 26×26 feature map is obtained through a Concat module with the feature map obtained in the dark net-53, wherein the Concat module refers to tensor stitching, and the dimension of two tensors can be expanded. And similarly, 26×26 upsampling is carried out through a Concat module to obtain a 52×52 characteristic diagram.

So far we have obtained three different scales of picture identification data,

y1 is suitable for large targets, with an output dimension of 13×13×255. 13×13 is a picture size; 255 = (80+5) ×3;80 means identifying the number of object types; 5 = x, y, w, h (anchor frame coordinates) and c (confidence); 3: each point predicts 3 target boxes.

Y2 is suitable for medium targets, with an output dimension of 26×26×255. 26×26 is a picture size; 255 = (80+5) ×3;80 means identifying the number of object types; 5 = x, y, w, h (anchor frame coordinates) and c (confidence); 3: each point predicts 3 target boxes.

Y3 is suitable for small targets, with an output dimension of 52×52×255. 52×52 is a picture size; 255 = (80+5) ×3;80 means identifying the number of object types; 5 = x, y, w, h (anchor frame coordinates) and c (confidence); 3: each point predicts 3 target boxes.

And carrying out non-great inhibition on the obtained three prediction frames.

All three groups of data, namely Y1, Y2 and Y3, have a parameter of c (confidence), the data is classified and identified by a classifier, each prediction frame can obtain a score, when the data inhibits an NMS network through a non-maximum value, the prediction frame with the highest score in the neighborhood (namely, the highest probability of correctly identifying a target) can be reserved, and the prediction frame with the low score can be cleared. The data of the optimal prediction frame is established so far, and the size and the confidence of the frame are predicted by the coordinates of the center point.

But at this time we also need to output this data in the form of a picture, we need to locate and regress the data with additional code, YOLOv3 uses logistic regression to predict the confidence level of each bounding box. If the prediction bounding box overlaps more than any other anchor boxes with the real object, then the value should be 1. But if the overlap of the frame with the real object is not optimal, but exceeds a certain threshold (here the threshold can be self-adjusting, we set the threshold to 0.5), its prediction is ignored next, and no loss function is generated. Only one anchor block is assigned to each object to predict (the one with highest confidence) and contribute to all terms of the loss function, otherwise as a negative example, the confidence label is zero. Thus, detection and identification of the target are realized.

Example two

As shown in fig. 6, an image super-resolution reconstruction system includes: the device comprises a preprocessing module, a first feature extraction module, a second feature extraction module, a reconstruction module, a first sub-pixel convolution module and a second pixel convolution module;

the preprocessing module is used for preprocessing an original image to obtain a preprocessed image; the method is used for acquiring the low-resolution image of the unified format and the index.

the reconstruction module is used for carrying out aggregation reconstruction on the first feature extraction image and the second feature image to obtain a second convolution feature image, and generating a super-resolution image according to the convolution feature image; and the method is used for inputting the low-resolution image into a super-resolution reconstruction model of the well-trained image to obtain a high-resolution image.

Optionally, the process of cleaning the pixels includes:

The above embodiments are merely illustrative of the preferred embodiments of the present application, and the scope of the present application is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present application pertains are made without departing from the spirit of the present application, and all modifications and improvements fall within the scope of the present application as defined in the appended claims.

Claims

1. The image super-resolution reconstruction method is characterized by comprising the following steps of:

s1, preprocessing an original image to obtain a preprocessed image;

2. The method for reconstructing an image according to claim 1, wherein the method for reconstructing a super-resolution image further comprises:

3. The method for reconstructing an image super-resolution as recited in claim 1, wherein the method for acquiring an original image in step S1 comprises:

4. The method for reconstructing super-resolution image as set forth in claim 1, wherein in step S1, said preprocessing of said original image comprises:

5. The method for reconstructing super-resolution images as recited in claim 4, wherein in step S2, the process of obtaining the first feature extraction map comprises;

6. The method for reconstructing super-resolution images as recited in claim 5, wherein in step S3, the generating of the second feature image comprises:

performing nonlinear mapping operation on the first feature extraction graph;

7. An image super-resolution reconstruction system, comprising: the device comprises a preprocessing module, a first feature extraction module, a second feature extraction module, a reconstruction module, a first sub-pixel convolution module and a second sub-pixel convolution module;

8. The image super-resolution reconstruction system according to claim 7, wherein the process of pixel cleaning comprises: