CN112837224A

CN112837224A - Super-resolution image reconstruction method based on convolutional neural network

Info

Publication number: CN112837224A
Application number: CN202110337413.9A
Authority: CN
Inventors: 李鹏飞; 李丽丽
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-05-25

Abstract

A super-resolution image reconstruction method based on a convolutional neural network belongs to the technical field of image processing and is used for solving the problems that an existing super-resolution convolutional neural network is low in operation speed, the quality of obtained images is not ideal, the resolution of the images is low, the gradient of a training network is easy to disappear and the like. The technical points of the invention comprise: the method for improving the existing super-resolution convolutional neural network model comprises the following steps: enabling the scaled image to be subjected to post-up-sampling operation at the rear section of the model; post-upsampling is a learning-based upsampling-sub-pixel method; the number of network layers is deepened, and a residual error network is added into the network; and further taking the improved super-resolution convolutional neural network model as a generation network in a generation countermeasure network, and integrating the generation network with the countermeasure network to further improve the image reconstruction efficiency. The method can be widely applied to the field of super-resolution image reconstruction research.

Description

Super-resolution image reconstruction method based on convolutional neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a super-resolution image reconstruction method based on a convolutional neural network.

Background

The neural network has become an important technical means in the field of image identification and image classification, and has a great trend, so that the research of reconstructing blurred images into high-definition images by applying the super-resolution convolutional neural network technology has important value and significance for the visual development of computers and the development of artificial intelligence. When a computer technology is used for operating targets such as pictures, videos and voices, the method is limited by the influence of hardware conditions such as a computer CPU (central processing unit) and a GPU (graphic processing unit), and when the target content with high quality is required to be obtained more efficiently and quickly, the network structure (such as a framework of a network model, an up-sampling method, a network design and the like) can be modified and integrated secondarily so as to meet the requirements of people as much as possible.

Super-Resolution Image Reconstruction (SRIR) is a research hotspot in the field of computer vision processing and Image processing in recent years, and due to the wide application range and the practical theoretical value of the application scenarios, Super-Resolution Image Reconstruction techniques are more closely focused by researchers. The super-Resolution image reconstruction technology is essentially a technology of generating a High Resolution (HR) image which has a good visual effect and is closer to a real image by processing one or more Low Resolution (LR) images through a network model. In the daily application of the technology, in order to alleviate a series of image quality problems caused by transferring and storing images, a down-sampling operation is generally adopted for the images to reduce the quality of the images, but the quality reduction operation has irreversible conversion, so that the problem is a pathological problem. The key to reconstructing a low resolution image into a high resolution image is to find an approximate mapping relationship between the low resolution and high resolution images.

Currently, the mainstream super-resolution reconstruction technology can be roughly divided into three methods, namely interpolation-based, reconstruction-based and learning-based methods. Among them, interpolation-based methods (such as nearest neighbor interpolation and bicubic interpolation) can simply and effectively enhance the resolution of an image, but edge blurring of an image portion occurs. The reconstruction-based method can recover the high-frequency information lost by the simple image, is simple to operate and low in workload, but cannot well process the image information with a complex structure. Learning-based methods, which are the mainstream methods in recent years, aim to establish an approximate mapping relationship between a low-resolution image and a high-resolution image by learning a large number of data samples. Compared with the shallow convolutional network learning of simple function fitting only by the traditional technology of manually extracting features, the super-resolution reconstruction based on deep learning can automatically learn to obtain different hierarchical feature representations, realizes more complex nonlinear function model approximation and has higher practical value. Thus, the deep learning based algorithm outperforms many of the classical algorithms before in performance. However, the existing super-resolution image reconstruction method based on the convolutional neural network has the problems of more parameters, larger calculated amount, longer training time, blurred image texture and the like, and is difficult to effectively obtain a high-quality reconstructed image.

Disclosure of Invention

In view of the above problems, the invention provides a super-resolution image reconstruction method based on a convolutional neural network, which is used for solving the problems that the existing super-resolution convolutional neural network is slow in operation speed, the obtained image quality is not ideal, the image resolution is low, the gradient of a training network is easy to disappear, and the like.

According to one aspect of the present invention, a super-resolution image reconstruction method based on a convolutional neural network is provided, the method comprising the following steps:

step one, acquiring a training image data set and a test image data set; wherein the image is a low resolution image;

setting training parameters and a content loss function, and constructing an improved super-resolution convolutional neural network model;

step three, using the training image data set as the input of the improved super-resolution convolutional neural network model, and adjusting the training parameters until the content loss function is minimized to obtain a trained super-resolution convolutional neural network model;

and step four, inputting the test image data set into the trained super-resolution convolutional neural network model to obtain a high-resolution image.

Further, the training parameters in the second step include learning rate and training times; the content loss function is a mean square error.

Further, the improved super-resolution convolutional neural network model in the second step comprises a feature extraction module, a nonlinear mapping module, an up-sampling module and a feature recombination module, wherein the output of the feature extraction module is connected with the input of the nonlinear mapping module, the output of the nonlinear mapping module is connected with the input of the up-sampling module, and the output of the up-sampling module is connected with the input of the feature recombination module; the characteristic extraction module is used for extracting characteristic information from the low-resolution image by adopting convolution operation; the nonlinear mapping module is used for mapping the characteristic information to a high-dimensional vector to obtain a plurality of high-dimensional characteristic graphs; the up-sampling module is used for amplifying the high-dimensional characteristic diagram; the characteristic recombination module is used for recombining the amplified high-dimensional characteristic images to obtain a high-resolution image.

Furthermore, a plurality of residual blocks connected in a jumping manner are added into the nonlinear mapping module, and the structures of the residual blocks are convolution, block standardization, an activation function, convolution, block standardization, and summation of products corresponding to each element in sequence, wherein the activation function is a ReLU function; the up-sampling module reconstructs the characteristic diagram by adopting a sub-pixel layer convolution method, and the sub-pixel layer convolution method comprises the following steps: and forming a feature map unit by combining single pixels on the multi-channel feature map, wherein the pixels on each feature map are equivalent to sub-pixels on the new feature map.

Further, the improved super-resolution convolutional neural network model in the second step further comprises a dimensionality reduction module, and the dimensionality reduction module is used for reducing the number of channels of the feature map into 3-dimensional channels of RGB by adopting a convolution operation.

According to another aspect of the present invention, a super-resolution image reconstruction method based on a convolutional neural network is further provided, the method comprising the following steps:

step four, taking the trained super-resolution convolutional neural network model as a generating network model in the generating confrontation network model, and simultaneously training the generating network model and a judging network model in the generating confrontation network model by using a generating confrontation loss function until the generating network and the judging network reach Nash balance to obtain the finally trained generating network model and judging network model;

and step five, inputting the test image data set into the finally trained generation network model in the step four to obtain a high-resolution image.

Further, the generating the confrontation loss function in step four includes a content loss function and a confrontation loss function, and the generating the confrontation loss function is expressed by the following formula:

wherein the content of the first and second substances,

represents a content loss function, i.e., mean square error;

representing the penalty function.

The beneficial technical effects of the invention are as follows:

according to the method, firstly, the pre-up-sampling operation of the zoomed image at the front section of the model is improved into the post-up-sampling operation at the rear section, so that the problems of high calculation complexity, low running speed and the like caused by the fact that the zoomed image occurs in a high-dimensional space are solved, and the network structure is clearer and easier to understand; secondly, an interpolation-based up-sampling method is improved to a learning-based up-sampling method, so that the problems of poor image quality, low image pixel, blurred image edge and the like of an image generated after interpolation by the interpolation method are solved, image content with better quality effect and higher resolution can be obtained, and the image resolution is improved; the number of network layers is deepened again, a residual error network is added in the network in order to solve the problems of gradient disappearance and the like, and a network model is more stable due to cross-layer connection of the residual error network; and finally, the improved super-resolution convolutional neural network model is used as a generation network in a generation countermeasure network and is combined with the countermeasure network, so that the image reconstruction effect is further improved.

Drawings

The invention may be better understood by referring to the following description in conjunction with the accompanying drawings, in which like reference numerals are used throughout the figures to indicate like or similar parts. The accompanying drawings, which are incorporated in and form a part of this specification, illustrate preferred embodiments of the present invention and, together with the detailed description, serve to further explain the principles and advantages of the invention.

FIG. 1 is a diagram illustrating a pre-upsampling operation in a conventional super-resolution convolutional neural network model;

FIG. 2 is a schematic diagram of a post-up-sampling operation in the improved super-resolution convolutional neural network model of the present invention;

FIG. 3 is a schematic diagram of the structure of the improved super-resolution convolutional neural network model of the present invention;

FIG. 4 is a process schematic of an up-sampling method in the improved super-resolution convolutional neural network model of the present invention;

FIG. 5 is a schematic diagram of a structure of a generative network model in the generative countermeasure network of the present invention;

FIG. 6 is a schematic diagram of the structure of the present invention for generating a model of a countermeasure network in a countermeasure network;

FIG. 7 is a schematic diagram showing the comparison of the effects of the method of the present invention and the conventional super-resolution convolutional neural network-SRCNN method after the image is reconstructed; wherein, (a) represents an SRCNN method; (b) represents the process of the invention; (c) representing an original image;

FIG. 8 is a schematic diagram showing the comparison of the effect of the method of the present invention combined with a countermeasure network and the reconstructed image by the conventional super-resolution convolutional neural network-SRCNN method; wherein (a) represents an original image; (b) represents the process of the invention; (c) represents the SRCNN method;

FIG. 9 is a graph comparing PSNR values with each other for different image reconstruction methods.

Detailed Description

Exemplary embodiments of the present invention will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. It should be noted that, in order to avoid obscuring the present invention by unnecessary details, only the device structures and/or processing steps that are closely related to the scheme according to the present invention are shown in the drawings, and other details that are not so relevant to the present invention are omitted.

The super-resolution convolutional neural network is a technical method which involves changing the size of an image (generally increasing the size of the image) and improving the quality of the original image. The super-resolution convolutional neural network directly learns the mapping between the low and high resolution images. The traditional super-resolution convolutional neural network is based on an interpolation mode, firstly, preprocessing of an image is carried out, namely, bicubic interpolation (also called bicubic interpolation) is utilized to amplify an original low-resolution image into a required size; then, learning to obtain a mapping relationship, including the following operations: extracting and representing feature small blocks, carrying out nonlinear mapping on feature vectors and reconstructing a feature image.

The method is characterized in that on the basis of the traditional super-resolution convolutional neural network, a model framework, an up-sampling method (sub-pixels) and a network design are improved respectively, and an anti-collision network is improved by further combining the existing image classification network model and a visual recognition algorithm aiming at the problems of more parameters, larger calculated amount, longer training time, fuzzy image texture and the like of the existing image super-resolution algorithm based on the convolutional neural network.

First, the improvement of the model framework is explained as follows.

As shown in FIG. 1, the traditional super-resolution convolutional neural network framework adopts pre-up-sampling^[1]After the image is input into the network, the operation process of enlarging the input image to the target size by adopting an interpolation method is firstly carried out.

The image with the target size is obtained at the beginning by the pre-up-sampling method, the subsequent operation does not need to change the image size, and only the up-sampled image needs to be subjected to detail operations such as feature extraction, but the method causes difficulty in high-dimensional space learning after the size is enlarged. Since other operations such as feature extraction after the pre-upsampling operation are performed on the input image based on the amplified image, problems such as noise of the amplified image and blurring of the image are caused, and since all operations of the image after the pre-upsampling operation are performed on a large-sized image, the required processing time is obviously longer than that of other small-sized network model frames. Therefore, in order to increase the network operation speed and simplify the complexity of network model calculation, the pre-up-sampling operation is improved.

As shown in fig. 2, the present invention improves the method by first performing operations such as feature extraction, nonlinear mapping, etc. on the input image without performing a scaled image of a target size, and then performing scaling on the input image to obtain a desired high-quality image, such an up-sampling operation not only locates the feature extraction, etc. on a low-dimensional space, but also significantly reduces the spatial complexity and computational complexity of the model, and increases the training speed of the network model. Therefore, the improved model adopts a model framework method of post-up-sampling operation.

FIG. 3 shows the improved super-resolution convolutional neural network model of the present invention. As shown in fig. 3, in the first part, different feature maps of an image are first extracted from a low-resolution image by a convolution operation using different convolution kernels; representing the extracted feature map as the feature map of a high-dimensional vector, wherein the partial formula is as follows:

in the feature extraction section, each of the obtained feature maps is determined by the features of the input image, and the modified linear unit function ReLU is used as an activation function after feature extraction.

Then, the improvement of the network design is explained as follows.

The residual error network is a common network model, and is generally applied to a deep convolutional neural network. A residual network structurally refers to an input received from a previous layer, through which the input can be passed to a network model structure of any, non-adjacent layer later. The difference between the residual block inside the residual network and the connection of the normal network is that the residual network uses a cross-layer connection (also called a jump connection), which is also a main factor that the residual network can be used to alleviate the problem of gradient disappearance caused by increasing the network depth in the deep neural network.

The residual network is generally composed of a plurality of residual modules, and a connection in one residual module generally spans two to three layers, or may span multiple layers as needed, but the computational difficulty of the network module may also increase. Residual learning is to add a shortcut connection mode between every two layers of nodes at different depths in the network model, so as to reduce the degradation problem caused by the deepening of the network layer number. The output calculation formula of a single residual module is as follows:

H(x)＝F(x)+x(2)

where x is the input, F (x) is the output of the convolutional layer, and H (x) is the output of the residual block.

As shown in fig. 3, in the second part, the non-linear mapping refers to an operation of mapping a high-dimensional vector of the feature map to another higher-dimensional vector, and a cross-layer connection of the residual block is added in the layer, so that each mapping vector can be represented as a high-resolution feature block, and the problem of gradient in the network is also guaranteed. The vectors are combined according to a specific rule to form a new feature map. And obtaining a plurality of feature maps after nonlinear mapping operation, wherein the feature maps are used for reconstructing the low-layer high-resolution feature small blocks.

And in the third part, a plurality of high-resolution feature small blocks obtained from the previous layer are subjected to convolution and recombination among multiple channels to obtain a high-resolution feature map, so that a final complete three-channel color high-resolution image is reconstructed. The final reconstructed formula is:

then, the improvement of the up-sampling method is explained as follows.

In a super-resolution convolutional neural network framework, a method for realizing upsampling is an interpolation method which is commonly used, and upsampling based on the interpolation method is to amplify the image size according to pixel points of the image and does not introduce other information content. According to the interpolation pixel point obtained by calculating the pixel point of the interpolation-based upsampling method, negative consequences such as poor image quality, low image pixel, blurred image edge and the like of an interpolated image can be generated after interpolation. In order to reduce or improve the problems of poor quality of image quality and the like and to better perfect the effect of the super-resolution convolutional neural network image, the upper sampling method is an upper sampling method based on learning, namely a sub-pixel method.

Any two pixels are macroscopically closely connected, but there is a pitch between any two pixels in microscopic vision, and there are many "small pixels" between the pitches of each other that are not visible to the naked eye, and these "small pixels" existing between any two pixels are considered to be actually present, and these "small pixels" are generally referred to as "sub-pixels".

As shown in fig. 4, the sub-pixel layer can improve the resolution of the image by implementing a mapping from a small rectangle to a large rectangle. The reconstruction operation from a low-resolution image to a high-resolution image is realized by adopting a sub-pixel layer convolution mode, a feature map unit is formed by single pixels on a multi-channel feature map according to a certain combination mode, and the pixels on each feature map are equivalent to the sub-pixels on a new feature map. If the original image is magnified by r times, then r should be generated²(number of channels) equal size profiles, which will then yield r²The same size feature maps are spliced into a large map of r times, and the operation is called sub-pixel layer operation. Then it is first

The sub-pixel formula for a layer can be expressed as:

wherein the content of the first and second substances,

is shown as

A feature map of the layer output;

a feature map representing an upper layer; PS denotes the period transform operator, which will

Is transformed into tensor elements

Tensor element (2)

Is shown as

The length and width of the layer feature map,

number of channels representing a feature map);

is shown as

The weight variable of the layer;

is shown as

The bias of the layers is

A vector of dimensions.

An image with better quality effect and higher resolution can be obtained by the learning-based up-sampling operation compared with an interpolation method, so that the improved up-sampling method is feasible on improving the resolution of the super-resolution convolutional neural network image.

Finally, the generation of the countermeasure network is explained as follows.

The super-resolution network based on the generation countermeasure network enables a realistic texture effect to be generated in a generated image at a single image super-resolution. In order to further improve the problem of generating artifacts on image details, three main parts of a network structure, an antagonism loss and a perception domain loss of a super-resolution generation countermeasure network are improved, and the enhanced super-resolution generation counteracts network birth. Enhanced super-resolution generation improves perceptual domain loss against the network, using pre-activation features, which can provide greater supervision for brightness consistency and texture recovery. The enhanced super-resolution generation countermeasure network can result in better visual quality and more realistic and natural image texture.

In an improvement of the enhanced super-resolution generation countermeasure network, a residual block in the network is changed from a basic residual block to a residual block RRDB in a dense residual block; generating a countermeasure network improvement to a relative average generation countermeasure network; finally, by improving the perceptual domain loss function using the pre-activation VGG feature, this improvement will provide sharper edges and more visually pleasing results^[2]。

One basic generative confrontation network comprises two models of neural networks: a generating model (G) and a discriminant model (D), also called a generator (denoted by G) and a discriminant (denoted by D). The generator can generate realistic data distribution; the discriminator is used to discriminate whether the data is simulated or real. Both the generator and the arbiter may be implemented in the form of a neural network model. The training process for generating the countermeasure network is to make the simulation data generated by G enough to be spurious and spurious, and make the probability of D judging true and false all 0.5.

A Super-Resolution generation adaptive network (SRGAN) is based on a Super-Resolution network model for generating a countermeasure network, and the network model is also divided into a generation network model and a determination network model. Due to the defects of the generated countermeasure network, the optimal state, namely the nash equilibrium state, is difficult to achieve in the process of realizing the target, the defects are specifically represented as the problems that the countermeasure network model is unstable in the training process, the calculation loss is high and low, the training result cannot reach the optimal point constantly, the convergence result is not thorough, the mode collapse is possible and the like, so that the training effect is very poor, and the improvement cannot be realized even if the training time is increased, so that the similar problems exist in the network model of the SRGAN. Thus, the present invention improves on the model.

The structure of the generated network model of the image super-resolution method for generating the countermeasure network is shown in fig. 5, wherein colors of different modules represent different network layers, and the same color represents the same content operation. In the network model, a low-resolution image is input firstly, and the input low-resolution image is subjected to convolution and activation function ReLU processing so as to extract the characteristics of the image; and then entering a residual module containing 5 residual blocks, wherein the structure in each residual block is convolution, block standardization, an activation function, convolution, block standardization, and summation of products corresponding to all elements in sequence, and the activation function used in the part is ReLU. And each residual block uses jump connection, and the purpose of using the residual blocks is that the jump connection of the residual blocks can keep gradient so as to avoid the degradation of a network model; after passing through a residual module, entering a convolution layer containing two up-sampling operations, wherein the up-sampling operation is realized by a sub-pixel convolution layer, each layer can be amplified by 2 times, and the up-sampling of the two layers can be amplified by 4 times; after the upsampling operation, the convolution operation is performed again, so that the number of channels of the feature map is reduced to 3-dimensional channels of RGB, and finally a high-resolution color image is output.

The confrontation network is also called as a discriminant network, which is essentially equivalent to a feature extraction module, firstly, an image is input, passes through a convolutional layer and an activation function, and then enters a module layer containing 7 same operations, each module respectively contains the convolutional layer, the activation function and the block standardization, wherein the activation function used in the part is Leaky ReLU, but in the 7 modules of the model, the number of convolutional kernels of each convolutional layer is different, and the number of the convolutional kernels is respectively 64, 128, 256, 512 and 512; and after the operation, entering a full connection layer, an activation function and a full connection layer, wherein the activation function used in the part is also Leaky ReLU, and finally outputting a classification result of a final image through a Sigmoid activation function. The structural model of the network is shown in fig. 6. Discriminant models in the field of convolutional neural networks are a method for modeling the relationship between unknown data and known data. The discriminant model is a method based on probability theory. Given the input variable X, the discriminant model predicts Y by constructing a conditional probability distribution P (Y | X).

In the learning of the traditional image super-resolution method, most of the loss functions used are Mean Square Errors (MSEs). In the equation for computing the MSE, the error is squared (true value — predicted value), and therefore if the error is greater than 1, the MSE increases the error. If there are outliers in the data, the value of the error is large. Therefore, using MSE to compute the penalties gives more weight to outliers, which reduces the overall performance of the model. And when the MSE loss function is used at an image magnification of 4, a smooth and less detailed realistic image is generated. Thus, the present invention employs the generation of a loss function defined for a network to generate a loss function for a counterpoise network.

The loss of the generation of the countermeasure network is also called perceptual loss, the perceptual loss includes content loss and countermeasure loss, the content loss refers to the loss caused by the image in the process of generating the image; the antagonistic loss refers to a loss generated by discriminating a network. Such a loss function is more effective in generating countermeasures to the MSE loss than to the MSE loss in enhancing the final effect of the generated image. Formula to generate the penalty function:

wherein the content of the first and second substances,

represents a content loss, i.e., a MSE function;

the method for representing the antagonistic loss comprises the steps of calculating the cross entropy between the probability (value between 0 and 1) of judging the network return and an ideal value 1, adding a weight before the latter to control the influence of the two types of losses on the total loss, and setting the initial value to be 10^-3。

Wherein, I^LRA low-resolution image is represented by a low-resolution image,

representing a probability estimate of the reconstructed image. And for better gradient use is made of

Instead of the former

A minimization of losses is performed.

The loss calculations are all produced at the output of the discriminator D, with the goal that the output of the discriminator is the more accurately the image and the better the image it generates is the true image, while the image generated by the generator G is the closer to the true image, the more realistic the effect is. Certainly, the discriminator can become a perfect discriminator if not in the training process, at this time, the discriminator generates a larger error, the generated larger error updates G, and G generates a more vivid image, and the operations are repeated, so that a balance state is finally achieved, namely, the image generated by the generator is closer to the original image, and the discriminator cannot distinguish whether the image is a real image or a false image generated by the generator.

Detailed description of the preferred embodiment

The embodiment adopts a DIV2K data set in a training set https:// github.com/xinntao/basicsR, wherein the data set comprises 800 training sets of HR (high resolution is abbreviated as HR) images and 100 HR test sets. The images of the data set in DIV2K are all color images, which is a high quality (2K resolution) image data set used for image restoration tasks.

The evaluation criterion of the present embodiment is to evaluate by calculating Peak Signal-to-Noise Ratio (PSNR) indexes of an original image and a generated image. Peak signal-to-noise ratio (PSNR) is commonly used to measure reconstructed images of lossy transforms (e.g., image compression, image restoration), and is a quantitative quality method for evaluating and comparing models, which represents the closeness of pixel values of reconstructed and original images. The pixel values of an image are generally defined by a mean square error MSE. The PSNR equation is:

wherein (2)ⁿ-1)²Representing the square of the maximum value of the colour of the image point, and n represents the number of bits of the sample point (typically each sample point is represented by 8 bits, then 2⁸-1 ═ 255). The higher the PSNR value, the closer the pixel value representing the reconstruction result is to the standard.

In the embodiment, opencv is used to firstly perform image compression on a trained high-resolution image, the image is used as an input low-resolution image, and then a high-definition image is generated by the method of the invention. And in order to improve the learning ability of the network, random training is carried out on the images in the training set. In the embodiment, the GPU is used for performing operation processing, the low-resolution image is respectively input into the traditional super-resolution convolutional neural network model and the improved super-resolution convolutional neural network model, and the effect graphs generated by the two models are compared, and the specific embodiment process is described as follows.

Step 1: converting high-definition images in a DIV2K training set into low-resolution images as image input for generating a network;

step 2: the first layer adopts convolution operation to extract features (the size of a convolution kernel is 3 multiplied by 3, the step length is 1, and the number of convolutions is 64), and then calculates an activation function ReLU;

and step 3: and entering a residual error network. The residual error network comprises 3 residual error blocks and 1 convolution layer, and each residual error block has the following structure: convolution layer → batch process standardization → convolution layer → batch process standardization, and the calculation of the activation function ReLU is performed after each layer of batch process standardization;

and 4, step 4: performing convolution according to the output result of the residual error network (the convolution kernel size is 3 multiplied by 3, the step length is 1, and the number of convolutions is 256); entering sub-pixels for convolution upsampling; this operation was repeated twice;

and 5: and finally, performing convolution (the size of a convolution kernel is 1 multiplied by 1), calculating by using an activation function Tanh, and outputting a reconstructed target image.

And training the model through the training set, and testing the model by using the model for a test image after training to generate a high-resolution image. In order to verify the effectiveness of the method, four images are randomly selected and respectively processed in a traditional super-resolution convolutional neural network-SRCNN^[3]The results of the tests performed by the method and the method of the present invention (Improved method) are shown in FIG. 7, and the peak signal-to-noise ratio (PSNR) results are shown in Table 1.

Table 1 PSNR comparison data table of images

In the above comparative experiments, the same training set, test set and training times are used to ensure the accuracy of the data. As is clear from Table 1, compared with the traditional super-resolution convolutional neural network-SRCNN method, the PSNR value of Image 1 reconstructed by the SRCNN method is 19.87dB, while the PSNR value reconstructed by the method is 22.14dB, which is improved by 11.42%; PSNR values of Image 2, Image 3 and Image 4 obtained by Image reconstruction by adopting the method are respectively improved by 19.94%, 14.25% and 8.54% compared with that of an SRCNN method. Therefore, the improved super-resolution convolutional neural network model has better performance on objective evaluation.

As can be seen from fig. 7, the images reconstructed by the method of the present invention in Image 1, Image 2, Image 3, and Image 4 are sharper and clearer, the edge layers are clear, the effect is more obvious, and the overall visual effect is effectively improved and closer to that of the original Image.

Detailed description of the invention

The embodiment adds a part for generating a discrimination network in the countermeasure network on the basis of the network model of the embodiment. Training is also performed on the CPU, using the open source dataset of DIV2K for training and testing of the model. The processing of the image is performed using the PIL library in Python, which allows filtering, color space conversion, image size conversion, image rotation, and various affine transformations using different convolution kernels. Firstly, the trained high-resolution image is processed into a low-resolution image to be processed, and then the processed low-resolution image is processed by the improved super-resolution convolutional neural network method to obtain a reconstructed high-resolution image. And random training is carried out on the images in the training set, so that the generalization capability of the network is improved.

(1) Training the experimental skills of the deep network: scaling the residual information, i.e. multiplying the residual information by a number between 0 and 1, for preventing instability of the network; smaller initialization, the residual structure is easier to train when the variance of the initialization parameters becomes smaller.

(2) Training details: magnification: 4; mini-batch: 16; carrying out down-sampling on the high-resolution image through Python to obtain a low-resolution image; high resolution module size: 128 x 128. It has been found from experimental experience that training deep networks using large modules is more effective, because an increased field of experience helps the model capture more meaningful information.

(3) Training process: training a model (L) based on PSNR index₁Loss), initial learning rate: 2 × 1e-4, multiplying the learning rate of every 200000 mini-batch by 0.5; taking the model trained in the first embodiment as the initialization of the generation network; initial learning rate: 1e-4, and the model is halved after iterations 50k, 100k, 200k, 300 k.

The pre-training model optimized based on the pixel loss function can avoid generating local optimization of the network, and after pre-training, the quality of an input image obtained by the judgment network is relatively better instead of a completely initialized image, so that the judgment of image textures is emphasized by a discriminator, and finally Adam (beta) is used₁＝0.9，β₂0.999) to alternately update the generating network and the discriminating network until convergence. Is a primary reason to help generate a more visually pleasing effect based on generating a model of the antagonistic network.

(4) The experimental steps are as follows:

step 1, training a DIV2K data set with abundant texture information by using python, and converting high-definition images in a training set into low-resolution images to be used as image input for generating a network;

step 2, the first layer is a feature extraction operation containing 64 3 × 3 convolutions, and then the calculation of an activation function ReLU is performed;

step 3, entering a residual error network comprising 3 residual error blocks and 1 convolutional layer, wherein each residual error block has a structure as follows: convolution layer → batch normalization → ReLU → convolution layer → batch normalization → ReLU;

the purpose of using batch normalization in this layer is: A. and (3) accelerating the convergence speed: if the data distribution of each layer is inconsistent in the deep neural network, the network is difficult to converge and train, so that the value of each layer of data is converted into 0-1, and the training is easier to converge due to the distribution; B. preventing gradient explosion and gradient disappearance; C. prevention of overfitting: in the training of the network, the network randomly takes small blocks each time, so that different learning of the network in different directions is realized, and overfitting is avoided to a certain extent;

step 4, according to the output result of the upper layer, performing re-convolution (the size of a convolution kernel is 3 multiplied by 3, the step length is 1, and the number of convolutions is 256); then, performing upsampling;

step 5, finally, performing convolution calculation by using a 1 × 1 convolution kernel to reconstruct a target image;

the above network structure is used to generate a high-resolution image as a generation network for generating a countermeasure network model.

And 6, taking the improved countermeasure network for generating the countermeasure network model as the countermeasure network of the improved super-resolution convolutional neural network, and judging whether the generated network generates an image.

Training of the improved network model is performed by the training set in DIV2K, and then the test set is used for testing of the improved network model, ultimately generating a high resolution image. As shown in fig. 8, 3 images were randomly reconstructed based on the test set in the DIV2K set, and their corresponding PSNR values are shown in table 2.

Table 2 PSNR comparison data table of images

In the embodiment, because the countermeasure network is added to judge the generated image, the generated image is closer to the original image. According to the evaluation of the PSNR value, the method disclosed by the invention has the advantages that the PSNR value is respectively improved by 13.72dB, 14.21dB and 9.46dB compared with the traditional super-resolution convolutional neural network-SRCNN method, which shows that the method disclosed by the invention can better recover the characteristic information of the image.

As can be seen from fig. 8, the method of the present invention significantly improves the quality of the reconstructed image. The location of the upsampling module exacerbates artifacts caused by pixel degradation in the input image, and thus, changing the upsampling location in the generation network can effectively suppress the artifacts, resulting in a sharper image. When the antagonistic network is used to train and optimize the model, it can be seen from table 1 and table 2 that the PSNR value is improved to some extent, because the model is optimized specifically for the index. After the network is improved, the image quality is improved to a certain extent. The improvement enables the super-resolution image generated by the network model to be closer to a real image, which is beneficial to reconstructing a clearer high-resolution image after a counternetwork is added later.

Finally, comparing the PSNR values of the images generated by the image reconstruction method, the conventional super-resolution convolutional neural network-SRCNN method, and the Bicubic interpolation (Bicubic interpolation) image reconstruction method of this embodiment, as shown in fig. 9, it can be clearly seen that the PSNR value of the network model gradually increases with the increase of the training times, where the PSNR value curve of the Bicubic interpolation algorithm is lower in position than the PSNR value curves of other methods of the same times, and the PSNR value curve of the conventional super-resolution convolutional neural network-SRCNN method also gradually increases, but has a plurality of intersections with the PSNR value curve of the method of this embodiment, and starts with the maximum PSNR value of the method of this embodiment, and then starts with the conventional super-resolution convolutional neural network-SRCNN method, and the Bicubic interpolation algorithm; then, with the increase of iteration times, the PSNR value of the method of the embodiment and the PSNR value of the traditional super-resolution convolutional neural network-SRCNN method are not mutually divided, and a bicubic interpolation curve is always below two curves. With the increase of the number of model operations, the PSNR value of the method of this embodiment gradually rises to the highest value, and the curve of the PSNR value of the SRCNN method is only inferior to that of the method of this embodiment. With the continuous increase of the iteration times, although the PSNR value of each curve is increased, compared with the beginning of the iteration, the rising speed of the curve is slower and the increasing amplitude of the curve is smaller, which indicates that the result of the network model training basically tends to be stable. Therefore, the performance of the improved super-resolution convolutional neural network is superior to that of the traditional super-resolution convolutional neural network.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

The documents cited in the present invention are as follows:

[1]DONG C,LOY CC,HE K,et al Learning a deep convolutional network for image super-resolution[C]//European Conference on Computer Vision.SPRINGER,CHAM,2014.

[2]WANG X,YU K,U S,et al.esrgan:Enhanced super-resolution generative adversarial networks.Computer Vision,ECCV 2018 Workshops.2018.

[3]DONG C,LOY C C,HE K,et al.Image super-resolution using deep convolutional networks[J].Pattern Analysis and Machine Intelligence,IEEE Transactions on,2016,38(2):295-307.

Claims

1. a super-resolution image reconstruction method based on a convolutional neural network is characterized by comprising the following steps:

2. The super-resolution image reconstruction method based on the convolutional neural network as claimed in claim 1, wherein the training parameters in step two include learning rate and training times; the content loss function is a mean square error.

3. The super-resolution image reconstruction method based on the convolutional neural network of claim 2, wherein the improved super-resolution convolutional neural network model in the second step comprises a feature extraction module, a nonlinear mapping module, an upsampling module and a feature reconstruction module, wherein the output of the feature extraction module is connected with the input of the nonlinear mapping module, the output of the nonlinear mapping module is connected with the input of the upsampling module, and the output of the upsampling module is connected with the input of the feature reconstruction module; the characteristic extraction module is used for extracting characteristic information from the low-resolution image by adopting convolution operation; the nonlinear mapping module is used for mapping the characteristic information to a high-dimensional vector to obtain a plurality of high-dimensional characteristic graphs; the up-sampling module is used for amplifying the high-dimensional characteristic diagram; the characteristic recombination module is used for recombining the amplified high-dimensional characteristic images to obtain a high-resolution image.

4. The super-resolution image reconstruction method based on the convolutional neural network as claimed in claim 3, wherein a plurality of residual blocks connected in a jumping manner are added into the nonlinear mapping module, the structure of the residual blocks is sequentially convolution, block normalization, an activation function, convolution, block normalization, and summation after products corresponding to each element, wherein the activation function is a ReLU function; the up-sampling module reconstructs the characteristic diagram by adopting a sub-pixel layer convolution method, and the sub-pixel layer convolution method comprises the following steps: and forming a feature map unit by combining single pixels on the multi-channel feature map, wherein the pixels on each feature map are equivalent to sub-pixels on the new feature map.

5. The super-resolution image reconstruction method based on the convolutional neural network of claim 4, wherein the improved super-resolution convolutional neural network model in step two further comprises a dimension reduction module, and the dimension reduction module is configured to reduce the number of channels of the feature map to 3-dimensional channels of RGB by using a convolution operation.

6. A super-resolution image reconstruction method based on a convolutional neural network is characterized by comprising the following steps:

7. The super-resolution image reconstruction method based on the convolutional neural network as claimed in claim 6, wherein the training parameters in step two include learning rate and training times; the content loss function is a mean square error.

8. The super-resolution image reconstruction method based on the convolutional neural network of claim 7, wherein the improved super-resolution convolutional neural network model in the second step comprises a feature extraction module, a nonlinear mapping module, an upsampling module and a feature reconstruction module, wherein the output of the feature extraction module is connected with the input of the nonlinear mapping module, the output of the nonlinear mapping module is connected with the input of the upsampling module, and the output of the upsampling module is connected with the input of the feature reconstruction module; the characteristic extraction module is used for extracting characteristic information from the low-resolution image by adopting convolution operation; the nonlinear mapping module is used for mapping the characteristic information to a high-dimensional vector to obtain a plurality of high-dimensional characteristic graphs; the up-sampling module is used for amplifying the high-dimensional characteristic diagram; the characteristic recombination module is used for recombining the amplified high-dimensional characteristic images to obtain a high-resolution image.

9. The super-resolution image reconstruction method based on convolutional neural network of claim 8, wherein a plurality of residual blocks connected in a jump way are added in the nonlinear mapping module, the structure of the residual blocks is sequentially convolution, block normalization, activation function, convolution, block normalization, product corresponding to each element and summation, wherein the activation function is ReLU function; the up-sampling module reconstructs the characteristic diagram by adopting a sub-pixel layer convolution method, and the sub-pixel layer convolution method comprises the following steps: and forming a feature map unit by combining single pixels on the multi-channel feature map, wherein the pixels on each feature map are equivalent to sub-pixels on the new feature map.

10. The super-resolution image reconstruction method based on convolutional neural network of claim 9, wherein the generating the countering loss function in step four comprises a content loss function and a countering loss function, and the formula of the generating the countering loss function is represented as:

wherein the content of the first and second substances,

represents a content loss function, i.e., mean square error;

representing the penalty function.