CN109949224B

CN109949224B - Deep learning-based cascade super-resolution reconstruction method and device

Info

Publication number: CN109949224B
Application number: CN201910143361.4A
Authority: CN
Inventors: 王玄音; 马洪兵; 刘刚; 顾桂华
Original assignee: Beijing Yuetu Remote Sensing Technology Development Co ltd
Current assignee: Beijing Yuetu Remote Sensing Technology Development Co ltd
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2023-06-30
Anticipated expiration: 2039-02-26
Also published as: CN109949224A

Abstract

The embodiment of the invention provides a cascade super-resolution reconstruction method and device based on deep learning. The super-resolution reconstruction model is obtained by machine learning an initial construction model, wherein the initial construction model adopts a plurality of cascade groups, and each cascade group comprises a convolution layer structure formed by serially connected convolution layers and a parallel deconvolution layer structure formed by parallelly connected deconvolution layers. The design of the initial construction model enhances the high-low frequency detail feature extraction capability and improves the reconstruction effect. The parallel deconvolution layer structure comprises a plurality of parallel deconvolution layers, so that the trained model can improve the resolution of the initial picture by different multiples, the application scene of the model is enlarged, and the applicability is improved.

Description

Deep learning-based cascade super-resolution reconstruction method and device

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a cascade super-resolution reconstruction method and device based on deep learning.

Background

Image Super-Resolution reconstruction (Super-Resolution) is an important digital image processing technology, and one or more low-Resolution images (or motion sequences) are utilized to reconstruct a high-Resolution and high-information-content image through a corresponding algorithm. The technology breaks through the resolution limit of the image sensor, and can improve the resolution of the image and improve the image quality on the premise of not changing or improving the image acquisition hardware. The processing result is favorable for visual interpretation of the earth surface, identification of an image target algorithm, analysis of the type of the ground object and improvement of quantitative inversion precision, and the information expression capability and the utilization value of the remote sensing image are improved. In the application, the technical support can be provided for ground target identification, type and number interpretation and target automatic detection application through the super-resolution reconstruction technology of the optical targets. There are various implementation methods of the optical target super-resolution reconstruction technology, such as implementation based on a frequency domain or based on a space domain, and implementation based on a single frame or multiple frames.

At present, the deep learning-based super-resolution reconstruction method also has many researches, such as ESPCN, SRCNN, SRGAN, VDSR, FSRCNN and the like, for realizing the super-resolution reconstruction of the image based on the convolutional neural network technology, and improving the spatial resolution of the image target. In the deep learning-based numerous image super-division reconstruction methods, respective characteristics and disadvantages exist, but in general, model training and reconstruction speed and processing precision are required to be further improved, the existing models are difficult to achieve better balance in training convergence time and processing effect, and an efficient and lightweight super-division reconstruction model is lacking. In real-time superdivision reconstruction tasks, effective application value is difficult.

In the process of realizing the embodiment of the invention, the inventor finds that the model for super-resolution reconstruction has single structure and poor reconstruction effect, and can only reconstruct the resolution by a specific multiple, so that the application applicability is lower.

Disclosure of Invention

The invention aims to solve the technical problems that the existing model for super-resolution reconstruction has single structure and poor reconstruction effect, and can only reconstruct the resolution of specific multiple, so that the application applicability is low.

Aiming at the technical problems, the embodiment of the invention provides a cascade super-resolution reconstruction method based on deep learning, which comprises the following steps:

acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture;

inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture;

the super-resolution reconstruction model is obtained by machine learning training from an initial construction model; the initial construction model comprises at least two cascade groups which are connected in sequence, and the output of the former cascade group is used as the input of the latter cascade group; each cascade group comprises a convolution layer structure and a parallel deconvolution layer structure, wherein the output of the convolution layer structure is used as the input of the parallel deconvolution layer structure, and the parallel deconvolution layer structure comprises at least two parallel deconvolution layers.

The embodiment of the invention provides a device for cascade super-resolution reconstruction based on deep learning, which comprises:

the acquisition module is used for acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture;

the processing module is used for inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture;

An embodiment of the present invention provides an electronic apparatus including:

at least one processor, at least one memory, a communication interface, and a bus; wherein,,

the processor, the memory and the communication interface complete the communication with each other through the bus;

the communication interface is used for information transmission between the electronic device and communication devices of other electronic devices;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of the above.

The present embodiment provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method of any one of the above.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for cascade super-resolution reconstruction based on deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a super-resolution reconstruction model according to another embodiment of the present invention;

FIG. 3 shows a function PReLu (x) provided by another embodiment of the invention _i ) Is a graph of (2);

FIG. 4 is a block diagram of an apparatus for cascade super-resolution reconstruction based on deep learning according to another embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to another embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 shows a flow chart of a method for cascade super-resolution reconstruction based on deep learning according to the present embodiment, referring to fig. 1, the method for cascade super-resolution reconstruction based on deep learning includes:

101: acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture;

102: inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture;

The method provided by the embodiment is executed by a device capable of calling or carrying a super-resolution reconstruction model, and the device is a server, a terminal device or a device special for performing super-resolution reconstruction on a picture. The super-resolution reconstruction model is a model capable of improving the resolution of the input initial picture. And inputting the initial pictures with lower resolution into a super-resolution reconstruction model, outputting pictures with the resolution of the initial pictures increased by different times by the super-resolution reconstruction model, and finding out the pictures with the increased times equal to the target times from the pictures, namely the target pictures. For example, the target magnification is 4 times, and the super-resolution reconstruction model can increase the resolution of the input initial picture by 2 times, 3 times, and 4 times. After the initial picture is input into the super-resolution reconstruction model, the super-resolution reconstruction model outputs 3 pictures with resolution respectively improved by 2 times, 3 times and 4 times, and then the picture with resolution of 4 times is the target picture.

The initial build model is a model structure built by convolution kernels and deconvolution kernels, and is not trained. And when the initial construction model is repeatedly trained, the obtained model is a super-resolution reconstruction model. Fig. 2 is a schematic structural diagram of the super-resolution reconstruction model provided in this embodiment, which is a DSRCNN super-resolution reconstruction model, as can be seen from fig. 2, an initial construction model for training the super-resolution reconstruction model is composed of multiple cascade groups, taking 5 cascade groups as an example, and only the first cascade group and the fifth cascade group are shown in fig. 2. Each concatenated group includes a convolutional layer structure, i.e., the structure of 3 convolutional layers of conv-1, conv-2 and conv-3 in fig. 2. Also included within each concatenated group is a parallel deconvolution layer structure, i.e., the structure of 3 parallel Dconv deconvolution layers in fig. 2. The feature map of the convolutional layer structure output serves as an input to the parallel deconvolution layer structure.

The number of deconvolution layers connected in parallel in the parallel deconvolution layer structure is the same as the number of super-resolution pictures which can be output by several times by the trained model. For example, if the parallel deconvolution layer structure in fig. 2 includes 3 deconvolution layers connected in parallel, the super-resolution reconstruction model in fig. 2 can output 3 multiple pictures. Therefore, the super-resolution reconstruction model can be adjusted by adjusting the number of parallel deconvolution layers in the initial construction model, the resolution of the picture can be improved by several times, and the model has good expansibility and applicability. The combination mode of the cascade group in the initial construction model and the combination mode of the convolution layer and the deconvolution layer in the cascade group can extract the characteristics from multiple dimensions, thereby enhancing the high-low frequency detail characteristic extraction capability and improving the reconstruction effect.

The embodiment provides a cascade super-resolution reconstruction method based on deep learning, which carries out super-resolution reconstruction on an initial picture through a super-resolution reconstruction model obtained through pre-training to obtain a target picture with improved resolution by a target multiple. The super-resolution reconstruction model is obtained by machine learning an initial construction model, wherein the initial construction model adopts a plurality of cascade groups, and each cascade group comprises a convolution layer structure formed by serially connected convolution layers and a parallel deconvolution layer structure formed by parallelly connected deconvolution layers. The design of the initial construction model enhances the high-low frequency detail feature extraction capability and improves the reconstruction effect. The parallel deconvolution layer structure comprises a plurality of parallel deconvolution layers, so that the trained model can improve the resolution of the initial picture by different multiples, the application scene of the model is enlarged, and the applicability is improved.

Further, on the basis of the above-described embodiment,

for any non-first-order level group in the initial construction model, the input of the parallel deconvolution layer structure in the non-first-order level group comprises the output of the deconvolution layer structure in the non-first-order level group and a residual calculation result;

and the residual calculation result is the result of residual calculation between the output of the previous cascade group of the non-first cascade group and the initial picture.

Further, the residual calculation result is calculated by an Eltwise method.

At each non-first cascade stageIn the group, an Eltwise method is added before deconvolution to perform residual calculation, so that the calculation of a depth convolution network is faster, and gradient explosion and overfitting phenomena are avoided. The residual calculation formula is as follows

In->

For the residual error of the i-th feature map of the layer, y is the true value of the high-resolution image, and h (x) is the simulated reconstructed high-resolution image. As shown in fig. 2, the input of the parallel deconvolution layer structure of the first concatenated group (first group in fig. 2) is the output of the convolution layers in the first concatenated group. Whereas for each non-first-order group, e.g., the fifth group in fig. 2, the input of the parallel deconvolution layer structure for the non-first-order group includes not only the output of the convolution layers in the non-first-order group, but also the residual calculation result. The residual calculation result is the result of residual calculation between the output of the previous cascade group (namely, the output of the parallel deconvolution layer of the previous cascade group) and the initial picture.

The embodiment provides a cascade super-resolution reconstruction method based on deep learning, wherein in a constructed cascade group, the input of each parallel deconvolution layer structure further comprises a residual calculation result, so that overfitting is avoided, and the quality of a super-resolution reconstructed picture is improved.

Further, on the basis of the above embodiments, the obtaining the super-resolution reconstruction model by the initial build model through machine learning training includes:

obtaining a sample picture and at least two preset multiples for improving the input picture by the super-resolution reconstruction model, and for each preset multiple, reducing the resolution of the sample picture by the preset multiple to obtain a picture corresponding to the preset multiple, so as to obtain a first picture set composed of pictures corresponding to each preset multiple;

taking the first picture set as an input sample of the initial construction model, taking a second picture set formed by the sample pictures as an output sample, training the initial construction model through a plurality of groups of the input samples and the output samples, and taking the trained model as the super-resolution reconstruction model;

wherein the number of sample pictures in the second set of pictures is equal to the number of pictures in the first set of pictures.

It should be noted that, the preset multiple is a multiple that the super-resolution reconstruction model can improve the resolution of the picture. The number of preset multiples is typically equal to the number of deconvolution layers contained in the parallel deconvolution layer structure in the initial build model. The values of the preset multiples are set manually, for example, for an initial build model of a parallel deconvolution layer structure comprising 3 deconvolution layers, the preset multiples are set to 2-fold, 3-fold and 4-fold or 2-fold, 4-fold and 8-fold, respectively.

When training an initial construction model, taking a sample picture with higher resolution as an output sample, taking a picture with the resolution reduced by a preset multiple from the sample picture as an input parameter, and obtaining a group of training samples from each sample picture. And training the initial construction model through a plurality of groups of training samples to obtain a super-resolution reconstruction model. The first picture set is a set of pictures with the resolution reduced by each preset multiple by the sample picture, for example, the resolution of the sample picture is reduced by 2 times, 3 times and 4 times respectively to obtain 3 pictures, and the 3 pictures are the first picture set of the input sample. The second set of pictures is the sample picture itself, e.g. a set of 3 sample pictures.

Specifically, the original high-resolution sample picture is set according to downsampling of each preset multiple to obtain low-resolution images, wherein the low-resolution images are input samples, and the original high-resolution sample picture is output samples. Thus, the low resolution in the model and the original high resolution image at the corresponding location constitute sets of image mapping matrices for model training.

Further, a pre-collected remote sensing image data set is taken as a sample picture, and the collected remote sensing image data set is preprocessed. Preprocessing includes original high resolution target image brightness/contrast enhancement, color space conversion, interpolation downsampling, affine transformation, multi-scale scaling, rotation, cropping.

Further, the preprocessing specifically comprises the steps of carrying out brightness and contrast enhancement on the high-resolution image, enabling an image target to be clearly resolved, carrying out YCbCr color space conversion on the image, extracting brightness (Y) components, carrying out 1/2 times interpolation downsampling on the brightness components by adopting a three-time convolution interpolation method, carrying out random affine transformation on four-corner coordinates of the image target, rotating and cutting every 90 degrees of minimum circumscribed rectangle of the image target, and obtaining a training sample set. In the image clipping process, the minimum circumscribed rectangle of a building target is used as a range, the target is clipped independently, 2000 target image sample data sets are constructed, and a mapping relation is constructed according to the image of which the resolution is reduced and the corresponding original resolution remote sensing image, so that a high-low resolution mapping matrix is generated.

The embodiment provides a cascade super-resolution reconstruction method based on deep learning, which is used for obtaining training samples by carrying out resolution reduction processing of preset multiples on sample pictures, thereby realizing training of an initial construction model. The training convergence speed is high, and the reconstruction efficiency is high.

Further, on the basis of the above embodiments, the training process of the initial build model further includes:

and acquiring an output picture set obtained after the input sample is input into the initial construction model of the current training, and taking the initial construction model of the current training as the super-resolution reconstruction model if the resolution deviation of each picture in the output picture set and the sample picture is within the tolerance range of a preset error function.

In the process of training the initial construction model, whether the initial construction model needs to be continuously trained or not can be determined through comparing the output picture of the trained model with the comparison result of the sample picture. For example, the resolution deviation of the resolutions of the two pictures can be obtained through an error function, if the resolution deviation is within the tolerance range of the preset error function, the currently trained model is used as a super-resolution reconstruction model, and otherwise, the training of the model is continued.

The embodiment provides a cascade super-resolution reconstruction method based on deep learning, and the quality of pictures output by a model is guaranteed through the inspection of the model.

Further, on the basis of the foregoing embodiments, the inputting the initial picture into a pre-trained super-resolution reconstruction model includes:

and acquiring the number of pictures with different multiples which can be output by the super-resolution reconstruction model, taking the number of pictures as the set number of multiples, and inputting initial pictures with the number equal to the set number of multiples into a pre-trained super-resolution reconstruction model.

According to the training process of the initial construction model, the pictures input into the super-resolution reconstruction model are multiple pictures, and when the resolution of a certain initial picture is actually required to be improved through the super-resolution reconstruction model, the multiple initial pictures can be used as the input of the super-resolution reconstruction model, so that errors in the calculation process are avoided.

Further, on the basis of the above embodiments, the convolution layer structure in each cascade group includes at least two convolution layers connected in sequence, the output of the former convolution layer serves as the input of the latter convolution layer, and the feature map of each convolution layer is calculated by using a pralu function.

Further, on the basis of the above embodiments, the initial build model includes 5 concatenated groups sequentially connected, the convolution layer structure in each concatenated group includes 3 sequentially connected convolution layers, and the parallel deconvolution layer structure in each concatenated group includes 3 parallel deconvolution layers.

Further, the operation of the super-resolution reconstruction model is realized by adopting a deep learning Caffe framework, and the method comprises the functions of network construction, parameter setting, strategy setting, model calculation, format conversion, result output and the like of a cascade multi-scale super-resolution reconstruction convolutional neural network. And calling a Caffe function through an interface, and performing cascade multi-scale super-division model training and reconstruction calculation by adopting a GPU.

As shown in the super-resolution reconstruction model in FIG. 2, the super-resolution reconstruction result of the previous cascade group is formed by 5 identical cascade groups, the super-resolution reconstruction result of the previous cascade group is continuously used as a low-resolution input image of the next cascade group, the target feature depth extraction is realized by continuous convolution and deconvolution operation, and as the structures of the 5 cascade groups are identical, the structure is equivalent to that of one cascade group which circulates for 5 times, and the last cascade group outputs a final super-resolution reconstruction image.

Each concatenated group consists of a parallel deconvolution layer structure of 3 convolution layers and 3 deconvolution layers. Because the 3 deconvolution layers are parallel structures, the network depth is calculated as one layer, and therefore, the whole super-resolution reconstruction network depth containing 5 cascade groups is 20 layers.

And in the 3 convolution layers, linear calculation of the space domain pixel value is realized, wherein the first convolution kernel structure is 3×3, the second convolution kernel structure is 5×5, and the third convolution kernel structure is 3×3. Each convolution layer realizes the full extraction of the image characteristic information, the high-dimensional characteristic map (feature map) after each layer of convolution calculation is calculated by adopting a PReLu piecewise function (the characteristic map after each convolution layer is calculated by adopting the PReLu function as shown in fig. 2), the nonlinear mapping relation (end-to-end image information mapping) among neurons with the same dimension is realized, and the characteristic map output dimension of each convolution layer is 64. The PReLu calculation formula is as follows:

where x is the neuron weight value, i is the image channel index, a is the learning parameter initialized to 0.25, and FIG. 3 is the function PReLu (x _i ) Is a graph of (2). F (y) in FIG. 3 is PReLu (x) _i ) Y is x _i 。

The 3 deconvolution layers belong to parallel structures, and 2 times (expressed as 'x 2'), 3 times (expressed as 'x 3') and 4 times (expressed as 'x 4') of the low-resolution images are simultaneously lifted in a depth convolution neural network by establishing a multi-scale mixed high-low-resolution image mapping relation, so that network parameter training is simultaneously realized in one model, and a training sample set with a multi-resolution scale is constructed in super-division model training. In the same cascade group, three parallel deconvolution kernels differ in structure size, and the first deconvolution kernel (x 2) structure is 4 x 4; the second deconvolution core (×3) structure is 5×5; the second deconvolution core (×4) structure was 6×6. Table 1 shows the core parameters of the super-resolution reconstruction model provided in this embodiment.

Table 1 core parameters of the super resolution reconstruction model provided in this embodiment

Core parameter settings	Parameter value
		Each group of cascade network depth	4
Convolutional neural network cascade group number	5
		Model network depth	20
First layer convolution kernel structure	3×3
		Second layer convolution kernel structure	5×5
Third layer convolution kernel structure	3×3
		Fourth layer deconvolution kernel structure (x 2)	4×4
Fourth layer deconvolution kernel structure (×3)	5×5
		Fourth layer deconvolution kernel structure (x 4)	6×6
Training image size filling value (first/third/fourth layer)	1
		Training image size filling value (second layer)	2
Convolution kernel moving step length (first/second/third)	1
		Fourth layer deconvolution move step size (×2)	2
Fourth layer deconvolution move step size (×3)	3
		Fourth layer deconvolution move step size (×4)	4

The convolution layer realizes multi-stage feature extraction of an image target, the size of an output image of an input low-resolution image after carrying out neural network convolution operation changes, and the original low-resolution image is assumed to be w in width ₀ Length of h ₀ The image size after convolution operation is:

width w= (w) of the convolved image ₀ +2*pad-kernel_size)/stride+1

Length h= (h) of the convolved image ₀ +2*pad-kernel_size)/stride+1

The deconvolution layer carries out reverse operation on the multi-stage characteristic image, so that the size of the image is expanded by 2 times, 3 times and 4 times by installing preset super-division reconstruction times, and the size of the image after deconvolution operation is as follows:

width w= (w) of deconvolution operation image ₀ -1)*stride+kernel_size-2*pad

Length h= (h) of image after deconvolution operation ₀ -1)*stride+kernel_size-2*pad

Where stride represents the step size of the convolution kernel, kernel_size represents the size of the convolution kernel, and pad represents the increased number of pixel rows.

The method has higher precision and faster training and reconstruction speed aiming at resolution improvement of the remote sensing image building target. The method is characterized in that quantitative objective indexes (such as PSNR, SSIM, MTF) and subjective targets are tested according to accuracy verification, and test results show that remote sensing spatial resolution is improved by 2 times, 3 times and 4 times; the multi-scale super-division reconstruction effect is achieved through one super-division model, and the training efficiency of the super-division model is remarkably improved compared with that of a single-scale super-division reconstruction model.

According to the embodiment of the invention, a multi-level multi-scale super-resolution reconstruction model is designed aiming at an optical image, a multi-level multi-scale mixed multi-scale resolution reconstruction structure design is adopted, the multi-level multi-resolution reconstruction structure design is adopted, the network depth is increased, the output image of each level group is used as a low-resolution input image for new level group network operation, multi-level cyclic reconstruction operation is carried out on the low-resolution image, each level group combines deconvolution and residual calculation, the conditions of over fitting and local optimization are effectively avoided, and the training efficiency and super-resolution reconstruction effect of the super-resolution model are effectively improved. 3 parallel layer deconvolution layers are added in the cascade multi-scale super-resolution reconstruction model, so that multi-scale promotion of 2 times, 3 times and 4 times of a low-resolution image can be realized in one model at the same time, the model training operation efficiency is greatly improved, and the applicability of the image super-resolution reconstruction application is effectively enhanced.

In a second aspect, fig. 4 is a block diagram of an apparatus for cascade super-resolution reconstruction based on deep learning according to this embodiment, and referring to fig. 4, the apparatus includes an acquisition module 401 and a processing module 402, where,

an obtaining module 401, configured to obtain an initial picture to be reconstructed with super resolution and a target multiple to increase the resolution of the initial picture;

the processing module 402 is configured to input the initial picture into a pre-trained super-resolution reconstruction model, obtain a target picture output by the super-resolution reconstruction model and matched with the target multiple, and output the target picture;

The device for cascade super-resolution reconstruction based on deep learning provided in this embodiment is applicable to the method for cascade super-resolution reconstruction based on deep learning provided in the foregoing embodiment, and is not described herein again.

The embodiment provides a device for cascade super-resolution reconstruction based on deep learning, which performs super-resolution reconstruction on an initial picture through a super-resolution reconstruction model obtained through pre-training to obtain a target picture with improved resolution by a target multiple. The super-resolution reconstruction model is obtained by machine learning an initial construction model, wherein the initial construction model adopts a plurality of cascade groups, and each cascade group comprises a convolution layer structure formed by serially connected convolution layers and a parallel deconvolution layer structure formed by parallelly connected deconvolution layers. The design of the initial construction model enhances the high-low frequency detail feature extraction capability and improves the reconstruction effect. The parallel deconvolution layer structure comprises a plurality of parallel deconvolution layers, so that the trained model can improve the resolution of the initial picture by different multiples, the application scene of the model is enlarged, and the applicability is improved.

In a third aspect, fig. 5 is a block diagram showing the structure of an electronic apparatus provided in the present embodiment.

Referring to fig. 5, the electronic device includes: a processor (processor) 501, a memory (memory) 502, a communication interface (Communications Interface) 503, and a bus 504;

wherein,,

the processor 501, the memory 502, and the communication interface 503 perform communication with each other through the bus 504;

the communication interface 503 is used for information transmission between the electronic device and communication devices of other electronic devices;

the processor 501 is configured to invoke the program instructions in the memory 502 to perform the methods provided in the above method embodiments, for example, including: acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture; inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture; the super-resolution reconstruction model is obtained by machine learning training from an initial construction model; the initial construction model comprises at least two cascade groups which are connected in sequence, and the output of the former cascade group is used as the input of the latter cascade group; each cascade group comprises a convolution layer structure and a parallel deconvolution layer structure, wherein the output of the convolution layer structure is used as the input of the parallel deconvolution layer structure, and the parallel deconvolution layer structure comprises at least two parallel deconvolution layers.

The present embodiment provides a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture; inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture; the super-resolution reconstruction model is obtained by machine learning training from an initial construction model; the initial construction model comprises at least two cascade groups which are connected in sequence, and the output of the former cascade group is used as the input of the latter cascade group; each cascade group comprises a convolution layer structure and a parallel deconvolution layer structure, wherein the output of the convolution layer structure is used as the input of the parallel deconvolution layer structure, and the parallel deconvolution layer structure comprises at least two parallel deconvolution layers.

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example, comprising: acquiring an initial picture to be subjected to super-resolution reconstruction and a target multiple for improving the resolution of the initial picture; inputting the initial picture into a pre-trained super-resolution reconstruction model, acquiring a target picture which is output by the super-resolution reconstruction model and matched with the target multiple, and outputting the target picture; the super-resolution reconstruction model is obtained by machine learning training from an initial construction model; the initial construction model comprises at least two cascade groups which are connected in sequence, and the output of the former cascade group is used as the input of the latter cascade group; each cascade group comprises a convolution layer structure and a parallel deconvolution layer structure, wherein the output of the convolution layer structure is used as the input of the parallel deconvolution layer structure, and the parallel deconvolution layer structure comprises at least two parallel deconvolution layers.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of electronic devices and the like are merely illustrative, wherein the elements described as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for cascade super-resolution reconstruction based on deep learning, comprising the steps of:

2. The method of claim 1, wherein for any non-first-order set in the initial build model, the inputs to the parallel deconvolution layer structure in the non-first-order set comprise the outputs of the deconvolution layer structure in the non-first-order set and residual calculation results;

3. The method of claim 1, wherein the initially constructed model is trained by machine learning to obtain the super-resolution reconstruction model, comprising:

taking the first picture set as an input sample of the initial construction model, taking a second picture set consisting of a corresponding number of sample pictures as an output sample, training the initial construction model through a plurality of groups of the input samples and the output samples, and taking the trained model as the super-resolution reconstruction model;

4. A method according to claim 3, further comprising, in training the initial build model:

5. The method of claim 1, wherein the inputting the initial picture into a pre-trained super-resolution reconstruction model comprises:

and acquiring the number of pictures with different multiples which can be output by the super-resolution reconstruction model, taking the number of pictures as the set number of multiples, and inputting the initial pictures with the number equal to the set number of multiples into a pre-trained super-resolution reconstruction model.

6. The method of claim 1, wherein the convolution layer structure within each concatenated group comprises at least two serially connected convolution layers, the output of a preceding convolution layer being the input of a subsequent convolution layer, and wherein the signature of each convolution layer is calculated using a prilu function.

7. The method of claim 1, wherein the initial build model comprises 5 concatenated groups of sequentially connected convolutional layer structures within each concatenated group of sequentially connected convolutional layers, and wherein the parallel deconvolution layer structures within each concatenated group of sequentially connected convolutional layers comprise 3 parallel deconvolution layers.

8. A device for cascade super-resolution reconstruction based on deep learning, comprising:

9. An electronic device, comprising:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-7.

10. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any one of claims 1 to 7.