CN111223046A

CN111223046A - Image super-resolution reconstruction method and device

Info

Publication number: CN111223046A
Application number: CN201911140450.XA
Authority: CN
Inventors: 孙旭; 董晓宇; 高连如; 雷莉萍; 张兵
Original assignee: Institute of Remote Sensing and Digital Earth of CAS
Current assignee: Institute of Remote Sensing and Digital Earth of CAS
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-06-02
Anticipated expiration: 2039-11-20
Also published as: CN111223046B

Abstract

The application provides an image super-resolution reconstruction method and device, wherein the method comprises the following steps: and inputting the low-resolution image and the resolution improvement multiple into the trained preset network to obtain a high-resolution reconstructed image. The trained preset network comprises a preset number of multi-sensing branch modules, and any multi-sensing branch module comprises a plurality of cascaded residual channel attention groups; any residual channel attention group comprises a plurality of cascaded enhanced residual blocks, and any enhanced residual block comprises: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting a second convolution layer for inputting the image of the enhanced residual block; the output and input of the second convolution layer are rectified; the output of the rectification module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module. The image reconstructed by the method has higher spatial resolution and higher information fidelity.

Description

Image super-resolution reconstruction method and device

Technical Field

The present application relates to the field of image processing, and in particular, to a method and an apparatus for reconstructing super-resolution images.

Background

Super-Resolution reconstruction (SR) refers to restoring a high-Resolution image from one or more low-Resolution images of the same scene.

Super-resolution reconstruction is an important digital image processing technology and has wide application in the fields of medicine, remote sensing and various social lives. The current mainstream image super-resolution reconstruction method comprises the following steps: a super-resolution reconstruction method based on deep learning. Specifically, the mapping relation between the high-resolution training sample pair and the low-resolution training sample pair is learned by constructing a neural network, and then high-resolution reconstruction is performed on various low-resolution images of the input network by using the learned priori knowledge.

However, the reconstructed high-resolution image has low fidelity.

Disclosure of Invention

The application provides an image super-resolution reconstruction method and device, and aims to solve the problem that an image obtained through super-resolution reconstruction is low in fidelity.

In order to achieve the above object, the present application provides the following technical solutions:

the application provides an image super-resolution reconstruction method, which comprises the following steps:

acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;

inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;

the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into the first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of the resolution improvement multiple on the output of the first channel attention module; the up-sampling module obtains the high-resolution reconstructed image;

any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting the image of the enhanced residual block into the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifier module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;

and outputting the high-resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer;

the output of the up-sampling module is input into the fourth convolutional layer;

and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.

Optionally, any of the residual channel attention groups further includes: a second channel attention module, a fifth convolutional layer, and a third summation module;

the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer;

the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;

and the third summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image and outputting the multi-channel image.

Optionally, the rectifier module is a linear rectifier module.

The application also provides an image super-resolution reconstruction device, which comprises:

the acquisition module is used for acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;

the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;

and the output module is used for outputting the high-resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer;

Optionally, the rectifier module is a linear rectifier module.

The present application also provides a storage medium comprising a stored program, wherein the program executes any one of the above-mentioned image super-resolution reconstruction methods.

The application also provides a device, which comprises at least one processor, at least one memory connected with the processor, and a bus; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute any one of the image super-resolution reconstruction methods described above.

In the image super-resolution reconstruction method and device, the preset network comprises a preset number of multi-sensing branch modules, any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups, and each residual channel attention group comprises a plurality of cascaded enhanced residual blocks. Any one of the enhancement residual blocks comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. The image for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectification module, and the output of the rectification module is input into the third convolution layer.

The image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely, the enhanced residual block can extract three levels of information, namely, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And the feature information of the three levels is all used as the input of the subsequently cascaded enhanced residual block, and by analogy, more levels of information (each level comprises a plurality of channels) can be extracted by any one multi-sensing branch module, and further more levels of information can be extracted by the preset number of multi-sensing branch modules.

In addition, the preset network provided by the application further comprises a first summing module and a first channel attention module, wherein the first summing module sums pixel values of pixel points at the same position of the same channel in multi-level information output by the first convolution layer and each multi-sensing branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summing module, so that different channel characteristics are utilized in different degrees in a self-adaptive manner. Meanwhile, the output of the first channel attention module is improved by the up-sampling module by a preset multiple of resolution, so that compared with a high-resolution image reconstructed in the existing mode, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a multi-awareness network disclosed in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of any one of the multi-sensing branch modules disclosed in the embodiments of the present application;

fig. 3 is a schematic structural diagram of any one of the residual channel attention groups disclosed in the embodiment of the present application;

fig. 4 is a schematic structural diagram of any one of the enhanced residual blocks disclosed in the embodiments of the present application;

FIG. 5 is a schematic diagram of a training process of a multi-awareness attention network disclosed in an embodiment of the present application;

FIG. 6 is a flowchart of a super-resolution image reconstruction method disclosed in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image super-resolution reconstruction apparatus disclosed in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The inventor of the present application finds, in research, that the reasons that the fidelity of a high-resolution image reconstructed by the existing image super-resolution reconstruction method based on deep learning is low include: first, the image information learned by the neural network is not fully utilized, i.e., the perception capability is limited. Secondly, the information extracted from different levels in the neural network is directly used for final reconstruction, namely, the channel-level characteristic difference between the information extracted from different levels in the neural network is ignored.

In one aspect, a network proposed in an embodiment of the present application includes a preset number of multi-sensing branch modules, where any one of the multi-sensing branch modules includes a plurality of cascaded residual channel attention groups, and each of the residual channel attention groups includes a plurality of cascaded enhanced residual blocks. Any one of the enhancement residual blocks comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. The image for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectification module, and the output of the rectification module is input into the third convolution layer.

On the other hand, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely, the enhanced residual block can extract three levels of information, namely, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And the feature information of the three levels is all used as the input of the subsequently cascaded enhanced residual block, and by analogy, more levels of information (each level comprises a plurality of channels) can be extracted by any one multi-sensing branch module, and further more levels of information which can be output by the multi-sensing branch modules in preset quantity can be preset.

In addition, the network provided by the embodiment of the present application further includes a first summing module and a first channel attention module, where the first summing module sums pixel values of pixel points at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-sensing branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summing module, so that different channel characteristics are utilized to different degrees in a self-adaptive manner.

In summary, compared with a high-resolution image reconstructed by the existing method, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.

Fig. 1 is a structure of a multi-awareness attention network according to an embodiment of the present application, including:

the device comprises a first convolutional layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module, an up-sampling module and a fourth convolutional layer.

The low-resolution image to be reconstructed is input into a first convolution layer, the output of the first convolution layer is respectively input into each multi-sensing branch module, the output of each multi-sensing branch module and the output of the first convolution layer are respectively input into a first summation module, and the first summation module is used for summing the pixel values of the pixel points at the same position of the same channel in the input image; the output of the first summation module is input into a first channel attention module; the output of the first channel attention module is input into an up-sampling module, the up-sampling module is used for performing r times up-sampling operation on the output of the first channel attention module, and the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer performs convolution operation on the input of the up-sampling module to obtain a high-resolution reconstructed image.

Specifically, the meaning of summing the pixel values of the pixel points at the same position of the same channel in the input image by the first summing module is as follows: assume that the output image of each multi-sensing branching module is an n-channel image, specifically, a 1 st channel, a 2 nd channel, a 3 rd channel, …, an nth channel. The output of the first convolution layer is an n-channel image, specifically, the 1 st channel, the 2 nd channel, the 3 rd channel, …, and the nth channel, respectively. In this embodiment, the first summing module sums up pixel values of pixel points at the same position in the 1 st channel in the image output by each multi-sensing branch module and the 1 st channel in the image output by the first convolution layer. The first summing module sums up pixel values of pixel points at the same position in the 2 nd channel in the image output by each multi-sensing branch module and the 2 nd channel in the image output by the first convolution layer, and so on, and the first summing module sums up pixel values of pixel points at the same position in the nth channel in the image output by each multi-sensing branch module and the nth channel in the image output by the first convolution layer. Namely, the first summation module is used for summing the pixel values of the pixel points at the same position of the same channel in the image output by each multi-sensing branch module and the image output by the first convolution layer.

It should be noted that, in the embodiment of the present application, the fourth convolutional layer is optional, and if the multi-perception attention network includes the fourth convolutional layer, the fourth convolutional layer outputs a high-resolution image, and if the multi-perception attention network does not include the fourth convolutional layer, the upsampling module outputs the high-resolution image.

Specifically, in the present embodiment, a low-resolution image input to a multi-sensing attention network (hereinafter referred to as an MPAN for convenience of description) is represented as X, where X is an M-row N-column C-channel image. The parameter of the first convolution layer is w_MPAN,1The parameter of the fourth convolution layer is represented by w_MPAN,2And (4) showing. Wherein the output of the first convolution layer is represented by X₀Of particular interest is X₀＝F_Conv(X,w_MPAN,1) Wherein F is_ConvRepresenting a convolution operation. Wherein if the first convolution layer includes n convolution kernels, then X₀For M rows and N columns of N channel images.

In this embodiment, the output of the first convolutional layer is input to each multi-sensing branch module (for convenience of description, any one multi-sensing branch is abbreviated as MPB). In the present embodiment, the convolutional layer set of the kth MPB in MPAN is represented as

The value of K is the number of multi-sensing branch modules in the MPAN. In practice, in order to ensure that a high-resolution image reconstructed by the MPAN has high fidelity, a value of K obtained by an experiment is not less than 2, and certainly, in practice, the value of K may also be other values, and the embodiment does not specifically limit the value of K.

In this embodiment, the calculation formula of the low-resolution image X input into the MPAN network and passing through the MPAN network is shown in the following formula (1):

in the formula (I), the compound is shown in the specification,

x representing the output of the first MPB to the first convolution layer₀As a result of the calculation of (a),

x representing the output of the second MPB to the first convolution layer₀As a result of the calculation of (a),

x representing the output of the Kth MPB to the first convolution layer₀The calculation result of (2). Wherein any MPB outputs X to the first convolution layer₀The results of the calculations of (1) are all M rows and N columns of N channel images.

In the formula (I), the compound is shown in the specification,

represents: the first summation module outputs X to the first convolution layer₀And in the output of each MPB, summing the pixel values of the pixel points at the same position of the same channel. The first summing module outputs M rows and N columns of N-channel images.

In the formula (I), the compound is shown in the specification,

represents: the first channel attention module calculates M rows and N columns of N channel images output by the first summation module. Suppose X is adopted¹To represent

Then

Can be expressed as

Wherein w_downA convolutional layer of n/r' convolutional kernels of size 1X 1 Xn, w_upA convolutional layer consisting of n convolutional kernels of size 1 × 1 × n/r ', r' being a vector dimension transform factor in the pass attention module.

In the formula (I), the compound is shown in the specification,

indicating that the upsampling module upsamples the output of the first summing module. Wherein, r refers to the image size magnification factor, or the resolution improvement factor, or the up-sampling rate.

In the formula (I), the compound is shown in the specification,

and the fourth convolution layer is expressed to carry out convolution calculation on the output of the up-sampling module, so that a high-resolution image is reconstructed.

In the present embodiment, the kth MPB in the MPAN includes G^(k)The structure of each residual channel attention group (hereinafter referred to as RCAG for convenience of description) and the sixth convolution layer, specifically, the kth MPB, is shown in fig. 2. As can be seen from FIG. 2, G^(k)An RCAG and a sixth convolutional layer cascade, specifically, the output of the first RCAG is input into the second RCAG, the output of the second RCAG is input into the third RCAG, … …, the G-th RCAG^(k)-1 RCAG output-to-G^(k)RCAG, G th^(k)The output of each RCAG is input to the sixth convolutional layer. In this example, the convolutional layer set of the g-th RCAG in the k-th MPB is adopted

Indicating that the sixth convolution layer adopts

Represents, therefore, the convolutional layer set of the kth MPB in the MPAN

For any MPB, assuming that the input data is X, the calculation formula of the MPB on the data X is shown in the following formula (2):

wherein X is an input image;

denotes the g-th RCAG convolutional layer set, w_MPBRepresents the sixth convolutional layer in the MPB.

Representing the operation of the g-th RCAG in the MPB on the input, e.g.,

to show adoption of

And performing operation on the X.

The structure of the g-th RCAG in the k-th MPB is shown in fig. 3, and includes a plurality of cascaded enhanced residual blocks, a second channel attention module, a fifth convolutional layer, and a third summation module. Wherein the image for input of RCAG is assumed to be X²First, X is put in²Inputting a first enhanced residual block, inputting the output of the first enhanced residual block into a second enhanced residual block, inputting the output of the second enhanced residual block into a third enhanced residual block, and so on, inputting the output of the last enhanced residual block into a second channel attention group, and inputting the output of the second channel attention group into a fifth convolutional layer. Output of the fifth convolution layer and image X for input of RCAG²And the pixel values of the pixel points at the same position of the same channel in the input image are summed by the third summing module, and the multi-channel image is output. If used for inputting RCAG image X²And the image is an N-channel image with M rows and N columns, the output of the third summing module is an N-channel image with M rows and N columns.

Specifically, the calculation formula of any one RCAG for the input image is shown in the following formula (3):

wherein X is the input image of the RCAG;

is a collection of convolutional layers in RCAG;

represents the convolutional layer set of the b-th ERB in the RCAG,

to show adoption of

The input is operated. w is a_RCAGRepresents the convolutional layer at the end of the RCAG, i.e., the fifth convolutional layer in the present embodiment. w is a_upSee w in the first channel attention Module in MPAN_up，w_downSee w in the first channel attention Module in MPAN_downAnd will not be described in detail herein.

In this embodiment, the structure of any one of the g-th RCAGs in the k-th MPB is as shown in fig. 4. The circuit comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. Wherein the calculation process of the image input into the enhanced residual block by the enhanced residual block comprises: the image input into the enhanced residual block is input into a second convolution layer, the output of the second convolution layer is input into a rectification module, the output of the rectification module is input into a third convolution layer, and the output of the third convolution layer, the output of the rectification module and the image input into the enhanced residual block are respectively input into a second summation module.

Assuming that the image input to the enhanced residual block is denoted by X, the calculation formula of the enhanced residual block for the image is as shown in the following formula (4):

in the formula W_ERB＝{w_ERB,1,w_ERB,2Denotes the set of convolutional layers in the enhanced residual block, w_ERB,1Representing a second convolution layer, w, in the enhanced residual block_ERB,2Representing a third convolutional layer in the enhanced residual block. F_Conv(X,w_ERB,1) Representing in the enhanced residual blockThe second convolution layer performs a convolution operation on the image X. F_ReLU(F_Conv(X,w_ERB,1) Means the rectification module performs a rectification calculation on the output of the second convolution module. F_Conv(F_ReLU(F_Conv(X,w_ERB,1)),w_ERB,2) Indicating that the third convolution layer performs convolution calculation on the output of the rectification module.

In this embodiment, the rectifier module may specifically be a linear rectifier module, that is, the rectifier module calculates the output of the first convolution layer according to a linear rectification function. The linear rectification function is prior art and is not described herein again.

Fig. 5 is a training process of a multi-awareness attention network according to an embodiment of the present application, including the following steps:

s501, acquiring an image set to be trained and a resolution improvement multiple.

In this embodiment, the resolution improvement multiple is a super-resolution improvement multiple that needs to be achieved by the MPAN network obtained by training in this embodiment. For example, if the MPAN network obtained by training in this embodiment is expected to achieve the effect of increasing the resolution by 3 times, the resolution increase multiple in this step is 3.

In this step, the image set to be trained comprises: a preset high resolution image set and a preset low resolution image set. Wherein, the low-resolution image set is obtained by performing r-time degradation on the high-resolution image set. Wherein, the value of r is the resolution ratio improvement multiple in the step.

It should be noted that the resolution improvement multiple in this embodiment may be set according to an actual situation, and the specific value of the resolution improvement multiple is not limited in this embodiment.

In particular, the high resolution image set is represented as

Representing a set of low resolution images as

Wherein r is a resolution improvement multiple.

S502, initializing convolution kernels in each convolution layer in the MPAN network.

Specifically, the MPAN network in this embodiment is the MPAN network shown in fig. 1 in this embodiment.

In this step, the size of the convolution kernel in each convolution layer in the MPAN network is t × t × C, and the number of convolution kernels in each convolution layer may be n.

S503, inputting the low-resolution image set and the resolution improvement multiple into an MPAN network, and obtaining a result image set after the MPAN network reconstructs the low-resolution image set.

In this step, after the low-resolution image set and the resolution improvement multiple are input to the MPAN network, the MPAN network reconstructs the low-resolution image and outputs a high-resolution image reconstructed from the low-resolution image set, which is called a result image for convenience of description.

S504, calculating a loss function value between the result image set and the high-resolution image set according to a preset loss function.

In the present embodiment, the loss function may be an equation shown in the following equation (6):

in the formula, L₁(W_MPAN) The value of the loss function is expressed,

representing a low-resolution image set

The resulting image calculated through the MPAN network,

representing high resolution image convergence

Corresponding high resolution image.

In this embodiment, the loss function may also be an equation shown in the following equation (7):

in the formula, L₂(W_MPAN) The value of the loss function is expressed,

representing a low-resolution image set

The resulting image calculated through the MPAN network,

representing high resolution image convergence

Corresponding high resolution image.

In practice, the calculation formula of the loss function may also adopt other forms of formulas, and the embodiment does not limit the specific form of the loss function.

And S505, adjusting convolution operation weights of all convolution layers in the MPAN according to the loss function value, returning to execute the step of inputting the low-resolution image set and the resolution improvement multiple into the MPAN to obtain a result image set reconstructed by the MPAN on the low-resolution image set until the loss function value is not reduced any more, and obtaining the trained MPAN.

The purpose required by the embodiment to train the MPAN network is as follows: by adjusting the convolution operation weight of all convolution layers in the MPAN network, the loss function value between a result image obtained by the calculation of the MPAN network and a high-resolution image corresponding to the low-resolution image reaches the minimum value, and the trained MPAN network is obtained. That is, under the condition of the designated resolution improvement multiple r, the convolution operation weight set formed by the convolution operation weights of all the convolution layers in the MPAN network after training is optimal.

Specifically, in this step, the specific implementation process of adjusting the convolution operation weights of all convolution layers in the MPAN network according to the loss function value is the prior art, and is not described herein again.

Fig. 6 is a super-resolution image reconstruction method provided in an embodiment of the present application, including the following steps:

s601, acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple.

In this step, the manner of obtaining the low-resolution image to be reconstructed is the prior art, and is not described herein. The resolution improvement multiple in this step may be set by the user according to an actual situation, and the value of the resolution improvement multiple is not limited in this embodiment.

And S602, inputting the low-resolution image and the resolution improvement multiple into the trained preset network to obtain a high-resolution reconstructed image.

The trained preset network in this step is the mpa n network obtained through the training in the embodiment corresponding to fig. 5, and the high-resolution reconstructed image output by the trained mpa n network is obtained.

And S603, outputting the high-resolution reconstructed image.

In this step, a specific implementation manner of outputting the high-resolution reconstructed image is the prior art, and is not described herein again.

Fig. 7 is an image super-resolution reconstruction apparatus provided in an embodiment of the present application, including: an obtaining module 701, a reconstructing module 702 and an outputting module 703.

The obtaining module 701 is configured to obtain a low-resolution image to be reconstructed and a preset resolution improvement multiple. A reconstruction module 702, configured to input the low-resolution image and the resolution improvement multiple into a trained preset network, so as to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one multi-sensing branch module comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;

inputting the low-resolution image into a first convolution layer, and respectively inputting the output of the first convolution layer into each multi-sensing branch module; the output of each multi-sensing branch module and the output of the first convolution layer are respectively input into a first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summation module is input into a first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of resolution enhancement multiple on the output of the first channel attention module; an up-sampling module obtains a high-resolution reconstructed image;

any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting a second convolution layer for inputting the image of the enhanced residual block; the output and input of the second convolution layer are rectified; the output of the rectification module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;

an output module 703 is configured to output a high resolution reconstructed image.

Optionally, the preset number is not less than 2.

Optionally, the preset network further includes: a fourth convolution layer; the output of the up-sampling module is input into the fourth convolution layer; and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain a high-resolution reconstructed image.

Optionally, any residual channel attention group further includes: a second channel attention module, a fifth convolutional layer, and a third summation module; the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into a second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer; the output of the image and the fifth convolution layer used for inputting the residual channel attention group is respectively input into a third summation module; and the third summing module is used for summing the pixel values of the pixel points at the same position of the same channel in the input image and outputting the multi-channel image.

Optionally, the rectification module is a linear rectification module.

The embodiment of the application provides a device, as shown in fig. 8, the device includes at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling program instructions in the memory so as to execute the image super-resolution reconstruction method. The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An image super-resolution reconstruction method is characterized by comprising the following steps:

and outputting the high-resolution reconstructed image.

2. The method of claim 1, wherein the preset number is not less than 2.

3. The method of claim 1, wherein the pre-provisioned network further comprises: a fourth convolution layer;

4. The method of claim 3, wherein any one of the residual channel attention groups further comprises: a second channel attention module, a fifth convolutional layer, and a third summation module;

5. A method according to any one of claims 1 to 4, wherein the rectifier module is a linear rectifier module.

6. An image super-resolution reconstruction apparatus, comprising:

7. The apparatus of claim 6, wherein the predetermined network further comprises: a fourth convolution layer;

8. The apparatus of claim 6, wherein any of the residual channel attention groups further comprises: a second channel attention module, a fifth convolutional layer, and a third summation module;

9. A storage medium comprising a stored program, wherein the program executes the image super-resolution reconstruction method according to any one of claims 1 to 5.

10. An apparatus comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute the image super-resolution reconstruction method according to any one of claims 1 to 5.