CN111223046A - Image super-resolution reconstruction method and device - Google Patents

Image super-resolution reconstruction method and device Download PDF

Info

Publication number
CN111223046A
CN111223046A CN201911140450.XA CN201911140450A CN111223046A CN 111223046 A CN111223046 A CN 111223046A CN 201911140450 A CN201911140450 A CN 201911140450A CN 111223046 A CN111223046 A CN 111223046A
Authority
CN
China
Prior art keywords
module
image
output
input
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911140450.XA
Other languages
Chinese (zh)
Other versions
CN111223046B (en
Inventor
孙旭
董晓宇
高连如
雷莉萍
张兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201911140450.XA priority Critical patent/CN111223046B/en
Publication of CN111223046A publication Critical patent/CN111223046A/en
Application granted granted Critical
Publication of CN111223046B publication Critical patent/CN111223046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image super-resolution reconstruction method and device, wherein the method comprises the following steps: and inputting the low-resolution image and the resolution improvement multiple into the trained preset network to obtain a high-resolution reconstructed image. The trained preset network comprises a preset number of multi-sensing branch modules, and any multi-sensing branch module comprises a plurality of cascaded residual channel attention groups; any residual channel attention group comprises a plurality of cascaded enhanced residual blocks, and any enhanced residual block comprises: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting a second convolution layer for inputting the image of the enhanced residual block; the output and input of the second convolution layer are rectified; the output of the rectification module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module. The image reconstructed by the method has higher spatial resolution and higher information fidelity.

Description

Image super-resolution reconstruction method and device
Technical Field
The present application relates to the field of image processing, and in particular, to a method and an apparatus for reconstructing super-resolution images.
Background
Super-Resolution reconstruction (SR) refers to restoring a high-Resolution image from one or more low-Resolution images of the same scene.
Super-resolution reconstruction is an important digital image processing technology and has wide application in the fields of medicine, remote sensing and various social lives. The current mainstream image super-resolution reconstruction method comprises the following steps: a super-resolution reconstruction method based on deep learning. Specifically, the mapping relation between the high-resolution training sample pair and the low-resolution training sample pair is learned by constructing a neural network, and then high-resolution reconstruction is performed on various low-resolution images of the input network by using the learned priori knowledge.
However, the reconstructed high-resolution image has low fidelity.
Disclosure of Invention
The application provides an image super-resolution reconstruction method and device, and aims to solve the problem that an image obtained through super-resolution reconstruction is low in fidelity.
In order to achieve the above object, the present application provides the following technical solutions:
the application provides an image super-resolution reconstruction method, which comprises the following steps:
acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into the first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of the resolution improvement multiple on the output of the first channel attention module; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting the image of the enhanced residual block into the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifier module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
and outputting the high-resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolutional layer;
and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
Optionally, any of the residual channel attention groups further includes: a second channel attention module, a fifth convolutional layer, and a third summation module;
the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image and outputting the multi-channel image.
Optionally, the rectifier module is a linear rectifier module.
The application also provides an image super-resolution reconstruction device, which comprises:
the acquisition module is used for acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into the first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of the resolution improvement multiple on the output of the first channel attention module; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting the image of the enhanced residual block into the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifier module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
and the output module is used for outputting the high-resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolutional layer;
and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
Optionally, any of the residual channel attention groups further includes: a second channel attention module, a fifth convolutional layer, and a third summation module;
the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image and outputting the multi-channel image.
Optionally, the rectifier module is a linear rectifier module.
The present application also provides a storage medium comprising a stored program, wherein the program executes any one of the above-mentioned image super-resolution reconstruction methods.
The application also provides a device, which comprises at least one processor, at least one memory connected with the processor, and a bus; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute any one of the image super-resolution reconstruction methods described above.
In the image super-resolution reconstruction method and device, the preset network comprises a preset number of multi-sensing branch modules, any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups, and each residual channel attention group comprises a plurality of cascaded enhanced residual blocks. Any one of the enhancement residual blocks comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. The image for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectification module, and the output of the rectification module is input into the third convolution layer.
The image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely, the enhanced residual block can extract three levels of information, namely, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And the feature information of the three levels is all used as the input of the subsequently cascaded enhanced residual block, and by analogy, more levels of information (each level comprises a plurality of channels) can be extracted by any one multi-sensing branch module, and further more levels of information can be extracted by the preset number of multi-sensing branch modules.
In addition, the preset network provided by the application further comprises a first summing module and a first channel attention module, wherein the first summing module sums pixel values of pixel points at the same position of the same channel in multi-level information output by the first convolution layer and each multi-sensing branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summing module, so that different channel characteristics are utilized in different degrees in a self-adaptive manner. Meanwhile, the output of the first channel attention module is improved by the up-sampling module by a preset multiple of resolution, so that compared with a high-resolution image reconstructed in the existing mode, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a multi-awareness network disclosed in an embodiment of the present application;
FIG. 2 is a schematic structural diagram of any one of the multi-sensing branch modules disclosed in the embodiments of the present application;
fig. 3 is a schematic structural diagram of any one of the residual channel attention groups disclosed in the embodiment of the present application;
fig. 4 is a schematic structural diagram of any one of the enhanced residual blocks disclosed in the embodiments of the present application;
FIG. 5 is a schematic diagram of a training process of a multi-awareness attention network disclosed in an embodiment of the present application;
FIG. 6 is a flowchart of a super-resolution image reconstruction method disclosed in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image super-resolution reconstruction apparatus disclosed in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The inventor of the present application finds, in research, that the reasons that the fidelity of a high-resolution image reconstructed by the existing image super-resolution reconstruction method based on deep learning is low include: first, the image information learned by the neural network is not fully utilized, i.e., the perception capability is limited. Secondly, the information extracted from different levels in the neural network is directly used for final reconstruction, namely, the channel-level characteristic difference between the information extracted from different levels in the neural network is ignored.
In one aspect, a network proposed in an embodiment of the present application includes a preset number of multi-sensing branch modules, where any one of the multi-sensing branch modules includes a plurality of cascaded residual channel attention groups, and each of the residual channel attention groups includes a plurality of cascaded enhanced residual blocks. Any one of the enhancement residual blocks comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. The image for inputting the enhanced residual block is input into the second convolution layer, the output of the second convolution layer is input into the rectification module, and the output of the rectification module is input into the third convolution layer.
On the other hand, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain images of all channels, namely, the enhanced residual block can extract three levels of information, namely, the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer. And the feature information of the three levels is all used as the input of the subsequently cascaded enhanced residual block, and by analogy, more levels of information (each level comprises a plurality of channels) can be extracted by any one multi-sensing branch module, and further more levels of information which can be output by the multi-sensing branch modules in preset quantity can be preset.
In addition, the network provided by the embodiment of the present application further includes a first summing module and a first channel attention module, where the first summing module sums pixel values of pixel points at the same position of the same channel in the multi-level information output by the first convolution layer and each multi-sensing branch module to obtain a multi-channel image, and the first channel attention module gives different weights to different channels in the multi-channel image output by the first summing module, so that different channel characteristics are utilized to different degrees in a self-adaptive manner.
In summary, compared with a high-resolution image reconstructed by the existing method, the high-resolution image reconstructed by the embodiment of the application has higher fidelity.
Fig. 1 is a structure of a multi-awareness attention network according to an embodiment of the present application, including:
the device comprises a first convolutional layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module, an up-sampling module and a fourth convolutional layer.
The low-resolution image to be reconstructed is input into a first convolution layer, the output of the first convolution layer is respectively input into each multi-sensing branch module, the output of each multi-sensing branch module and the output of the first convolution layer are respectively input into a first summation module, and the first summation module is used for summing the pixel values of the pixel points at the same position of the same channel in the input image; the output of the first summation module is input into a first channel attention module; the output of the first channel attention module is input into an up-sampling module, the up-sampling module is used for performing r times up-sampling operation on the output of the first channel attention module, and the output of the up-sampling module is input into a fourth convolution layer; and the fourth convolution layer performs convolution operation on the input of the up-sampling module to obtain a high-resolution reconstructed image.
Specifically, the meaning of summing the pixel values of the pixel points at the same position of the same channel in the input image by the first summing module is as follows: assume that the output image of each multi-sensing branching module is an n-channel image, specifically, a 1 st channel, a 2 nd channel, a 3 rd channel, …, an nth channel. The output of the first convolution layer is an n-channel image, specifically, the 1 st channel, the 2 nd channel, the 3 rd channel, …, and the nth channel, respectively. In this embodiment, the first summing module sums up pixel values of pixel points at the same position in the 1 st channel in the image output by each multi-sensing branch module and the 1 st channel in the image output by the first convolution layer. The first summing module sums up pixel values of pixel points at the same position in the 2 nd channel in the image output by each multi-sensing branch module and the 2 nd channel in the image output by the first convolution layer, and so on, and the first summing module sums up pixel values of pixel points at the same position in the nth channel in the image output by each multi-sensing branch module and the nth channel in the image output by the first convolution layer. Namely, the first summation module is used for summing the pixel values of the pixel points at the same position of the same channel in the image output by each multi-sensing branch module and the image output by the first convolution layer.
It should be noted that, in the embodiment of the present application, the fourth convolutional layer is optional, and if the multi-perception attention network includes the fourth convolutional layer, the fourth convolutional layer outputs a high-resolution image, and if the multi-perception attention network does not include the fourth convolutional layer, the upsampling module outputs the high-resolution image.
Specifically, in the present embodiment, a low-resolution image input to a multi-sensing attention network (hereinafter referred to as an MPAN for convenience of description) is represented as X, where X is an M-row N-column C-channel image. The parameter of the first convolution layer is wMPAN,1The parameter of the fourth convolution layer is represented by wMPAN,2And (4) showing. Wherein the output of the first convolution layer is represented by X0Of particular interest is X0=FConv(X,wMPAN,1) Wherein F isConvRepresenting a convolution operation. Wherein if the first convolution layer includes n convolution kernels, then X0For M rows and N columns of N channel images.
In this embodiment, the output of the first convolutional layer is input to each multi-sensing branch module (for convenience of description, any one multi-sensing branch is abbreviated as MPB). In the present embodiment, the convolutional layer set of the kth MPB in MPAN is represented as
Figure BDA0002280778660000081
The value of K is the number of multi-sensing branch modules in the MPAN. In practice, in order to ensure that a high-resolution image reconstructed by the MPAN has high fidelity, a value of K obtained by an experiment is not less than 2, and certainly, in practice, the value of K may also be other values, and the embodiment does not specifically limit the value of K.
In this embodiment, the calculation formula of the low-resolution image X input into the MPAN network and passing through the MPAN network is shown in the following formula (1):
Figure BDA0002280778660000091
in the formula (I), the compound is shown in the specification,
Figure BDA0002280778660000092
x representing the output of the first MPB to the first convolution layer0As a result of the calculation of (a),
Figure BDA0002280778660000093
x representing the output of the second MPB to the first convolution layer0As a result of the calculation of (a),
Figure BDA0002280778660000094
x representing the output of the Kth MPB to the first convolution layer0The calculation result of (2). Wherein any MPB outputs X to the first convolution layer0The results of the calculations of (1) are all M rows and N columns of N channel images.
In the formula (I), the compound is shown in the specification,
Figure BDA0002280778660000095
represents: the first summation module outputs X to the first convolution layer0And in the output of each MPB, summing the pixel values of the pixel points at the same position of the same channel. The first summing module outputs M rows and N columns of N-channel images.
In the formula (I), the compound is shown in the specification,
Figure BDA0002280778660000096
represents: the first channel attention module calculates M rows and N columns of N channel images output by the first summation module. Suppose X is adopted1To represent
Figure BDA0002280778660000097
Then
Figure BDA0002280778660000098
Can be expressed as
Figure BDA0002280778660000099
Wherein wdownA convolutional layer of n/r' convolutional kernels of size 1X 1 Xn, wupA convolutional layer consisting of n convolutional kernels of size 1 × 1 × n/r ', r' being a vector dimension transform factor in the pass attention module.
In the formula (I), the compound is shown in the specification,
Figure BDA00022807786600000910
indicating that the upsampling module upsamples the output of the first summing module. Wherein, r refers to the image size magnification factor, or the resolution improvement factor, or the up-sampling rate.
In the formula (I), the compound is shown in the specification,
Figure BDA00022807786600000911
and the fourth convolution layer is expressed to carry out convolution calculation on the output of the up-sampling module, so that a high-resolution image is reconstructed.
In the present embodiment, the kth MPB in the MPAN includes G(k)The structure of each residual channel attention group (hereinafter referred to as RCAG for convenience of description) and the sixth convolution layer, specifically, the kth MPB, is shown in fig. 2. As can be seen from FIG. 2, G(k)An RCAG and a sixth convolutional layer cascade, specifically, the output of the first RCAG is input into the second RCAG, the output of the second RCAG is input into the third RCAG, … …, the G-th RCAG(k)-1 RCAG output-to-G(k)RCAG, G th(k)The output of each RCAG is input to the sixth convolutional layer. In this example, the convolutional layer set of the g-th RCAG in the k-th MPB is adopted
Figure BDA0002280778660000101
Indicating that the sixth convolution layer adopts
Figure BDA0002280778660000102
Represents, therefore, the convolutional layer set of the kth MPB in the MPAN
Figure BDA0002280778660000103
For any MPB, assuming that the input data is X, the calculation formula of the MPB on the data X is shown in the following formula (2):
Figure BDA0002280778660000104
wherein X is an input image;
Figure BDA0002280778660000105
denotes the g-th RCAG convolutional layer set, wMPBRepresents the sixth convolutional layer in the MPB.
Figure BDA0002280778660000106
Representing the operation of the g-th RCAG in the MPB on the input, e.g.,
Figure BDA0002280778660000107
to show adoption of
Figure BDA0002280778660000108
And performing operation on the X.
The structure of the g-th RCAG in the k-th MPB is shown in fig. 3, and includes a plurality of cascaded enhanced residual blocks, a second channel attention module, a fifth convolutional layer, and a third summation module. Wherein the image for input of RCAG is assumed to be X2First, X is put in2Inputting a first enhanced residual block, inputting the output of the first enhanced residual block into a second enhanced residual block, inputting the output of the second enhanced residual block into a third enhanced residual block, and so on, inputting the output of the last enhanced residual block into a second channel attention group, and inputting the output of the second channel attention group into a fifth convolutional layer. Output of the fifth convolution layer and image X for input of RCAG2And the pixel values of the pixel points at the same position of the same channel in the input image are summed by the third summing module, and the multi-channel image is output. If used for inputting RCAG image X2And the image is an N-channel image with M rows and N columns, the output of the third summing module is an N-channel image with M rows and N columns.
Specifically, the calculation formula of any one RCAG for the input image is shown in the following formula (3):
Figure BDA0002280778660000109
wherein X is the input image of the RCAG;
Figure BDA00022807786600001010
is a collection of convolutional layers in RCAG;
Figure BDA00022807786600001011
represents the convolutional layer set of the b-th ERB in the RCAG,
Figure BDA00022807786600001012
to show adoption of
Figure BDA00022807786600001013
The input is operated. w is aRCAGRepresents the convolutional layer at the end of the RCAG, i.e., the fifth convolutional layer in the present embodiment. w is aupSee w in the first channel attention Module in MPANup,wdownSee w in the first channel attention Module in MPANdownAnd will not be described in detail herein.
In this embodiment, the structure of any one of the g-th RCAGs in the k-th MPB is as shown in fig. 4. The circuit comprises a second convolution layer, a third convolution layer, a rectifying module and a second summing module. Wherein the calculation process of the image input into the enhanced residual block by the enhanced residual block comprises: the image input into the enhanced residual block is input into a second convolution layer, the output of the second convolution layer is input into a rectification module, the output of the rectification module is input into a third convolution layer, and the output of the third convolution layer, the output of the rectification module and the image input into the enhanced residual block are respectively input into a second summation module.
Assuming that the image input to the enhanced residual block is denoted by X, the calculation formula of the enhanced residual block for the image is as shown in the following formula (4):
Figure BDA0002280778660000111
in the formula WERB={wERB,1,wERB,2Denotes the set of convolutional layers in the enhanced residual block, wERB,1Representing a second convolution layer, w, in the enhanced residual blockERB,2Representing a third convolutional layer in the enhanced residual block. FConv(X,wERB,1) Representing in the enhanced residual blockThe second convolution layer performs a convolution operation on the image X. FReLU(FConv(X,wERB,1) Means the rectification module performs a rectification calculation on the output of the second convolution module. FConv(FReLU(FConv(X,wERB,1)),wERB,2) Indicating that the third convolution layer performs convolution calculation on the output of the rectification module.
In this embodiment, the rectifier module may specifically be a linear rectifier module, that is, the rectifier module calculates the output of the first convolution layer according to a linear rectification function. The linear rectification function is prior art and is not described herein again.
Fig. 5 is a training process of a multi-awareness attention network according to an embodiment of the present application, including the following steps:
s501, acquiring an image set to be trained and a resolution improvement multiple.
In this embodiment, the resolution improvement multiple is a super-resolution improvement multiple that needs to be achieved by the MPAN network obtained by training in this embodiment. For example, if the MPAN network obtained by training in this embodiment is expected to achieve the effect of increasing the resolution by 3 times, the resolution increase multiple in this step is 3.
In this step, the image set to be trained comprises: a preset high resolution image set and a preset low resolution image set. Wherein, the low-resolution image set is obtained by performing r-time degradation on the high-resolution image set. Wherein, the value of r is the resolution ratio improvement multiple in the step.
It should be noted that the resolution improvement multiple in this embodiment may be set according to an actual situation, and the specific value of the resolution improvement multiple is not limited in this embodiment.
In particular, the high resolution image set is represented as
Figure BDA0002280778660000121
Representing a set of low resolution images as
Figure BDA0002280778660000122
Wherein r is a resolution improvement multiple.
S502, initializing convolution kernels in each convolution layer in the MPAN network.
Specifically, the MPAN network in this embodiment is the MPAN network shown in fig. 1 in this embodiment.
In this step, the size of the convolution kernel in each convolution layer in the MPAN network is t × t × C, and the number of convolution kernels in each convolution layer may be n.
S503, inputting the low-resolution image set and the resolution improvement multiple into an MPAN network, and obtaining a result image set after the MPAN network reconstructs the low-resolution image set.
In this step, after the low-resolution image set and the resolution improvement multiple are input to the MPAN network, the MPAN network reconstructs the low-resolution image and outputs a high-resolution image reconstructed from the low-resolution image set, which is called a result image for convenience of description.
S504, calculating a loss function value between the result image set and the high-resolution image set according to a preset loss function.
In the present embodiment, the loss function may be an equation shown in the following equation (6):
Figure BDA0002280778660000123
in the formula, L1(WMPAN) The value of the loss function is expressed,
Figure BDA0002280778660000124
representing a low-resolution image set
Figure BDA0002280778660000125
The resulting image calculated through the MPAN network,
Figure BDA0002280778660000126
representing high resolution image convergence
Figure BDA0002280778660000127
Corresponding high resolution image.
In this embodiment, the loss function may also be an equation shown in the following equation (7):
Figure BDA0002280778660000128
in the formula, L2(WMPAN) The value of the loss function is expressed,
Figure BDA0002280778660000131
representing a low-resolution image set
Figure BDA0002280778660000132
The resulting image calculated through the MPAN network,
Figure BDA0002280778660000133
representing high resolution image convergence
Figure BDA0002280778660000134
Corresponding high resolution image.
In practice, the calculation formula of the loss function may also adopt other forms of formulas, and the embodiment does not limit the specific form of the loss function.
And S505, adjusting convolution operation weights of all convolution layers in the MPAN according to the loss function value, returning to execute the step of inputting the low-resolution image set and the resolution improvement multiple into the MPAN to obtain a result image set reconstructed by the MPAN on the low-resolution image set until the loss function value is not reduced any more, and obtaining the trained MPAN.
The purpose required by the embodiment to train the MPAN network is as follows: by adjusting the convolution operation weight of all convolution layers in the MPAN network, the loss function value between a result image obtained by the calculation of the MPAN network and a high-resolution image corresponding to the low-resolution image reaches the minimum value, and the trained MPAN network is obtained. That is, under the condition of the designated resolution improvement multiple r, the convolution operation weight set formed by the convolution operation weights of all the convolution layers in the MPAN network after training is optimal.
Specifically, in this step, the specific implementation process of adjusting the convolution operation weights of all convolution layers in the MPAN network according to the loss function value is the prior art, and is not described herein again.
Fig. 6 is a super-resolution image reconstruction method provided in an embodiment of the present application, including the following steps:
s601, acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple.
In this step, the manner of obtaining the low-resolution image to be reconstructed is the prior art, and is not described herein. The resolution improvement multiple in this step may be set by the user according to an actual situation, and the value of the resolution improvement multiple is not limited in this embodiment.
And S602, inputting the low-resolution image and the resolution improvement multiple into the trained preset network to obtain a high-resolution reconstructed image.
The trained preset network in this step is the mpa n network obtained through the training in the embodiment corresponding to fig. 5, and the high-resolution reconstructed image output by the trained mpa n network is obtained.
And S603, outputting the high-resolution reconstructed image.
In this step, a specific implementation manner of outputting the high-resolution reconstructed image is the prior art, and is not described herein again.
Fig. 7 is an image super-resolution reconstruction apparatus provided in an embodiment of the present application, including: an obtaining module 701, a reconstructing module 702 and an outputting module 703.
The obtaining module 701 is configured to obtain a low-resolution image to be reconstructed and a preset resolution improvement multiple. A reconstruction module 702, configured to input the low-resolution image and the resolution improvement multiple into a trained preset network, so as to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one multi-sensing branch module comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
inputting the low-resolution image into a first convolution layer, and respectively inputting the output of the first convolution layer into each multi-sensing branch module; the output of each multi-sensing branch module and the output of the first convolution layer are respectively input into a first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summation module is input into a first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of resolution enhancement multiple on the output of the first channel attention module; an up-sampling module obtains a high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting a second convolution layer for inputting the image of the enhanced residual block; the output and input of the second convolution layer are rectified; the output of the rectification module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into a second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
an output module 703 is configured to output a high resolution reconstructed image.
Optionally, the preset number is not less than 2.
Optionally, the preset network further includes: a fourth convolution layer; the output of the up-sampling module is input into the fourth convolution layer; and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain a high-resolution reconstructed image.
Optionally, any residual channel attention group further includes: a second channel attention module, a fifth convolutional layer, and a third summation module; the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into a second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer; the output of the image and the fifth convolution layer used for inputting the residual channel attention group is respectively input into a third summation module; and the third summing module is used for summing the pixel values of the pixel points at the same position of the same channel in the input image and outputting the multi-channel image.
Optionally, the rectification module is a linear rectification module.
The embodiment of the application provides a device, as shown in fig. 8, the device includes at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling program instructions in the memory so as to execute the image super-resolution reconstruction method. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An image super-resolution reconstruction method is characterized by comprising the following steps:
acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into the first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of the resolution improvement multiple on the output of the first channel attention module; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting the image of the enhanced residual block into the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifier module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
and outputting the high-resolution reconstructed image.
2. The method of claim 1, wherein the preset number is not less than 2.
3. The method of claim 1, wherein the pre-provisioned network further comprises: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolutional layer;
and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
4. The method of claim 3, wherein any one of the residual channel attention groups further comprises: a second channel attention module, a fifth convolutional layer, and a third summation module;
the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image and outputting the multi-channel image.
5. A method according to any one of claims 1 to 4, wherein the rectifier module is a linear rectifier module.
6. An image super-resolution reconstruction apparatus, comprising:
the acquisition module is used for acquiring a low-resolution image to be reconstructed and a preset resolution improvement multiple;
the reconstruction module is used for inputting the low-resolution image and the resolution improvement multiple into a trained preset network to obtain a high-resolution reconstructed image; the trained preset network comprises a first convolution layer, a preset number of multi-sensing branch modules, a first summation module, a first channel attention module and an up-sampling module; any one of the multi-sensing branch modules comprises a plurality of cascaded residual channel attention groups; any one of the residual channel attention groups comprises a plurality of cascaded enhanced residual blocks;
the low-resolution image is input into the first convolution layer, and the output of the first convolution layer is respectively input into each multi-perception branch module; the output of each multi-perception branch module and the output of the first convolution layer are respectively input into the first summation module; the first summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image; the output of the first summing module is input to the first channel attention module; the output of the first channel attention module is input into the up-sampling module; the up-sampling module is used for performing up-sampling operation of the resolution improvement multiple on the output of the first channel attention module; the up-sampling module obtains the high-resolution reconstructed image;
any one of the enhanced residual blocks includes: the second convolution layer, the third convolution layer, the rectifying module and the second summing module; inputting the image of the enhanced residual block into the second convolution layer; the output of the second convolution layer is input into the rectification module; the output of the rectifier module is input into the third convolution layer; the image used for inputting the enhanced residual block, the output of the rectification module and the output of the third convolution layer are respectively input into the second summation module; the second summation module is used for summing pixel values of pixel points at the same position of the same channel in the input image to obtain a multi-channel image;
and the output module is used for outputting the high-resolution reconstructed image.
7. The apparatus of claim 6, wherein the predetermined network further comprises: a fourth convolution layer;
the output of the up-sampling module is input into the fourth convolutional layer;
and the fourth convolution layer performs convolution operation on the output of the up-sampling module to obtain the high-resolution reconstructed image.
8. The apparatus of claim 6, wherein any of the residual channel attention groups further comprises: a second channel attention module, a fifth convolutional layer, and a third summation module;
the image is used for inputting the residual channel attention group, and a first enhanced residual block in the residual channel attention group is input; the output of the first enhanced residual block is input into a second enhanced residual block of the residual channel attention group; the output of the B-1 enhanced residual block in the residual channel attention group is input into the B enhanced residual block in the residual channel attention group; the output of the B enhanced residual block is input into the second channel attention module; the output of the second channel attention module is input into the fifth convolutional layer;
the image for inputting the residual channel attention group and the output of the fifth convolution layer are respectively input into the third summation module;
and the third summing module is used for summing pixel values of pixel points at the same position of the same channel in the input image and outputting the multi-channel image.
9. A storage medium comprising a stored program, wherein the program executes the image super-resolution reconstruction method according to any one of claims 1 to 5.
10. An apparatus comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute the image super-resolution reconstruction method according to any one of claims 1 to 5.
CN201911140450.XA 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device Active CN111223046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911140450.XA CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911140450.XA CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Publications (2)

Publication Number Publication Date
CN111223046A true CN111223046A (en) 2020-06-02
CN111223046B CN111223046B (en) 2023-04-25

Family

ID=70832773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911140450.XA Active CN111223046B (en) 2019-11-20 2019-11-20 Image super-resolution reconstruction method and device

Country Status (1)

Country Link
CN (1) CN111223046B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021249523A1 (en) * 2020-06-12 2021-12-16 华为技术有限公司 Image processing method and device
WO2022217746A1 (en) * 2021-04-13 2022-10-20 湖南大学 High-resolution hyperspectral calculation imaging method and system, and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109003229A (en) * 2018-08-09 2018-12-14 成都大学 Magnetic resonance super resolution ratio reconstruction method based on three-dimensional enhancing depth residual error network
CN110111256A (en) * 2019-04-28 2019-08-09 西安电子科技大学 Image Super-resolution Reconstruction method based on residual error distillation network
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN109003229A (en) * 2018-08-09 2018-12-14 成都大学 Magnetic resonance super resolution ratio reconstruction method based on three-dimensional enhancing depth residual error network
CN110111256A (en) * 2019-04-28 2019-08-09 西安电子科技大学 Image Super-resolution Reconstruction method based on residual error distillation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔顺: "基于深度学习的图像超分辨率重建技术研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021249523A1 (en) * 2020-06-12 2021-12-16 华为技术有限公司 Image processing method and device
WO2022217746A1 (en) * 2021-04-13 2022-10-20 湖南大学 High-resolution hyperspectral calculation imaging method and system, and medium

Also Published As

Publication number Publication date
CN111223046B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
US11200424B2 (en) Space-time memory network for locating target object in video content
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
Kim et al. Accurate image super-resolution using very deep convolutional networks
US10311547B2 (en) Image upscaling system, training method thereof, and image upscaling method
CN111476719B (en) Image processing method, device, computer equipment and storage medium
CN112037129B (en) Image super-resolution reconstruction method, device, equipment and storage medium
CN111754404B (en) Remote sensing image space-time fusion method based on multi-scale mechanism and attention mechanism
CN111523449A (en) Crowd counting method and system based on pyramid attention network
CN114049332A (en) Abnormality detection method and apparatus, electronic device, and storage medium
CN112889084B (en) Method, system and computer readable medium for improving color quality of image
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN111696038A (en) Image super-resolution method, device, equipment and computer-readable storage medium
CN111223046B (en) Image super-resolution reconstruction method and device
CN110809126A (en) Video frame interpolation method and system based on adaptive deformable convolution
CN112997479A (en) Method, system and computer readable medium for processing images across a phase jump connection
WO2020000877A1 (en) Method and device for generating image
CN113591528A (en) Document correction method, device, computer equipment and storage medium
CN114830168A (en) Image reconstruction method, electronic device, and computer-readable storage medium
CN116071279A (en) Image processing method, device, computer equipment and storage medium
KR20150099964A (en) Method and apparatus for extracting image feature
CN110782398B (en) Image processing method, generative countermeasure network system and electronic device
CN110335228B (en) Method, device and system for determining image parallax
CN116630152A (en) Image resolution reconstruction method and device, storage medium and electronic equipment
Xu et al. Attention‐based multi‐channel feature fusion enhancement network to process low‐light images
CN113947521A (en) Image resolution conversion method and device based on deep neural network and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant