CN114972024A

CN114972024A - Image super-resolution reconstruction device and method based on graph representation learning

Info

Publication number: CN114972024A
Application number: CN202210532655.8A
Authority: CN
Inventors: 梁吉业; 唐胜贵; 姚凯旋; 王智强
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-08-30

Abstract

The invention relates to the technical field of artificial intelligence, in particular to an image super-resolution reconstruction device and method based on graph representation learning, which combines graph representation learning with channel space attention, and a network adaptively learns the global dependency among different depths, channels and positions by utilizing an adaptive attention mechanism; specifically, the layer feature attention module captures the long-distance dependency between layers; meanwhile, the channel space attention module integrates channel and context information into each layer; the two attention modules are cooperatively applied to multi-level features, and then more features with rich information can be captured, so that the distinguishing and learning capacity of a network on the features is improved, and the super-resolution reconstruction quality of the channel attention residual group image is improved.

Description

Image super-resolution reconstruction device and method based on graph representation learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a device and a method for reconstructing image super-resolution based on graph representation learning.

Background

The image super-resolution reconstruction technology aims to overcome or compensate the problems of imaging image blurring, low quality and the like caused by the limitation of an image acquisition system or an acquisition environment. The current common super-resolution reconstruction methods have three types: interpolation-based, reconstruction-based, and learning-based methods. The learning-based method is obviously superior to other two methods in efficiency and reconstruction quality, and obtains wide attention of academic and industrial fields.

The existing image super-resolution reconstruction method based on deep learning mainly focuses on wider or deeper architecture design, and ignores the detail information of the image and the potential relation between features. In recent years, some researchers have begun to explore the problem of image super-resolution reconstruction by graphical representation learning. However, in recent years, the problem of performing learning processing super-resolution reconstruction by using graph representation is limited to information processing on the whole image or relationship processing between different feature images in the same layer, and the interdependence relationship between extracted features in different layers is ignored, and in a traditional convolutional neural network model, channel dimension information and space dimension information are not considered at the same time, which is not beneficial to extracting image features in deeper layers.

Based on the above, the layer feature map attention module is provided for capturing the interdependence relation between different layer features, and the spatial aggregation module and the channel attention residual error module are embedded in the model, so that the model also takes channel dimension information and spatial scale information into account on the basis of considering the interdependence relation between different layer extracted features, thereby improving the expression capability of the model and enabling the reconstructed high-resolution image to have better image details.

Disclosure of Invention

The invention aims to provide a device and a method for reconstructing image super-resolution based on graph representation learning, which are used for realizing the technical effect of improving the reconstruction quality of the image super-resolution.

An image super-resolution reconstruction apparatus based on graph representation learning, comprising:

s1, an input module is used for preparing a data set, and establishing a training set according to an image degradation model to obtain m low-resolution images and m high-resolution images, wherein the m high-resolution images correspond to real high-resolution images; m is an integer greater than 1;

s2, a shallow feature extraction module (100) inputs the low-resolution image into the shallow feature extraction module to extract the shallow features of the image;

s3, a deep layer feature extraction module (200) inputs the shallow layer features into the deep layer feature extraction module to extract the deep layer features;

s4, a reconstruction module (300) inputs the deep features into the reconstruction module, performs sub-pixel convolution to complete up-sampling processing and reconstructs a high-resolution image;

s5, a device optimization module, which optimizes the image super-resolution reconstruction device through a loss function, and calculates an average L1 error between the m reconstructed high-resolution images and the corresponding real high-resolution images by using the data set in the S1, wherein the expression is as follows:

wherein L (Θ) represents a loss function, H _FSGCN Representing a function of the image super-resolution reconstruction apparatus;

and S6, an output module, namely inputting the low-resolution image according to the optimal model obtained by training in the S5, and outputting the final high-resolution image by the system.

Further, the deep feature extraction module includes a residual group composed of a plurality of channel attention residual blocks, a layer feature map attention module, a spatial attention module, and a feature fusion module, and the S3 includes:

s31, extracting the shallow feature through a residual group (210) consisting of a plurality of serially connected channel attention residual blocks (211) to obtain features of different depths;

s32, learning the correlation among different layer characteristics through the layer characteristic diagram attention module (220);

s33, making the interested area more prominent through the space attention module (230);

and S34, adding the shallow features and the fusion features through a feature fusion module to obtain corresponding deep features.

The layer feature map attention module comprises a feature combination module, a feature relation graph calculation module and a feature updating module,

the S32 includes:

combining the features of different depths obtained by the S31;

and calculating the relation among different features through the feature relation, constructing a graph structure among residual features by setting a threshold value, and updating the features through the graph attention.

The spatial attention module comprises a feature dimension reduction module, a feature processing module and a feature updating module, wherein the 33 comprises:

performing dimension reduction processing on the features of different depths obtained in the step S31;

extracting characteristic information through characteristic processing, and then performing characteristic dimension-increasing processing;

and calculating attention coefficients of different features through a feature updating module, and giving new representations to the features.

The feature update module includes computing an attention coefficient and a weighted sum (240); the steps of the feature update module include:

obtaining a graph structure and coefficients of edges among all nodes through the calculation characteristic relation graph;

performing attention coefficient normalization on the coefficients of edges among all nodes through Softmax;

and (3) performing weighted summation on the features through the calculated attention coefficient to complete the updating of the features, splicing by using a Concat layer, adding the features obtained in the step S31, the features obtained in the step S33 and the shallow features, and finally performing dimension reduction through a 1 x 1 convolutional layer.

A super-resolution image reconstruction method based on graph representation learning comprises the following steps: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module;

the shallow feature extraction module is used for extracting shallow features from an input low-resolution image;

the deep layer feature extraction module comprises a residual group consisting of a plurality of channel attention residual blocks, a layer feature map attention module, a space attention module and a feature fusion module; the residual error group extracts features of different depths from the shallow features, and then the features are fused to obtain corresponding fusion features; the layer characteristic diagram attention module is used for composition by acquiring the characteristics of each block residual error group, and calculating to obtain the correlation among different block characteristics; the spatial attention module is used for carrying out spatial aggregation on a plurality of residual error groups to extract features, so that the interested region of the image is more concerned; fusing and adding the shallow layer features, residual group features, layer feature map attention module features and space attention module features through a feature fusion module to obtain corresponding deep layer features;

the reconstruction module is used for performing up-sampling and feature reconstruction on the deep features and outputting a final high-resolution image;

the image super-resolution reconstruction method based on graph representation learning comprises a plurality of residual groups, wherein each residual group comprises a plurality of channel attention residual blocks, and the output channels of the channel attention residual blocks are 64 feature graphs; and each channel attention residual block is arranged in series; the output of each channel attention residual block in the residual group is the input of the next channel attention residual block; the channel attention residual block comprises two convolution layers of 3 multiplied by 3, a computing layer, a global pooling layer and a Sigmoid layer and aims to acquire attention weights for corresponding feature maps; the convolutional layer is used for extracting features; the computing layer accumulates the output of each channel; then multiplying the global pooling layer by the Sigmoid layer to obtain corresponding characteristics; and finally inputting the data to a next channel attention residual block for feature extraction.

The image super-resolution reconstruction method based on graph representation learning comprises a layer feature graph attention module; the input of the layer feature map attention module is the features extracted from a plurality of different residual error groups; the features extracted from different groups are taken as input, a weight is distributed to the features extracted from different groups through the attention of the graph and multiplied by the weight, and finally a multi-dimensional vector is output to be subjected to feature fusion with the output of the tail part of the model.

The image super-resolution reconstruction method based on graph representation learning comprises a spatial attention module; the input of the space attention module is the extracted features of a plurality of different residual error groups, dimension reduction processing is carried out on a 1 x 1 convolution layer, then down sampling is carried out on the space attention module through convolution and pooling operation, then up sampling is carried out after learning of one residual error group, dimension increasing processing is carried out on the space attention module through the 1 x 1 convolution layer, finally Sigmoid calculation weight is multiplied by the space attention module, and feature fusion is carried out on the output and the output of the power claim 3.

In a first aspect of the present invention, the present invention provides a super-resolution image reconstruction apparatus based on graph representation learning, including: the input module is used for preparing a data set, establishing a training set according to an image degradation model and obtaining m low-resolution images and m high-resolution images, wherein the m high-resolution images correspond to real high-resolution images, and m is an integer greater than 1; the shallow feature extraction module is used for inputting the low-resolution image into the shallow feature extraction module to extract the shallow features of the image; the deep layer feature extraction module inputs the shallow layer features into the deep layer feature extraction module to extract the deep layer features; the reconstruction module is used for inputting the deep features into the reconstruction module, performing sub-pixel convolution to complete up-sampling processing and reconstructing a high-resolution image; the device optimization module is used for optimizing the image super-resolution reconstruction device through a loss function, and calculating an average L1 error between the m reconstructed high-resolution images and the corresponding real high-resolution images by using the data set in S1, wherein the expression is as follows:

wherein L (Θ) represents a loss function, H _FSGCN Representing a function of the image super-resolution reconstruction device; and the output module is used for outputting a final high-resolution image after inputting the low-resolution image according to the optimal model obtained by the training of S5.

In a second aspect, the present invention provides a graph representation learning-based image super-resolution reconstruction method, which is applied to the image super-resolution reconstruction apparatus described above, and includes: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module; the shallow feature extraction module is used for extracting shallow features from the input low-resolution image; the deep layer feature extraction module comprises a residual group consisting of a plurality of channel attention residual blocks, a layer feature map attention module, a space attention module and a feature fusion module; extracting features of different depths from the shallow features by the residual group, and then fusing the features to obtain corresponding fusion features; the layer characteristic graph attention module is used for constructing a graph by acquiring the characteristics of each block residual error group, and calculating to obtain the correlation among different block characteristics; the spatial attention module is used for carrying out spatial aggregation on a plurality of residual error groups to extract features, so that the interested region of the image is more concerned; fusing and adding the shallow layer features, the residual group features, the layer feature map attention module features and the spatial attention module features through a feature fusion module to obtain corresponding deep layer features; and the reconstruction module is used for performing up-sampling and feature reconstruction on the deep features and outputting a final high-resolution image.

The invention can realize beneficial effects: the invention combines graph representation learning and channel space attention, and the network adaptively learns the global dependency among different depths, channels and positions by using an attention-self mechanism; specifically, the layer feature attention module captures the long-distance dependency between layers; meanwhile, the channel space attention module integrates channel and context information into each layer; the two attention modules are cooperatively applied to multi-level features, and then more features with rich information can be captured, so that the distinguishing and learning capacity of a network on the features is improved, and the super-resolution reconstruction quality of the channel attention residual group image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a channel attention residual group according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a channel attention residual block according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a layer profile attention module according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating an image super-resolution reconstruction method according to an embodiment of the present invention.

An icon: 10-an image super-resolution reconstruction device; 100-a shallow feature extraction module; 200-a deep feature extraction module; 210-channel attention residual set; 211-channel attention residual block; 220-layer profile attention module; 230-spatial attention module; 240-feature fusion module; 300-reconstruction module.

Detailed Description

The technical solution in the embodiment of the present invention will be described below with reference to the drawings in the embodiment of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present invention.

With the development of deep learning, image super-resolution reconstruction (SR) has been well developed, but in recent years, the problem of applying graph representation to learning and processing super-resolution reconstruction is limited to information processing on the whole image or relationship processing between different feature images of the same layer, and the interdependence relationship between extracted features of different layers is ignored, and in a traditional convolutional neural network model, channel dimension information and spatial scale information are not considered at the same time, which is not beneficial to extracting image features of deeper layers. Therefore, the detail feature processing result still needs to be improved during image reconstruction, and therefore, the embodiment of the invention provides an image super-resolution reconstruction method based on graph representation learning to solve the above problems.

In one embodiment, the image super-resolution reconstruction device provided by the embodiment of the invention comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module;

the shallow layer feature extraction module is used for extracting low-resolution images I from input _LR Middle extracted shallow feature F ₀ ：

F ₀ ＝Conv(I _LR )

The deep feature extraction module comprises a plurality of channel attention residual blocks

Set of composed residuals

Wherein:

the input of the first residual block of the first residual group is the shallow feature F ₀ ，

The residual group is composed of m residual blocks, and the output of each residual block is the input of the next residual block;

a layer profile attention module:

F _{F_GAT} ＝H _{F_GAT} (concatenate(F ₁ ,F ₂ ,...,F _N ))

wherein:

F _{F_GAT} represents H _{F_GAT} Learning the relationship among all residual error groups from the output features of the residual error groups, so that the feature layer with high contribution is strengthened and the low contribution is suppressed;

a spatial attention module:

F _{S_AGG} the features screened by the channel space attention module.

A feature fusion module, which fuses and adds the shallow feature and the residual group feature, the layer feature graph attention feature and the space attention module feature to obtain the corresponding deep feature: f ₀ +F _{L_GAT} +F _{S_AGG}

The reconstruction module is used for performing up-sampling and feature reconstruction according to the deep features and outputting a final high-resolution image:

I _SR ＝U _↑ (F ₀ +F _{L_GAT} +F _{S_AGG} )

U _↑ representing a sub-pixel convolution operation, I _SR Showing the results of SR reconstruction.

As shown in fig. 1, in the implementation process, the channel attention residual group may be set to N (N is an integer greater than 2); the shallow layer feature extraction module comprises an n multiplied by n convolution layer, wherein n is an odd number larger than 1; all the channel attention residual error groups are connected in series, and features of different depths are extracted; the feature fusion module adds the shallow features and the fused deep features to ensure that the network is focused on learning the residual features of the high frequency; the reconstruction module comprises a sub-pixel convolution layer and a 1 x 1 convolution layer and is used for performing up-sampling and feature reconstruction on the deep features output by the multi-scale residual error module and outputting a final high-resolution image.

Referring to fig. 2 and fig. 3, fig. 2 is a schematic structural diagram of a channel attention residual group according to an embodiment of the present invention;

in one embodiment, one channel attention residual group comprises M channel attention residual blocks, different residual blocks are connected in series and connected end to end; the features learned by the previous channel attention residual group are transmitted into the next channel attention residual group; one residual block comprises a convolutional layer-ReLU layer-convolutional layer and a channel attention module, and the head residual and the tail residual are connected; the features learned by the previous channel attention residual block are passed to the next channel attention residual block.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a layer feature diagram attention module according to an embodiment of the present invention;

in one embodiment, the features learned by the N channel attention residual groups are summarized into N one-dimensional vectors; each vector is multiplied by other N vectors including the vector to calculate the correlation, an N multiplied by N two-dimensional matrix is obtained, then the average value of each line is subtracted from each line, the value smaller than 0 is marked as 0, then the attention coefficient between each node and different nodes is calculated according to N nodes including the node and other nodes, then the attention coefficient is normalized, and finally the node is updated; the step is that the node sufficiently fuses the information of N nodes related to the node, so that the node has stronger expression capability.

In the implementation described above, the input to the module is the extracted set of intermediate features, with dimensions N × H × W × C. The feature set is then reconstructed into a two-dimensional matrix having dimensions N × HWC. Calculating the relationship weight W between different layers by matrix multiplication and corresponding transposition to obtain a relationship matrix M, wherein M is _i Is the ith row of the M matrix;the average value of M is the average value node of the importance of different nodes, and the importance average value of different nodes is selected as a threshold value and is represented as T _i ，

Weight w _i,j Expressed as the final weight value w of the ith node and the jth node _i,j ＝softmax(relu(M _i,j -T _i ) I, j ═ 1, 2.. times, N, and finally, the reshaped feature set FGs is multiplied by the prediction correlation matrix

The layer feature map attention module allows the attention information of the network to be concentrated on the intermediate features, and the expression capacity of the network is further improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention.

As shown in FIG. 5, in one embodiment, the learned features of the N channel attention residual groups are aggregated as input to the spatial attention module; performing dimensionality reduction on the data through 1 × 1 convolution (1 × 1Conv), then performing convolution and pooling upsampling, adding a result and a dimensionality reduced result, performing dimensionality enhancement on the data through 1 × 1 convolution (1 × 1Conv), and calculating weight; forcing the features to be more concentrated in the region of interest, a better representation of the features can be obtained when aggregating the features output by each residual set.

Specifically, 1 × 1 convolutional layers are used to reduce the channel dimension, so that the calculation amount of the whole module is reduced; then, in order to reduce the size of the feature space and obtain a larger receptive field range, a convolution with stride of 2 and Maxpooling + ConvGroups are used; wherein ConvGroups are composed of convolution of 7 × 7Maxpooling and stride of 3, and are used for further enlarging the scope of receptive field; next, correspondingly increasing the upper sampling layer with the previous step, and restoring the space size by using 1 multiplied by 1 convolution; finally, generating attention parameter W through sigmoid layer _{S_AGG} And input F _{S_AGG} Multiplication.

The present invention also uses a hopping connection to connect the input directly to the upsampling layer;

compared with the traditional spatial attention and channel attention, the S _ AGG has the capability of adaptively learning the attention among and in channels by explicitly modeling the dependency relationship among the channel and the spatial characteristics.

Referring to fig. 6, fig. 6 is a flowchart illustrating an image super-resolution reconstruction apparatus according to an embodiment of the present invention.

In an implementation manner, the embodiment of the present invention further provides an image super-resolution reconstruction apparatus applied to the image super-resolution reconstruction model, which is described in detail below.

S1, an input module, which is used for preparing a data set and according to an image degradation model:

in which I _y Establishing a training set for the high-resolution images to obtain m low-resolution images LR and m high-resolution images HR, wherein the m high-resolution images correspond to the real high-resolution images, and m is an integer greater than 1;

s2, a shallow layer feature extraction module inputs the low-resolution image into the shallow layer feature extraction module to extract the shallow layer feature F of the image ₀ ＝Conv(I _LR )；

S3, inputting the shallow layer characteristics into a deep layer characteristic extraction module to extract deep layer characteristics F ₀ +F _{L_GAT} +F _{S_AGG} ；

S4, a reconstruction module is used for inputting the deep features into the reconstruction module, performing sub-pixel convolution to complete up-sampling processing and reconstructing a high-resolution image I _SR ＝U _↑ (F ₀ +F _{L_GAT} +F _{S_AGG} )；

S5, a device optimization module optimizes the image super-resolution reconstruction device through a loss function, wherein the loss function uses an average L1 error between m reconstructed high-resolution images and corresponding real high-resolution images, and the expression is as follows:

wherein L (Θ) represents a loss function, H _FSGCN A function representing an image super-resolution reconstruction device;

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image super-resolution reconstruction apparatus based on graph representation learning, comprising:

and S6, an output module is used for outputting a final high-resolution image after inputting the low-resolution image according to the optimal model obtained by training in the S5.

2. The apparatus of claim 1, wherein the deep feature extraction module comprises a residual group consisting of a plurality of channel attention residual blocks, a layer feature map attention module, a spatial attention module, and a feature fusion module, and the S3 comprises:

3. The apparatus of claim 1, wherein the layer feature map attention module comprises a feature combination module, a calculate feature relationship map, and a feature update module, and wherein the S32 comprises:

combining the features of different depths obtained by the S31;

4. The apparatus of claim 2, wherein the spatial attention module comprises a feature dimension reduction module, a feature processing module, and a feature update module, and wherein 33 comprises:

5. The apparatus of claim 3, wherein the feature update module comprises computing an attention coefficient and weighted sum (240); the steps of the feature update module include:

6. A graph representation learning-based image super-resolution reconstruction method applied to the image super-resolution reconstruction apparatus according to claims 1 to 5, comprising: the device comprises a shallow layer feature extraction module, a deep layer feature extraction module and a reconstruction module;

the deep layer feature extraction module comprises a residual group consisting of a plurality of channel attention residual blocks, a layer feature map attention module, a space attention module and a feature fusion module; the residual error group extracts features of different depths from the shallow features, and then the features are fused to obtain corresponding fusion features; the layer characteristic diagram attention module is used for composition by acquiring the characteristics of each residual error group, and calculating to obtain the correlation among different block characteristics; the spatial attention module is used for carrying out spatial aggregation on the extracted features of the residual error groups, so that the interested region of the image is more concerned; fusing and adding the shallow layer features, residual group features, layer feature map attention module features and space attention module features through a feature fusion module to obtain corresponding deep layer features;

and the reconstruction module is used for performing up-sampling and feature reconstruction on the deep features and outputting a final high-resolution image.

7. The image super-resolution reconstruction method according to claim 6, wherein the graph representation learning-based image super-resolution reconstruction method comprises a number of residual groups, each residual group comprising a number of channel attention residual blocks, the channel attention residual block output channels being 64 feature maps; and each channel attention residual block is arranged in series; the output of each channel attention residual block in the residual group is the input of the next channel attention residual block; the channel attention residual block comprises two convolution layers of 3 multiplied by 3, a computing layer, a global pooling layer and a Sigmoid layer and aims to acquire attention weights for corresponding feature maps; the convolutional layer is used for extracting features; the computing layer accumulates the output of each channel; then multiplying the global pooling layer by the Sigmoid layer to obtain corresponding characteristics; and finally inputting the data to a next channel attention residual block for feature extraction.

8. The image super-resolution reconstruction method of claim 6, wherein the graph representation learning-based image super-resolution reconstruction method comprises a layer feature map attention module; the input of the layer feature map attention module is the features extracted from a plurality of different residual error groups; the features extracted from different groups are taken as input, a weight is distributed to the features extracted from different groups through the attention of the graph and multiplied by the weight, and finally a multi-dimensional vector is output to be subjected to feature fusion with the output of the tail part of the model.

9. The image super-resolution reconstruction method of claim 6, wherein the graph representation learning-based image super-resolution reconstruction method comprises a spatial attention module; the input of the space attention module is the extracted features of a plurality of different residual error groups, the dimension reduction processing is carried out on a 1 x 1 convolution layer, then the down sampling is carried out on the space attention module through convolution and pooling operations, then the up sampling is carried out after the learning of one residual error group, then the dimension increasing processing is carried out on the 1 x 1 convolution layer, finally the Sigmoid calculation weight is multiplied by the Sigmoid calculation weight, and the output is carried out with the feature fusion of the output of the claim 3.