CN117196960B

CN117196960B - Full-scale feature refinement lightweight image super-resolution method and device

Info

Publication number: CN117196960B
Application number: CN202311475299.1A
Authority: CN
Inventors: 林明昕; 黄德天; 刘航; 宋佳讯; 陈龙涛; 施一帆; 曾焕强; 陈婧
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2024-03-01
Anticipated expiration: 2043-11-08
Also published as: CN117196960A

Abstract

The invention discloses a full-scale feature refinement lightweight image super-resolution method and a device, and relates to the field of image processing, wherein the method comprises the following steps: constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, inputting a low-resolution image into the trained full-scale feature refinement lightweight image super-resolution model, firstly obtaining a first feature map through a first convolution layer, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature map through a third convolution layer, adding the second feature map and the first feature map to obtain a final feature map, inputting the final feature map into an up-sampling module, reconstructing to obtain a high-resolution image, solving the problem that feature information extracted by an original super-resolution model is too single, and removing redundant features through distillation to enable the model to be lighter.

Description

Full-scale feature refinement lightweight image super-resolution method and device

Technical Field

The invention relates to the field of image processing, in particular to a full-scale feature refinement lightweight image super-resolution method and device.

Background

Super-resolution (Single image super resolution, SISR) of single frame images is widely used in the field of computer vision, such as medical images, video surveillance, remote sensing images, and video transmission, among others. SISR is a corresponding High Resolution (HR) image generated by software processing from an existing Low Resolution (LR) image. With the development of deep learning, the method based on convolutional neural network (Convolutional neural network, CNN) has far exceeded the traditional interpolation algorithm, and can learn more accurate mapping relation from HR-LR image blocks, and the reconstructed HR image quality is higher. Therefore, the CNN-based method is a main method for super-resolution research of single images at the current stage.

Super-resolution methods based on deep learning can be roughly classified into two categories. The first is based on generating an antagonism network. By optimizing the perception loss, the generated HR image is more in line with the subjective visual perception of human beings. However, the PSNR and SSIM indexes of the HR image reconstructed by the algorithm are low, and the detail textures are greatly different from those of the original image, so that the defects in practical application are obvious.

The second category is that the details and texture features of the reconstructed image are more important, and the objective index is higher than that of the first method. However, this type of approach still has some problems. First, to improve the reconstruction quality of the image, a large number of modules often need to be stacked in the model to increase the depth of the network, but this results in a large difficulty and a long time for model training. Second, due to the lack of thought and research of some super-resolution algorithms on the feature extraction module, the extracted depth features are weaker. In addition, the models lack the ability to adaptively distinguish important features from secondary features, treat all feature information in the image equally, and therefore directly affect the high frequency features of the reconstructed image.

Disclosure of Invention

The technical problems mentioned above are solved. The embodiment of the application aims to provide a full-scale feature refinement lightweight image super-resolution method and device, which solve the technical problems mentioned in the background art part, solve the problem that the feature information extracted by the original classical super-resolution model is too single, provide the local, regional and global full-scale feature information, and remove redundant features through distillation, so that the model is lighter.

In a first aspect, the present invention provides a full-scale feature refinement lightweight image super-resolution method, comprising the steps of:

acquiring a low-resolution image to be reconstructed;

constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

the method comprises the steps of inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

Preferably, the calculation process of the full-scale feature refinement lightweight image super-resolution model is as follows:

F＝W _p (I _LR )；

I _SR ＝Subpixel(F+W _p (W _c (D ₁ ，D ₂ ，...，D _k )))；

D ₁ ＝Distil(F)；

D _i ＝Distil(D _i-1 )，i＞1；

Wherein I is _LR Representing an input low resolution image, I _SR Representing a reconstructed high resolution image, sub-bpixel represents a sub-pixel convolution upsampling operation in the upsampling module, W _p (. Cndot.) represents a convolution operation with a convolution kernel size of 3X 3, W _c (. Cndot.) represents a convolution operation with a convolution kernel size of 1×1, distil (. Cndot.) represents the operation of the feature distillation extraction module, D _i Representing the output through the ith feature distillation extraction module.

Preferably, the feature distillation extraction module comprises 3 full-scale feature fusion layers, 5 depth separable convolution layers, a reinforced space attention layer and a Concat layer, wherein the 3 full-scale feature fusion layers and the 1 depth separable convolution layers with the convolution kernel size of 3×3 are sequentially connected in series to form a first branch, the 3 depth separable convolution layers with the convolution kernel size of 3×3 form a second branch, the depth separable convolution layers in the second branch are respectively connected with the full-scale feature fusion layers in the first branch in parallel, the output of the first branch and each depth separable convolution layer in the second branch are input into the Concat layer for splicing, then the input of the 1 depth separable convolution layers with the convolution kernel size of 3×3 and the reinforced space attention layer which are connected in series, and the output of the reinforced space attention layer is connected with the input of the feature distillation extraction module in a residual way, so that the output of the feature distillation extraction module is obtained.

Preferably, the feature distillation extraction module inputs 3/4 channel features into the full scale feature fusion layer in the first branch and inputs 1/4 channel features into the depth separable convolution layer in the second branch.

Preferably, the full-scale feature fusion layer comprises a first local feature extraction module and a recursive anchoring stripe frame attention mechanism-based transform module, wherein the first local feature extraction module comprises 3 convolution layers with the convolution kernel size of 1×1, 1 depth separable convolution layers with the convolution kernel size of 3×3, a ReLu activation function and channel attention, input features of the full-scale feature fusion layer sequentially pass through the 1 st convolution layer with the convolution kernel size of 1×1, the 1 depth separable convolution layers with the convolution kernel size of 3×3, the ReLu activation function, the channel attention and the 2 nd convolution layer with the convolution kernel size of 1×1, and the 2 nd convolution layer with the convolution kernel size of 1×1 is added with the input features of the full-scale feature fusion layer and then input into the 3 rd convolution layer with the convolution kernel size of 1×1.

Preferably, the anchor frame attention mechanism based transducer module is connected to the 3 rd 1 x 1 convolutional layer, and the output of the anchor frame attention mechanism based transducer module is input again into the anchor frame attention mechanism based transducer module using a recursive mechanism.

Preferably, the transporter module based on the anchor strip frame attention mechanism comprises a channel attention module, an anchor strip frame attention module, a second local feature extraction module and a multi-layer perceptron, wherein the anchor strip frame attention module is positioned on the upper branch of the transporter module based on the anchor strip frame attention mechanism, the channel attention module and the second local feature extraction module are positioned on the lower branch of the transporter module based on the anchor strip frame attention mechanism, the second local feature extraction module comprises two depth separable convolution layers with convolution kernel sizes of 3×3 and a ReLu activation function which are connected in sequence, and the calculation process of the transporter module based on the anchor strip frame attention mechanism is as follows:

F _out ＝MLP(F _in +concatenate(ASA(split(F _in )))+CA(ELE(F _in )))；

wherein F is _in And F _out Input and output of a transducer module based on an anchor bar frame attention mechanism are respectively represented; MLP (& gt) represents a multi-layer perceptron; ASA (-) represents the anchor bar frame attention module; CA (-) represents the channel attention module; ELE (·) represents a second local feature extraction module; split (·) means that the input is cut into pieces of the same size; concatenate (·) represents stitching the results obtained for each block in front; wherein, the internal operation of the anchor bar frame attention module is defined as:

Y＝M _e ·Z＝M _e ·(M _d ·V)；

Wherein Layernormal is a layer normalization operation, M _e ∈R ^N×M Self-attention diagram representing a lateral bar, M _d ∈R ^M ^×N Representing self-attention force diagram of longitudinal bar frame, N is number of input vectors, M is less than N; q, K, V E R ^N×d Respectively corresponding to the vectors input into the attention module of the anchoring bar frame, the key values and the value matrix, A epsilon R ^M×d Is an intermediate matrix, d is the dimension of the input vector; avgpool (·) represents average pooling,W _c (. Cndot.) represents a 1X 1 point-by-point convolution, W _d (. Cndot.) represents a depth separable convolution with a convolution kernel size of 3 x 3; x is,Y∈R ^N×d X is the set of all vectors of the input of the anchor bar attention module,/for the anchor bar attention module>For the result of X after layer normalization, Z represents matrix M _d And V, Y is the set of all vectors of the output of the anchor frame attention module, and Softmax is the activation function.

In a second aspect, the present invention provides a full-scale feature refinement lightweight image super-resolution apparatus, comprising:

a data acquisition module configured to acquire a low resolution image to be reconstructed;

the model construction module is configured to construct and train a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

The reconstruction module is configured to input a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, input the low-resolution image into a first convolution layer to obtain a first feature image, transmit the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtain a second feature image through a third convolution layer, add the second feature image with the first feature image to obtain a final feature image, input the final feature image into an up-sampling module, and reconstruct to obtain a high-resolution image.

In a third aspect, the present invention provides an electronic device comprising one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the full-scale feature refinement lightweight image super-resolution method, the feature distillation extraction module, the full-scale feature fusion layer and the anchoring strip frame attention mechanism are introduced into the full-scale feature refinement lightweight image super-resolution model, redundant features are removed to enable the model to be lighter, meanwhile, the extraction capacity of the model for local and global features is enhanced, the network can adaptively identify the features, and different weights are distributed for different types of features.

(2) Compared with the original classical single-frame super-resolution method, the full-scale feature refinement lightweight image super-resolution method provided by the invention can greatly improve the reconstruction performance of a network model, realize further restoration of texture details of a reconstructed image and effectively reduce the number of model parameters.

(3) The full-scale feature refinement lightweight image super-resolution method provided by the invention can solve the problem that the feature information extracted by the original classical super-resolution model is too single, provide the feature information of local, regional and global full scale, and remove redundant features by distillation so as to enable the model to be lighter.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary device frame pattern to which an embodiment of the present application may be applied;

FIG. 2 is a flow diagram of a full-scale feature refinement lightweight image super-resolution method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a full-scale feature refinement lightweight image super-resolution model of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 4 is a schematic diagram of a Transformer module based on an anchor bezel attention mechanism of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 5 is a schematic diagram of an anchor bezel attention module of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 6 is a schematic diagram of a full-scale feature fusion layer of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 7 is a schematic diagram of a feature distillation extraction module of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 8 is a schematic diagram of a full-scale feature refinement lightweight image super-resolution device of an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device suitable for use in implementing the electronic device of the embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 illustrates an exemplary device architecture 100 in which a full-scale feature refinement lightweight image super-resolution method or full-scale feature refinement lightweight image super-resolution device of embodiments of the present application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various applications, such as a data processing class application, a file processing class application, and the like, may be installed on the terminal device one 101, the terminal device two 102, and the terminal device three 103.

The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be hardware or software. When the first terminal device 101, the second terminal device 102, and the third terminal device 103 are hardware, they may be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like. When the first terminal apparatus 101, the second terminal apparatus 102, and the third terminal apparatus 103 are software, they can be installed in the above-listed electronic apparatuses. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the terminal device one 101, the terminal device two 102, and the terminal device three 103. The background data processing server can process the acquired file or data to generate a processing result.

It should be noted that, the full-scale feature refinement lightweight image super-resolution method provided in the embodiment of the present application may be executed by the server 105, or may be executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103, and accordingly, the full-scale feature refinement lightweight image super-resolution device may be set in the server 105, or may be set in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.

FIG. 2 shows a full-scale feature refinement lightweight image super-resolution method provided by an embodiment of the present application, comprising the steps of:

s1, acquiring a low-resolution image to be reconstructed.

Specifically, a low-resolution image to be reconstructed is collected, so that the low-resolution image can be conveniently used as the input of a super-resolution model of a full-scale characteristic refinement lightweight image.

S2, constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module.

In a specific embodiment, the calculation process of the full-scale feature refinement lightweight image super-resolution model is as follows:

F＝W _p (I _LR )；

I _SR ＝Subpixel(F+W _p (W _c (D ₁ ，D ₂ ，...，D _k )))；

D ₁ ＝Distil(F)；

D _i ＝Distil(D _i-1 )，i＞1；

Wherein I is _LR Representing an input low resolution image, I _SR Representing a reconstructed high resolution image, sub-bpixel represents a sub-pixel convolution upsampling operation in the upsampling module, W _p (. Cndot.) represents a convolution operation with a convolution kernel size of 3X 3, W _c (. Cndot.) represents a convolution with a convolution kernel size of 1X 1Operation, distil (. Cndot.) represents the operation of the feature distillation extraction module, D _i Representing the output through the ith feature distillation extraction module.

Specifically, referring to fig. 3, a full-scale feature refinement lightweight image super-resolution model is constructed, and the model is a single-frame image super-resolution network based on feature distillation, and the network is composed of K feature distillation extraction modules, 3 convolution layers and an up-sampling module based on sub-pixel convolution. The input low-resolution image is processed by a convolution layer with the size of 3 multiplied by 3 to obtain a first feature image F, the first feature image F is input into a feature distillation extraction module for further processing, K feature distillation extraction modules are connected in series, the output of each feature distillation extraction module is transmitted to a final convolution layer with the size of 1 multiplied by 1 to carry out channel number processing, then local context information is obtained by a convolution layer with the size of 3 multiplied by 3 to obtain a second feature image, and the second feature image is added with the first feature image F to obtain a final feature image And finally, up-sampling is carried out through an up-sampling module, so that a final output high-resolution image can be obtained.

S3, inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

In a specific embodiment, the feature distillation extraction module comprises 3 full-scale feature fusion layers, 5 depth separable convolution layers, a reinforced spatial attention layer and a Concat layer, wherein the 3 full-scale feature fusion layers and the 1 depth separable convolution layers with the convolution kernel size of 3×3 are sequentially connected in series to form a first branch, the 3 depth separable convolution layers with the convolution kernel size of 3×3 form a second branch, the depth separable convolution layers in the second branch are respectively connected in parallel with the full-scale feature fusion layers in the first branch, the output of the first branch and each depth separable convolution layer in the second branch are input into the Concat layer for splicing and then subjected to the serial connection of the 1 depth separable convolution layers with the convolution kernel size of 3×3 and the reinforced spatial attention layer, and the output of the reinforced spatial attention layer is connected with the input of the feature distillation extraction module in a residual mode to obtain the output of the feature distillation extraction module.

In a specific embodiment, the feature distillation extraction module inputs 3/4 channel features into the full scale feature fusion layer in the first branch and 1/4 channel features into the depth separable convolution layer in the second branch.

In a specific embodiment, the full-scale feature fusion layer includes a first local feature extraction module and a recursive anchoring stripe frame attention mechanism-based transform module, where the first local feature extraction module includes 3 convolution layers with a convolution kernel size of 1×1, 1 depth separable convolution layer with a convolution kernel size of 3×3, a ReLu activation function, and a channel attention, input features of the full-scale feature fusion layer sequentially pass through the 1 st convolution layer with a convolution kernel size of 1×1, the 1 depth separable convolution layer with a convolution kernel size of 3×3, the ReLu activation function, the channel attention, and the 2 nd convolution layer with a convolution kernel size of 1×1 is added to input features of the full-scale feature fusion layer and then input to the 3 rd convolution layer with a convolution kernel size of 1×1.

In a specific embodiment, the anchor frame attention mechanism based transducer module is connected to the 3 rd 1 x 1 convolutional layer, and the output of the anchor frame attention mechanism based transducer module is again input into the anchor frame attention mechanism based transducer module using a recursive mechanism.

In a specific embodiment, the transporter module based on the anchor frame attention mechanism includes a channel attention module, an anchor frame attention module, a second local feature extraction module and a multi-layer perceptron, where the anchor frame attention module is located on an upper branch of the transporter module based on the anchor frame attention mechanism, the channel attention module and the second local feature extraction module are located on a lower branch of the transporter module based on the anchor frame attention mechanism, the second local feature extraction module includes two depth separable convolution layers with convolution kernel sizes of 3×3 and ReLu activation functions connected in sequence, and the calculation procedure of the transporter module based on the anchor frame attention mechanism is as follows:

F _out ＝MLP(F _in +concatenate(ASA(split(F _in )))+CA(ELE(F _in )))；

wherein F is _in And F _out Input and output of a transducer module based on an anchor bar frame attention mechanism are respectively represented; MLP (& gt) represents a multi-layer perceptron; ASA (-) represents the anchor bar frame attention module; CA (-) represents the channel attention module; ELE (·) represents a second local feature extraction module; split (·) means that the input is cut into pieces of the same size; concatate (·) indicates that the result obtained by each block in front is spliced; wherein, the internal operation of the anchor bar frame attention module is defined as:

Y＝M _e ·Z＝M _e ·(M _d ·V)；

Wherein Layernormal is a layer normalization operation, M _e ∈R ^N×M Self-attention diagram representing a lateral bar, M _d ∈R ^M ^×N Representing self-attention force diagram of longitudinal bar frame, N is number of input vectors, M is less than N; q, K, V E R ^N×d Respectively corresponding to the vectors input into the attention module of the anchoring bar frame, the key values and the value matrix, A epsilon R ^M×d Is an intermediate matrix, d is the dimension of the input vector; avgpool (. Cndot.) represents average pooling, W _c (. Cndot.) represents a 1X 1 point-by-point convolution, W _d (. Cndot.) represents a depth separable convolution with a convolution kernel size of 3 x 3; x is,Y∈R ^N×d X is the set of all vectors of the input of the anchor bar attention module,/for the anchor bar attention module>For the result of X after layer normalization, Z represents matrix M _d And V, Y is the set of all vectors of the output of the anchor frame attention module, and Softmax is the activation function.

Specifically, referring to FIG. 4, a Transformer module based on an anchor frame attention mechanism is constructed. The module consists of a channel attention module (Channel Attention Module, CA), an Anchor-strip self-attention module (ASA), a second local feature extraction module (Efficient local feature extraction module, ELE) and a multi-layer perceptron (MLP). Wherein the anchor frame attention module is located in an upper branch of the transporter module based on the anchor frame attention mechanism, the channel attention module and the second local feature extraction module are located in a lower branch of the transporter module based on the anchor frame attention mechanism, and the multi-layer perceptron is located in a last branch of the transporter module based on the anchor frame attention mechanism. The transducer module based on the attention mechanism of the anchoring strip frame realizes the global and regional modeling of the features through the attention mechanism of the anchoring strip frame, and realizes the local modeling of the features through 2 convolution kernels with the size of 3 multiplied by 3 and the depth separable convolution, the ReLu activation function and the channel self-attention mechanism, thereby fully utilizing the feature information of different scales of the image, truly realizing the full-scale feature modeling, and greatly improving the image reconstruction quality.

Referring to fig. 5, the advantage of the anchor bar frame attention module (ASA) over the common multi-headed self-attention module is: first, the anchor frame attention module will apply a 1 x 1 convolution to the input to aggregate the pixel level cross-channel contexts, then use a 3 x 3 depth separable convolution to encode the channel level contexts, generating the Q, K, V matrix with less parameters and computation than the traditional method; secondly, the introduced anchor strip frame method is to calculate self-attention force diagrams of a transverse strip frame and a longitudinal strip frame respectively when calculating the self-attention force diagrams, and then calculate the two self-attention force diagrams to obtain an integral attention force diagram, wherein an intermediate matrix A is introduced in the process, the parameter quantity of the intermediate matrix A is much smaller than Q, K and V, and the new self-attention force diagram calculation process is as follows:

the calculation complexity is O (NMd), the spatial complexity is O (NM), and the calculation process is +.>Is O (N) ² d) And a spatial complexity of O (N ² ) The weight of the model is reduced greatly.

Referring to fig. 6, a full-scale feature fusion layer is further constructed, consisting of a first local feature extraction module and a recursive anchor frame attention mechanism-based Transformer module.

The first local feature extraction module is composed of 3 convolution layers with convolution kernel sizes of 1×1, 1 depth separable convolution with convolution kernel sizes of 3×3, a ReLu activation function and channel attention, and can efficiently extract local features. The method is characterized in that a transducer module based on an anchor strip frame attention mechanism is introduced into the tail part of a depth feature extraction module based on full-scale feature fusion in a recursion mode, and the recursion mechanism is introduced, namely the output of the module is input into the module again as the module input thereof, and the specified times are repeated, so that the transducer module based on the anchor strip frame attention mechanism can be fully trained without greatly increasing the consumption of a GPU memory and model parameters, the minimum consumption of the model is utilized, the defect that the first local feature extraction module does not have enough receptive field to acquire global information is overcome, and the model has light weight and high performance.

Referring to fig. 7, the full-scale feature layer is introduced into the feature distillation extraction module as a component part of the upper branch of the feature distillation extraction module, and the construction of the full-scale feature refinement lightweight image super-resolution model is completed. The feature distillation extraction module consists of 3 full-scale feature fusion layers, 5 convolution layers, a space attention enhancement layer and a Concat layer. The method comprises the steps of connecting 3 convolution cores with 3 x 3 size depth separable convolution layers with 3 full-scale feature fusion layers and 1 convolution core with 3 x 3 size depth separable convolution layers in parallel, respectively connecting two branches, wherein the first branch consists of the 3 full-scale feature fusion layers, 2 convolution cores with 3 x 3 size depth separable convolution layers and an enhanced spatial attention layer in series, connecting the second branch with the 3 parallel convolution cores with 3 x 3 size depth separable convolution layers, sending the results of the two branches into a Concat layer for splicing, then processing the Concat layer through the 1 convolution cores with 3 x 3 size depth separable convolution layers and the enhanced spatial attention layer in series, and finally summing the output of the enhanced spatial attention layer with the input of a feature distillation extraction module transmitted through residual connection to obtain the output of the feature distillation extraction module.

Further, the input of the characteristic distillation extraction module is divided into two parts and is respectively processed by two branches, namely, the 3/4 channel characteristics are further refined by the full-scale characteristic fusion layer of the first branch, the 1/4 channel characteristics are further refined by the second branch on the fine characteristics, redundant characteristics are removed, and finally, more accurate high-frequency information for reconstructing a high-resolution image is obtained, and the training burden of a model is reduced. Before entering each full-scale feature fusion layer, the channel is divided once, the channel is divided into two parts, wherein one part accounts for 3/4 of the original channel number, the other part accounts for 1/4 of the original channel number, the rough feature is a fine feature, and no more processing is needed.

And finally, reconstructing the low-resolution image to be reconstructed by using the trained full-scale feature refinement lightweight image super-resolution model to obtain a reconstructed high-resolution image.

The above steps S1-S3 do not merely represent the order between steps, but rather are step notations.

With further reference to fig. 8, as an implementation of the method shown in the foregoing drawings, the present application provides an embodiment of a full-scale feature refinement lightweight image super-resolution apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied in various electronic devices.

The embodiment of the application provides a full-scale feature refinement lightweight image super-resolution device, which comprises:

a data acquisition module 1 configured to acquire a low resolution image to be reconstructed;

the model construction module 2 is configured to construct and train a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

the reconstruction module 3 is configured to input a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, input the low-resolution image into a first convolution layer to obtain a first feature image, transmit the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtain a second feature image through a third convolution layer, add the second feature image with the first feature image to obtain a final feature image, input the final feature image into an up-sampling module, and reconstruct to obtain a high-resolution image.

Referring now to fig. 9, there is illustrated a schematic diagram of a computer apparatus 900 suitable for use in implementing an electronic device (e.g., a server or terminal device as illustrated in fig. 1) of an embodiment of the present application. The electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in fig. 9, the computer apparatus 900 includes a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 903 or a program loaded from a storage section 909 into a Random Access Memory (RAM) 904. In the RAM904, various programs and data required for the operation of the apparatus 900 are also stored. The CPU 901, GPU902, ROM 903, and RAM904 are connected to each other by a bus 905. An input/output (I/O) interface 906 is also connected to bus 905.

The following components are connected to the I/O interface 906: an input section 907 including a keyboard, a mouse, and the like; an output portion 908 including a speaker, such as a Liquid Crystal Display (LCD), or the like; a storage section 909 including a hard disk or the like; and a communication section 910 including a network interface card such as a LAN card, a modem, or the like. The communication section 910 performs communication processing via a network such as the internet. The drive 911 may also be connected to the I/O interface 906 as needed. A removable medium 912 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 911 so that a computer program read out therefrom is installed into the storage section 909 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 910, and/or installed from the removable medium 912. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902.

It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments described in the present application may be implemented by software, or may be implemented by hardware. The described modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a low-resolution image to be reconstructed; constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module; the method comprises the steps of inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. The full-scale characteristic refinement lightweight image super-resolution method is characterized by comprising the following steps of:

acquiring a low-resolution image to be reconstructed;

constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, H feature distillation extraction modules and an up-sampling module;

the characteristic distillation extraction module comprises 3 full-scale characteristic fusion layers, 5 depth separable convolution layers, an enhanced space attention layer and a Concat layer, wherein the 3 full-scale characteristic fusion layers and 1 depth separable convolution layer with the convolution kernel size of 3 multiplied by 3 are sequentially connected in series to form a first branch, before entering each full-scale characteristic fusion layer, the original channel number is divided into two parts by one channel segmentation, the first part enters the full-scale characteristic fusion layer, the second part enters the 3 multiplied by 3 depth separable convolution layer, the output of the first branch and the output of each 3 multiplied by 3 depth separable convolution layer are input into the Concat layer, and then are subjected to serial connection to form 1 convolution kernel size of 3 multiplied by 3, and the output of the enhanced space attention layer is connected with the input of the characteristic distillation extraction module in a residual way to obtain the output of the characteristic distillation extraction module;

The full-scale feature fusion layer comprises a first local feature extraction module and a recursive anchoring strip frame attention mechanism-based transform module, wherein the first local feature extraction module comprises 3 convolution layers with the size of 1 multiplied by 1, 1 depth separable convolution layers with the size of 3 multiplied by 3, a ReLu activation function and channel attention, input features of the full-scale feature fusion layer sequentially pass through the 1 st convolution layer with the size of 1 multiplied by 1, the 1 depth separable convolution layers with the size of 3 multiplied by 3, a ReLu activation function, the channel attention and the 2 nd convolution layer with the size of 1 multiplied by 1, and the 3 rd convolution layer with the size of 1 multiplied by 1 is input after the output features of the 2 nd convolution layer with the size of 1 multiplied by 1 and the input features of the full-scale feature fusion layer are added;

the transporter module based on the anchoring strip frame attention mechanism comprises a channel attention module, an anchoring strip frame attention module, a second local feature extraction module and a multi-layer perceptron, wherein the anchoring strip frame attention module is positioned on the upper branch of the transporter module based on the anchoring strip frame attention mechanism, the channel attention module and the second local feature extraction module are positioned on the lower branch of the transporter module based on the anchoring strip frame attention mechanism, the second local feature extraction module comprises two depth separable convolution layers with convolution kernel sizes of 3×3 and a ReLu activation function which are connected in sequence, and the calculation process of the transporter module based on the anchoring strip frame attention mechanism is as follows:

F _out ＝MLP(F _in +concatenate(ASA(split(F _in )))+CA(ELE(F _in )))；

Wherein F is _in And F _out Respectively representing the input and the output of the transducer module based on the attention mechanism of the anchoring bar frame; MLP (& gt) represents a multi-layer perceptron; ASA (-) represents the anchor bar frame attention module; CA (-) represents the channel attention module; ELE (·) represents a second local feature extraction module; split (·) means that the input is cut into pieces of the same size; concatate (·) indicates that the result obtained by each block in front is spliced; wherein, the internal operation of the anchoring bar frame attention module is defined as:

wherein Layernormal is a layer normalization operation, M _e Self-attention diagram representing a lateral bar, M _d A self-attention diagram representing a longitudinal bar; q, K, V is respectively a query, a key value and a value matrix corresponding to the vector input into the anchoring bar frame attention module, A is an intermediate matrix, and d is the dimension of the input vector; avgpool (·) represents average pooling; x is the set of all vectors of the input of the anchor bar frame attention module,for the result of X after layer normalization, Z represents matrix M _d And V ₀ Y is the result of multiplication and is all vector sets of the output of the anchoring strip frame attention module, and Softmax is an activation function;

inputting the low-resolution image into the first convolution layer to obtain a first feature image, wherein the first feature image is transmitted to the second convolution layer through H feature distillation extraction modules connected in series, the output of each feature distillation extraction module is transmitted to the third convolution layer to obtain a second feature image, the second feature image is added with the first feature image to obtain a final feature image, and the final feature image is input into the up-sampling module to be reconstructed to obtain a high-resolution image.

2. The full-scale feature refinement lightweight image super-resolution method of claim 1, in which said anchor frame attention mechanism-based Transformer module is connected to said 3 rd convolution layer of 1 x 1 size, said recursive anchor frame attention mechanism-based Transformer module employing a recursive mechanism to re-input the output of said anchor frame attention mechanism-based Transformer module into said anchor frame attention mechanism-based Transformer module.

3. A full-scale feature refinement lightweight image super-resolution device, comprising:

the model construction module is configured to construct and train a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, H feature distillation extraction modules and an up-sampling module;

F _out ＝MLP(F _in +concatenate(ASA(split(F _in )))+CA(ELE(F _in )))；

Wherein F is _in And F _out Transformer modules respectively representing the anchor bar frame based attention mechanismsInput and output of (a); MLP (& gt) represents a multi-layer perceptron; ASA (-) represents the anchor bar frame attention module; CA (-) represents the channel attention module; ELE (·) represents a second local feature extraction module; split (·) means that the input is cut into pieces of the same size; concatate (·) indicates that the result obtained by each block in front is spliced; wherein, the internal operation of the anchoring bar frame attention module is defined as:

Y＝M _e ·Z＝M _e ·(M _d ·V ₀ )；

wherein Layernormal is a layer normalization operation, M _e Self-attention representing a lateral barForce diagram, M _d A self-attention diagram representing a longitudinal bar; q, K, V is respectively a query, a key value and a value matrix corresponding to the vector input into the anchoring bar frame attention module, A is an intermediate matrix, and d is the dimension of the input vector; avgpool (·) represents average pooling; x is the set of all vectors of the input of the anchor bar frame attention module,for the result of X after layer normalization, Z represents the matrices Md and V ₀ Y is the result of multiplication and is all vector sets of the output of the anchoring strip frame attention module, and Softmax is an activation function;

the reconstruction module is configured to input the low-resolution image into the first convolution layer to obtain a first feature image, the first feature image is subjected to H feature distillation extraction modules which are connected in series, the output of each feature distillation extraction module is transmitted to the second convolution layer and is subjected to the third convolution layer to obtain a second feature image, the second feature image is added with the first feature image to obtain a final feature image, the final feature image is input into the up-sampling module, and the high-resolution image is reconstructed.

4. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-2.

5. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-2.