CN117196960A

CN117196960A - Full-scale feature refinement lightweight image super-resolution method and device

Info

Publication number: CN117196960A
Application number: CN202311475299.1A
Authority: CN
Inventors: 林明昕; 黄德天; 刘航; 宋佳讯; 陈龙涛; 施一帆; 曾焕强; 陈婧
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2023-12-08
Anticipated expiration: 2043-11-08
Also published as: CN117196960B

Abstract

The invention discloses a full-scale feature refinement lightweight image super-resolution method and a device, and relates to the field of image processing, wherein the method comprises the following steps: constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, inputting a low-resolution image into the trained full-scale feature refinement lightweight image super-resolution model, firstly obtaining a first feature map through a first convolution layer, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature map through a third convolution layer, adding the second feature map and the first feature map to obtain a final feature map, inputting the final feature map into an up-sampling module, reconstructing to obtain a high-resolution image, solving the problem that feature information extracted by an original super-resolution model is too single, and removing redundant features through distillation to enable the model to be lighter.

Description

Full-scale feature refinement lightweight image super-resolution method and device

Technical Field

The invention relates to the field of image processing, in particular to a full-scale feature refinement lightweight image super-resolution method and device.

Background

Super-resolution (Single image super resolution, SISR) of single frame images is widely used in the field of computer vision, such as medical images, video surveillance, remote sensing images, and video transmission, among others. SISR is a corresponding High Resolution (HR) image generated by software processing from an existing Low Resolution (LR) image. With the development of deep learning, the method based on convolutional neural network (Convolutional neural network, CNN) has far exceeded the traditional interpolation algorithm, and can learn more accurate mapping relation from HR-LR image blocks, and the reconstructed HR image quality is higher. Therefore, the CNN-based method is a main method for super-resolution research of single images at the current stage.

Super-resolution methods based on deep learning can be roughly classified into two categories. The first is based on generating an antagonism network. By optimizing the perception loss, the generated HR image is more in line with the subjective visual perception of human beings. However, the PSNR and SSIM indexes of the HR image reconstructed by the algorithm are low, and the detail textures are greatly different from those of the original image, so that the defects in practical application are obvious.

The second category is that the details and texture features of the reconstructed image are more important, and the objective index is higher than that of the first method. However, this type of approach still has some problems. First, to improve the reconstruction quality of the image, a large number of modules often need to be stacked in the model to increase the depth of the network, but this results in a large difficulty and a long time for model training. Second, due to the lack of thought and research of some super-resolution algorithms on the feature extraction module, the extracted depth features are weaker. In addition, the models lack the ability to adaptively distinguish important features from secondary features, treat all feature information in the image equally, and therefore directly affect the high frequency features of the reconstructed image.

Disclosure of Invention

The technical problems mentioned above are solved. The embodiment of the application aims to provide a full-scale feature refinement lightweight image super-resolution method and device, which are used for solving the technical problems mentioned in the background art part, solving the problem that the feature information extracted by the original classical super-resolution model is too single, providing the local, regional and global full-scale feature information, and removing redundant features through distillation, so that the model is lighter.

In a first aspect, the present invention provides a full-scale feature refinement lightweight image super-resolution method, comprising the steps of:

acquiring a low-resolution image to be reconstructed;

constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

the method comprises the steps of inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

Preferably, the calculation process of the full-scale feature refinement lightweight image super-resolution model is as follows:

；

Wherein,low resolution image representing input, +.>Representing a reconstructed high resolution image, +.>Representing sub-pixel convolution up-sampling operation in up-sampling module,/->Indicating a convolution kernel size of +.>Is used in the convolution operation of (1),indicating a convolution kernel size of +.>Is a convolution operation of->Representing the operation of the feature distillation extraction module,indicating that go through->And outputting the characteristic distillation and extraction module.

Preferably, the feature distillation extraction module comprises 3 full-scale feature fusion layers, 5 depth separable convolution layers, an enhanced spatial attention layer and a Concat layer, wherein the sizes of the 3 full-scale feature fusion layers and 1 convolution kernel are as followsThe depth separable convolution layers of (2) are serially connected in turn to form a first branch, and 3 convolution kernels are +.>The depth separable convolution layers in the second branch are respectively connected with the full-scale feature fusion layer in the first branch in parallel, and after the output of the first branch and each depth separable convolution layer in the second branch are input into a Concat layer for splicing, the size of 1 convolution kernel in series is%>The depth of the (2) separable convolution layer and the enhanced spatial attention layer, and the output of the enhanced spatial attention layer is connected with the input of the characteristic distillation extraction module in a residual way to obtain the output of the characteristic distillation extraction module.

Preferably, the feature distillation extraction module inputs 3/4 channel features into the full scale feature fusion layer in the first branch and inputs 1/4 channel features into the depth separable convolution layer in the second branch.

Preferably, the full-scale feature fusion layer comprises a local feature extraction module and a recursive anchor-bar-frame-attention-mechanism-based transducer module, wherein the local feature extraction module comprises 3 volumesThe size of the accumulation core isIs 1 convolution kernel size +.>The depth of the separable convolution layer, the ReLu activation function and the channel attention, and the input features of the full-scale feature fusion layer sequentially pass through the 1 st convolution kernel with the size of +.>Is 1 convolution kernel size +.>Is of the order of +.f.2 for the depth separable convolution layer, reLu activation function, channel attention and convolution kernel size +.>Is the convolution layer of 2 nd convolution kernel size +.>After adding the input features of the convolution layer and the full-scale feature fusion layer, inputting a 3 rd convolution kernel with the size of +.>Is a convolution layer of (a) and (b).

Preferably, the transducer module and 3 rd module are based on the attention mechanism of the anchor bar frameThe convolutional layers are concatenated and the output of the Transformer module based on the anchor frame attention mechanism is again input into the Transformer module based on the anchor frame attention mechanism using a recursive mechanism.

Preferably, the transporter module based on the anchor frame attention mechanism comprises a channel attention module, an anchor frame attention module, an efficient local feature extraction module and a multi-layer perceptron, wherein the anchor frame attention module is positioned on the transporter module based on the anchor frame attention mechanismThe upper branch, the channel attention module and the high-efficiency local feature extraction module are positioned on the lower branch of the transducer module based on an anchoring strip frame attention mechanism, and the high-efficiency local feature extraction module comprises two convolution kernels with the sizes ofThe depth separable convolution layer and ReLu activation functions, the computation of the transducer module based on the anchor frame attention mechanism is as follows:

；

wherein,and->Input and output of a transducer module based on an anchor bar frame attention mechanism are respectively represented; />Representing a multi-layer perceptron; />Representing an anchor bar frame attention module; />Representing a channel attention module; />Representing an efficient local feature extraction module; />Representing cutting the input into blocks of the same size; />Representing that the result obtained by each block in front is spliced; wherein, the internal operation of the anchor bar frame attention module is defined as:

；

Wherein,for layer normalization operations, ++>Self-attention force diagram representing a lateral bar, +.>Self-attention diagram representing vertical bar, N is the number of input vectors, ++>；/>、/>、 Respectively corresponding query, key value and value matrix of vectors input into the anchor bar frame attention module, +.>Is an intermediate matrix>Is the dimension of the input vector; />Representing average pooling>Representation->The convolution is performed point by point,indicating a convolution kernel size of +.>Is a depth separable convolution of (1); />、/>、/>，/>All vector sets for the input of the anchor bar attention module, +.>Is->Results after layer normalization, < >>Representation matrix->And->The result of multiplication->All vector sets for the output of the anchor bar attention module, +.>To activate the function.

In a second aspect, the present invention provides a full-scale feature refinement lightweight image super-resolution apparatus, comprising:

a data acquisition module configured to acquire a low resolution image to be reconstructed;

the model construction module is configured to construct and train a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

The reconstruction module is configured to input a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, input the low-resolution image into a first convolution layer to obtain a first feature image, transmit the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtain a second feature image through a third convolution layer, add the second feature image with the first feature image to obtain a final feature image, input the final feature image into an up-sampling module, and reconstruct to obtain a high-resolution image.

In a third aspect, the present invention provides an electronic device comprising one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the full-scale feature refinement lightweight image super-resolution method, the feature distillation extraction module, the full-scale feature fusion layer and the anchoring strip frame attention mechanism are introduced into the full-scale feature refinement lightweight image super-resolution model, redundant features are removed to enable the model to be lighter, meanwhile, the extraction capacity of the model for local and global features is enhanced, the network can adaptively identify the features, and different weights are distributed for different types of features.

(2) Compared with the original classical single-frame super-resolution method, the full-scale feature refinement lightweight image super-resolution method provided by the invention can greatly improve the reconstruction performance of a network model, realize further restoration of texture details of a reconstructed image and effectively reduce the number of model parameters.

(3) The full-scale feature refinement lightweight image super-resolution method provided by the invention can solve the problem that the feature information extracted by the original classical super-resolution model is too single, provide the feature information of local, regional and global full scale, and remove redundant features by distillation so as to enable the model to be lighter.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary device frame pattern to which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of a full-scale feature refinement lightweight image super-resolution method according to an embodiment of the application;

FIG. 3 is a schematic diagram of a full-scale feature refinement lightweight image super-resolution model of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the application;

FIG. 4 is a schematic diagram of a Transformer module based on an Anchor strip frame attention mechanism of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the present application;

FIG. 5 is a schematic diagram of an anchor bezel attention module of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the application;

FIG. 6 is a schematic diagram of a full-scale feature fusion layer of a full-scale feature refinement lightweight image super-resolution method of an embodiment of the application;

FIG. 7 is a schematic diagram of a feature distillation extraction module of a full-scale feature refinement lightweight image super-resolution method according to an embodiment of the application;

FIG. 8 is a schematic diagram of a full-scale feature refinement lightweight image super-resolution device according to an embodiment of the application;

fig. 9 is a schematic structural view of a computer device suitable for use in an electronic apparatus for implementing an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

FIG. 1 illustrates an exemplary device architecture 100 for a full-scale feature refinement lightweight image super-resolution method or full-scale feature refinement lightweight image super-resolution device to which embodiments of the application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various applications, such as a data processing class application, a file processing class application, and the like, may be installed on the terminal device one 101, the terminal device two 102, and the terminal device three 103.

The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be hardware or software. When the first terminal device 101, the second terminal device 102, and the third terminal device 103 are hardware, they may be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like. When the first terminal apparatus 101, the second terminal apparatus 102, and the third terminal apparatus 103 are software, they can be installed in the above-listed electronic apparatuses. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present application is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the terminal device one 101, the terminal device two 102, and the terminal device three 103. The background data processing server can process the acquired file or data to generate a processing result.

It should be noted that, the full-scale feature refinement lightweight image super-resolution method provided by the embodiment of the present application may be executed by the server 105, or may be executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103, and accordingly, the full-scale feature refinement lightweight image super-resolution device may be set in the server 105, or may be set in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.

FIG. 2 shows a full-scale feature refinement lightweight image super-resolution method provided by an embodiment of the application, comprising the steps of:

s1, acquiring a low-resolution image to be reconstructed.

Specifically, a low-resolution image to be reconstructed is collected, so that the low-resolution image can be conveniently used as the input of a super-resolution model of a full-scale characteristic refinement lightweight image.

S2, constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module.

In a specific embodiment, the calculation process of the full-scale feature refinement lightweight image super-resolution model is as follows:

；

Wherein,low resolution image representing input, +.>Representing a reconstructed high resolution image, +.>Representing sub-pixel convolution up-sampling operation in up-sampling module,/->Indicating a convolution kernel size of +.>Is used in the convolution operation of (1),indicating a convolution kernel size of +.>Is a convolution operation of->Representing the operation of the feature distillation extraction module, +.>Indicating that go through->And outputting the characteristic distillation and extraction module.

Specifically, referring to fig. 3, a full-scale feature refinement lightweight image super-resolution model is constructed, and the model is a single-frame image super-resolution based on feature distillationThe network consists of K characteristic distillation extraction modules, 3 convolution layers and an up-sampling module based on sub-pixel convolution. The input low resolution image is passed through a convolution kernel of sizeAfter the convolution layer processing of (2), a first feature map F is obtained, the first feature map F is input into a feature distillation extraction module for further processing, K feature distillation extraction modules are connected in series, and the output of each feature distillation extraction module is transmitted to a final convolution kernel with the size of +.>The number of channels is processed and then a convolution kernel of size +.>The convolution layer of (1) acquires local context information to obtain a second feature map, and the second feature map is added with the first feature map F to obtain a final feature map +. >And finally, up-sampling is carried out through an up-sampling module, so that a final output high-resolution image can be obtained.

S3, inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

In a specific embodiment, the feature distillation extraction module comprises 3 full-scale feature fusion layers, 5 depth-separable convolution layers, an enhanced spatial attention layer and a Concat layer, wherein the sizes of the 3 full-scale feature fusion layers and 1 convolution kernel are as followsThe depth separable convolution layers of (2) are serially connected in turn to form a first branch, and 3 convolution kernels are +.>The depth separable convolution layers in the second branch are respectively connected with the full-scale feature fusion layer in the first branch in parallel, and after the output of the first branch and each depth separable convolution layer in the second branch are input into a Concat layer for splicing, the size of 1 convolution kernel in series is% >The depth of the (2) separable convolution layer and the enhanced spatial attention layer, and the output of the enhanced spatial attention layer is connected with the input of the characteristic distillation extraction module in a residual way to obtain the output of the characteristic distillation extraction module.

In a specific embodiment, the feature distillation extraction module inputs 3/4 channel features into the full scale feature fusion layer in the first branch and 1/4 channel features into the depth separable convolution layer in the second branch.

In a specific embodiment, the full-scale feature fusion layer comprises a local feature extraction module and a recursive anchor frame attention mechanism-based transform module, wherein the local feature extraction module comprises 3 convolution kernels with the size ofIs 1 convolution kernel size +.>The depth of the separable convolution layer, the ReLu activation function and the channel attention, and the input features of the full-scale feature fusion layer sequentially pass through the 1 st convolution kernel with the size of +.>Is 1 convolution kernel size +.>Can be divided into depth of (a)The distance convolution layer, reLu activation function, channel attention and 2 nd convolution kernel size is +.>Is the convolution layer of 2 nd convolution kernel size +.>After adding the input features of the convolution layer and the full-scale feature fusion layer, inputting a 3 rd convolution kernel with the size of +. >Is a convolution layer of (a) and (b).

In a specific embodiment, the Transformer module and 3 rd are based on an anchor frame attention mechanismThe convolutional layers are concatenated and the output of the Transformer module based on the anchor frame attention mechanism is again input into the Transformer module based on the anchor frame attention mechanism using a recursive mechanism.

In a specific embodiment, the transporter module based on the anchor frame attention mechanism comprises a channel attention module, an anchor frame attention module, an efficient local feature extraction module and a multi-layer perceptron, wherein the anchor frame attention module is positioned on the upper branch of the transporter module based on the anchor frame attention mechanism, the channel attention module and the efficient local feature extraction module are positioned on the lower branch of the transporter module based on the anchor frame attention mechanism, and the efficient local feature extraction module comprises two convolution kernels with the sizes ofThe depth separable convolution layer and ReLu activation functions, the computation of the transducer module based on the anchor frame attention mechanism is as follows:

；

wherein,and->Input and output of a transducer module based on an anchor bar frame attention mechanism are respectively represented; />Representing a multi-layer perceptron; / >Representing an anchor bar frame attention module; />Representing a channel attention module; />Representing an efficient local feature extraction module; />Representing cutting the input into blocks of the same size; />Representing that the result obtained by each block in front is spliced; wherein, the internal operation of the anchor bar frame attention module is defined as:

；

wherein,for layer normalization operations, ++>Self-attention force diagram representing a lateral bar, +.>Self-attention diagram representing vertical bar, N is the number of input vectors, ++>；/>、/>、 Respectively corresponding query, key value and value matrix of vectors input into the anchor bar frame attention module, +.>Is an intermediate matrix>Is the dimension of the input vector; />Representing average pooling>Representation->The convolution is performed point by point,indicating a convolution kernel size of +.>Is a depth separable convolution of (1); />、/>、/> ，/>All vector sets for the input of the anchor bar attention module, +.>Is->Results after layer normalization, < >>Representation matrix->And->The result of multiplication->All vector sets for the output of the anchor bar attention module, +.>To activate the function.

Specifically, referring to FIG. 4, a Transformer module based on an anchor frame attention mechanism is constructed. The modules consist of a channel attention module (Channel Attention Module, CA), an Anchor-strip self-attention module (ASA), an efficient local feature extraction module (Efficient local feature extraction module, ELE) and a multi-layer perceptron (MLP). Wherein the anchor frame attention module is located at an upper branch of the transporter module based on the anchor frame attention mechanism, the channel attention module and the efficient local feature extraction module are located at a lower branch of the transporter module based on the anchor frame attention mechanism, and the multi-layer perceptron is located at a last of the transporter module based on the anchor frame attention mechanism. The transducer module based on the anchoring frame attention mechanism realizes the global and regional modeling of the feature through the anchoring frame attention mechanism, and the feature is modeled by 2 convolution kernels with the size of The depth separable convolution, the ReLu activation function and the channel self-attention mechanism realize local modeling of the features, fully utilize the feature information of different scales of the image, and truly realize full-scale feature modeling, thereby greatly improving the image reconstruction quality.

Referring to fig. 5, the advantage of the anchor bar frame attention module (ASA) over the common multi-headed self-attention module is: first, the anchor bar attention module will apply to the input at the beginningConvolution aggregates pixel-level cross-channel contexts, howeverAfter use +.>Depth separable convolutions encode channel level contexts, the method generates +.>,/> , />A matrix; secondly, the method of anchoring strip frame is introduced, namely, when self-attention force diagrams are calculated, self-attention force diagrams of a transverse strip frame and a longitudinal strip frame are calculated respectively, and then the two self-attention force diagrams are calculated to obtain an integral attention force diagram, and an intermediate matrix is introduced in the process>The parameter is compared with +.>,/> ,/>Much smaller, the new self-care striving calculation process is: />The computational complexity is +.>Spatial complexity is->Compared with the original self-attention force diagram calculation process +. >Is of the computational complexity of/>And spatial complexity of->The weight of the model is reduced greatly.

Referring to fig. 6, a full-scale feature fusion layer is further constructed, which consists of a local feature extraction module and a recursive anchor frame attention mechanism-based transducer module.

The local feature extraction module is formed by 3 convolution kernels with the size ofIs 1 convolution kernel size +.>The depth separable convolution, the ReLu activation function and the channel attention constitute, and the local feature can be extracted efficiently. The transducer module based on the anchoring strip frame attention mechanism is introduced into the tail part of the depth feature extraction module based on full-scale feature fusion in a recursion mode, and the recursion mechanism is introduced, namely the output of the module is input into the module again as the module input, and the specified times are repeated, so that the transducer module based on the anchoring strip frame attention mechanism can be fully trained, and the consumption of GPU memory and model parameters do not need to be greatly increased, thus the minimum consumption of the model is used, the defect that the local feature extraction module in the front has not enough receptive field to acquire global information is overcome, and the model has light weight and high performance.

Referring to fig. 7, the full-scale feature layer is introduced into the feature distillation extraction module as a component part of the upper branch of the feature distillation extraction module, and the construction of the full-scale feature refinement lightweight image super-resolution model is completed. The feature distillation extraction module consists of 3 full-scale feature fusion layers, 5 convolution layers, a space attention enhancement layer and a Concat layer. Wherein 3 convolution kernels areSize of the productDepth separable convolution layer and 3 full scale feature fusion layers and 1 convolution kernel are +.>The depth separable convolution layers with the sizes are connected in parallel and are respectively two branches, wherein the first branch consists of 3 full-scale feature fusion layers and 2 convolution kernels with the sizes of +.>Is composed of serially connected depth separable convolution layers and enhanced spatial attention layers, and has a second branch of 3 parallel convolution kernels with the size ofThe depth separable convolution layer of (2) is obtained by splicing the results of two branches into a Concat layer and then performing series connection on the results to obtain 1 convolution kernel>And finally, summing the output of the enhanced spatial attention layer and the input of the characteristic distillation extraction module transmitted by residual connection to obtain the output of the characteristic distillation extraction module.

Further, the input of the characteristic distillation extraction module is divided into two parts and is respectively processed by two branches, namely, the 3/4 channel characteristics are further refined by the full-scale characteristic fusion layer of the first branch, the 1/4 channel characteristics are further refined by the second branch on the fine characteristics, redundant characteristics are removed, and finally, more accurate high-frequency information for reconstructing a high-resolution image is obtained, and the training burden of a model is reduced. Before entering each full-scale feature fusion layer, the channel is divided once, the channel is divided into two parts, wherein one part accounts for 3/4 of the original channel number, the other part accounts for 1/4 of the original channel number, the rough feature is a fine feature, and no more processing is needed.

And finally, reconstructing the low-resolution image to be reconstructed by using the trained full-scale feature refinement lightweight image super-resolution model to obtain a reconstructed high-resolution image.

The above steps S1-S3 do not merely represent the order between steps, but rather are step notations.

With further reference to fig. 8, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of a full-scale feature refinement lightweight image super-resolution apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

The embodiment of the application provides a full-scale feature refinement lightweight image super-resolution device, which comprises:

a data acquisition module 1 configured to acquire a low resolution image to be reconstructed;

the model construction module 2 is configured to construct and train a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module;

the reconstruction module 3 is configured to input a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, input the low-resolution image into a first convolution layer to obtain a first feature image, transmit the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtain a second feature image through a third convolution layer, add the second feature image with the first feature image to obtain a final feature image, input the final feature image into an up-sampling module, and reconstruct to obtain a high-resolution image.

Referring now to fig. 9, there is illustrated a schematic diagram of a computer apparatus 900 suitable for use in an electronic device (e.g., a server or terminal device as illustrated in fig. 1) for implementing an embodiment of the present application. The electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in fig. 9, the computer apparatus 900 includes a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 903 or a program loaded from a storage section 909 into a Random Access Memory (RAM) 904. In the RAM 904, various programs and data required for the operation of the apparatus 900 are also stored. The CPU 901, GPU902, ROM 903, and RAM 904 are connected to each other by a bus 905. An input/output (I/O) interface 906 is also connected to bus 905.

The following components are connected to the I/O interface 906: an input section 907 including a keyboard, a mouse, and the like; an output portion 908 including a speaker, such as a Liquid Crystal Display (LCD), or the like; a storage section 909 including a hard disk or the like; and a communication section 910 including a network interface card such as a LAN card, a modem, or the like. The communication section 910 performs communication processing via a network such as the internet. The drive 911 may also be connected to the I/O interface 906 as needed. A removable medium 912 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 911 so that a computer program read out therefrom is installed into the storage section 909 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 910, and/or installed from the removable medium 912. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 901 and a Graphics Processor (GPU) 902.

It should be noted that the computer readable medium according to the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. The described modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a low-resolution image to be reconstructed; constructing and training a full-scale feature refinement lightweight image super-resolution model to obtain a trained full-scale feature refinement lightweight image super-resolution model, wherein the full-scale feature refinement lightweight image super-resolution model comprises a first convolution layer, a second convolution layer, a third convolution layer, K feature distillation extraction modules and an up-sampling module; the method comprises the steps of inputting a low-resolution image into a trained full-scale feature refinement lightweight image super-resolution model, inputting the low-resolution image into a first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to a second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through a third convolution layer, adding the second feature image with the first feature image to obtain a final feature image, inputting the final feature image into an up-sampling module, and reconstructing to obtain a high-resolution image.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. The full-scale characteristic refinement lightweight image super-resolution method is characterized by comprising the following steps of:

acquiring a low-resolution image to be reconstructed;

inputting the low-resolution image into the trained full-scale feature refined lightweight image super-resolution model, inputting the low-resolution image into the first convolution layer to obtain a first feature image, transmitting the output of each feature distillation extraction module to the second convolution layer through K feature distillation extraction modules connected in series, obtaining a second feature image through the third convolution layer, adding the second feature image and the first feature image to obtain a final feature image, inputting the final feature image into the up-sampling module, and reconstructing to obtain a high-resolution image.

2. The full-scale feature refinement lightweight image super-resolution method according to claim 1, wherein the calculation process of the full-scale feature refinement lightweight image super-resolution model is as follows:

；

wherein,low resolution image representing input, +.>Representing a reconstructed high resolution image, +.>Representing a sub-pixel convolution upsampling operation in said upsampling module,/for>Indicating a convolution kernel size of 3 +.>The convolution operation of 3,indicating a convolution kernel size of 1 +.>Convolution operation of 1>Representing the operation of the feature distillation extraction module, +.>Indicating that go through->And outputting the characteristic distillation and extraction module.

3. The full-scale feature refined lightweight image super-resolution method of claim 1, wherein the feature distillation extraction module comprises 3 full-scale feature fusion layers, 5 depth separable convolution layers, an enhanced spatial attention layer, and a Concat layer, wherein the sizes of 3 full-scale feature fusion layers and 1 convolution kernel are 33, the depth separable convolution layers are connected in series in sequence to form a first branch, and the size of 3 convolution kernels is 3 +.>3, wherein the depth separable convolution layers in the second branch are respectively connected with the full-scale feature fusion layer in the first branch in parallel, and the output of the first branch and each depth separable convolution layer in the second branch are input into the Concat layer for splicing Then the sequence is subjected to series connection of 1 convolution kernel with the size of 3 +.>3, the depth of the convolution layer and the enhanced spatial attention layer can be separated, and the output of the enhanced spatial attention layer is connected with the input of the characteristic distillation extraction module in a residual way, so that the output of the characteristic distillation extraction module is obtained.

4. A full-scale feature refinement lightweight image super-resolution method as claimed in claim 3, wherein the feature distillation extraction module inputs 3/4 channel features into the full-scale feature fusion layer in the first branch and 1/4 channel features into the depth separable convolution layer in the second branch.

5. The full-scale feature refinement lightweight image super-resolution method of claim 1, wherein the full-scale feature fusion layer comprises a local feature extraction module and a recursive anchor frame attention mechanism-based Transformer module, the local feature extraction module comprising 3 convolution kernels of size 11, 1 convolution kernel size of 3 +.>3, a ReLu activation function and channel attention, wherein the input features of the full-scale feature fusion layer sequentially pass through a 1 st convolution kernel with the size of 1 +. >1, 1 convolution kernel size of 3 +.>3, the depth separable convolution layer, the ReLu activation function, the channel attention and the 2 nd convolution kernel size are 1 +.>1, the 2 nd convolution kernel size is 1 +.>The convolution layer of 1 is added with the input features of the full-scale feature fusion layer, and then the 3 rd convolution kernel with the size of 1 +.>1.

6. The full-scale feature refinement lightweight image super-resolution method according to claim 5, wherein said anchor frame attention mechanism-based transducer module is identical to said 3 rd 11 convolving layer connections and re-inputting the output of the anchor frame attention mechanism based fransformer module into the anchor frame attention mechanism based fransformer module using a recursive mechanism.

7. The full-scale feature refinement lightweight image super-resolution method of claim 5, wherein the anchor frame attention mechanism-based Transformer module comprises a channel attention module, an anchor frame attention module, an efficient local feature extraction module and a multi-layer perceptron, wherein the anchor frame attention module is located at an upper branch of the anchor frame attention mechanism-based Transformer module, the channel attention module and the efficient local feature extraction module are located at a lower branch of the anchor frame attention mechanism-based Transformer module, the efficient local feature extraction module comprises two convolution kernel sizes of 3 connected in sequence 3 depth separable convolutional layer and ReLu activation function, tran based on the Anchor frame attention mechanismThe calculation process of the sformer module is as follows:

；

wherein,and->Respectively representing the input and the output of the transducer module based on the attention mechanism of the anchoring bar frame; />Representing a multi-layer perceptron; />Representing an anchor bar frame attention module; />Representing a channel attention module; />Representing an efficient local feature extraction module; />Representing cutting the input into blocks of the same size; />Representing that the result obtained by each block in front is spliced; wherein, the internal operation of the anchoring bar frame attention module is defined as: />；

；

Wherein,for layer normalization operations, ++>Representing a self-attention force diagram of the lateral bar,self-attention diagram representing vertical bar, N is the number of input vectors, ++>；/>、/>、/> Respectively corresponding to the vector input into the anchoring bar frame attention module, a query, a key value and a value matrix,is an intermediate matrix>Is the dimension of the input vector; />Representing average pooling>Representing 1->1 point-by-point convolution,>indicating a convolution kernel size of 3 +.>3, depth separable convolution; />、/>、/> ，/>For all sets of vectors of the input of the anchor bar attention module,/for the anchor bar attention module >Is->Results after layer normalization, < >>Representation matrix->And->The result of multiplication->For all sets of vectors of the output of the anchor bar frame attention module,to activate the function.

8. A full-scale feature refinement lightweight image super-resolution device, comprising:

the reconstruction module is configured to input the low-resolution image into the trained full-scale feature refinement lightweight image super-resolution model, the low-resolution image is input into the first convolution layer to obtain a first feature image, the first feature image is subjected to K feature distillation extraction modules which are connected in series, the output of each feature distillation extraction module is transmitted to the second convolution layer and is subjected to the third convolution layer to obtain a second feature image, the second feature image is added with the first feature image to obtain a final feature image, and the final feature image is input into the up-sampling module to be reconstructed to obtain a high-resolution image.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.