CN116823656B

CN116823656B - Image blind deblurring method and system based on frequency domain local feature attention mechanism

Info

Publication number: CN116823656B
Application number: CN202310764762.8A
Authority: CN
Inventors: 李庆利; 毛欣天; 王妍
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2024-06-28
Anticipated expiration: 2043-06-27
Also published as: CN116823656A

Abstract

The invention discloses an image blind deblurring method and system based on a frequency domain local feature attention mechanism, wherein the method comprises the following steps: acquiring an image deblurring data set, preprocessing the image deblurring data set, and acquiring a training set of the image deblurring data set; training the initial frequency domain local feature attention mechanism network based on the blurred image and the clear image in the training set to obtain a target frequency domain local feature attention mechanism network; and inputting the blurred image to be detected into a target frequency domain local feature attention mechanism network to perform image blind deblurring processing, and obtaining a target clear image. The invention combines the frequency domain information and the space information of the characteristic image, and can effectively help the convolutional neural network to restore the blurred image into a clearer image through the frequency domain information of the characteristic image.

Description

Image blind deblurring method and system based on frequency domain local feature attention mechanism

Technical Field

The invention belongs to the technical field of computer vision and image processing, and particularly relates to an image blind deblurring method and system based on a frequency domain local feature attention mechanism.

Background

Image deblurring aims at eliminating the blurred features to restore a sharp image. Many factors can cause blurring, such as irregular movement of the camera or object, optical defocus, etc. Low quality blurred images present significant challenges for subsequent advanced visual tasks such as medical diagnosis, object recognition, etc.

In the wave of the global feature learning method, significant progress has been made in the field of image restoration. Existing MLP-based methods, as shown in FIG. 5 (a), MAXIM sparsely decomposes global MLP operations into window-MLP and grid-MLP. In addition to the MLP-based approach, recent research studies such as Restormer, uformer, stripformer have shown the ability of the attention mechanism in image deblurring tasks. Note that the mechanism (SA) transform model is a key to capturing remote dependencies, and its computational complexity is twice as large as the number of pixels, which is not suitable for application to high resolution image deblurring tasks. In order to make the computation feasible, the existing methods try various methods to reduce the number of pixels of the SA in the spatial domain, which can be divided into three categories. (1) Local Spatial-wise SA (Spa-LS). As in fig. 5 (b), uformer proposes a locally enhanced window transform block to capture local context information, which makes it difficult for the remote information to be modeled efficiently. (2) global SA. As in fig. 5 (c), stripformer explores horizontal and vertical intra-and inter-stripe SAs, which rely on a strong assumption that image blur is generally region-oriented. (3) coarse-grained global SA. As in FIG. 5 (d), restormer captures remote interactions (Spa-GC) through Global CHANNEL WISE SA. Although Spa-GC can learn global information of features, it is inevitably more focused on extracting low frequency components of an image (1) the energy of an image is mainly concentrated at low frequencies and (2) when feature learning is performed, a high frequency part is generally more difficult to handle than a low frequency part in practice. The low-frequency part is coarse-granularity information, namely the basic structure of the object; the high frequency part is fine-grained level information, i.e. texture details. Therefore, a coarse global SA such as Spa-GC has a problem of insufficient fine-grained correlation.

Disclosure of Invention

In order to build a remote dependency model without compromising fine-grained detail, the present invention proposes a network of frequency-domain local feature attention mechanisms (LoFormer) for image deblurring as shown in FIG. 2. In particular, the present invention proposes a frequency domain local channel self-attention structure (Freq-LC) as shown in fig. 3. First, the present invention converts features to the frequency domain through Discrete Cosine Transform (DCT). The DCT represents the original features as coefficients of different base images. As shown in fig. 5 (e), the base map may be arranged in a rectangular grid with the low frequency component at the upper left corner and the high frequency component at the lower right corner. The top left base graph represents the average intensity of the entire image, while the remaining base graphs capture finer and finer detail and texture. It is apparent that coefficients of any frequency have global information. In order to provide the coarse grain structure and fine grain detail with equal learning opportunities, the invention designs a window-based frequency characteristic extraction paradigm, namely splitting frequency coefficients into non-overlapping windows.

In order to achieve the above object, the present invention provides the following solutions:

An image blind deblurring method based on a frequency domain local feature attention mechanism comprises the following steps:

acquiring an image deblurring dataset;

Preprocessing the image deblurring data set to obtain a training set of the image deblurring data set;

Training the initial frequency domain local feature attention mechanism network based on the blurred image and the clear image in the training set to obtain a target frequency domain local feature attention mechanism network;

And inputting the blurred image to be detected into the target frequency domain local feature attention mechanism network to perform image blind deblurring processing, and obtaining a target clear image.

Preferably, the image deblurring dataset comprises: goPro dataset, HIDE dataset, realBlur dataset, REDS dataset.

Preferably, the method for training the initial frequency domain local feature attention mechanism network based on the blurred image and the clear image in the training set comprises the following steps:

S1: inputting the blurred image into an initial frequency domain local feature attention mechanism network to obtain an output image;

s2: calculating loss and carrying out gradient inversion based on the difference between the output image and the clear image, and updating parameters of the initial frequency domain local feature attention mechanism network;

S3: and repeating the step S1 and the step S2 until the training times reach the preset number, and obtaining the target frequency domain local feature attention mechanism network.

Preferably, the method for inputting the blurred image into the initial frequency domain local feature attention mechanism network to obtain an output image comprises the following steps:

inputting the blurred image into the initial frequency domain local feature attention mechanism network, and processing the blurred image through a U-shaped coder and decoder to obtain corresponding blurred features;

and adding the blurred image and the corresponding blurred feature to obtain an output image.

Preferably, the method of calculating the loss based on the difference between the output image and the clear image comprises,

Comparing the clear image in the training set with the output image of the frequency domain local feature attention mechanism network to obtain a gap;

And calculating the difference to obtain loss.

The invention also provides an image blind deblurring system based on the frequency domain local feature attention mechanism, which comprises: the system comprises a data set acquisition module, a training set acquisition module, a network acquisition module and an image acquisition module;

the data set acquisition module is used for acquiring an image deblurring data set;

The training set acquisition module is used for preprocessing the image deblurring data set to acquire a training set of the image deblurring data set;

the network acquisition module is used for training the initial frequency domain local feature attention mechanism network based on the blurred image and the clear image in the training set to obtain a target frequency domain local feature attention mechanism network;

The image acquisition module is used for inputting the blurred image to be detected into the target frequency domain local feature attention mechanism network to perform image blind deblurring processing, and a target clear image is obtained.

Preferably, the network acquisition module includes: an output image obtaining unit, a calculating unit, and a network obtaining unit;

The output image obtaining unit is used for inputting the blurred image into an initial frequency domain local feature attention mechanism network to obtain an output image;

The computing unit is used for computing loss and carrying out gradient inversion based on the difference between the output image and the clear image, and updating parameters of the initial frequency domain local feature attention mechanism network;

The network obtaining unit is used for repeating the training steps of the output image obtaining unit and the calculating unit until the training times reach the preset number, and obtaining the target frequency domain local feature attention mechanism network.

Preferably, in the output image obtaining unit, the process of inputting the blurred image into an initial frequency domain local feature attention mechanism network to obtain an output image includes:

Preferably, in the calculating unit, the process of calculating the loss includes,

And calculating the difference to obtain loss.

Compared with the prior art, the invention has the beneficial effects that:

The invention provides an image blind deblurring method based on a frequency domain local feature attention mechanism, wherein a frequency domain local feature attention mechanism network comprises a frequency domain normalization (DCT-LN) module, and from the perspective of an image frequency domain, the training stability is improved in a frequency domain information normalization mode. Furthermore, the frequency domain local feature attention mechanism (Freq-LC) module utilizes the frequency domain space to realize the decomposition of different fine granularity features, thereby achieving the balanced utilization of high and low frequency information. Further, the MLP gating mechanism (MGate) enhances the aggregation capability of global information, effectively restoring blurred images to clearer images.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a frequency domain local feature attention mechanism in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a frequency domain local feature attention mechanism according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a frequency domain normalization process according to an embodiment of the present invention;

FIG. 5 is a diagram showing a comparison of a frequency domain local feature attention method and other global learning methods in an embodiment of the present invention;

FIG. 6 is a diagram of an example of GoPro test data in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1

As shown in fig. 1, the present embodiment provides an image blind deblurring method based on a frequency domain local feature attention mechanism, which includes two stages, namely a network training stage and a network prediction stage.

The training phase of the network and the prediction phase of the network comprise the following steps:

1. preparing an image deblurring standard data set, wherein the selected image deblurring standard data set is as follows: a GoPro dataset, a HIDE dataset, realBlur dataset and a REDS dataset; preprocessing a data set, namely randomly cutting data into 384 multiplied by 384 and randomly horizontally turning up and down before inputting experimental data into a model for training to obtain a training set of an image deblurring data set;

2. Training the initial frequency domain local feature attention mechanism network based on the blurred image and the clear image in the training set to obtain a target frequency domain local feature attention mechanism network;

In the embodiment, a blurred image is input into an initial frequency domain local feature attention mechanism network to obtain an output image; based on the difference between the output image and the clear image, calculating loss and carrying out gradient inversion, and updating parameters of an initial frequency domain local feature attention mechanism network; repeating the training steps until the training times reach the preset number, and obtaining a target frequency domain local feature attention mechanism network;

The method for inputting the blurred image into the initial frequency domain local feature attention mechanism network and obtaining the output image comprises the following steps: inputting the blurred image into an initial frequency domain local feature attention mechanism network, and processing the blurred image through a U-shaped coder and decoder to obtain a corresponding blurred feature prediction scale; adding the blurred image and the corresponding blurred features to obtain an output image;

The method for calculating the loss based on the difference between the output image and the clear image comprises the steps of comparing the clear image in the training set with the network output image of the frequency domain local feature attention mechanism to obtain the difference, and calculating to obtain the loss;

3. Inputting blurred images in the training set of the image deblurring standard dataset into a target frequency domain local feature attention mechanism network (Local Frequency Transformer) to obtain estimated images which are possibly clear;

Further, the difference between the network output image and the clear image in the training set is calculated to obtain loss, and the gradient is returned to update the parameters of the network, so that the training is repeated until the number of training times reaches the preset number. Is provided with S is the reconstruction result and the real and clear data output by the network respectively. The loss functions of the network training are respectively the combination of the following two loss functions:

1) L1 reconstruction Loss function (L1 Loss):

2) Frequency domain reconstruction Loss function (FR Loss):

The final loss function is as follows:

L＝L₁+aL_fr

Wherein a is a super parameter, which is set to 0.01.

Referring to fig. 2, the frequency domain local feature attention mechanism network (Local Frequency Transformer, loFormer) adopts a U-shaped blurred image restoration network structure (UNet) and comprises an encoder, a latent layer feature processing module and a decoder, wherein the encoder and the decoder respectively comprise 3 scales, and three transverse connections exist between the encoder and the decoder. The experimental data firstly enter an encoder and a latent layer characteristic processing module, and finally the decoder which passes through the experimental data obtains fuzzy characteristic prediction and then adds the fuzzy characteristic prediction with corresponding input to obtain a restored image. The invention designs two models LoFormer-S and LoFormer-B respectively by changing the number of frequency domain local feature attention modules (Local Frequency Transformer Block, loFT block) contained in the encoder, the latent layer feature processing module and the decoder under different scales (Stage 1-4 and STAGE REFINEMENT in figure 2). Wherein LoFormer-S encoders and decoders from Stage 1-4 contain 2, 4, 6, 14 LoFT block, STAGE REFINEMENT contain 2 LoFT block, respectively. LoFormer-B encoders and decoders contained 2, 4, 12, 18 LoFT block, STAGE REFINEMENT contained 2 LoFT block, respectively, from Stage 1-4.

Referring to fig. 3, the frequency domain local feature attention module process is as follows:

(1) Calculating the 2D discrete cosine transform of X _in to obtain X _dct∈R^H×W×C;

(2) LayerNorm is carried out on the channel dimension of X _dct to obtain X _LN＝LN(X_dct);

(3) X _LN was subjected to 1X 1 Conv-3X 3DConv and blocked by 8X 8 to give Q, K, V e R ^K ^×n×C, n=64, k=h×w/n;

(4) Q, K obtaining an attention matrix A epsilon R ^K×C×C through matrix multiplication operation;

(5) V and the attention matrix A are subjected to matrix multiplication operation to obtain V _attn, MLP-GeLU operation is performed on the second dimension of V to obtain V _MGate, and the V _attn and the V _MGate are multiplied to obtain V _out＝V_attn×V_MGate;

(6) Performing inverse blocking treatment on V _out to obtain Z' _dct∈R^H×W×C, and performing 1X 1Conv to obtain Z _dct∈R^H×W×C;

(7) Calculating the 2D inverse discrete cosine transform of Z _dct to obtain Z E R ^H×W×C

Through the operation, the invention models the local correlation of the frequency domain information. The final output was calculated by y=x _in +z.

Referring to fig. 4, the frequency domain features after LN are distributed more uniformly than the frequency domain features before LN.

Referring to fig. 5, the difference between the frequency domain local feature attention mechanism proposed by the present invention and other mainstream MLP methods and attention mechanism methods is that the frequency domain local feature attention mechanism proposed by the present invention uses features contained in a local window of a frequency domain feature to implement global modeling of spatial information.

Referring to fig. 6, the large graph is the original sharp image, and the small graph is the sharp image, blurred image, MIMO-unet+, MPRNet, deepRFT +, NAFNet, restormer, and LoFormer-B results, respectively, from top left to bottom right. By comparing Restormer results with the frequency domain local feature attention mechanism (LoFormer-B), the invention can be used for effectively improving the deblurring capability of the convolutional neural network, and compared with the current mainstream deblurring neural network, the frequency domain local feature attention mechanism (LoFormer) has better deblurring capability.

Example two

The training set acquisition module is used for preprocessing the image deblurring data set to obtain a training set of the image deblurring data set;

The image acquisition module is used for inputting the blurred image to be detected into the target frequency domain local feature attention mechanism network to perform image blind deblurring processing, and obtaining a target clear image.

In this embodiment, the image deblurring dataset comprises: goPro dataset, HIDE dataset, realBlur dataset, REDS dataset.

In this embodiment, the network acquisition module includes: an output image obtaining unit, a calculating unit, and a network obtaining unit;

The computing unit is used for computing loss and carrying out gradient inversion based on the difference between the output image and the clear image, and updating the parameters of the initial frequency domain local feature attention mechanism network;

the network obtaining unit is used for repeatedly outputting the training steps of the image obtaining unit and the calculating unit until the training times reach the preset number, and obtaining the target frequency domain local feature attention mechanism network.

In this embodiment, in the output image obtaining unit, the process of inputting the blurred image into the initial frequency domain local feature attention mechanism network to obtain the output image includes:

Inputting the blurred image into an initial frequency domain local feature attention mechanism network, and processing the blurred image through a U-shaped coder and decoder to obtain corresponding blurred features;

And adding the blurred image and the corresponding blurred features to obtain an output image.

In the present embodiment, the process of calculating the loss based on the difference between the output image and the clear image in the calculating unit includes,

and calculating the gap to obtain loss.

The invention is mainly characterized in that the invention provides a frequency domain local feature attention mechanism network (LoFormer) for an image deblurring task. Unlike previous transform-based methods that either learn local Self-Attention or coarse-grained global Self-Attention to reduce computational complexity, loFormer models coarse-grained and fine-grained remote dependencies by simply executing the channel Self-Attention within each local window of the frequency domain feature. In order to filter out invalid features and enhance global learning ability, the invention further designs an MLP gating mechanism to enhance aggregation of frequency domain information. LoFormer successfully solves the problem of insufficient fitting of the traditional coarse-granularity global Self-attribute to high-frequency (detail) information, and effectively improves the deblurring effect of the model.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims

1. The image blind deblurring method based on the frequency domain local feature attention mechanism is characterized by comprising the following steps of:

acquiring an image deblurring dataset;

inputting the blurred image to be detected into the target frequency domain local feature attention mechanism network to perform image blind deblurring processing to obtain a target clear image;

The target frequency domain local feature attention mechanism network adopts a U-shaped fuzzy image recovery network structure UNet, and comprises an encoder, a latent layer feature processing module and a decoder, wherein the encoder and the decoder respectively comprise 3 scales, and three transverse connections exist between the encoder and the decoder; the experimental data firstly enter an encoder and a latent layer characteristic processing module, and finally a decoder passing through the experimental data obtains fuzzy characteristic prediction and then adds the fuzzy characteristic prediction with corresponding input to obtain a restored image; two models LoFormer-S and LoFormer-B are respectively designed by changing the number of frequency domain local feature attention modules contained in the encoder, the latent layer feature processing module and the decoder under different scales; wherein LoFormer-S encoders and decoders from Stage 1-4 contain 2, 4, 6, 14 LoFT block, STAGE REFINEMENT contain 2 LoFT block, respectively; loFormer-B encoders and decoders from Stage 1-4 contained 2, 4, 12, 18 LoFT block, STAGE REFINEMENT contained 2 LoFT block, respectively;

The frequency domain local feature attention module processing procedure is as follows:

Calculating the 2D discrete cosine transform of X _in to obtain X _dct∈R^H×W×C;

LayerNorm is carried out on the channel dimension of X _dct to obtain X _LN＝LN(X_dct);

X _LN was subjected to 1X 1 Conv-3X 3DConv and blocked by 8X 8 to give Q, K, V e R ^K×n×C, n=64, k=h×w/n;

Q, K obtaining an attention matrix A epsilon R ^K×C×C through matrix multiplication operation;

V and the attention matrix A are subjected to matrix multiplication operation to obtain V _attn, MLP-GeLU operation is performed on the second dimension of V to obtain V _MGate, and the V _attn and the V _MGate are multiplied to obtain V _out＝V_attn×V_MGate;

performing inverse blocking treatment on V _out to obtain Z' _dct∈R^H×W×C, and performing 1X 1Conv to obtain Z _dct∈R^H×W×C;

Calculating the 2D inverse discrete cosine transform of Z _dct to obtain Z epsilon R ^H×W×C;

Modeling the local correlation of the frequency domain information, and finally obtaining the output through calculation of Y=X _in +Z;

the frequency domain features after LN are distributed more uniformly than the frequency domain features before LN.

2. The method of image blind deblurring based on a frequency domain local feature attention mechanism of claim 1, wherein the image deblurring dataset comprises: goPro dataset, HIDE dataset, realBlur dataset, REDS dataset.

3. The method for blind deblurring of images based on frequency domain local feature attention mechanisms of claim 1, wherein the method for training the initial frequency domain local feature attention mechanism network based on blurred images and sharp images in the training set comprises:

4. A method of blind deblurring of an image based on a frequency domain local feature attention mechanism according to claim 3, wherein inputting the blurred image into an initial frequency domain local feature attention mechanism network, the method of obtaining an output image comprises:

5. The method for blind deblurring of an image based on a frequency domain local feature attention mechanism according to claim 3, wherein the method for calculating a loss based on a gap between said output image and said sharp image comprises,

And calculating the difference to obtain loss.

6. An image blind deblurring system based on a frequency domain local feature attention mechanism, comprising: the system comprises a data set acquisition module, a training set acquisition module, a network acquisition module and an image acquisition module;

The image acquisition module is used for inputting a to-be-detected blurred image into the target frequency domain local feature attention mechanism network to perform image blind deblurring processing to obtain a target clear image;

7. The image blind deblurring system based on a frequency domain local feature attention mechanism of claim 6 wherein the image deblurring dataset comprises: goPro dataset, HIDE dataset, realBlur dataset, REDS dataset.

8. The image blind deblurring system based on a frequency domain local feature attention mechanism of claim 6, wherein said network acquisition module comprises: an output image obtaining unit, a calculating unit, and a network obtaining unit;

9. The image blind deblurring system based on the frequency domain local feature attention mechanism according to claim 8, wherein the process of inputting the blurred image into the initial frequency domain local feature attention mechanism network in the output image obtaining unit, to obtain an output image, comprises:

10. The image blind deblurring system based on the frequency domain local feature attention mechanism according to claim 8, wherein the process of calculating the loss based on the difference between the output image and the clear image in the calculating unit includes,

And calculating the difference to obtain loss.