CN117408916A

CN117408916A - Image deblurring method based on multi-scale residual Swin transducer and related product

Info

Publication number: CN117408916A
Application number: CN202311348430.8A
Authority: CN
Inventors: 赵振兴
Original assignee: Shenzhen Ruishi Zhixin Technology Co ltd
Current assignee: Shenzhen Ruishi Zhixin Technology Co ltd
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-01-16

Abstract

The application provides an image deblurring method based on a multi-scale residual error Swin converter and a related product, wherein a blurred APS image and corresponding EVS data are input into a shallow feature extraction module to respectively extract shallow features; the extracted APS shallow features and the EVS shallow features are input to a feature fusion module for feature fusion; taking the fusion feature as the input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, and simultaneously taking the EVS shallow features as the input of a plurality of multi-scale residual Swin transducer modules with different scales to extract deep features to obtain deep features; and (3) superposing the APS shallow layer features and the deep layer features, inputting the superposed APS shallow layer features and the superposed deep layer features into a feature reconstruction module for feature reconstruction, and outputting a clear APS image corresponding to the fuzzy APS image. According to the scheme, the high dynamic characteristics of the event data and the global information capturing capability of the multi-scale residual error Swin converter model are fully utilized, the deblurring effect of an APS image is improved, and the image quality enhancement requirement in an actual application scene can be effectively met.

Description

Image deblurring method based on multi-scale residual Swin transducer and related product

Technical Field

The present application relates to the field of artificial intelligence, and in particular to the field of computer vision, which is applicable to deblurring scenes of APS (Active-Pixel Sensor) images. More specifically, the application discloses an image deblurring method based on a multi-scale residual Swin Transformer and a related product.

Background

In the process of acquiring an image by the image acquisition device, if there is relative motion between the image acquisition device and the shooting target, for example, the image acquisition device and/or the shooting target are in a motion state, imaging blurring of the finally shot image can be caused, based on the fact, an image deblurring technology is developed, and research personnel aim to enhance the image quality of the blurred image by an image deblurring algorithm.

In the related art, an APS image is generally input to a neural network model to perform an APS image deblurring process, however, the neural network model used in the related art is generally implemented by using a common convolution, and the receptive field of the common convolution is limited, and guide information provided by the APS image is limited, so that the image deblurring effect is poor, and it is difficult to meet the image quality enhancement requirement in practical application.

It is noted that the techniques described in this section are not necessarily ones that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the techniques described in this section are merely prior art as they were included in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The main objective of the present application is to provide an image deblurring method based on a multi-scale residual Swin transform and related products, which at least can solve the problem that the deblurring effect of the image deblurring method provided by the related technology is poor and is difficult to meet the image quality enhancement requirement in practical application.

The first aspect of the present application provides an image deblurring method based on a multi-scale residual Swin transform, comprising:

inputting the blurred APS image and corresponding EVS data into a shallow feature extraction module to extract shallow features respectively to obtain APS shallow features and EVS shallow features;

inputting the APS shallow features and the EVS shallow features into a feature fusion module for feature fusion to obtain fusion features;

taking the fusion feature as the input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, taking the EVS shallow feature as the input of all multi-scale residual Swin transducer modules at the same time, and sequentially extracting deep features by using a plurality of multi-scale residual Swin transducer modules with different scales to obtain deep features;

and superposing the APS shallow layer features and the deep layer features, inputting the superposed features into a feature reconstruction module for feature reconstruction, and outputting a clear APS image corresponding to the blurred APS image.

A second aspect of the present application provides an image deblurring device based on a multi-scale residual Swin Transformer, comprising:

the shallow feature extraction module is used for inputting the blurred APS image and corresponding EVS data into the shallow feature extraction module to extract shallow features respectively, so as to obtain APS shallow features and EVS shallow features;

the feature fusion module is used for inputting the APS shallow features and the EVS shallow features into the feature fusion module for feature fusion to obtain fusion features;

the deep feature extraction module is used for taking the fusion feature as the input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, taking the EVS shallow feature as the input of all multi-scale residual Swin transducer modules at the same time, and sequentially extracting the deep features by using a plurality of multi-scale residual Swin transducer modules with different scales to obtain deep features;

and the feature reconstruction module is used for inputting the superimposed APS shallow layer features and the deep layer features to the feature reconstruction module for feature reconstruction and outputting a clear APS image corresponding to the blurred APS image.

A third aspect of the present application provides an electronic device, comprising: the image deblurring method based on the multi-scale residual Swin transform provided in the first aspect of the application is realized when the processor executes the computer program.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the image deblurring method based on the multi-scale residual Swin Transformer provided in the first aspect of the present application.

From the above, according to the image deblurring method based on multi-scale residual Swin Transformer and related products provided by the scheme of the application, the blurred APS image and corresponding EVS data are input to the shallow feature extraction module to extract shallow features respectively, so as to obtain APS shallow features and EVS shallow features; inputting the APS shallow features and the EVS shallow features into a feature fusion module for feature fusion to obtain fusion features; taking the fusion feature as the input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, taking the EVS shallow feature as the input of all multi-scale residual Swin transducer modules at the same time, and sequentially extracting deep features by using a plurality of multi-scale residual Swin transducer modules with different scales to obtain deep features; and (3) superposing the APS shallow layer features and the deep layer features, inputting the superposed APS shallow layer features and the superposed deep layer features into a feature reconstruction module for feature reconstruction, and outputting a clear APS image corresponding to the fuzzy APS image. Through implementation of the scheme, the event data based on synchronous acquisition guides the neural network model based on the multi-scale residual Swin transducer to perform the APS image deblurring processing, the high dynamic characteristic of the event data and the global information capturing capability of the multi-scale residual Swin transducer model are fully utilized, the deblurring effect of the APS image is improved, and the image quality enhancement requirement in an actual application scene can be effectively met.

It should be understood that the description of this section is not intended to identify key or critical features of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The drawings are shown for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

Fig. 1 is a schematic diagram of an image deblurring model based on a multi-scale residual Swin Transformer according to an embodiment of the present application;

fig. 2 is a basic flow diagram of an image deblurring method based on a multi-scale residual Swin Transformer according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a shallow feature extraction module according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a multi-scale residual Swin Transformer module according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a residual Swin Transformer module according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a Swin transducer module according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a multi-head fusion attention module based on windows according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a feature reconstruction module according to an embodiment of the present disclosure;

fig. 9 is a schematic functional block diagram of an image deblurring device based on a multi-scale residual Swin Transformer according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the description of the embodiments of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, the term "plurality" means two or more, unless specifically defined otherwise.

In order to solve the problem that the image deblurring effect of the image deblurring method provided by the related art is poor and is difficult to meet the image quality enhancement requirement in practical application, an embodiment of the present application provides an image deblurring method based on a multi-scale residual Swin Transformer, which is applied to an image deblurring model based on a multi-scale residual Swin Transformer as shown in fig. 1, and the input of the model is divided into two parts: APS images (i.e., images in fig. 1) and EVS data (i.e., events in fig. 1), the backbone network of the model includes four parts: the device comprises a shallow feature extraction module (FE), a Fusion module (merge_block), a multi-scale deep feature extraction network and a feature reconstruction module (reconstruction), wherein the shallow feature extraction module comprises an EVS shallow feature extraction module and an APS shallow feature extraction module (namely E_FE and I_FE in fig. 1), the multi-scale deep feature extraction network comprises a plurality of multi-scale residual error Swin transform modules (namely fusion_MS-RSTB in fig. 1) which are cascaded and have different scales, and the output out of the model is a clear APS image after deblurring.

Fig. 2 is a basic flowchart of an image deblurring method based on a multi-scale residual Swin Transformer according to the present embodiment, where the image deblurring method based on the multi-scale residual Swin Transformer includes the following steps:

and step 201, inputting the blurred APS image and corresponding EVS data to a shallow feature extraction module to extract shallow features respectively, so as to obtain APS shallow features and EVS shallow features.

Specifically, in the present embodiment, APS images are acquired by an Active-Pixel Sensor (APS), and Event data is acquired based on an Event-monitoring-type vision Sensor (EVS, event-based Vision Sensor). In practical application, the active pixel sensor and the event monitoring type vision sensor of the present embodiment may be discrete image sensors, or may be integrated image sensors (also referred to as fusion sensors), for the integrated image sensors, that is, the integrated image sensors are divided into a plurality of sub-photosensitive areas, where the pixel arrays of the sub-photosensitive areas respectively correspond to an APS data mode and an EVS data mode, and compared with the separately arranged plurality of sensor modules, the device volume of the sensor modules is effectively compressed, which is more beneficial to miniaturization of the overall hardware architecture. It should be noted that the event monitoring vision sensor is a novel sensor, which simulates the retina of a human being, responds to pixel point pulses of brightness change generated by motion, so that the event monitoring vision sensor can capture the brightness change (namely light intensity change) of a scene at an extremely high frame rate, record events at specific time points and specific positions in an image, form event streams instead of frame streams, and solve the problems of redundancy of information, data storage capacity, large real-time processing capacity and the like of the traditional camera.

It should be appreciated that the EVS pixel generates event data in response to a change in brightness, and when the change in brightness exceeds a certain threshold, outputs an event data including pixel coordinates (x, y), a time stamp (t), and an event polarity (p, values +1 and-1, respectively, representing an increase and decrease in brightness), each event data being represented in the form of e= (t, x, y, p).

In some implementations of the present embodiment, before the step of inputting the blurred APS image and the corresponding EVS data to the shallow feature extraction module to perform shallow feature extraction, the method further includes: acquiring all EVS data synchronously acquired within the APS exposure time corresponding to the blurred APS image; splitting all EVS data into positive polarity event data and negative polarity event data; equally dividing positive polarity event data and negative polarity event data into six equal parts respectively, and then respectively splicing the six equal parts into a six-channel matrix; and splicing the two six-channel matrixes to obtain EVS data corresponding to the fuzzy APS image.

Specifically, in this embodiment, an APS image and EVS data within the APS exposure time are taken to form a data pair, then positive and negative event data are separated separately, the separated EVS positive and negative event data is divided into 6 equal parts according to a start time (Timestamp) and an end time (Sof), and are spliced into a matrix of 6 channels, and finally, a matrix of 12 channels (each of the positive and negative event data is 6 channels). It should be noted that, in this embodiment, the positive and negative events are separated separately, because after the positive and negative events are divided into 6 equal parts, there may be positive and negative events at a certain coordinate point in the time of each equal part, so as to avoid the cancellation of the positive and negative events, the positive and negative events are divided into different channels, and thus the integrity of the EVS data is preserved as much as possible.

Then, an APS data (shape= [1,3, h, w ]) and an EVS data (shape= [1,12, h, w ]) are sent to corresponding branches in the network, and through e_fe and i_fe, respectively, an APS feature (shape= [1,64, h/2,w/2 ]) and an EVS feature (shape= [1,64, h/2,w/2 ]) are obtained.

In some implementations of the present embodiment, the step of inputting the blurred APS image and the corresponding EVS data to the shallow feature extraction module to extract the shallow features, to obtain the APS shallow features and the EVS shallow features includes: and respectively inputting the blurred APS image and corresponding EVS data to respective shallow feature extraction modules, and extracting shallow features by using the cascaded first convolution layer, the gelu activation function layer and the second convolution layer to obtain APS shallow features and EVS shallow features.

As shown in fig. 3, a schematic structural diagram of a shallow feature extraction module provided in this embodiment is shown, where each shallow feature extraction module FE (feature extraction) includes a first convolution layer, a gel activation function layer, and a second convolution layer connected in sequence. It should be noted that, in this embodiment, the shallow layer feature extraction module based on the convolution layer has better performance in early visual processing, can guide the network to be more stably optimized, and can map the input image space to the higher-dimensional feature space more simply.

Step 202, inputting the APS shallow features and the EVS shallow features into a feature fusion module to perform feature fusion, and obtaining fusion features.

Specifically, in this embodiment, APS features and EVS features are input to a feature fusion module merge_block at the same time to perform feature fusion to obtain Merge features, so as to increase information about the occurrence position of an event. It should be understood that the feature fusion module of this embodiment may also be implemented with the same structure as the shallow feature extraction module.

Step 203, taking the fusion feature as an input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, taking the EVS shallow feature as an input of all multi-scale residual Swin transducer modules at the same time, and sequentially extracting deep features by using a plurality of multi-scale residual Swin transducer modules with different scales to obtain deep features.

Specifically, in this embodiment, the Merge feature and the EVS feature are simultaneously input into the multi-scale deep feature extraction network, where the EVS feature is repeatedly used multiple times, that is, the EVS feature is simultaneously input into a plurality of multi-scale residual switch transform modules (fusion_ms-RSTB), and the Merge feature is used in series, that is, the Merge feature is only used as the input of the first fusion_ms-RSTB of the fusion_ms-RSTB connected layer by layer.

In an optional implementation manner of this embodiment, the step of extracting deep features by sequentially using a plurality of multi-scale residual Swin Transformer modules with different scales to obtain deep features includes: the method comprises the steps that downsampling processing is respectively carried out on fusion features and EVS shallow features which are input initially by using a multi-scale residual error Swin converter module of each scale in sequence, so that corresponding downsampling features are obtained; respectively inputting the initial input feature and the downsampling feature into a residual Swin transform module of a corresponding scale, and carrying out feature extraction on the input feature by using the residual Swin transform module to obtain a first extraction feature; carrying out up-sampling processing on the first extracted features corresponding to the down-sampling features for corresponding times to obtain up-sampling features; after the up-sampling feature and the first extraction feature corresponding to the initial input feature are fused, the up-sampling feature is used as the fusion feature of the initial input of the multi-scale residual Swin converter module of the next scale; when the multi-scale residual error Swin transducer module of the last scale is processed, the fusion feature output by the multi-scale residual error Swin transducer module is used as a deep feature extracted by a multi-scale deep feature extraction network.

Fig. 4 is a schematic structural diagram of a multi-scale residual Swin Transformer module provided in this embodiment, wherein for different scales of fusion_ms-RSTB in a multi-scale deep feature extraction network, initial inputs are different scales of Merge feature and EVS feature, and for a single fusion_ms-RSTB, the multi-scale residual Swin Transformer module includes a plurality of different scales of residual Swin Transformer modules, i.e., fusion_rstb.

In this embodiment, the fusion_ms-RSTB shown in fig. 4 is taken as an example, where the merge_ms-RSTB is taken as two initial inputs of the module, the merge_feature 1 and the EVS-feature 1 are respectively obtained by downsampling the two initial inputs by one time to obtain the merge_feature 2 and the EVS-feature 2, the merge_feature 3 and the EVS-feature 3 are obtained by downsampling the two times, the merge_feature 1 and the EVS-feature 3 are obtained by processing the merge_rstb of the same scale, and the downsampled feature is obtained by downsampling the fusion_rstb by the same number of times as downsampling.

In an optional implementation manner of this embodiment, the step of extracting the features of the input feature by using the residual Swin Transformer module to obtain a first extracted feature includes: for each scale of residual Swin transducer module, taking the fusion feature vector in the input feature as the input of the first Swin transducer module, taking the EVS feature vector in the input feature as the input of all the Swin transducer modules at the same time, and sequentially carrying out feature extraction by utilizing a plurality of different scales of Swin transducer modules in the residual Swin transducer module to obtain a second extraction feature; and carrying out convolution processing on the second extracted feature by using a convolution layer, and then fusing the second extracted feature with a fused feature vector in the input feature to obtain a first extracted feature.

Fig. 5 is a schematic diagram of a residual Swin Transformer module provided in this embodiment, where the residual Swin Transformer module (fusion_rstb) includes a plurality of Swin Transformer modules (fusion_stb) connected in sequence and a convolution layer (cov) connected to the output of the last fusion_stb, the EVS feature in the initial input feature is reused, and the Merge feature string is used. And the convolutional layer output is concatenated with the Merge feature jump in the initial input feature.

In an optional implementation manner of this embodiment, the step of extracting features by sequentially using a plurality of Swin transform modules with different scales in the residual Swin transform module to obtain a second extracted feature includes: normalizing the fusion feature vector by utilizing a normalization layer for each scale of the Swin transform module in the residual Swin transform module, and inputting the normalized fusion feature vector and the EVS feature vector into a multi-head fusion attention module based on a window for feature extraction to obtain a third extraction feature; normalizing the fused feature vector obtained by fusing the third extracted feature and the EVS feature vector, inputting the normalized fused feature vector to a multi-layer perceptron for feature transformation, fusing the fused feature vector and the transformed feature vector, and outputting the fused feature vector and the transformed feature vector to a Swin transform module of the next scale; and taking the output characteristic of the last Swin transducer module as a second extracted characteristic extracted by the residual Swin transducer module.

Fig. 6 is a schematic structural diagram of a Swin Transformer module provided in this embodiment, where each fusion_stb includes two LayerNorm layers, a W-MFA (Window based Multi-head Fusion Attention) module and an MLP (Multilayer Perceptron, multi-layer perceptron) module, the W-MFA layer and the MLP layer are respectively connected to one LayerNorm layer, and an input of one LayerNorm layer is in skip connection with an output of the MSA layer, and an input of the other LayerNorm layer is in skip connection with an output of the MLP layer.

It should be noted that the swin_transducer module of this embodiment concentrates the attention mechanism in the window, which greatly reduces the calculation amount. Meanwhile, window attribute and Shift Window Attention adopted by the Swin_transform module can well process the boundary problem between windows.

In an optional implementation manner of this embodiment, the step of inputting the normalized fusion feature vector and the EVS feature vector to the window-based multi-head fusion attention module to perform feature extraction, and obtaining a third extracted feature includes: inputting the normalized fusion feature vector and the EVS feature vector to a multi-head fusion attention module based on a window, and respectively carrying out linear transformation by using a full-connection layer to obtain a first linear transformation matrix corresponding to the normalized fusion feature vector and a second linear transformation matrix corresponding to the EVS feature vector; the first linear transformation matrix comprises a first query matrix, a first key matrix and a first value matrix, and the second linear transformation matrix comprises a second query matrix and a second key matrix; performing element-by-element multiplication operation on the first query matrix and the first key matrix, performing fusion processing on the second query matrix and the second key matrix after performing element-by-element multiplication operation, and performing linear regression processing on the fusion feature vector by utilizing a softmax layer to obtain a feature vector after linear regression; and performing element-by-element multiplication operation on the feature vector subjected to linear regression and the first value matrix, and then performing linear transformation by using a full connection layer to obtain a third extracted feature.

Fig. 7 is a schematic structural diagram of a multi-head fusion attention module based on a window, where Merge feature and EVS feature are two inputs of the module, where Merge feature obtains corresponding q_m, k_m, v_m through a full connection layer, EVS feature obtains corresponding q_e, k_e through a full connection layer, where a fuzzy area of an APS image is basically a position where an event occurs, and q_e, k_e are position information for focusing more on the event occurrence; in order to better obtain the fuzzy region characteristics in the fusion characteristics, q_e and k_e are added outside q_m and k_m. In order to control the ratio of (q_m, k_m) to (q_e, k_e), this embodiment sets an adjustable parameter α for element-by-element multiplication. It should be appreciated that the fusion feature of the APS and EVS makes the model more focused on the region where the event occurs, while q_e, k_e generated by the EVS feature in the W-MFA module also has the same effect. Based on the above, the method has a better deblurring effect on the area with the APS blurring and the event generation, and has a comparable deblurring effect on the area with the APS blurring only.

And 204, overlapping the shallow features and the deep features of the APS, inputting the overlapped features and the deep features into a feature reconstruction module for feature reconstruction, and outputting a clear APS image corresponding to the blurred APS image.

Specifically, in this embodiment, the shallow feature extraction module corresponding to the APS image is connected with the multi-scale deep feature extraction network in a jump manner, allowing feature aggregation of different levels. The reconstruction module reconstructs a high-quality image by gathering shallow features and deep features, the shallow features contain low-frequency information, the deep features focus on recovering lost high-frequency information, and the embodiment directly transmits the low-frequency information to the reconstruction module, so that a multi-scale deep feature extraction network only needs to focus on the high-frequency information. The deblurring model of the embodiment can focus on the blurred regions with different degrees, and achieves better deblurring effect.

Fig. 8 is a schematic structural diagram of a feature Reconstruction module provided in this embodiment, where the feature Reconstruction module Reconstruction includes a convolution layer conv and a pixel reorganization layer PixelShuffle, pixelShuffle, which is an end-to-end learnable upsampling module, and performs pixel-level division on a low-resolution image, and then rearranges the divided pixels to form a high-resolution image.

It should be understood that, the sequence number of each step in this embodiment does not mean the order of execution of the steps, and the execution order of each step should be determined by its functions and internal logic, and should not be construed as a unique limitation on the implementation process of the embodiments of the present application.

Fig. 9 is a schematic diagram of an image deblurring device based on a multi-scale residual Swin Transformer according to an embodiment of the present application, where the image deblurring device based on the multi-scale residual Swin Transformer may be used to implement the image deblurring method based on the multi-scale residual Swin Transformer in the foregoing embodiment, and mainly includes:

the shallow feature extraction module 901 is configured to input the blurred APS image and corresponding EVS data to the shallow feature extraction module to perform shallow feature extraction respectively, so as to obtain APS shallow features and EVS shallow features;

the feature fusion module 902 is configured to input the APS shallow features and the EVS shallow features to the feature fusion module for feature fusion, so as to obtain fusion features;

the deep feature extraction module 903 is configured to take the fusion feature as an input of a first multi-scale residual Swin transform module in the multi-scale deep feature extraction network, take the EVS shallow feature as an input of all multi-scale residual Swin transform modules at the same time, and sequentially utilize a plurality of multi-scale residual Swin transform modules with different scales to extract the deep feature, so as to obtain a deep feature;

the feature reconstruction module 904 is configured to superimpose the APS shallow features and the deep features, input the superimposed APS shallow features and the superimposed APS deep features to the feature reconstruction module to perform feature reconstruction, and output a clear APS image corresponding to the blurred APS image.

In an optional implementation manner of this embodiment, the image deblurring device based on the multi-scale residual Swin Transformer further includes an event data processing module, configured to acquire all EVS data synchronously acquired within an APS exposure time corresponding to the blurred APS image; splitting all EVS data into positive polarity event data and negative polarity event data; equally dividing positive polarity event data and negative polarity event data into six equal parts respectively, and then respectively splicing the six equal parts into a six-channel matrix; and splicing the two six-channel matrixes to obtain EVS data corresponding to the fuzzy APS image.

It should be noted that, the image deblurring method based on the multi-scale residual Swin Transformer in the foregoing embodiment may be implemented based on the image deblurring device based on the multi-scale residual Swin Transformer provided in the foregoing embodiment, and those skilled in the art can clearly understand that, for convenience and brevity of description, the specific working process of the image deblurring device based on the multi-scale residual Swin Transformer described in the foregoing embodiment may refer to the corresponding working process in the foregoing method embodiment, which is not repeated herein.

Based on the technical scheme of the embodiment of the application, the fuzzy APS image and corresponding EVS data are input into a shallow feature extraction module to extract shallow features respectively; the extracted APS shallow features and the EVS shallow features are input to a feature fusion module for feature fusion; taking the fusion feature as the input of a first multi-scale residual Swin transducer module in a multi-scale deep feature extraction network, and simultaneously taking the EVS shallow features as the input of a plurality of multi-scale residual Swin transducer modules with different scales to extract deep features to obtain deep features; and (3) superposing the APS shallow layer features and the deep layer features, inputting the superposed APS shallow layer features and the superposed deep layer features into a feature reconstruction module for feature reconstruction, and outputting a clear APS image corresponding to the fuzzy APS image. According to the scheme, the high dynamic characteristics of the event data and the global information capturing capability of the multi-scale residual error Swin converter model are fully utilized, the deblurring effect of an APS image is improved, and the image quality enhancement requirement in an actual application scene can be effectively met.

Fig. 10 is an electronic device according to an embodiment of the present application. The electronic device may be used to implement the image deblurring method based on the multi-scale residual Swin Transformer in the foregoing embodiment, and mainly includes: the image deblurring method based on the multi-scale residual Swin transducer in the foregoing embodiment is implemented when the processor 1002 executes the computer program 1003. Wherein the number of processors 1002 may be one or more.

The memory 1001 may be a high-speed random access memory (RAM, random Access Memory) memory, or may be a non-volatile memory (non-volatile memory), such as a disk memory. The memory 1001 is for storing executable program codes, and the processor 1002 is coupled to the memory 1001.

Further, the embodiment of the application further provides a computer readable storage medium, which may be provided in the electronic device in each embodiment, and the computer readable storage medium may be a memory in the embodiment shown in fig. 10.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the image deblurring method based on the multi-scale residual Swin Transformer in the foregoing embodiment. Further, the computer-readable medium may be any medium capable of storing a program code, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.

It should be understood that the apparatus and method disclosed in accordance with the embodiments provided herein may be implemented in any other equivalent manner. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a readable storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned readable storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The foregoing is a description of the image deblurring method based on the multi-scale residual Swin Transformer and related products provided in the present application, and it is understood by those skilled in the art that the present application is not limited to this description, since the specific implementation and application range may vary according to the concepts of the embodiments of the present application.

Claims

1. An image deblurring method based on a multi-scale residual Swin Transformer, comprising the steps of:

2. The image deblurring method according to claim 1, wherein before the step of inputting the blurred APS image and the corresponding EVS data to the shallow feature extraction module for shallow feature extraction, respectively, the method further comprises:

acquiring all EVS data synchronously acquired within the APS exposure time corresponding to the blurred APS image;

splitting all EVS data into positive polarity event data and negative polarity event data;

equally dividing the positive polarity event data and the negative polarity event data into six equal parts respectively, and then respectively splicing the six equal parts into a six-channel matrix;

and splicing the two six-channel matrixes to obtain EVS data corresponding to the fuzzy APS image.

3. The image deblurring method according to claim 1, wherein the step of extracting deep features by sequentially using a plurality of multi-scale residual Swin Transformer modules with different scales to obtain deep features includes:

the method comprises the steps that downsampling processing is respectively carried out on fusion features and EVS shallow features which are input initially by using a multi-scale residual error Swin converter module of each scale in sequence, so that corresponding downsampling features are obtained;

respectively inputting the initial input feature and the downsampling feature into a residual Swin transform module of a corresponding scale, and carrying out feature extraction on the input feature by using the residual Swin transform module to obtain a first extracted feature;

performing up-sampling processing on the first extracted features corresponding to the down-sampling features for corresponding times to obtain up-sampling features;

after the up-sampling feature and the first extraction feature corresponding to the initial input feature are fused, the up-sampling feature is used as a fusion feature which is initially input by a multi-scale residual Swin transducer module of the next scale;

and when the multi-scale residual error Swin transducer module of the last scale finishes processing, taking the fusion characteristic output by the multi-scale residual error Swin transducer module as the deep characteristic extracted by the multi-scale deep characteristic extraction network.

4. The image deblurring method according to claim 3, wherein the step of extracting features of the input features by using the residual Swin Transformer module to obtain first extracted features includes:

for each scale residual Swin transducer module, taking a fusion feature vector in an input feature as the input of a first Swin transducer module, taking EVS feature vectors in the input feature as the input of all Swin transducer modules at the same time, and sequentially carrying out feature extraction by utilizing a plurality of Swin transducer modules with different scales in the residual Swin transducer module to obtain a second extraction feature;

and carrying out convolution processing on the second extracted feature by using a convolution layer, and then fusing the second extracted feature with a fused feature vector in the input feature to obtain a first extracted feature.

5. The image deblurring method according to claim 4, wherein the step of sequentially extracting features by using a plurality of Swin transform modules with different scales in the residual Swin transform module to obtain second extracted features includes:

normalizing the fusion feature vector by utilizing a normalization layer for each scale of the Swin transducer module in the residual Swin transducer module, and inputting the normalized fusion feature vector and the EVS feature vector to a multi-head fusion attention module based on a window for feature extraction to obtain a third extraction feature;

normalizing the fusion feature vector obtained by fusing the third extracted feature and the EVS feature vector, inputting the normalized fusion feature vector into a multi-layer perceptron to perform feature transformation, fusing the fusion feature vector and the transformed feature vector, and outputting the fused feature vector and the transformed feature vector to a Swin transform module of the next scale;

taking the output characteristic of the last Swin converter module as a second extracted characteristic extracted by the residual Swin converter module.

6. The image deblurring method according to claim 5, wherein the step of inputting the normalized fusion feature vector and the EVS feature vector to a window-based multi-head fusion attention module for feature extraction, to obtain a third extracted feature, includes:

inputting the normalized fusion feature vector and the EVS feature vector to a window-based multi-head fusion attention module, and respectively carrying out linear transformation by using a full-connection layer to obtain a first linear transformation matrix corresponding to the normalized fusion feature vector and a second linear transformation matrix corresponding to the EVS feature vector; the first linear transformation matrix comprises a first query matrix, a first key matrix and a first value matrix, and the second linear transformation matrix comprises a second query matrix and a second key matrix;

performing element-by-element multiplication operation on the first query matrix and the first key matrix, performing fusion processing after performing element-by-element multiplication operation on the second query matrix and the second key matrix, and performing linear regression processing on the fusion feature vector by utilizing a softmax layer to obtain a feature vector after linear regression;

and performing element-by-element multiplication operation on the feature vector subjected to linear regression and the first value matrix, and then performing linear transformation by using a full-connection layer to obtain a third extracted feature.

7. The image deblurring method according to any one of claims 1 to 6, wherein the step of inputting the blurred APS image and the corresponding EVS data to the shallow feature extraction module to perform shallow feature extraction, respectively, to obtain APS shallow features and EVS shallow features includes:

and respectively inputting the blurred APS image and corresponding EVS data to respective shallow feature extraction modules, and extracting shallow features by using the cascaded first convolution layer, the gelu activation function layer and the second convolution layer to obtain APS shallow features and EVS shallow features.

8. An image deblurring device based on a multi-scale residual Swin Transformer, comprising:

9. An electronic device comprising a memory and a processor, wherein:

the processor is used for executing the computer program stored on the memory;

the processor, when executing the computer program, implements the steps of the multi-scale residual Swin transform-based image deblurring method according to any of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the steps of the multi-scale residual Swin Transformer based image deblurring method according to any of claims 1 to 7.