CN117726540A

CN117726540A - Image denoising method for enhanced gate control converter

Info

Publication number: CN117726540A
Application number: CN202311780241.8A
Authority: CN
Inventors: 张�杰; 黄雯潇; 王延峰; 陈宜滨; 张焕龙; 张雷; 李林伟; 王凤仙
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-19

Abstract

The invention provides an image denoising method for enhancing a gate control converter, which comprises the following steps: preprocessing all images in the data set, and generating an original-noise image pair as a training set; taking a U-shaped structure as a main body framework, taking a transform block and up-down sampling as coding and decoding structures, and connecting the transform block through a skip layer to form an image denoising model; inputting the noisy images in the training set into an image denoising model for learning and training to obtain model weights with different noise levels; inputting the image with noise into a trained image denoising model, and processing by using the model weight of the corresponding image denoising model to obtain a denoised image. The invention has excellent performance in the high noise level synthesis denoising task, can recover more detailed information in the real denoising task, and effectively solves the problem of noise influence in the processes of sensor acquisition, image transmission and processing.

Description

Image denoising method for enhanced gate control converter

Technical Field

The invention relates to the technical field of image denoising, in particular to an image denoising method for enhancing a gate control transducer, which realizes high-quality reconstruction of a high-noise image.

Background

In the formation and transmission of an actual optical image, the image is inevitably contaminated with various types of noise, including impulse noise, gaussian noise, poisson noise, etc., due to the presence of various interference factors, resulting in a large amount of noise information in the obtained image. Therefore, how to obtain a high quality reconstructed image from an image containing random noise is an important issue of research.

The conventional image denoising method uses prior image information to solve the degradation problem. However, these methods often rely heavily on manually selected features, which is time consuming and of limited practical utility in real-world scenarios. In addition, the conventional method is limited in its effectiveness in a high noise environment because of the difficulty in capturing advanced features and structures in high noise images. In contrast, deep learning-based methods exhibit significant performance in image denoising tasks by automatically learning efficient feature representations from data. Some convolution algorithms have achieved significant success in various computer vision tasks, enabling the learning of hierarchical representations of images, effectively capturing low-level and high-level features. However, convolution ignores the problem of local-global correlation during image denoising, resulting in loss of valuable information. In recent years, the Transformer architecture originally used for natural language processing tasks (NLP) has also been successfully applied to computer vision tasks, and by using a self-attention mechanism, the Transformer network can capture global and local dependency relationships in an image, so that image details and structures can be better restored, which brings new potential for image denoising tasks.

The Transformer network adopts a hierarchical structure and a multi-head attention mechanism, and can learn characteristic representations of different scales at the same time, so that the details and the structure of the image can be better recovered. In addition, the Transformer network can further increase the depth and complexity of the network by stacking more layers and headers, further improving the modeling capability of the model. However, the computational complexity of the transducer network increases quadratically with spatial resolution, and is not suitable for high-resolution high-noise image restoration tasks. And the core idea of the attention mechanism is to focus attention on pixels that have an important impact on the reconstructed image, while weakening pixels that do not contribute to the reconstructed image, but low-weight tokens are always present, which may have an adverse impact in the denoising process.

The invention patent with application number 202310492058.1 discloses a real image self-supervision denoising method, firstly, a densely sampled plaque mask convolution process is introduced to local information, and more adjacent pixels are taken into consideration based on priori statistics of real noise space correlation, so that a network has a denser receptive field, and the network can recover a more detailed structure; the introduction of an extended transducer for global information can better exploit long-range interactions, enabling full exploitation of local and long-range information, respectively. The invention can complete the denoising process without the original image, remarkably improves the quality of the existing self-supervision real image denoising method, and is suitable for various application fields such as deep sea detection, near-earth detection and the like under the low illumination condition. However, the use of large convolution kernels in the above inventive densely sampled plaque mask convolution makes the model difficult to train or training for too long, and the hole convolution in the transducer feed-forward network increases the computational burden.

Disclosure of Invention

Aiming at the technical problems that the computational complexity of the existing high-resolution image denoising is too high and some detail information is excessively smoothed, the invention provides an image denoising method for enhancing a gate control transducer, which can effectively relieve the computational complexity and effectively reserve the edge detail characteristics of an image.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows: an image denoising method for enhancing a gating transducer comprises the following steps:

preprocessing all images in a data set, and generating an original-noise image pair as a training set;

step two: taking a U-shaped structure as a main body framework, taking a transform block and up-down sampling as coding and decoding structures, and connecting the transform block through a skip layer to form an image denoising model;

step three: inputting the noisy images in the training set into an image denoising model for learning and training to obtain model weights with different noise levels;

step four: inputting the image with noise into a trained image denoising model, and processing by using the model weight of the corresponding image denoising model to obtain a denoising reconstructed image.

Preferably, the pretreatment method in the first step is as follows: for the synthetic noise denoising task, all images in the dataset are cut into images with the size of 512×512, and overlapping pixels are 96 until the images are completely cut; then the image is rotated by 90 degrees, 180 degrees or 270 degrees for expansion, gaussian noise is added to generate an original-noise image pair, and the original-noise image pair is used as a training set; for a real noise denoising task, all images in a data set are randomly cut into images with the size of 256 multiplied by 256, and then the images are rotated by 90 degrees, 180 degrees or 270 degrees for expansion, so that an original-noise image pair is generated and used as a training set.

Preferably, the image denoising model is a U-shaped structure formed by connecting an encoder, a decoder and a layer jump, and is divided into three parts: shallow layer feature extraction, deep layer feature extraction and image reconstruction; firstly, an input image passes through a 3X 3 convolution layer expansion channel to extract shallow features; then, combining the transform block with up-sampling and down-sampling to form a coder-decoder, and extracting depth features through layer jump connection; finally, the number of channels is restored to the original number using a 3×3 convolution layer, resulting in a reconstructed image.

Preferably, the codec structure comprises k=4 stages, i.e. 4 coding blocks, 4 decoding blocks and one bottom module, each block containing 2 Transformer blocks; the transform block in each coding block is followed by a 4 x 4 convolution with a step size of 2 to perform downsampling, which reduces the size of the feature map by half and doubles the number of channels; the transform block is preceded in each decoding block by upsampling using a transpose convolution with a convolution kernel of 2 x 2 and a step size of 2, doubling the size of the feature map and halving the number of channels.

Preferably, each transducer block comprises a deformable convolution module, an enhanced multi-head self-attention module and a feedforward network with a gating strategy, and the deformable convolution module and the enhanced multi-head self-attention module and the feedforward network with the gating strategy are subjected to feature fusion by multiplying parallel connection feature graphs.

Preferably, the deformable convolution module is divided into 3 stages after layer normalization LayerNorm, and the first stage uses convolution kernel to initially extract texture features of an image for 3×3 deformable convolution; the second stage further explores image features and captures deeper information for 3 x 3 common convolutions using convolution kernels; the final stage uses another convolution kernel which is the same as the first stage to be 3×3 deformable convolution, and further extracts texture features of the image; adding the input features before layer normalization LayerNorm and the features processed in the final stage to obtain enhanced local features; each stage is followed by a GELU activation function.

Preferably, the multi-head window self-attention module combines the multi-head window self-attention module with the edge-enhanced convolution block activated by the GELU through layer normalization LayerNorm and residual structure, the layer normalization LayerNorm is arranged at the front side of the multi-head window self-attention module and the edge-enhanced convolution block, and features obtained by the multi-head window self-attention module and the edge-enhanced convolution block are directly added to obtain the edge-enhanced feature information; the edge enhancement convolution block executes convolution operation with a convolution kernel of 3 multiplied by 3 on the input feature map to obtain feature mapping, calculates an absolute difference value between an input value and an average pooling result, and carries out convolution operation with the convolution kernel of 3 multiplied by 3 on the absolute difference value to obtain an edge feature map; and adding the input feature map and the edge feature map, and obtaining a final output through a GELU activation function.

Preferably, the feedforward network with gating strategy normalizes LayerNorm, 1×1 convolution through layers, then performs gating selection through a local sensing unit and a GELU activation function and a feature map of the depth convolution at the same time, and finally restores the original channel number through convolution with a convolution kernel of 1×1.

Preferably, the function of the local perceptual unit is: LPU (X) =dwconv (X) +x; wherein X is an input feature, and DWConv is a depth convolution;

the multi-head window self-attention module comprises 8 heads, after normalization processing is carried out on input, a normalization result is divided into non-overlapping areas by using non-overlapping local windows, linear transformation is carried out on the characteristics of each non-overlapping area to obtain a Q matrix, a K matrix and a V matrix, and self-attention calculation is carried out on each local window respectively as follows:

wherein SoftMax represents a gradient log normalization function of a finite term discrete probability distribution, d _k Representing the dimensions of the Q matrix and the K matrix; b is a relative position deviation matrix;

the gel activation function is:

where x represents the input feature.

Preferably, for denoising synthetic noise images, respectively adding Gaussian noise in each noise level range to images of a training set, and then inputting the images into an image denoising model for learning and training respectively to obtain trained image denoising models with different noise levels; for denoising a real noise image, directly inputting a noisy image into an image denoising model for learning and training to obtain the real noise image denoising model;

adding different grades of Gaussian noise which accords with normal random distribution to the image to obtain noisy images with different noisy levels, and inputting the noisy images as a training set into an image denoising model to perform batch training;

in the image denoising model training process, a Charbonnier loss function is adopted for training, wherein the Charbonnier loss function is as follows:

wherein I' represents the denoised image,representing an original image, I representing a noisy image, e being an empirical value;

in the training process, an AdamW optimizer is adopted for an image denoising model;

for the experiment of the synthetic noise, a stage learning rate attenuation strategy is adopted, the learning rate is initialized to be 1e-4, and each 50 epochs are attenuated by 0.5 times, and finally, the learning rate is attenuated to be 1e-9; for the experiment of real noise, training is carried out by using a cosine annealing strategy, the initial learning rate is 2e-4, and the initial learning rate is attenuated to 1e-6; batch size was set to 16;

measuring the noise standard deviation of the noisy image, and inputting the noisy image into an image denoising model of a noise level matched with the noise standard deviation; and loading the parameters of the saved model weights, and directly reconstructing and outputting the denoised picture.

Compared with the prior art, the invention has the beneficial effects that: data preprocessing, generating an image pair: preprocessing all pictures in the data set to generate image pairs as training sets; the denoising model takes a U-shaped structure as a main framework, takes a transform block and up-down sampling as a coder-decoder, and forms an integral network model through layer jump connection; training a network model: inputting the noisy image into a network model for learning and training to obtain model weights with different noise levels; and inputting the noisy picture into a trained image denoising model, and predicting by using the corresponding model weight to obtain a reconstructed image. The multi-head self-attention enhancing module combines the edge enhancing mechanism with the multi-head window self-attention mechanism, so that the edge area can better retain detail information while the computational complexity is reduced; the feedforward network with the gating strategy is utilized to realize the adjustment of pixel weight, so that the attention to local effective pixels is more prominent, and low-weight pixels which are not helpful to reconstructing an image can be restrained; the context information interaction is enhanced through deformable convolution, so that texture change in an image can be well adapted, and the problem that the image subjected to transform denoising still has some detail information and is excessively smooth is effectively solved. When the method is used for denoising the high-noise image, the representation result is more outstanding, the high-quality image can be obtained, and the denoising problem of the high-noise image can be effectively solved; the method has excellent performance in a high-noise-level synthetic denoising task, can recover more detailed information in the real denoising task, and effectively solves the problem of noise introduced in the processes of sensor acquisition, image transmission and processing.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a block diagram of a denoising model of the present invention.

Fig. 3 is a block diagram of a deformable convolution module (DFCB) in accordance with the present invention.

Fig. 4 is a block diagram of an enhanced multi-headed self-attention module (EMWSA) in accordance with the present invention.

Fig. 5 is a block diagram of a feed forward network (LGFN) with gating strategy in the present invention.

FIG. 6 shows the results of an image test set with Gaussian white noise according to the present invention, where (a) is the original image, (b) is the noisy image, and (c) is the denoised image.

Fig. 7 shows the results of a test set of true noise images according to the present invention, where (a) is the original image, (b) is the noise image, and (c) is the denoised image.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

The hardware environment for implementation of the invention is as follows: CPU, intel (R) Core (TM) i9-12900H; GPU RTX 4080ti; the RAM is 64GB; the software environment running is: pyCharm Integrated Environment and Windows 11.

As shown in FIG. 1, the method for denoising the enhanced gated transducer image specifically comprises two parts of denoising network model construction and denoising noise by using model weights. The idea of the invention is that: (1) The deformable convolution module is provided, and can be used in different stages by combining deformable convolution and common convolution, so that texture change can be effectively captured, and deeper characteristic information in an image can be extracted. (2) The multi-head self-attention enhancing module is provided, so that the computational complexity is relieved, and more edge details can be extracted. (3) A feed-forward network with gating strategies is presented that enables better utilization of context information, and thus extraction of more texture details. The basic flow of this embodiment is shown in fig. 1, and the steps include:

step one, data preprocessing, namely generating an image pair: all images in the dataset are preprocessed and an original-noise image pair is generated as a training set.

For the synthetic noise denoising task, all pictures in DIV2K, flickr2K, WED, BSD400 are used as a data set, all images in the data set are cut into images with the size of 512×512, overlapping pixels are 96 until the images are completely cut, the images are rotated by 90 degrees, 180 degrees or 270 degrees for expansion, gaussian noise is added to generate an original-noise image pair, and the original-noise image pair is used as a training set. For a real noise denoising task, taking all images in SIDD medium-sized data set as a training set, randomly cutting the images into 256×256 images, and then rotating the images by 90 degrees, 180 degrees or 270 degrees for expansion to generate an original-noise image pair as the training set. For synthetic noise datasets, due to the large number of image types, images cropped to a size of 512 x 512 can effectively reduce training time if adequately trained. The data enhancement strategy expands the number of pictures, so that over fitting can be avoided, and the robustness of the model is improved.

Step two: constructing an image denoising model: the U-shaped structure is used as a main framework, the transform block and the up-down sampling are used as encoding and decoding structures, and the transform block is connected through a skip layer to form an image denoising model.

The structure of the image denoising model is shown in fig. 2, and is a U-shaped structure formed by connecting an encoder, a decoder and a layer jump, and the structure is divided into three parts: shallow feature extraction, deep feature extraction and image reconstruction. Firstly, an input image passes through a 3X 3 convolution layer expansion channel to extract shallow features; combining the transform block with up-sampling and down-sampling to form a coder-decoder, and extracting depth features through layer jump connection; finally, the number of channels is restored to the original number using a 3×3 convolution layer, resulting in a reconstructed image.

As shown in fig. 2, the codec structure includes k=4 stages, i.e., 4 encoded blocks, 4 decoded blocks, and one bottom module. Wherein each block contains l=2 transducer blocks; the transform block in each coding block is followed by a 4 x 4 convolution with a step size of 2 to perform downsampling, which can reduce the size of the feature map by half and double the number of channels; the transform block in each decoding block is preceded by up-sampling by a transpose convolution with a convolution kernel of 2 x 2, with a step size of 2, which doubles the size of the feature map and reduces the number of channels by half. Downsampling the image may make model calculations more efficient, increasing the number of channels to extract more image detail information. Each transducer block comprises a deformable convolution module, an enhanced multi-head self-attention module and a feedforward network with a gating strategy, and the deformable convolution module and the enhanced multi-head self-attention module and the feedforward network with the gating strategy are multiplied by parallel connection feature graphs to perform feature fusion. By multiplying the two feature maps element by element, the image denoising model can concentrate on the key features with higher response in the two feature maps, and the perception capability of the model on important information can be improved.

As shown in fig. 3, the deformable convolution module first goes through layer normalization LayerNorm and then subdivides it into 3 stages, where the first stage uses a convolution kernel of 3×3 deformable convolution to initially extract the texture features of the image. The second stage is a convolution kernel that is a 3 x 3 common convolution, further exploring image features and capturing deeper information. The final stage uses another convolution kernel, 3 x 3 deformable convolution, similar to the first stage, that can further extract the texture features of the image. The layer normalized LayerNorm can stabilize the forward input distribution and accelerate convergence. By combining deformable convolution with normal convolution, texture variations can be effectively captured and more extensive feature information in the image can be extracted. Each phase is followed by a GELU activation function to ensure that the valid information for each phase is propagated. In fig. 3, the input features before layer normalization LayerNorm and the features processed in the final stage are added to obtain enhanced local features, so that the generalization capability of the model is enhanced.

As shown in FIG. 4, the multi-head window self-attention enhancement module combines the multi-head window self-attention module with the edge enhancement convolution block activated by GELU through layer normalization LayerNorm and residual structure, thereby being beneficial to improving the perception of an image denoising model on a target edge and further enhancing the visibility of the image denoising model. The layer normalization LayerNorm is arranged on the front side of the multi-head window self-attention module and the edge enhancement convolution block, stabilizes forward input distribution, and obtains features by directly adding the multi-head window self-attention module and the edge enhancement convolution block to obtain edge enhancement feature information. The edge-enhanced convolution block includes first performing a convolution operation with a convolution kernel of 3 x 3 on an input feature map to obtain a feature map; then, calculating an absolute difference value of the input value and the average pooling result, and performing convolution operation with a convolution kernel of 3×3 on the absolute difference value to obtain an edge feature map; and finally, adding the input feature images and the edge feature images, and obtaining final output through a GELU activation function.

The multi-head window self-attention module comprises 8 heads, and given an input X, after normalization processing is carried out on the input, a non-overlapping local window is used for dividing a normalization result into non-overlapping areas with the size of M multiplied by M. The larger the value of M is, the higher the model calculation complexity is, the stronger the generalization capability is, and the calculation complexity and the model generalization capability are comprehensively considered, so that m=16 is taken. And (3) carrying out linear transformation on each non-overlapping feature to obtain corresponding Q (query), K (key) and V (value) matrixes. These linear transformations map the input features to a lower dimensional space for subsequent self-attention calculations. Each local window performs a self-attention calculation separately. The self-attention calculation is formulated as:

wherein SoftMax represents a gradient log normalization function of a finite term discrete probability distribution, d _k Representing the dimensions of the Q matrix and the K matrix; b is a relative position deviation matrix, the value of which is taken from B epsilon R ^{(2M-1)×(2M-1)} 。

The gel activation function is:

where x represents the input feature. The edge enhancement convolution block obtains a final output through the GELU activation function, and the output characteristics of the edge enhancement convolution block are used as the input of the GELU activation function to be activated.

The derivative of the GELU activation function is continuous, which allows for easier propagation of gradients when training deep neural networks, avoiding the problem of discontinuous derivative at x=0, thus reducing the problem of gradient disappearance during training.

As shown in fig. 5, the feedforward network with gating strategy performs gating selection by convolution of LayerNorm and 1×1, then by LPU (local sensing unit LPU) and GELU activation function and simultaneously by feature map of depth convolution DWConv, and finally restores the original channel number by convolution of convolution kernel 1×1. Gating mechanisms may help models better control information flow, selectively pass and filter features through activation functions, and thus more flexibly control the expression of local features. A convolution of 1 x 1 increases the number of reduction channels in order to extract more feature information. The LPU is used for extracting the local characteristic information, so that the capability of the feedforward network for sensing the relation between adjacent pixels can be enhanced, and the local characteristic information can be better utilized. LPU is expressed as:

LPU(X)＝DWConv(X)+X

step three: training the image denoising model by using a training set: and (3) inputting the noisy images in the training set into the image denoising model in the second step for learning and training to obtain model weights with different noise levels.

And (3) denoising the synthesized noise image, respectively adding Gaussian noise in each noise level range to the images of the training set, and then inputting the images into the image denoising model to respectively learn and train to obtain trained image denoising models with different noise levels. And (3) for denoising the real noise image, directly inputting the noisy image into an image denoising model for learning and training to obtain the real noise image denoising model.

And adding defined Gaussian noise which accords with normal random distribution at different levels to the image to obtain noisy images with different noisy levels, and inputting the noisy images as a training set into an image denoising model to perform batch training. The gaussian noise function formula is:

where x represents the gray value of the image,representing the mean value, sigma, of the gray values x ² Representing the variance of the gray value x.

The Charbonnier loss function is adopted for training in the image denoising model training process, and the Charbonnier loss function shows stronger robustness and can effectively process abnormal values and improve the model performance. The Charbonnier loss function formula is as follows:

wherein I' represents the denoised image,representing the original image, I representing the noisy image, e being an empirical value, the value of which is taken as e=10 ^-3 。

In the training process, an AdamW optimizer is adopted for an image denoising model, and momentum items are respectively set to 0.9 and 0.999. For the experiment of the synthetic noise, a stage learning rate attenuation strategy is adopted, the learning rate is initialized to be 1e-4, and the attenuation of each 50 epochs is 0.5 times, and finally the attenuation is 1e-9. For the experiment of real noise, training is carried out by using a cosine annealing strategy, the initial learning rate is 2e-4, and the initial learning rate is attenuated to 1e-6. The batch size is set to 16, so that the model can be ensured to achieve the best performance while being trained stably.

The noise standard deviation of the noisy image is measured, and then the noisy image is input into an image denoising model of a noise level matched with the noise standard deviation. When the image denoising model is applied, the input picture is not cut and preprocessed, the noisy picture can be directly input into the trained image denoising model, the parameters of the saved model weight are loaded, and the model can directly reconstruct and output the denoised picture.

According to the method and the specific implementation steps, the effectiveness of the invention is verified through experiments.

The experimental parameters and training set adopted in the experiment of the invention are shown in the specific steps, and CBSD68, kodak24, mcMaster and Urban100 data sets are used as test sets for synthetic noise; for real noise, the SIDD test set and DND test set were used for testing.

The experimental parameters and training set adopted in the experiment of the invention are shown in the specific steps, the set12 data set is adopted as the test set, and the performance of the invention is evaluated and tested through objective evaluation indexes PSNR and SSIM. PSNR is used for measuring the denoising effect of an image denoising model, the higher the PSNR value is, the better the denoising effect is, SSIM is used for measuring the similarity between two images, the maximum value of SSIM is 1, the higher the value is, the higher the similarity between the two images is, and PSNR and SSIM formulas are as follows:

wherein MSE represents the mean square error, MAX, between the original image and the reconstructed image _I Represents the maximum pixel value, mu, possible for the original image _x Is the mean value, mu, of the image x _y Is the average value of the image y,is the variance of image x, +.>Is the variance, sigma, of the image y _xy Is the covariance of image x and image y, c ₁ And c ₂ Is a constant that maintains stability.

Analysis of experimental results: for synthetic noise, gaussian noise with noise standard deviation of 15, 25 and 50 is respectively added in four test sets of CBSD68, kodak24, mcMaster and Urban100, noise-free images are predicted by using pre-trained model weights with different noise levels, PSNR and SSIM values of the images are obtained, table 1 is obtained, and FIG. 6 shows the denoising effect of partial pictures in the four test sets. For real noise, training is directly performed by using the SIDD medium-sized data set, after training is completed, the real noise model weight is utilized to denoise the real noise image test set, PSNR and SSIM values are obtained, data in table 2 are obtained, and the denoising result of the real noise image is shown in fig. 7.

As can be seen from the PSNR and SSIM values in tables 1 and 2 and visual evaluation by human vision in FIGS. 6-7, the invention has good denoising effect under different noise standard deviations, and related detail features and edge features are effectively reserved, so that the restored image has relatively good visual effect.

TABLE 1 synthetic noise image denoising results

TABLE 2 real noise image denoising result comparison

	DnCNN	SwinIR	The invention is that
				PSNR(dB)	23.66	39.77	39.87
SSIM	0.583	0.958	0.960

Wherein DnCNN is a classical convolution network denoising algorithm, and SwinIR is a transform image restoration algorithm.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The image denoising method for enhancing the gating transducer is characterized by comprising the following steps of:

2. The method for denoising an image of an enhanced gated fransformer according to claim 1, wherein the preprocessing in the step one is as follows: for the synthetic noise denoising task, all images in the dataset are cut into images with the size of 512×512, and overlapping pixels are 96 until the images are completely cut; then the image is rotated by 90 degrees, 180 degrees or 270 degrees for expansion, gaussian noise is added to generate an original-noise image pair, and the original-noise image pair is used as a training set; for a real noise denoising task, all images in a data set are randomly cut into images with the size of 256 multiplied by 256, and then the images are rotated by 90 degrees, 180 degrees or 270 degrees for expansion, so that an original-noise image pair is generated and used as a training set.

3. The method for denoising the image of the enhanced gated fransformer according to claim 1 or 2, wherein the image denoising model is a U-shaped structure consisting of an encoder, a decoder and a layer jump connection, and is divided into three parts: shallow layer feature extraction, deep layer feature extraction and image reconstruction; firstly, an input image passes through a 3X 3 convolution layer expansion channel to extract shallow features; then, combining the transform block with up-sampling and down-sampling to form a coder-decoder, and extracting depth features through layer jump connection; finally, the number of channels is restored to the original number using a 3×3 convolution layer, resulting in a reconstructed image.

4. The method of claim 3, wherein the codec structure comprises k=4 stages, i.e. 4 encoded blocks, 4 decoded blocks and a bottom block, each block containing 2 Transformer blocks; the transform block in each coding block is followed by a 4 x 4 convolution with a step size of 2 to perform downsampling, which reduces the size of the feature map by half and doubles the number of channels; the transform block is preceded in each decoding block by upsampling using a transpose convolution with a convolution kernel of 2 x 2 and a step size of 2, doubling the size of the feature map and halving the number of channels.

5. The method of claim 4, wherein each transducer block comprises a deformable convolution module, an enhanced multi-headed self-attention module, and a feed-forward network with gating strategy, and wherein the deformable convolution module and the enhanced multi-headed self-attention module, the feed-forward network with gating strategy perform feature fusion by multiplying parallel connection feature graphs.

6. The method for denoising the image of the enhanced gated fransformer according to claim 5, wherein the deformable convolution module is subdivided into 3 stages after layer normalization LayerNorm, and the first stage uses a convolution kernel to initially extract texture features of the image for 3×3 deformable convolution; the second stage further explores image features and captures deeper information for 3 x 3 common convolutions using convolution kernels; the final stage uses another convolution kernel which is the same as the first stage to be 3×3 deformable convolution, and further extracts texture features of the image; adding the input features before layer normalization LayerNorm and the features processed in the final stage to obtain enhanced local features; each stage is followed by a GELU activation function.

7. The method for denoising the image of the enhanced gated fransformer according to claim 5, wherein the enhanced multi-head self-attention module combines a multi-head window self-attention module with a GELU activated edge enhanced convolution block through a layer normalized LayerNorm and a residual structure, the layer normalized LayerNorm is arranged at the front side of the multi-head window self-attention module and the edge enhanced convolution block, and features obtained by the multi-head window self-attention module and the edge enhanced convolution block are directly added to obtain edge enhanced feature information; the edge enhancement convolution block executes convolution operation with a convolution kernel of 3 multiplied by 3 on the input feature map to obtain feature mapping, calculates an absolute difference value between an input value and an average pooling result, and carries out convolution operation with the convolution kernel of 3 multiplied by 3 on the absolute difference value to obtain an edge feature map; and adding the input feature map and the edge feature map, and obtaining a final output through a GELU activation function.

8. The method for denoising an image according to claim 5, wherein the feedforward network with gating strategy is characterized in that the convolution of LayerNorm and 1×1 is normalized by a layer, then gating selection is performed by a local sensing unit and a GELU activation function and simultaneously by a feature map of depth convolution, and finally the original channel number is recovered by convolution with a convolution kernel of 1×1.

9. The method of image denoising for enhanced gated fransformer according to claim 8, wherein the function of the local perceptual unit is: LPU (X) =dwconv (X) +x; wherein X is an input feature, and DWConv is a depth convolution;

the gel activation function is:

where x represents the input feature.

10. The method for denoising the image of the enhanced gate control converter according to claim 9, wherein for denoising the synthesized noise image, the images of the training set are respectively added with Gaussian noise in each noise level range, and then the images are input into the image denoising model for respectively learning and training, so as to obtain trained image denoising models with different noise levels; for denoising a real noise image, directly inputting a noisy image into an image denoising model for learning and training to obtain the real noise image denoising model;