CN116416156A

CN116416156A - Swin transducer-based medical image denoising method

Info

Publication number: CN116416156A
Application number: CN202310246661.1A
Authority: CN
Inventors: 苏进; 李学俊; 王华彬; 张弓
Original assignee: China Canada Institute Of Health Engineering Hefei Co ltd
Current assignee: China Canada Institute Of Health Engineering Hefei Co ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-07-11

Abstract

The invention discloses a Swin transform-based medical image denoising method, and belongs to the technical field of image processing. The invention comprises the following steps: step one, obtaining a noise medical image and a clean medical image as a training set and a testing set; step two, adding a Swin transducer to the neural network to design an RSTB block; training a network by adopting an Adam algorithm, and constructing a medical image denoising network; inputting the noise medical image into a network to obtain a denoising result; and fifthly, evaluating the network by using the image quality evaluation index. According to the invention, the Swin transform module is used as a main body of the network structure, so that the texture details can be refined by using the global information of the picture; the use of downsampling increases the receptive field, allows the convolution of 3*3 to perform feature extraction over a larger image range, and the model can better reconstruct the details and texture of the image.

Description

Swin transducer-based medical image denoising method

Technical Field

The invention relates to the technical field of image processing, in particular to a Swin transducer-based medical image denoising method.

Background

Image restoration is a long-standing problem of low-level vision, whose purpose is to restore high-quality, noiseless images from low-quality images, such as reduced, noisy, and compressed images. Advanced image restoration methods are based on convolutional neural networks, but cannot solve the problem of long-distance dependence of information due to the limitation of local modeling.

Most CNN-based approaches focus on complex architectural designs such as residual learning and dense connections. Although there is a significant improvement in performance over traditional model-based approaches, two fundamental problems are often faced, both of which stem from the underlying convolutional layers. First, the interaction between the image and the convolution kernel is content independent, and using the same convolution kernel to recover different image regions may not be the best choice. Second, convolution is ineffective for long-term dependency modeling under the principle of local processing.

The transfomer devised a self-attention mechanism to capture global interactions between contexts and to show good performance on several visual problems. However, visual transducers for image restoration typically divide an input image into fixed-size patches (e.g., 48 x 48) and process each patch independently. This strategy inevitably brings about two disadvantages. First, the boundary pixels cannot perform image restoration using neighboring pixels outside the patch. Second, the restored image may introduce boundary artifacts around each patch. Although this problem can be alleviated by patch overlap, it can impose additional computational burden and computational effort.

After searching, chinese patent number CN114140353A, the patent name is: channel attention-based Swin-transducer image denoising method and system; according to the application, a noise image is input into a denoising network model after training and optimization, a shallow feature extraction network in the denoising network model firstly extracts shallow feature information such as noise and channels of the noise image, then the extracted shallow feature information is input into a deep feature extraction network in the denoising network model to obtain deep feature information, and then the shallow feature information and the deep feature information are input into a reconstruction network of the denoising network model to perform feature fusion, so that a pure image can be obtained; but this application differs from the present patent in the idea of using Swin transducer for medical image denoising.

Disclosure of Invention

1. Technical problem to be solved by the invention

In view of the shortcomings of the prior art, the invention provides a medical image denoising method based on a Swin transducer, and provides a strong baseline image restoration model-USwinTrans based on the Swin transducer, which combines the advantages of CNN and transducer, on one hand, the method has the advantage of CNN processing large-size images due to a local attention mechanism. On the other hand, with the advantage of a transducer, long-range dependencies can be modeled with a shift window scheme.

2. Technical proposal

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

the invention discloses a Swin transducer-based medical image denoising method, which comprises the following steps of:

step one, obtaining a noise medical image and a clean medical image as a training set and a testing set;

step two, adding a Swin transducer to the neural network to design an RSTB block;

training a network by adopting an Adam algorithm, and constructing a medical image denoising network;

inputting the noise medical image into a network to obtain a denoising result;

and fifthly, evaluating the network by using the image quality evaluation index.

Further, the step one performs data augmentation on the obtained image training set by cutting the image.

Furthermore, a depth feature extraction module is added between the encoder and the decoder of the U-net network, and the module introduces a Swin transform and is combined with convolution operation to extract local and global information respectively.

Still further, the encoder of the U-net network uses two downsampling operations of step size 2, and the decoder uses two upsampling operations of step size 2.

Still further, the depth feature extraction module includes a plurality of RSTB blocks and a convolution block, each RSTB block includes a plurality of Swin transform layers and a convolution block, and residual connection is performed after the series connection.

Still further, the depth feature extraction module includes 5 RSTB blocks and 1 convolution block, and each RSTB block is a residual module formed by connecting 6 Swin transform layers and 1 convolution block in series.

Furthermore, the Swin Transformer layer consists of two residual blocks, wherein the first residual block is normalized by a LayerNorm layer and is connected with a multi-head self-attention module MSA; the second residual block is normalized by the LayerNorm layer, followed by a multi-layer perceptron MLP.

Still further, step three trains the network using Adam optimizer, L2 loss function is calculated as:

wherein X represents a clean image,

representing a noisy image, F (·) representing the network.

3. Advantageous effects

Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:

(1) According to the medical image denoising method based on the Swin transform, a medical image denoising model based on U-net is improved, a strong baseline image restoration model USwinTrans based on the Swin transform is constructed, the Swin transform is added into the model, the transform and a convolution module Convolutional Module are combined to obtain a depth feature extraction part, a Self-attention mechanism Self-Attention Mechanism is introduced into the model, and finally a convolution layer Convolutinal Layer is used as a decoder to output a denoising result, so that details and textures of an image can be reconstructed better, and the performance is superior to that of other denoising methods;

(2) According to the medical image denoising method based on the Swin Transformer, on a U-net basic structure, the Swin Transformer and a convolution module Convolutional Module are introduced to improve the depth feature extraction part of a network, so that the model not only improves the capability of capturing local information of an image, but also promotes the understanding of the model on information between image patches; meanwhile, the Swin transducer is used for extracting more global information, and a good effect is achieved in the process of denoising medical images. The method is not only effective for denoising medical images, but also can generate good visual effect for denoising natural images.

Drawings

FIG. 1 is a medical image denoising flowchart of the present invention;

FIG. 2 is a schematic diagram of a medical image denoising model structure;

fig. 3 is a block structure diagram of Residual Swin Transformer Block (RSTB);

FIG. 4 is a schematic diagram of Swin Transformer Layer (STL);

fig. 5 is a schematic diagram of a processing result of using a denoising method for a noisy gray-scale image with gaussian noise σ=0.001 and multiplicative noise σ=0.005;

fig. 6 is a schematic diagram of a processing result of using a denoising method for a noisy gray scale nuclear magnetic image with gaussian noise σ=0.001 and multiplicative noise σ=0.005.

Detailed Description

For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and examples.

Example 1

Referring to fig. 1, the embodiment mainly includes original medical image data augmentation, constructing a medical image denoising model, a noisy medical image training network and a training result test; the method specifically comprises the following steps:

step one, acquiring a medical image data set, and distributing the data set into an image training set and an image testing set according to requirements; performing data augmentation on the obtained image training set by cutting the image;

and step two, adding a depth feature extraction module between an encoder and a decoder of the U-net network, introducing a Swin Transformer, and respectively extracting local and global information by combining convolution operation. The strong baseline image restoration model based on the Swin transducer constructed in the embodiment is called USwinTrans, and the specific structure is shown in FIG. 2.

The image encoder uses two downsampling operations with a step size of 2, the decoder uses two upsampling operations with a step size of 2, and the depth feature extraction module consists of five residual Swin transform blocks (RSTB blocks) and one convolution block, each RSTB block comprises six Swin transform layers and one convolution block, and residual connection is performed after the steps are connected in series. The multi-level information of the image is obtained through up and down sampling, and the depth feature extraction module focuses on the recovery of the high-frequency information of the image.

Residual Swin Transformer Block (RSTB) is a residual block consisting of Swin Transformer Layer (STL) and a convolution block, and the structure is shown in fig. 3. This design has two benefits, first, although the transducer can be considered as a specific example of spatially varying convolution, the convolution layer with spatially invariant filters can enhance the translational invariance of the USwinTrans. Second, the residual connection provides self-based connection from different blocks to the reconstruction module, allowing different levels of feature aggregation.

Swin Transformer Layer (STL) is based on the standard multi-headed self-attention mechanism of original Transformer Layer. The main differences are local attention and changes in window mechanism. The STL structure is shown in fig. 4, which consists of two residual blocks, the first normalized by LayerNorm, followed by a multi-head self-attention Module (MSA); the second residual block is normalized by the LayerNorm layer, followed by a multi-layer perceptron (MLP).

Given an input of size H W C, the Swin transducer first remodels the input to a size (HW/M by dividing the input into M partial windows that do not overlap ² )×M ² Features of XC, where HW/M ² Is the total number of windows. Then, the Swin transducer calculates the standard self-attention (i.e., local attention) for each window separately. For the local window feature X, the query, key, value matrix (Q, K, V) in the attention mechanism is calculated as follows:

Q＝XP _Q ，K＝XP _K ，V＝XP _V ，

wherein P is _Q 、P _K And P _V Is a projection matrix shared by different windows. The attention matrix is calculated in the local window by a self-attention mechanism:

where B is a learnable relative position code and d is the dimension of K. The present embodiment performs the attention function h times in parallel and connects the results of the multi-head self-attention Module (MSA). Next, a further feature transformation is performed using a multi-layer perceptron (MLP) having two fully connected layers with GELU nonlinearity between the layers. LayerNorm (LN) layers were added before MSA and MLP, both modules using residual connections.

And thirdly, training the network by using an Adam optimizer and an L2 loss function. Wherein the L2 loss function is calculated as:

wherein X represents a clean image,

representing noise patternsLike, F (·) represents a network.

Inputting the noise medical image into a network to obtain a denoising result.

And fifthly, testing the USwinTrans by using the image test set in the step one, and evaluating the model by using an image evaluation index.

In the embodiment, the Swin transform module is used as a main body of the network structure, and can refine texture details by using global information of the picture; the use of downsampling operations increases the receptive field, enabling the convolution of 3*3 for feature extraction over a larger image range.

The present embodiment performs experiments on natural images (gray scale, color) and medical images, respectively. The present example was divided into three groups to perform experiments to verify the effectiveness of the algorithm. This example was compared in experiments with several other methods (PM, LEPM, DEPS, FDOGC), respectively.

Referring to tables 1-3, for each set of experiments, the present example selects a plurality of pictures to be tested in different methods, calculates the corresponding index (PSNR, SSIM, RMSE) from the obtained result image and the corresponding clean noise-free image, and averages the result image and the corresponding index. From quantitative analysis, the indexes of the experimental results of the invention are maximum or minimum values, which shows that the method of the invention is superior to other methods in maintaining the similarity of image structures and improving the signal to noise ratio of images.

Table 1 comparison of results under various indices for different methods of gray natural images

Method	PSNR	SSIM	RMSE
				PM	30.1910	0.8247	0.0246
LEPM	27.7090	0.7866	0.0336
				DEPS	30.2095	0.8248	0.0245
FDOGC	27.7117	0.7865	0.0464
				USwinTrans	31.1550	0.8472	0.0216

Table 2 comparison of results under various indices for different methods of color natural images

Method	PSNR	SSIM	RMSE
				PM	31.9112	0.9230	0.0263
LEPM	30.6955	0.8970	0.0311
				DEPS	31.9161	0.9230	0.0263
FDOGC	30.6891	0.8970	0.0312
				USwinTrans	33.0638	0.9345	0.0228

Table 3 comparison of results under various indices for different methods of greyscale medical images

Method	PSNR	SSIM	RMSE
				PM	27.8698	0.6919	0.0234
LEPM	27.5688	0.6765	0.0242
				DEPS	27.7713	0.6919	0.0236
FDOGC	27.4868	0.7254	0.0244
				USwinTrans	27.9732	0.7320	0.0231

Fig. 5 and 6 are comparison results of the method of the present invention with the remaining several denoising methods, wherein (a) represents an input image, (B), (C), (D), (E) represent processing results using the PM, LEPM, DEPS, FDOGC denoising method, respectively, (F) represents processing results using the USwinTrans denoising method, and (G) represents a clean image. From qualitative analysis, the method can reconstruct details and textures of the image better, and the denoising effect is obviously better than that of other methods, and is stronger than that of other methods in the aspect of maintaining the detail textures, so that the method has the effectiveness of denoising the medical image and the natural image.

The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims

1. The medical image denoising method based on the Swin transducer is characterized by comprising the following steps of:

inputting the noise medical image into a network to obtain a denoising result;

2. The method for denoising medical images based on Swin transducer according to claim 1, wherein the method comprises the steps of: and step one, performing data augmentation on the obtained image training set by cutting the image.

3. A method for denoising medical images based on Swin transducer according to claim 1 or 2, wherein: and step two, adding a depth feature extraction module between an encoder and a decoder of the U-net network, introducing a Swin transducer, and respectively extracting local and global information by combining convolution operation.

4. A method of denoising a medical image based on a Swin transducer according to claim 3, wherein: the encoder of the U-net network uses two downsampling operations of step size 2 and the decoder uses two upsampling operations of step size 2.

5. The method for denoising a medical image based on a Swin transducer according to claim 4, wherein: the depth feature extraction module comprises a plurality of RSTB blocks and a convolution block, each RSTB block comprises a plurality of Swin transform layers and a convolution block, and residual connection is carried out after the RSTB blocks are connected in series.

6. The method for denoising medical images based on Swin transducer according to claim 5, wherein: the depth feature extraction module comprises 5 RSTB blocks and 1 convolution block, and each RSTB block is a residual module formed by connecting 6 Swin transform layers and 1 convolution block in series.

7. The method for denoising medical images based on Swin transducer according to claim 6, wherein: the Swin converter layer consists of two residual blocks, wherein the first residual block is normalized by a LayerNorm layer and is connected with a multi-head self-attention module MSA; the second residual block is normalized by the LayerNorm layer, followed by a multi-layer perceptron MLP.

8. The method for denoising medical images based on Swin transducer according to claim 7, wherein: step three, training a network by using an Adam optimizer and an L2 loss function, wherein the L2 loss function is calculated as follows:

wherein, X is as followsA clean image is shown and is shown in a clear view,

representing a noisy image, F (·) representing the network.