CN116167929A

CN116167929A - Low-dose CT image denoising network based on residual error multi-scale feature extraction

Info

Publication number: CN116167929A
Application number: CN202211588468.8A
Authority: CN
Inventors: 贾丽娜; 黄爱敏; 贺旭; 李宗洋; 贾蓓蓓
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-05-26

Abstract

The invention relates to a low-dose CT image denoising network based on residual multi-scale feature extraction, which adopts an encoder-decoder framework and comprises the following steps: adding a multi-scale feature extraction module to the residual connection of the encoder and decoder convolutional layers; the mixed loss function consisting of MSE loss, SSIM loss and perception loss is adopted to guide the denoising network. According to the invention, the multi-scale feature extraction module is added into residual connection, so that the information utilization rate of an input LDCT image is increased, and the operation speed is improved while higher performance indexes are obtained; the BN layer is used for relieving the overfitting phenomenon which occurs along with the improvement of the complexity of the model; in order to further generate a denoising image with high relevance to human perception, MSE loss, SSIM loss and perception loss are used for guiding the denoising network model together, so that the visual effect of the denoising image is further improved.

Description

Low-dose CT image denoising network based on residual error multi-scale feature extraction

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a low-dose CT image denoising network based on residual multi-scale feature extraction.

Background

The computed tomography (Computed Tomography, CT) has been widely used in clinical medical diagnosis (such as tumor screening, cancer staging, multi-part puncture biopsy of human body, etc.) because of its advantages of fast imaging speed, good definition, low requirement on patients, no wound and no pain, etc. However, its ionizing radiation may cause adverse effects on the patient, increasing the risk of lesions in the focal area of the patient. The ALARA (as low as reasonable achievable) criterion should be followed when the patient is examined. The radiation dose brought by CT scanning is reduced as much as possible on the premise of not influencing the clinical diagnosis result. Therefore, low-dose CT (Low Dose Computed Tomography, LDCT) technology is attracting more and more attention, however, the lower the dose, the more noisy the reconstructed CT image, resulting in diagnosis difficulties.

Over the past few decades, researchers have developed a number of algorithms to address the image quality problem of LDCT. Two main categories can be distinguished: one is a traditional LDCT noise reduction algorithm; another class is learning-based LDCT noise reduction algorithms. The traditional LDCT noise reduction algorithm mainly comprises the following steps: projection domain filtering algorithm, iterative reconstruction algorithm and post-processing algorithm. The projection domain filtering algorithm is also called domain processing algorithm, which directly takes a projection image as an optimization variable, filters random noise of the projection image, and finally obtains a reconstructed image by applying a filtering back projection algorithm. Typical methods mainly include: structural adaptive filtering, bilateral filtering, and penalty weighted least squares algorithm (The Penalized weighted least-square, PWLS); the iterative reconstruction algorithm mainly utilizes Bayes and imaging physics theory, utilizes likelihood functions to link projection and reconstructed images according to priori characteristics of projection data, integrates statistical characteristics of the images to be reconstructed, and constructs an objective function; the image post-processing algorithm is mainly used for carrying out noise reduction on the reconstructed image again, is different from the former two algorithms, does not depend on original data, can be directly applied to an LDCT image, and realizes the end-to-end mapping from the LDCT image to an NDCT (Normal Dose Computed Tomography, NDCT) image. However, this method cannot determine the exact distribution of noise, and it is therefore difficult to achieve equalization between feature preservation and noise removal.

In recent years, due to the powerful function of data driving, deep learning has been widely applied to denoising and artifact suppression of LDCT images due to the advantages of strong feature learning capability, wide application range, good portability and the like. The deep learning post-processing algorithm is a research hotspot. The method mainly realizes the end-to-end mapping from LDCT images to NDCT images by training a convolutional neural network (Convolutional Neural Network, CNN). Chen et al propose a low dose CT residual encoder-decoder convolutional neural network (RED-CNN), combining a shortcut connection of the deconvolution network into the CNN model, achieving an improvement in LDCT image quality, both in quantitative index measurement and in subjective visual effect, with new heights, but due to the use of minimum mean square error (Mean Square Error, MSE) as a loss function, the denoised image is too smooth, causing visual embarrassment for clinician diagnosis; yang et al propose an LDCT image denoising algorithm based on a generated countermeasure network (Generative Adversarial Network, GAN), and the proposed algorithm has Wasserstein distance and perceptual similarity, and shifts visual perception knowledge into an image denoising task, so that the noise level of an image can be reduced, and key information can be kept. Li et al propose a residual attention module RAM and incorporate the module into two neural networks for LDCT image denoising, develop REDCNN-RAM and WGAN-RAM, although obtaining good effect, the model is complex and the running speed is slow.

Disclosure of Invention

The invention aims to provide a low-dose CT image denoising network based on residual error multi-scale feature extraction, which can improve the operation speed and has better denoising effect;

in order to achieve the above purpose, the invention adopts the following technical scheme:

a low dose CT image denoising network based on residual multi-scale feature extraction employing an encoder-decoder framework comprising:

adding a multi-scale feature extraction module to the residual connection of the encoder and decoder convolutional layers;

the mixed loss function consisting of MSE loss, SSIM loss and perception loss is adopted to guide the denoising network.

Preferably, the encoder is composed of 6 convolution layers, wherein the 6 convolution layers are 2 layers of 5×5 shallow feature extraction layers and 4 layers of 3×3 deep feature extraction layers; the decoder consists of 6 deconvolution layers, the 6 deconvolution layers being 4 layers 3 x3 deconvolution layers and 2 layers 5 x 5 deconvolution layers.

Preferably, the convolutional layers each use zero padding to ensure consistent image sizes for both input and output.

Preferably, a BN layer is added after each convolution layer; the convolutional layer of the encoder is activated by a ReLU function; the deconvolution layer of the decoder is activated by the PReLU function.

Preferably, the residual connection comprises a first residual connection, a second residual connection and a third residual connection;

the first residual connection is connected from the original input to the BN layer of the last layer of the decoder;

the second residual connection is connected from after the encoder second layer BN to after the decoder penultimate layer BN;

the third residual connection is connected from after the BN of the fourth layer of the encoder to after the BN of the fourth layer of the decoder.

Preferably, the multi-scale feature extraction module is denoted as an MSFE module; the MSFE module comprises an MSFEA module and an MSFEB module; the MSFEA module is added to a second residual connection; the MSFEB module is added to the third residual connection.

Preferably, the MSFEA module divides the input into 4 paths for multi-scale feature extraction, wherein the paths are respectively 3 convolution branches and 1 direct connection branch, the convolution kernel sizes of the 3 convolution branches are respectively (1×1), (1×1,3×3) (3×3 ), the channel numbers are respectively 16, (16, 16), (16, 24, 32), the output results of the 3 convolution branches are spliced according to dimensions, and then the output results are subjected to element addition by a convolution layer with the convolution kernel size of 1×1 and the channel number of 96, and the output results are sent to the PReLU layer of the denoising network; each convolution layer of the MSFEA module is followed by an added BN layer and is non-linearly activated by a ReLU.

Preferably, the MSFEB module divides the input into 3 paths for multi-scale feature extraction, wherein the 3 paths are respectively 2 convolution branches and a direct connection branch, the convolution kernels of the 2 convolution branches are respectively (1×1), (1×1,1×7,7×1), the channel numbers are respectively 96, (96, 80, 64), the output results of the 2 convolution branches are spliced according to dimensions, and then the output results are added with the direct connection branch through a convolution layer with the convolution kernel size of 1×1 and the channel number of 96, and are sent to the prime lu layer of the denoising network; each convolution layer of the MSFEB module is followed by an added BN layer and is non-linearly activated by a ReLU.

Preferably, the MSE loss is:

wherein, in the formula (1), T (-) represents a denoising network, X represents an LDCT image, and Y represents an NDCT image;

the perceived loss is:

/>

wherein (2) X in the formula _denoised Representing a denoised image generated by a denoised model, Φ _i (. Cndot.) represents the i-th layer features extracted from the feature extraction network;

the SSIM is as follows:

wherein in the formula (3)

μ _Y Respectively represent X _denoised Average value of Y,/->

X represents _denoised Covariance of Y,>

and->

Respectively represent X _denoised And variance of Y, c ₁ ，c ₂ Is a constant value, and is used for the treatment of the skin,

the SSIM loss is:

L _SSIM ＝1-SSIM(X _denoised ,Y) (4)

preferably, the mixing loss function is:

L _total ＝λ ₁ L _MSE +λ ₂ L _per +λ ₃ L _SSIM (5)

wherein lambda in the formula (5) ₁ ，λ ₂ ，λ ₃ Is the weight coefficient of the corresponding loss function.

According to the invention, the MSFE module is added into residual connection through the multi-scale feature extraction module MSFE, so that the information utilization rate of an input LDCT image is increased; zero padding is used by all convolution layers to ensure that the input and output images are the same in size, and the loss of image structure information caused by continuous downsampling is relieved; the BN layer is used to accelerate convergence speed and alleviate the problem of network overfitting; introducing a mixing loss function: the MSE loss, SSIM loss and perception loss further improve the visual effect of the denoised image.

Drawings

FIG. 1 is a diagram of the overall framework of a denoising network according to the present invention;

FIG. 2 is a schematic diagram of an MSFEA module;

FIG. 3 is a schematic diagram of an MSFEB module;

FIG. 4 is a downsampling process;

FIG. 5 is a graph of denoising results for several methods with representative slices on the AAPM 1/10 chest dataset;

FIG. 6 is a graph of denoising results for several methods with representative slices on the AAPM 1/10 chest dataset;

FIG. 7 is an enlarged view of the region of interest of FIG. 5;

FIG. 8 is an enlarged view of the region of interest of FIG. 6;

FIG. 9 is a graph of denoising results for several methods with representative slices on the AAPM 1/4 abdomen dataset;

FIG. 10 is a graph of denoising results for several methods with representative slices on the AAPM 1/4 abdomen dataset;

FIG. 11 is an enlarged view of the region of interest of FIG. 9;

fig. 12 is an enlarged view of the region of interest of fig. 10.

Detailed Description

The invention is further described below with reference to the drawings and specific examples.

As shown in fig. 1, the low dose CT image denoising network based on residual multi-scale feature extraction of the present invention employs an encoder-decoder framework, comprising:

specifically, the multi-scale feature extraction module is a lightweight multi-scale feature extraction module, and is denoted as an MSFE module; the MSFE module is added into residual connection, so that the information utilization rate of an input LDCT image is increased, and the operation speed is improved while higher performance indexes are obtained;

specifically, the denoising network adopts a classical network architecture: an encoder-decoder network, the encoder being the process of compressing an input image and the decoder being the process of decompressing it;

the encoder consists of 6 convolution layers, wherein the 6 convolution layers comprise 2 layers of 5 multiplied by 5 shallow feature extraction layers and 4 layers of 3 multiplied by 3 deep feature extraction layers; the decoder consists of 6 deconvolution layers, wherein the 6 deconvolution layers are 4 layers of 3×3 deconvolution layers and 2 layers of 5×5 deconvolution layers; the BN layer is added after each convolution layer, so that the overfitting phenomenon which occurs along with the improvement of the complexity of the model is relieved, the encoder part is activated by the ReLU function, and the decoder part is activated by the PReLU function, so that the gradient of the negative part can be updated along with the training of the network; in the denoising network, except for MSFEA and MSFEB modules, the number of all convolution kernel channels is set to 96;

in this embodiment, the convolution layers all use zero padding to ensure consistent input-output image sizes;

the encoder is made up of consecutive downsampled blocks and, correspondingly, the decoder is made up of consecutive upsampled blocks. Because of continuous downsampling, the loss of image structure information is likely to be caused, so that the final denoising image is distorted, zero padding is used for ensuring that the input and output images are the same in size, richer detail information is extracted, and the loss of image structure information caused by continuous downsampling is relieved;

the normal downsampling and downsampling operation with zero padding is illustrated in fig. 4. Assuming that the input image is h×w, it can be seen from fig. 4 (a) that the output image size is (H-2) × (W-2) after the normal downsampling operation, and the reduction of the image size tends to cause loss of detail information, especially after the continuous downsampling operation, the detail information loss is more obvious, and as can be seen from fig. 4 (b), the output image size and the input image remain consistent, thereby reducing the loss of detail information.

In this embodiment, as shown in fig. 1, the residual connection includes a first residual connection, a second residual connection, and a third residual connection; the first residual connection is connected from the original input to the BN layer of the last layer of the decoder; the second residual connection is connected from after the encoder second layer BN to after the decoder penultimate layer BN; the third residual connection is connected from after the BN of the fourth layer of the encoder to after the BN of the fourth layer of the decoder.

In this embodiment, the MSFE module includes an MSFEA module and an MSFEB module; the MSFEA module is added to a second residual connection; the MSFEB module is added to a third residual connection;

as shown in fig. 2, the MSFEA module divides the input into 4 paths for multi-scale feature extraction, which are respectively 3 convolution branches and 1 direct connection branch, the convolution kernel sizes of the 3 convolution branches are respectively (1×1), (1×1,3×3) (3×3 ), the channel numbers are respectively 16, (16, 16), (16, 24, 32), then, the output results of the 3 convolution branches are spliced according to dimensions, and then pass through a convolution layer with the convolution kernel size of 1×1 and the channel number of 96, and the output results are added with the direct connection branch elements, and finally, the result is sent to the prime lu layer of the main denoising network; as shown in fig. 3, the MSFEB module divides the input into 3 paths for multi-scale feature extraction, which are respectively 2 convolution branches and one direct connection branch, the convolution kernels of the 2 convolution branches are respectively (1×1), (1×1,1×7,7×1), and the channel numbers are respectively 96, (96, 80, 64). Then, the output results of the 2 convolution branches are spliced according to dimensions and then pass through a convolution layer with the convolution kernel size of 1 multiplied by 1 and the channel number of 96, the output results are added with elements of the direct connection branches, and finally, the convolution layer is sent to a PReLU layer of a main denoising network; each convolution layer of the MSFEA and MSFEB modules is followed by a BN layer to prevent network overfitting and nonlinear activation by ReLU.

The traditional CNN structure is characterized by carrying out feature extraction by a convolution kernel with a single size, the feature extraction is not diversified enough in the scale, the input image information utilization rate is often low, the denoising performance is poor, the MSFEA module divides the input into 4 paths to carry out multi-scale feature extraction, 3 convolution branches and 1 direct connection branch are respectively adopted, the MSFEB module divides the input into 3 paths to carry out multi-scale feature extraction, 2 convolution branches and one direct connection branch are respectively adopted, the width of a network is increased, the adaptability of the network to the scale is also increased, different branch receptive fields are different, therefore, more nonlinear features can be combined, convolution is carried out on a plurality of scales simultaneously, the features of different scales can be extracted, the features are richer, and the information utilization rate of the image is higher.

The mixed loss function formed by weighted sum of MSE loss, SSIM loss and perception loss is adopted to guide the denoising network, so that the problem of excessive smoothing of the denoised image caused by guiding the denoising network by single MSE loss is solved;

the MSE loss is the mean value of the sum of squares of denoising difference values between the denoising image and the NDCT image under the background of denoising the LDCT image, and the mathematical expression is as follows:

the perceptual loss is used for comparing the characteristic obtained by convoluting the denoised image with the characteristic obtained by convoluting the NDCT image under the background of denoising the LDCT image, so that the content is similar to the global context structure, and the denoising network is guided to generate a denoised image which is more and more similar to the NDCT image. The process can be represented by formula 2:

wherein (2) X in the formula _denoised Representing a denoised image generated by a denoised model, Φ _i (. Cndot.) represents the i-th layer features extracted from the feature extraction network; the feature extraction network used in the present embodiment is a VGG19 network;

the SSIM is an index for measuring the similarity of two images. A higher SSIM indicates a higher similarity of the two images, whereas a lower SSIM indicates a lower similarity of the two images. It can be expressed as:

wherein in the formula (3)

μ _Y Respectively represent X _denoised Average value of Y,/->

X represents _denoised Covariance of Y,>

and->

the SSIM loss is:

L _SSIM ＝1-SSIM(X _denoised ,Y) (4)

thus, the mixing loss function is:

L _total ＝λ ₁ L _MSE +λ ₂ L _per +λ ₃ L _SSIM (5)

The present invention has the following test experiments,

the denoising network of the present invention was named MSFREDCNN, compared to several related methods, including REDCN N-RAM, WGAN-RAM, and REDCNN, and was selected to optimize the denoising model by Adam optimizer, with Patch_size set to 64×64, sliding interval set to 10, and initial learning rate set to 1e-4. The learning rate is reduced by half every 2000 iteration steps, the total iteration times is set to 200, and the super-parameters of the mixed loss function are finally set to be: lambda (lambda) ₁ ＝1，λ ₂ ＝0.02，λ ₃ ＝0.01，

For fair comparison, all experiments were trained in a Pytorch1.11 environment, using a NIVID RTX3080-10GB graphics accelerator,

the performance of the method of the present invention was evaluated by experiments performed on 1/10 chest CT dataset and 1/4 abdomen CT dataset in "2016 NIH-AAPM-Mayo clinic Low dose CT big challenge" authorized by the Meaoh clinic, and the results are shown in FIGS. 5-8. Fig. 5 and 6 are denoising results of several methods with representative slices on the AAPM 1/10 chest dataset, as can be seen from fig. 5 (a) and 6 (a), the untreated LDCT image does present serious noise and artifacts, making LDCT structural details difficult to resolve, making diagnosis difficult for the clinician. When compared with an LDCT image, all algorithms can effectively inhibit noise and artifacts, and when compared with an NDCT image, as can be seen from fig. 5 (c) and 6 (c), REDCNN-RAM inhibits noise to a certain extent, but introduces new noise, and the image has a certain degree of blurring; as can be seen from fig. 5 (d) and fig. 6 (d), the WGAN-RAM, although generating a denoising image with a high degree of similarity to human perception, still has no obvious detail information recovery; as can be seen from fig. 5 (e) and fig. 6 (e), the REDCNN achieves good denoising results, but the use of the MSE loss function to guide the denoising network causes the denoising image to be too smooth, which still makes clinical diagnosis difficult. As can be seen from fig. 5 (f) and fig. 6 (f), the best visual effect is seen by the best image detail recovered by MSFREDCNN. In order to further improve the visual effect of the denoising image and generate the denoising image highly similar to human perception, the invention introduces the perception loss and the structural similarity loss to guide the whole denoising network together with the MSE, and as can be seen from the FIG. 5 (g) and the FIG. 6 (g), the denoising network acquires the denoising image closest to the NDCT image under the common guidance of the MSE loss, the perception loss and the structural similarity loss. Fig. 7 and 8 are denoising and enlarging views of the region of interest (Region of interest, ROI) marked by the white boxes in fig. 5 (b) and 6 (b), respectively, and it can be seen from the white circles and white arrows in the figures that MSFREDCNN is closest to the NDCT image in terms of both structural similarity and visual effect.

Fig. 9 and 10 are denoising results of several methods with representative slices on two AAPM 1/4 abdominal datasets, and it can be seen from fig. 9 (a) and 10 (a) that although the 1/4 dose abdominal CT image has better visual effect than the 1/10 dose chest CT image, there is still some noise and artifact that affects the doctor's diagnosis. From the overall effect of fig. 9 and 10, these several improvements suppress noise to some extent and improve visual effects. However, when compared to the more clear NDCT images of lesions, it can be seen that WGAN-RAM and REDCNN still retain some degree of noise; since REDCNN-RAM and proposed MSFREDCNN both use MSE objective functions to guide the denoising network, there is some degree of ambiguity; when the present invention directs MSFREDCNN with the proposed mixing loss function, the restored denoising image is visually closest to NDCT. By observing the enlarged ROI area, such as the area marked by the white circles in fig. 11 and 12, it is not difficult to find that the MSFREDCNN proposed by the present invention is visually closer to the human perception level. In restoring the microstructure, the areas marked by white arrows in fig. 11, MSFREDCNN and MSFREDCNN + mixing loss restore the structure more clearly.

Image denoising effect is objectively evaluated by using peak-signal-to-noise ratio (PSNR), structural similarity (structural similarity, SSIM) and Root mean square error (Root-mean Square Error). Tables 1 and 2 are the mean quantitative index values on the 1/10 dose chest CT dataset and the 1/4 dose abdomen CT dataset, respectively.

Table 1 quantitative index mean values for different algorithms on 1/10 chest CT dataset

TABLE 2 quantitative index mean values for different algorithms on 1/4 abdomen CT dataset

It can be seen from both tables 1 and 2 that the proposed denoising network MSFREDCNN is improved by nearly 3 times in run time compared to both REDCNN-RAM and WGAN-RAM algorithms, while the proposed mixing loss is used to guide MSFREDCNN, which is improved by nearly 2 times in run time. And from the quantitative index, the best quantitative index value (indicated by the bold in the table) is obtained by the proposed MSFREDCNN, which shows that the MSFREDCNN network model has better denoising performance in terms of structure maintenance and noise suppression. Although the quantitative index value after adding the mixing loss function fails to exceed the non-addition, due to the specificity of the LDCT image: the background with the same LDCT and NDCT occupies a large part of the CT image, which often does not work for the diagnosis of the doctor. Therefore, the quantitative index value of the ROI is given by the ablation experiment part, and the best performance index is obtained by the denoising network added with the mixing loss. At the same time, this result further confirms the previous visual comparison.

Since the MSFREDCNN denoising network contains multiple modules, an ablation study is required to verify the validity of each module, the proposed model is subjected to an ablation study from both the network structure and the objective function, assuming:

BL: baseline model (representing a rough network without any enhancement modules);

pa+bn: adding a BN layer and zero-padding to the baseline model;

MSFE: the MSFEA, MSFEB multi-scale feature extraction module is added to the BL+PA+BN model.

MSE: guidance of the proposed MSFREDCNN network with MSE

Mse+vgg: guidance of the proposed MSFREDCNN network with MSE+VGG

Mse+ssim: guidance of the proposed MSFREDCNN network with MSE+SSIM

Mse+ssim+vgg: MSE+SSIM+VGG is used to guide the proposed MSFREDCNN denoising network.

The network structure ablation study, the denoising network adds an enhancement module to the baseline model every time, and the quantitative result is shown in table 3. It can be observed that the denoising performance of the model gradually increases with the increase of each module.

The ablation study of the objective function, the denoising network also verifies the influence of different objective functions on the proposed denoising model. Due to the specificity of CT images, the background and bone regions with the same LDCT and NDCT occupy a significant portion of the CT image, which is often ineffective for the diagnosis of the physician. Therefore, the present invention selects a lesion region of interest with a representative slice to calculate a quantitative index, and the quantitative results are shown in table 4.

Table 3 network structure ablation experiments

Table 4 objective function ablation experiments

/>

Claims

1. A low dose CT image denoising network based on residual multi-scale feature extraction, employing an encoder-decoder framework comprising:

2. The low dose CT image denoising network based on residual multiscale feature extraction of claim 1, wherein the encoder consists of 6 convolutional layers, the 6 convolutional layers being 2 layers of 5 x 5 shallow feature extraction layers and 4 layers of 3 x3 deep feature extraction layers; the decoder consists of 6 deconvolution layers, the 6 deconvolution layers being 4 layers 3 x3 deconvolution layers and 2 layers 5 x 5 deconvolution layers.

3. The low dose CT image denoising network based on residual multiscale feature extraction of claim 2, wherein the convolution layers each use zero padding to ensure consistent input and output image sizes.

4. A low dose CT image denoising network based on residual multiscale feature extraction according to claim 2 or 3, wherein BN layer is added after each of the convolution layers; the convolutional layer of the encoder is activated by a ReLU function; the deconvolution layer of the decoder is activated by the PReLU function.

5. The low dose CT image denoising network based on residual multi-scale feature extraction of claim 4, wherein the residual connections comprise a first residual connection, a second residual connection, and a third residual connection;

6. The low dose CT image denoising network based on residual multi-scale feature extraction of claim 5, wherein the multi-scale feature extraction module is denoted as MSFE module; the MSFE module comprises an MSFEA module and an MSFEB module; the MSFEA module is added to a second residual connection; the MSFEB module is added to the third residual connection.

7. The low-dose CT image denoising network according to claim 6, wherein the MSFEA module divides the input into 4 paths for multi-scale feature extraction, respectively, 3 convolution branches and 1 direct connection branch, the convolution kernel sizes of the 3 convolution branches are (1×1), (1×1,3×3) (3×3 ), the channel numbers are 16, (16, 16), (16, 24, 32), and the output results of the 3 convolution branches are spliced according to dimensions, and then are added with the direct connection branch by a convolution layer with the convolution kernel size of 1×1 and the channel number of 96, and then sent to the pralu layer of the denoising network; each convolution layer of the MSFEA module is followed by an added BN layer and is non-linearly activated by a ReLU.

8. The low-dose CT image denoising network according to claim 6, wherein the MSFEB module divides the input into 3 paths for multi-scale feature extraction, the paths are respectively 2 convolution branches and a direct-connection branch, the convolution kernels of the 2 convolution branches are respectively (1×1), (1×1,1×7,7×1), the channel numbers are respectively 96, (96, 80, 64), and the output results of the 2 convolution branches are spliced according to dimensions, and then are fed into the pralu layer of the denoising network through a convolution layer with a convolution kernel size of 1×1 and a channel number of 96, wherein the convolution kernels of the 2 convolution branches are respectively (1×1), (1×1,1×7,7×1), and the output results of the 2 convolution branches are added with the direct-connection branch; each convolution layer of the MSFEB module is followed by an added BN layer and is non-linearly activated by a ReLU.

9. The low dose CT image denoising network based on residual multiscale feature extraction of claim 1, wherein the MSE penalty is:

the perceived loss is:

the SSIM is as follows:

wherein in the formula (3)

μ _Y Respectively represent X _denoised Average value of Y,/->

X represents _denoised Covariance of Y,>

and->

the SSIM loss is:

L _SSIM ＝1-SSIM(X _denoised ,Y) (4)。

10. the low dose CT image denoising network based on residual multiscale feature extraction of claim 9, wherein the mixing loss function is:

L _total ＝λ ₁ L _MSE +λ ₂ L _per +λ ₃ L _SSIM (5)