CN116563554A

CN116563554A - Low-dose CT image denoising method based on hybrid characterization learning

Info

Publication number: CN116563554A
Application number: CN202310454243.1A
Authority: CN
Inventors: 张聚; 叶列立; 王奔; 叶智毅; 龚伟伟; 应长钢
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-08-08

Abstract

The invention relates to a low-dose CT image denoising method based on hybrid characterization learning. The invention uses the symmetrical structure of the encoder and the decoder, adds the mixed characterization block in each encoder and decoder stage respectively, transmits the convolution mapping information to the self-attention module, realizes the information interaction between the inside of the window and the window, and simultaneously splices the information with the image enhancement information processed by the Canny operator, thereby better realizing the noise reduction function to the greatest extent on the premise of not damaging the local information of the original part. And combining the mean square error and a multi-scale perception function based on Resnet to finally output a noise image and a low-dose CT image for residual processing, so that the denoising effect of the denoising image is ensured, more detail texture information is reserved, and the denoising effect of the low-dose CT image is improved. The invention can better capture global information and local detail information, solve the problem of edge blurring and strengthen the detail texture of the CT image.

Description

Low-dose CT image denoising method based on hybrid characterization learning

Technical Field

The invention belongs to the technical field of medical image denoising, and relates to a low-dose CT image denoising method based on hybrid characterization learning.

Background

The modern medical diagnosis means are rapidly developed, and one of the most important tools is Computed Tomography (CT), which is a reliable and noninvasive medical image imaging mode and is helpful for finding pathological abnormalities of human bodies, such as head, neck, cardiovascular, chest, abdomen, basin and other diseases. In addition to diagnostic aspects, CT is also useful in guiding various clinical treatments, such as radiation therapy and surgery. However, the X-ray radiation during the repeated CT scan may cause a certain harm to the human body, which may lead to reduced immune function, abnormal metabolism, and genital damage, and increase the risk of leukemia, cancer, and genetic diseases. So that more and more examinations in recent years are performed using a Low Dose CT (LDCT) method. The method mainly reduces the x-ray dosage in CT scanning on the basis of ensuring that the quality of CT images can meet the diagnosis requirement. But this can lead to problems such as increased noise, reduced contrast at edges, corners and sharp features, and excessive smoothing of the image.

In recent years, extensive research has been conducted on the problem of denoising in LDCT. These methods can be broadly classified into three categories: 1. a filter-based method: having many similar image blocks in the same image, noise can be removed by non-local similar block stacking, such as non-local self-similar based image denoising (NLM) algorithm, block matching based 3D filtering (BM 3D) algorithm, wavelet transform, etc. 2. Model-based methods: such as: LNLTV algorithm. The method is a local and non-local total variation composite regularized image denoising model, utilizes the local structure and non-local similarity of images, combines a Total Variation (TV) model and a NLTV (nonlocal total variation) model to alleviate the defects of the models, and fully utilizes the advantages of the two models to denoise the images. 3. Learning-based methods: the advent of deep-network methods, which obtain more promising denoising results than filtering-based, model-based, and traditional learning-based methods, has become the dominant approach. The method focuses on learning the potential mapping from the noisy image to the clean image, and the LDCT image denoising network based on CNN has good effect due to the strong feature learning capability and feature mapping capability of CNN, but the risk that key details are lost and the correct analysis of focus positions is affected due to excessive smoothness of the denoising image still exists. VIT (vision transformer) has become a further breakthrough in the field of computer vision through long-term dependencies in global self-attention capture data.

Based on this, more and more transducer structures are present in the image field and achieve good results, as well as in the medical image denoising field. Recently, many studies have been made to achieve a certain effect by attempting to fuse CNN with a transducer. However, the combination of the two may result in inaccurate partial information caused by parameter weight sharing.

Therefore, a simple convolutional neural network has advantages in terms of local feature extraction, and has the problem of weak global modeling capability. Window-based self-attention mechanisms constitute a global representation through complex spatial transformations and long-range feature dependencies. But it may ignore the local feature details. And thus may lead to excessive smoothing of the CT image, with less detailed texture information, thereby affecting diagnosis.

Disclosure of Invention

The invention aims to provide a low-dose CT image denoising method based on hybrid characterization learning; and performing edge enhancement processing on the CT image, and then performing interaction with self-attention of a local window by utilizing depth convolution, so that CT image information between the inside of the window and the inside of the window is obtained, and finally, a noise image and a low-dose CT image are output by combining a mean square error and a multi-scale sensing function based on Resnet to perform residual processing, so that the denoising effect of the denoising image is ensured, more detail texture information is reserved, and the denoising effect of the low-dose CT image is improved.

The method has the advantages that the symmetrical structures of the encoder and the decoder are used, the Mixed representation blocks are respectively added in each encoder and each decoder, convolution mapping information is transmitted to the self-attention module, information interaction between the inside of a window and the window is achieved, meanwhile, the convolution mapping information is spliced with image enhancement information processed by a Canny operator, and the noise reduction function is achieved to the greatest extent on the premise that local information of the original part is not damaged. Meanwhile, each output decoding block and the input coding block at the corresponding position are fused on the channel through jump connection, and more texture details are reserved through fusion of the bottom layer information and the high layer information. Secondly, the problem of network gradient disappearance can be prevented, and the training speed is improved.

The method comprises the following specific steps:

step one, constructing a low-dose CT data set:

a certain number of patients are selected, low-dose CT images and corresponding normal-dose CT images on different parts of the patients are selected, poisson noise is inserted until the noise level is close to 25% of the noise level under the full dose, and a CT image data set (x, y) is formed. Where x is the low dose CT image and y is the normal dose CT image to which x corresponds.

Step two, constructing a denoising network model:

a symmetric encoding-decoding network model is constructed based on Mixed Block learning, and comprises a preprocessing Block (Pre-processing), an Input mapping Block (Input project), an Input encoding Block (MED), an intermediate layer Mixed Block (Mixed Block), an output decoding Block (MEU) and an output mapping Block (Output Projection).

In the image preprocessing stage, an input image is filtered by a Canny operator to generate an image with enhanced edge characteristics, and the image flows into a subsequent network block as auxiliary information to highlight the edge characteristics of the image and increase the receptive field of the model.

The image preprocessing layer based on the Canny operator is designed to strengthen the image edge characteristics by considering the situations that the image is excessively smooth, part of detail characteristics are lost and the like after the low-dose CT image is directly input into a neural network model for processing. Specifically, the image filtered by the Canny operator is directly spliced to an input mapping block, an encoding block and a decoding block, and image denoising is realized by taking the encoding block and the decoding block as auxiliary features.

The input mapping block comprises a multi-layer perceptron MLP, and maps the input to a fixed dimension through two layers of neurons, so that the subsequent characterization operation is facilitated.

The input coding block comprises an input mixed representation block, an edge enhancement feature layer, a separable rolling layer and a downsampling layer; firstly, the output of an input mapping block is used as a Query set Query, a Key Key and a Value to be input into a mixed characterization block for characterization learning, after the edge enhancement feature map processed by a Canny operator is cascaded with a characterization result, the edge enhancement feature map is subjected to separable convolution, and finally, the edge enhancement feature map is input into a downsampling layer to realize the coding of an input module, wherein the downsampling convolution kernel is 3 multiplied by 3, the stride is 2, and the filling is 1.

The intermediate layer hybrid token block includes window self-attention (WSA) and depth convolution (Dwconv), which can reduce dependencies outside the window, capturing correlations inside the feature space window while efficiently computing. The depth convolution performs an independent convolution operation on each channel to mine the channel information without changing the number of convolution channels.

The middle layer mixed representation block is designed, through the bidirectional interaction of the channel layer and the space layer, the problem that a window self-attention mechanism is limited in receptive field is solved, meanwhile, the defect caused by deep convolution weight sharing is also eliminated, texture information interaction and information gain inside and among windows are realized, and the global modeling capability of CT images is effectively improved. In a practical configuration, the attention module window size is 7×7 and the convolution kernel size of the depth convolution is 3×3.

The output decoding block MEU comprises an up-sampling layer, an edge enhancement feature layer, a separable convolution and an output mixed characterization block; an up-sampling operation with a convolution kernel of 4 x 4 and a stride of 2 is first performed. Because during deconvolution, if the convolution kernel size is not divisible by the step size, the deconvolution may exhibit a checkerboard effect. Then the edge characteristic images processed by the Canny operator are spliced in the same way, and the characteristic knowledge of more textures is learned by a separable convolution and finally adding a Mixed characteristic Block.

The input mixed representation block, the middle layer mixed representation block and the output mixed representation block have the same structure; and jump connection is adopted between the input coding blocks and the output decoding blocks, and in a symmetrical structure, each output decoding block is fused with the input coding block at the corresponding position on the channel, and more texture details are reserved through fusion of bottom layer information and high layer information. Secondly, the problem of network gradient disappearance can also be prevented.

The output mapping block (Output Projection) includes a layer of MLP, maps the output to 1×h×w, and restores the original image size.

Step three, data enhancement:

the medical image has the problems of complex sample data, sparse labeling and the like. In order to further increase training samples, the invention uses an image enhancement strategy to carry out various enhancement operations such as horizontal-vertical overturn, random clipping and the like on the collected data set with different probabilities, and constructs more images.

Step four, model optimization:

two loss functions are used to optimize the model to further improve the performance of the model.

First, the mean square error MSE is used to converge the error to a minimum, the loss function L ₁ The expression is:

L ₁ the method is used for evaluating the pixel-level similarity between the image after model denoising and the real image, and whether the model can accurately restore the original image;

wherein R (x) _i ) Representing a low dose noise image x _i Pure noise image mapped through residual error learning; y is _i A CT image representing its corresponding normal dose.

Secondly, a multi-scale perception function based on Resnet is used for realizing the residual function of the low-dose picture and the noise picture, and the loss function L ₂ The expression is:

L ₂ the loss function is mainly used for evaluating the structural similarity between the image predicted by the model and the real image;

where alpha is the use of classical feature extraction network Resnet50 as a feature extractor,is the weight of Resnet50 frozen on the ImageNet dataset after the convergence layer is deleted, R (x _i ) Representing a low dose noise image x _i Pure noise image mapped through residual learning. y is _i Representing a low dose noise image x _i Corresponding normal dose CT images.

Loss function l=λ ₁ L ₁ +λ ₂ L ₂ Wherein lambda is ₁ ，λ ₂ Is an adjustable super parameter.

And optimizing the model by continuously adjusting the learning rate and the adjustable super parameters to obtain optimal parameters and outputting the optimal model.

And fifthly, randomly selecting a low-dose CT image to be placed in a denoising model, and outputting a final result to obtain a denoised low-dose CT image.

Compared with the prior art, the invention adopts the technical proposal, and has the innovation and advantages that:

the depth convolution and window self-attention mechanism is applied to the characterization learning of the CT image, so that the texture information interaction and the information gain of the CT image can be realized to a great extent, and the texture information interaction and the information gain of the CT image are spliced with the edge information image strengthened by the Canny operator, so that a mixed feature map is formed, and the generated denoising image can better keep the original integral structure and local texture details.

The original low-dose denoising image is processed by a Canny operator to generate an edge feature image, and each time downsampling or upsampling is carried out, the original low-dose denoising image is fused with the edge feature image through an MED and MEU module, so that details of the CT image are enhanced, excessive smoothness is prevented, and key detail features are prevented from being lost. Meanwhile, through the jump connection of the corresponding MED and MEU blocks, the feature images of the corresponding positions of the encoder can be fused on the channel in the up-sampling process of each stage of the network. Through fusion of the bottom layer features and the high-level features, the network can retain more texture detail information contained in the high-level feature map, so that the feature representation capability is improved. Meanwhile, the jump connection can relieve the problem of gradient disappearance and accelerate the training of the network, thereby improving the performance of the network.

The model optimization is carried out by adopting the common mean square error and the multi-scale perception loss function based on Resnet, the convergence speed is faster by adopting the common mean square error, meanwhile, the function of denoising learning is similar to identity mapping by adopting the multi-scale perception loss function, and the residual mapping is easier to optimize. And the potential clean image in the hidden layer is implicitly removed in a residual error learning mode, so that the noise information is better identified, and a better denoising effect is achieved. Compared with the direct output of a clean image, the method has the advantages that the problem of fitting is avoided, and the denoising effect is greatly improved.

The model reduces the parameter quantity as much as possible, and can achieve better denoising effect under the same parameter complexity.

The invention can remove noise and artifacts in the low-dose CT image, can also retain the whole structure and local texture details of the original image, solves the problem of edge blurring, prevents the image from losing partial local information due to excessive smoothness, and is convenient for assisting clinical diagnosis.

Drawings

FIG. 1 is a schematic diagram of the overall network architecture of the present invention;

FIG. 2 is a schematic diagram of the input encoding block (MED) architecture of the present invention;

fig. 3 is a schematic diagram of the output decoding block (MEU) structure of the present invention;

FIG. 4 is a schematic diagram of the structure of an intermediate layer hybrid Block (Mixed Block) of the present invention;

FIG. 5 is a schematic view of a low dose CT image of an embodiment;

fig. 6 is a schematic view of the image of the low dose CT image of fig. 5 after denoising.

Detailed Description

The present invention will be specifically explained below with reference to the drawings.

A low-dose CT image denoising method based on hybrid characterization learning comprises the following specific steps:

step one, preprocessing a CT image data set:

dividing the CT image data set into a training set, a verification set and a test set; each group of paired images of the training set and the verification set is randomly cut to the input size of 128×128 pixels for training, local information of the images is obtained, and the sample size is expanded.

Step two, constructing a denoising network model, as shown in fig. 1, specifically comprising the following steps:

the construction of a denoising model is realized by utilizing a symmetrical structure of an encoder-decoder, and the aim is to denoise a low-dose CT image containing noise to obtain a normal-dose image after the model is input with the low-dose CT image containing noise. The overall network model can be divided into six core modules, a preprocessing Block (Pre-processing), an Input mapping Block (Input project), an Input coding Block (MED), an intermediate layer hybrid Block (Mixed Block), an output decoding Block (MEU), and an output mapping Block (Output Projection).

Considering the situations that the image is excessively smooth, part of detail features are lost and the like after the low-dose CT image is directly input into a neural network model for processing, an image preprocessing layer based on a Canny operator is designed to strengthen the edge features of the image. In the image preprocessing stage, an input image is filtered by a Canny operator to generate an image with enhanced edge characteristics, and the image is marked as Hand-processed features. The image is used as auxiliary information to flow into a subsequent network block so as to highlight the edge characteristics of the image and increase the receptive field of the model.

First, an input low-dose CT image X _a Put into an Input mapping block (Input project) consisting of a multi-layer perceptron (MLP): mapping inputs to a fixed dimension C H W, denoted X, by two layers of neurons _c And the subsequent characterization operation is convenient.

The encoder stage in this embodiment includes three input encoding blocks (MEDs), which are low dose CT images X to be input as shown in FIG. 2 _c Firstly, an input Mixed representation Block (Mixed Block) is used, the output of an input mapping Block is used as a Query set (Query), keys (Key) and values (Value) to be input into the Mixed Block for representation learning, and then the input mapping Block is spliced with an image Hand-processed features with enhanced edge features, and at the moment, the size of a two-dimensional feature map is changed into X _c ∈R ^2C×H×W . Then, by a separable convolution, in which the depth convolution is first performed and then the point-by-point convolution is performed. In the point-by-point convolution, the number of output channels is set for fixing the output channels, and the size of the two-dimensional characteristic diagram is changed back to X _c ∈R ^C×H×W . And finally, downsampling the data once for extracting more deep detail information. Wherein the downsampling convolution kernel is 3 x 3, the stride is 2, and the padding is 1. The size of the two-dimensional feature map becomesThe above operation is repeated through two MED blocks, and the size of the two-dimensional feature map becomes +.>

A Mixed Block (Mixed Block) is used in the middle layer, which consists of window self-attention (WSA) and depth convolution (Dwconv). Window self-attention can mainly reduce dependence on the outside of the window, and can capture the correlation inside the feature space window while being efficient in calculation. The depth convolution performs an independent convolution operation on each channel to mine the channel information without changing the number of convolution channels. Through the bidirectional interaction of the channel layer and the space layer, the defect that a window self-attention mechanism has limited receptive field is solved, meanwhile, the defect brought by deep convolution weight sharing is also eliminated, texture information interaction and information gain inside and among windows are realized, and the global modeling capability of CT images is effectively enhanced. In a practical configuration, the attention module convolution kernel is 7×7 and the depth convolution kernel is 3×3. The image after passing through the intermediate layer is marked as X _m 。

As shown in fig. 4, the specific expression is: in the figure, (a) is a channel interaction link, a global average convergence layer (GAP) is used, two convolution layers with the convolution kernel size of 1 multiplied by 1 and the step length of 1 are passed through, batch normalization and simple gating block processing are carried out in the convolution layers, and finally, a sigmoid activation function is used for transferring the signals into V of a self-attention module. The simple gating block divides the characteristic into two C/2 channels and multiplies the two channels. The partial linear function is replaced in a nonlinear manner, thereby reducing the amount of computation. In the figure, (b) is a space interaction link, two convolution layers with convolution kernel size of 1×1 and step length of 1 are adopted, batch normalization and simple gating block processing are carried out in the convolution layers, and finally a sigmoid activation function is used for generating a space self-attention diagram, so that a larger space range is provided for a depth convolution part. The new module thus formed is named Mixed Block and can be expressed as:

X′ _i+1 ＝MIX(LN(X _l ),WS1,D345nv)+X _l ；

X _l+1 ＝FFN(LN(X′l ₊₁ ))+X′ _l+1 ；

wherein MIX () function represents feature fusion of WSA with DwconvIs a function of (2). LN represents layer normalization, while FFN () is an MLP consisting of two linear layers with a simple gating block in between. X is X _l Representing an input feature tensor for the first layer; x'. _l+1 Representing the result of the window self-attention and depth convolution interaction post-processing as an intermediate state; x is X _l+1 The output after the l-layer mixed characterization processing is represented and can be used as the input of the next mixed characterization block.

In the decoder stage, as shown in fig. 3, it is embodied as: consists of three output decoding blocks (MEUs). MEU Block is image X processed by intermediate layer Mixed Block ₍ An up-sampling operation with a convolution kernel of 4 x 4 and a stride of 2 is performed. The upsampling is performed using a transposed convolution method. That feature map is of size fromAmplified to +.> At this time, the feature map size becomes +.>Then by a separable convolution for fixing the output channel, in which case the size of the two-dimensional feature map is changed back +.>Finally, an output hybrid characterization block is added to learn the feature knowledge of more textures. The same step is carried out again in two MEU blocks, so that the size of the two-dimensional characteristic diagram is changed back to R ^C×H×W The obtained image is marked as X _u 。

Then image X _u Put into the output mapping block (Output Projection), which consists of a layer of MLPs, map the output to 1×h×w, restore the original image size. This output isIs a pure noise image and is marked as X _o 。

The input mixed representation block, the middle layer mixed representation block and the output mixed representation block have the same structure; jump connection is adopted between the input coding blocks and the output decoding blocks, in a symmetrical structure, each output decoding block is fused with the input coding block at the corresponding position on a channel, and more texture details are reserved through fusion of bottom layer information and high layer information; secondly, the problem of network gradient disappearance can be prevented, and network training is accelerated;

finally, carrying out residual error processing with the original image to obtain a denoised CT image X _i 。

Step three, data enhancement:

selecting a certain number of patients, selecting low-dose CT images on different parts of the patients, performing data enhancement operation on the low-dose CT images, performing multiple enhancement operations such as horizontal-vertical overturning, random cutting and the like on the collected data set with different probabilities, and constructing a plurality of images to form a training set and a test set.

Step four, model optimization:

in order to make the model have better effect, two loss functions are constructed to optimize the model to further improve the performance of the model.

First, the Mean Square Error (MSE) is used to converge the error to a minimum, the loss function L ₁ The expression is:

Second, a multi-scale perceptual function based on Resnet is used to implement low dose pictures andresidual function, loss function L of noise picture ₂ The expression is:

loss function L ₂ The method comprises the steps of evaluating structural similarity between an image predicted by a model and a real image;

where alpha is the use of classical feature extraction network Resnet50 as a feature extractor,is the weight of Resnet50 frozen on the ImageNet dataset after the convergence layer is deleted, R (x _i ) Representing a low dose noise image x _i Pure noise image mapped through residual learning. y is _i A CT image representing its corresponding normal dose is represented.

The Adam optimizer is used in this implementation to train the weights of the update network. The model is optimized continuously by continuously adjusting the learning rate and the adjustable super-parameters. And judging the denoising effect of the model from multiple dimensions aiming at the trained model. And (5) preserving the denoising model parameters with the best training set effect.

Step five, denoising the low-dose CT image: and denoising the low-dose CT image shown in fig. 5 by using the trained denoising network to generate a CT image with good denoising effect shown in fig. 6.

The embodiments described in the present specification are merely examples of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, and the scope of protection of the present invention and equivalent technical means that can be conceived by those skilled in the art based on the inventive concept.

Claims

1. The low-dose CT image denoising method based on hybrid characterization learning is characterized by comprising the following steps of: the method specifically comprises the following steps:

step one, constructing a low-dose CT data set:

selecting a certain number of patients, selecting low-dose CT images on different parts of the patients and corresponding normal-dose CT images, and inserting poisson noise until the noise level is close to 25% of the noise level under the full dose to form a CT image data set (x, y); wherein x is a low dose CT image and y is a normal dose CT image corresponding to x;

step two, constructing a denoising network model:

constructing a symmetric coding-decoding network model based on hybrid characterization learning; the model comprises a preprocessing block, an input mapping block, an input encoding block, an intermediate layer hybrid characterization block, an output decoding block and an output mapping block;

in the image preprocessing stage, an input image is filtered by a Canny operator to generate an image with enhanced edge characteristics, and the image flows into a subsequent network block as auxiliary information to highlight the edge characteristics of the image and increase the receptive field of the model;

the input mapping block includes a multi-layer perceptron: the input is mapped to a fixed dimension through two layers of neurons, so that the subsequent characterization operation is facilitated;

the input encoding block includes an input hybrid representation block, an edge enhancement feature layer, a separable convolution and downsampling layer: firstly, inputting the output of an input mapping block as a query set, keys and values to a mixed characterization block for characterization learning, cascading an edge enhancement feature map processed by a Canny operator with a characterization result, and finally inputting the result into a downsampling layer to realize the coding of an input module through a separable convolution;

the intermediate layer hybrid characterization block includes window self-attention and depth convolution; window self-attention can reduce dependence on the outside of the window, capturing the correlation inside the feature space window while efficiently computing; the depth convolution is to carry out independent convolution operation on each channel to mine channel information on the premise of not changing the number of convolution channels;

the middle layer mixed representation block is designed, through the bidirectional interaction of the channel layer and the space layer, the problem that a window self-attention mechanism has limited receptive field is solved, meanwhile, the defect caused by deep convolution weight sharing is also eliminated, texture information interaction and information gain inside and among windows are realized, and the global modeling capability of CT images is effectively enhanced;

the output decoding block comprises an up-sampling layer, an edge enhancement feature layer, a separable convolution and an output mixed characterization block, wherein the up-sampling operation is firstly carried out, then edge feature images processed by a Canny operator are spliced in the same way, and feature knowledge of more textures is learned by the aid of the separable convolution and the mixed characterization block;

the output mapping block comprises a layer of MLP, maps the output to 1 XH x W, and restores the original image size;

step three, data enhancement:

to further increase the number of training samples, a greater number of images are constructed by performing a horizontal-to-vertical flip or random cropping on the collected data set;

step four, model optimization:

model optimization is carried out by adopting two loss functions so as to further improve the performance of the model;

first, the error is converged to a minimum value using the mean square error MSE, the loss function L ₁ The expression is:

wherein R (x) _i ) Representing a low dose noise image x _i Pure noise image mapped through residual error learning; y is _i Representing a low dose noise image x _i Corresponding normal dose CT images;

where alpha is the use of classical feature extraction network Resnet50 as a feature extractor,is the weight of Resnet50 frozen on the ImageNet dataset after the convergence layer is deleted, R (x _i ) Representing a low dose noise image x _i Pure noise image mapped through residual error learning; y is _i Representing a low dose noise image x _i Corresponding normal dose CT images;

loss function l=λ ₁ l ₁ +λ ₂ l ₂ Wherein lambda is ₁ ，λ ₂ Is an adjustable super parameter;

optimizing the model by continuously adjusting the learning rate and the adjustable super parameters to obtain optimal parameters and outputting an optimal model;

and fifthly, randomly selecting a low-dose CT image, putting the low-dose CT image into the optimized denoising model, and outputting a final result to obtain the denoised low-dose CT image.

2. The low-dose CT image denoising method based on hybrid characterization learning as set forth in claim 1, wherein: in the second step, in the image preprocessing stage, the image after being filtered by the Canny operator is directly spliced to the input mapping block, the coding block and the decoding block, and the image denoising is realized by taking the coding block and the decoding block as auxiliary features.

3. The low-dose CT image denoising method based on hybrid characterization learning as set forth in claim 1, wherein: the method comprises the steps of carrying out a first treatment on the surface of the

The input coding block firstly carries out an input mixed representation block on an input image, then carries out splicing with the image subjected to Canny operator filtering treatment to strengthen detail texture information, and then carries out a separable convolution, which is to decompose standard convolution into depth convolution and point-by-point convolution, and respectively processes information on a space and a channel; the separable convolution can be used for reducing the calculated amount and realizing the adjustment of the channel number; finally, downsampling once to reduce the size of the feature map, thereby extracting more detailed features; wherein the downsampling convolution kernel is 3×3, the stride is 2, and the padding is 1;

the output decoding block firstly carries out up-sampling, the convolution kernel of the up-sampling is 4 multiplied by 4, the stride is 2, then the upper edge enhancement characteristic image is spliced, the output size is fixed through a separable convolution, and finally the characteristic expression capability of the CT image is further improved through an output mixed representation block.

4. The low-dose CT image denoising method based on hybrid characterization learning as set forth in claim 1, wherein: the bidirectional interaction between the channel layer and the space layer is specifically as follows:

in the channel interaction link, a global average convergence layer is used, a convolution layer with the size of 1 multiplied by 1 and the step length of 1 is passed through two convolution cores, batch normalization and simple gating block processing are carried out in the convolution layer, and finally a sigmoid activation function is used for transmitting the convolution layer into V of a self-attention module; the simple gating block divides the characteristics into channel parts, divides the channel parts into two C/2 channels and multiplies the C/2 channels; replacing part of the linear function in a nonlinear manner, thereby reducing the calculation amount; in the space interaction link, two convolution layers with the convolution kernel size of 1 multiplied by 1 and the step length of 1 are adopted, batch normalization and simple gating block processing are carried out in the convolution layers, and finally a sigmoid activation function is used for generating a space self-attention diagram, so that a larger space range is provided for a depth convolution part; the new module thus formed names it as a hybrid representation block, denoted as:

X′ _l+1 ＝MIX(LN(X _l )，WSA,Dwconv)+X _l ；

X _l+1 ＝FFN(LN(X′ _l+1 ))+X′ _l+1 ；

wherein the MIX () function represents a function that feature fuses the local window self-attention WSA with the depth convolution Dwconv; LN represents layer normalization, while FFN () is an MLP consisting of two linear layers with a simple gating block in between; x is X _l Representing a layer 1 input feature tensor; x'. _l+1 Representing the result of the window self-attention and depth convolution interaction post-processing as an intermediate state; x is X _l+1 Representing the output after the 1-layer hybrid token processing, and also can be used as the input of the next hybrid token block.