CN117315063A

CN117315063A - Low-dose CT image reconstruction method and system based on deep learning

Info

Publication number: CN117315063A
Application number: CN202311155965.3A
Authority: CN
Inventors: 杨海波; 张世宇; 赵承心; 贾彦灏
Original assignee: Guangdong Provincial Laboratory Of Advanced Energy Science And Technology; Institute of Modern Physics of CAS
Current assignee: Guangdong Provincial Laboratory Of Advanced Energy Science And Technology; Institute of Modern Physics of CAS
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-12-29

Abstract

The invention provides a low-dose CT image reconstruction method based on deep learning, which comprises the following steps: s1, extracting a first CT image data set; s2, processing the first CT image data set by adopting a convolutional neural network; the convolutional neural network comprises two convolutional network layers and three Hipro-former modules, and the three Hipro-former modules jointly construct HformerNet, hformerNet comprising three scales; in HformerNet, the number of channels of each convolution network layer is k times that of the convolution network layer of the upper layer from the first scale to the third scale; s3, after the first CT image data set is processed by the convolutional neural network, outputting a second CT image data set. According to the low-dose CT image reconstruction method, the CT image data set is processed by adopting the convolutional neural network comprising the two convolutional network layers and the three Hipro former modules, so that the resolution of a low-dose CT image can be rapidly and accurately improved, and more accurate clinical data can be provided for clinical diagnosis while the risk of patients suffering from dose radiation is reduced.

Description

Low-dose CT image reconstruction method and system based on deep learning

Technical Field

The invention relates to the technical field of medical CT image processing, in particular to a low-dose CT image reconstruction method and system based on deep learning.

Background

Since the discovery of X-rays in nineties of the nineteenth century, X-rays have been popular and widely used in industrial inspection and medical diagnosis. In the process of advancing in the computer field, medical diagnosis is also gradually entering a brand new information age. Medical diagnosis using computed tomography (Computed Tomography, CT) is an important technology in the information age.

In practical application, the CT image is inevitably affected by physical factors such as dosage, environment, angle, etc. in the imaging process, resulting in phenomena of low image quality and unbalanced noise distribution, and the clinical diagnosis of doctors can be seriously affected by such low-quality imaging results.

Although the reconstruction of CT images at normal doses can clinically obtain images with high resolution, secondary injury in the medical process is easily caused to a patient due to injuries caused by larger radiation, and the risk of illness of the patient is increased. Therefore, it becomes particularly critical to reduce the radiation dose while ensuring that the image accuracy reaches clinical diagnosis, and low dose CT image applications are becoming of increasing interest to the academia and industry.

In 1990, naidich et al proposed using Low Dose CT (LDCT) as a method for lung cancer screening. Low Dose CT (LDCT) is a radiation dose that is reduced by changing the scan parameters of the CT, i.e. mainly by reducing the intensity of the tube current, while ensuring that the other parameters are unchanged. However, when the tube current is weakened, the number of photons transmitted by the detector is reduced, so that the image contains obvious noise, the reconstructed image has poor precision, and the diagnosis is seriously influenced.

Algorithms for CT image reconstruction mainly include a projection domain filtering method, an iterative method and an image domain method, and the image domain method is divided into conventional methods, such as Block-Matchingand 3D filtering (BM 3D), and image domain deep learning techniques. The traditional method mostly learns noise distribution on a low-dose CT image through mathematical modeling, and has the defects of high calculation power expenditure, long training time and the like although the experimental result of the method is high in accuracy, and the noise distribution has randomness, so that the traditional algorithm has the problem of poor generalization capability. With continued advancement in deep learning technology, deep learning models exhibit dramatic advantages in learning data distribution. However, the existing image domain deep learning technology still has the problems of excessive smoothing of detail textures in the reconstructed image and blurring of the processed CT image.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a low-dose CT image reconstruction method based on deep learning, which adopts a convolutional neural network comprising two convolutional network layers and three Hipro modules to process CT image data, can rapidly and accurately improve the resolution of a low-dose CT image, and provides more accurate clinical data for clinical diagnosis while reducing the risk of patients suffering from dose radiation. The invention also provides a low-dose CT image reconstruction system based on the deep learning.

In order to achieve the above object, the present invention provides the following technical solutions:

the low-dose CT image reconstruction method based on the deep learning comprises the following steps of:

s1, extracting a first CT image data set;

s2, processing the first CT image data set by adopting a convolutional neural network;

the convolutional neural network comprises two convolutional network layers and three hipro-former modules,

the three Hipro modules together construct a HformerNet, which comprises three dimensions; in the HformerNet, the number of channels of each convolution network layer is k times as large as the number of channels of the convolution network layer of the upper layer from the first scale to the third scale;

s3, after the first CT image data set is processed by the convolutional neural network, a second CT image data set is output.

As a preferred embodiment, in the convolutional neural network of S2, a downsampling module based on a convolutional block with a step size of nxn and an upsampling module based on a deconvolution block with a step size of nxn are set for each scale.

As a preferred embodiment, the hipro module may share multi-scale data information between the convolutional network layers in a jump connection manner.

As a preferred embodiment, the two convolutional network layers comprise a first convolutional network layer module and a second convolutional network layer module, and the three hipro-mer modules comprise a first hipro-mer module, a second hipro-mer module and a third hipro-mer module.

As a preferred embodiment, the processing of the first CT image dataset with a convolutional neural network at S2 specifically comprises the steps of:

s21, inputting a first CT image data set into a convolutional neural network;

s22, a first convolution network layer module processes a first CT image data set;

s23, respectively processing by a downsampling module before entering the first Hipro-former module and the second Hipro-former module for processing;

s24, before the data enter a third Hipro former module and a second convolution network layer module for processing, the data are respectively processed by an up-sampling module;

s25, after being processed by the three Hipro former modules and the second convolution network layer module, the first CT image data set is output from the convolution neural network.

As a preferred embodiment, the downsampling module process and the upsampling module process are both image feature extraction processes.

As a preferred embodiment, the hipro former module comprises a depth separable convolution module and a lightweight self-attention module that calculates attention in the channel dimension as follows:

101. given an input value X of the size R ^H×W×C The method comprises the steps of carrying out a first treatment on the surface of the Generating corresponding characteristic values Query, key and Value by an original attention mechanism;

102. dot product operation is carried out on the Query and the Key to generate a product with the size of R ^N×N Is a weight matrix of (a):

wherein W is ^Q 、W ^K And W is ^V All are fully connected operations;

103. respectively carrying out downsampling treatment on the Query and the Key by adopting a maximum pooling method with the step length of K to obtain two relatively smaller characteristic values K 'and Q'; wherein,

as a preferred embodiment, the lightweight self-attention module calculates attention in the channel dimension, further comprising the steps of:

104. transpose Q 'to obtain Q' ^T ；

105. Using Q' ^T And K' performing dot product operation on the channel dimension, and obtaining a concentration score matrix Attn with dimension of C×C by using softmax regression on the obtained calculation result _channels ；

106. Attention score matrix Attn _channels Matrix multiplication with V results in a final attention map.

As a preferred embodiment, the lightweight self-attention module calculates the attention in the channel dimension as a function of the following:

as a preferred embodiment, the depth separable convolution module extracts and reconstructs shallow features, which specifically includes the following steps:

201. the shallow characteristic layer is normalized by combining with the standard layer after being processed by the depth separable convolution module;

202. local characterization enhancement and channel dimension transformation using two projection convolutions;

203. after the first projection convolution, connecting a Gaussian error linear unit to carry out nonlinear feature mapping;

204. the information is stably propagated forwards and backwards by using a residual connection mode.

The invention also provides a low-dose CT image reconstruction system based on deep learning, which adopts a convolutional neural network to process the first CT image data set;

the three Hipro modules together construct a HformerNet, which comprises three dimensions;

in the HformerNet, the number of channels of each convolution network layer is k times as large as the number of channels of the convolution network layer of the upper layer from the first scale to the third scale;

the first CT image data set is processed by a convolutional neural network and then a second CT image data set is output;

the hipro former module includes a depth separable convolution module and a lightweight self-attention module that calculates attention in a channel dimension; the depth separable convolution module extracts and reconstructs shallow features.

Based on the technical scheme, compared with the CT image reconstruction method in the prior art, the method has the following technical effects:

(1) The low-dose CT image reconstruction method based on deep learning provided by the invention adopts the improved convolutional neural network to process CT image data, and the convolutional neural network comprises two convolutional network layers and three convolutional neural networks of Hipro module, so that the method is not only suitable for unsupervised learning of noise distribution, but also suitable for recovering images and completing noise reduction tasks; and the Hipro former module comprises a depth separable convolution module and a light self-attention module, wherein the light self-attention module calculates attention in the channel dimension, and the depth separable convolution module extracts and reconstructs shallow features so as to cooperatively process CT image data, so that the resolution of a low-dose CT image can be rapidly and accurately improved, and more accurate clinical data can be provided for clinical diagnosis while the dose radiation risk of a patient is reduced.

(2) According to the low-dose CT image reconstruction system based on deep learning, the convolution neural network containing the Hipro module is adopted to process CT image data, in the reconstruction process, the details and the integral structure of the CT image can be well recovered, and noise and artifact can be effectively reduced.

Drawings

Fig. 1 is a flow chart of a low dose CT image reconstruction method of embodiment 1.

Fig. 2 is a schematic diagram of a low dose CT image reconstruction method according to example 1.

Fig. 3 is a schematic flow chart of processing the first CT image dataset by using the convolutional neural network in embodiment 1.

Fig. 4 is a schematic diagram of the hipro module of example 1.

Fig. 5 is a block diagram of the hipro module of example 1.

Fig. 6 is a schematic diagram of a model of a depth separable convolution module of example 1.

Fig. 7 is a schematic flow chart of the lightweight self-attention module of embodiment 1 for calculating attention in channel dimension.

Fig. 8 is a schematic flow chart of the depth separable convolution module of embodiment 1 for extracting and reconstructing shallow features.

Fig. 9 is a CT image of the abdominal cavity of the L506 patient of example 2.

Fig. 10 is a ROI chart of fig. 9 in example 2.

Fig. 11 is a CT image of the abdominal cavity of another L506 patient of example 2.

Fig. 12 is a ROI chart of fig. 11 in example 2.

Fig. 13 is a block diagram showing the structure of a convolutional neural network of embodiment 3.

Detailed Description

In order that the invention may be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings. The drawings illustrate preferred embodiments of the invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Example 1

As shown in fig. 1-8, a low-dose CT image reconstruction method based on deep learning is used to improve the image resolution of clinical low-dose CT imaging and reduce noise.

The low-dose CT image reconstruction method comprises the following steps:

s1, extracting a first CT image data set;

s3, after the first CT image data set is processed by the convolutional neural network, outputting a second CT image data set.

The convolutional neural network is a modified neural network, and the convolutional neural network 100 includes two convolutional network layers and three hipro mer modules, and the three hipro mer modules jointly construct HformerNet, hformerNet a self-encoder structure using residual learning.

Furthermore, the HformerNet further comprises three scales, and in the HformerNet, the number of channels of each convolution network layer is k times as large as that of the convolution network layer of the previous layer from the first scale to the third scale; furthermore, each scale is provided with a downsampling module based on a convolution block of step size n×n and an upsampling module based on a deconvolution block of step size n×n.

The modules may also share multi-scale information using a hopping connection between the modules. I.e. between convolutional network layers, the hipro modules can share multi-scale data information in a jump connection manner. The convolutional neural network is not only suitable for unsupervised learning of noise distribution, but also suitable for the tasks of recovering and reducing noise of CT images.

In convolutional neural network 100, a plurality of nodes are interconnected to form a convolutional network layer, the nodes, i.e., neurons, which are separated into different layers, each neuron being connected to other neurons of an adjacent layer. Each layer of neurons has an input and an output, and the input of each layer of neurons is the output of the previous layer of neurons. In this embodiment, two convolutional network layers are employed.

In some embodiments, the low-dose CT image reconstruction method verifies the efficiency of the model by using the low-dose CT large-challenge data set of the AAPM-Mayo clinic, and proves that index results of PSNR 33.4405, RMSE 8.6956 and SSIM 0.9163 are obtained under the condition that a large number of learning parameters are not needed, and SOTA effect is achieved.

Specifically, the two convolutional network layers 10 include a first convolutional network layer module 101 and a second convolutional network layer module 102, and the three hipro mer modules 20 include a first hipro mer module 201, a second hipro mer module 202, and a third hipro mer module 203. In this embodiment, the hipro former module is also simply referred to as Hformer.

The processing of the first CT image dataset with the convolutional neural network at S2 specifically includes the steps of:

s21, inputting a first CT image data set into a convolutional neural network;

The downsampling module processing and the upsampling module processing are both image feature extraction processing.

As described above, the modules may also share multi-scale information using a hopped connection between modules. For example, the first hipro mer module 201 and the third hipro mer module 203 may share multi-scale data information between the first convolutional network layer module 101 and the second convolutional network layer module 102 in a jump connection manner.

Based on the characteristics of deep learning, a large number of samples are required for training the convolutional neural network. However, this requirement often makes it difficult to obtain adequate samples in practice, especially in clinical imaging. In the embodiments provided by the present invention, overlapping slices in a CT image are used. This strategy has proven effective in previous studies, with more slices detecting perceived differences in local areas and greatly increasing the number of samples. In the embodiment provided by the invention, a patch of fixed size is extracted from the LDCT and corresponding NDCT images.

Thus, in this embodiment, the first CT image dataset employs overlapping slices in the CT image, and a Patch of fixed size (Patch Extraciton) is extracted from the LDCT and corresponding NDCT images.

It should be noted that the hipro former module 20 of the present embodiment includes a depth separable convolution module 21 and a lightweight self-care module 22, where the lightweight self-care module 22 is used to calculate attention in a channel dimension, and the depth separable convolution module 21 is used to extract and reconstruct shallow features.

The conventional self-attention module has huge calculation overhead, and is a huge burden on the device calculation power. To solve this problem and still obtain valid global context information, the hipro former module 20 in the present application reduces the feature map dimension and tries to calculate the attention in the channel dimension.

Specifically, lightweight self-attention module 22 calculates attention in the channel dimension as follows:

wherein W is ^Q 、W ^K And W is ^V All are fully connected operations.

in 103, the calculation of the conventional self-attention module is performed along the spatial dimension between Q, K, and the calculation result is:

wherein,is based on the networkScaling coefficients of the depth calculation of (a); in the calculation process, a large amount of calculation resources (video memory) are consumed due to the large characteristic size of the input data, and the training and deployment of the neural network are difficult. In order to solve the problem, as shown in fig. 4, the invention adopts a maximum pooling method with a step length of K to respectively perform downsampling treatment on Query and Key to obtain two relatively smaller characteristic values of K 'and Q'; the invention then proposes to calculate the attention in the channel dimension, i.e. reduce the algorithm complexity to a linear relationship with respect to the image resolution, to further reduce the model overhead.

The original self-attention (attention) calculation process is to multiply 3 groups of H×W×C matrix, and the total amount is (H×W) ² ×C ³ The dimension of Q and K is reduced by one or two times of maximum pooling method (maxpooling), so that the original H×W matrix can be reduced toSince C is unchanged, the final matrix size is unchanged, and the overhead of the calculation model is reduced only in self-attention (attention), and the final calculation result scale is not affected.

Further, the lightweight self-attention module 22 calculates attention in the channel dimension, further comprising the steps of:

104. transpose Q 'to obtain Q' ^T ；

105. Using Q ^′T And K' performing dot product operation on the channel dimension, and obtaining a concentration score matrix Attn with dimension of C×C by using softmax regression on the obtained calculation result _channels ；

In some embodiments, in the attention score matrix Attn _channels After matrix multiplication with V, linear processing may also be performed.

Based on the above operation, the present invention successfully reduces the calculation amount of self-attention to C ² (HW) making it a linear relation with respect to image resolution.

The lightweight self-attention module computes the attention operation in the channel dimension, and its functional relationship is expressed as follows:

it should be further noted that, considering that the shallow layer information of the CT image has more detail information, such as contour, edge, color, texture and shape features, the use of the convolutional neural network can extract features by sharing the convolutional kernel, thereby ensuring a lower network parameter number and improving model efficiency. In addition, the convolutional neural network has two inherent inductive biases, namely translational invariance and local correlation, which enables the convolutional neural network to capture more local information.

On the basis, the shallow feature extraction (reconstruction) module designed by the invention mainly comprises a depth separable convolution module. The depth separable convolution module extracts and reconstructs shallow features, and the specific steps are as follows:

201. the shallow characteristic layer is processed by a depth separable convolution module and then normalized by combining a standard layer (LayerNorm);

203. the first projection convolution is followed by Gaussian Error Linear Units (GELU) to carry out nonlinear feature mapping;

In some embodiments, linear processing may also be performed during local characterization enhancement and channel dimension transformation using two projection convolutions, as well as during residual connection.

The extraction and reconstruction process has the following functional relationship:

x _i+1 ＝x _i +Linear(GELU(Linear(LN(DWConv _7×7 (x _i )))))。

the low-dose CT image reconstruction method based on deep learning provided by the embodiment adopts an improved convolutional neural network to process CT image data, and the convolutional neural network comprises two convolutional network layers and three convolutional neural networks of Hipro module, so that the method is suitable for unsupervised learning of noise distribution, and is also suitable for recovering images and completing noise reduction tasks; and the Hipro former module comprises a depth separable convolution module and a light self-attention module, wherein the light self-attention module calculates attention in the channel dimension, and the depth separable convolution module extracts and reconstructs shallow features so as to cooperatively process CT image data, so that the resolution of a low-dose CT image can be rapidly and accurately improved, and more accurate clinical data can be provided for clinical diagnosis while the dose radiation risk of a patient is reduced.

Example 2

In the low-dose CT image reconstruction method based on deep learning of the above embodiment 1, the embodiment uses the clinical data set published by the LDCT major challenge race of the 2016 NIH-AAPMMayo clinic for model training and testing. The clinical dataset included 2378 low dose (quarter dose) CT images and 2378 normal dose (full dose) CT images from 10 anonymous patients.

In the test results, SCUNet, uformer is the mainstream image noise reduction algorithm based on deep learning. Red-CNN is a representative work in CT noise reduction algorithm, and CTformer and DU-GAN are the most advanced noise reduction algorithms based on LDCT data set at present, and have excellent effect in image noise reduction task.

In order to evaluate the deep learning-based low dose CT image reconstruction method proposed in embodiment 1, the denoising ability of the method was compared with the above.

This example gives two representative results from the test set of L506 patient data and their corresponding ROI images.

Fig. 9 and 11 are results of an abdominal CT image, respectively, and the noise shown in fig. 10 and 12 is mainly distributed in the abdominal region, and the details of the outline and the tissue structure of each organ are greatly affected by the noise. In fig. 9, the regions of the spine, liver, etc. can see significant streak artifacts, greatly affecting the clinical diagnosis of the diseased region.

RED-CNN based on convolution network can effectively eliminate most of noise and artifact, and better preserve details. However, RED-CNN has poor effect on structural restoration of images, because convolution has a characteristic of calculation, high-frequency information such as picture texture details can be extracted more effectively, and more global information cannot be extracted effectively due to the size of receptive fields.

Uformer and CTformer have excessive smoothing of detail texture due to the lack of convolution layers, resulting in blurring of CT images.

The low dose CT image reconstruction method using the hipro module proposed in this embodiment also exceeds the scenet for noise reduction and retaining the detail structure. In fig. 11, the noise suppression of the lesion area processed by the reconstruction method of the present embodiment is significantly better than that of the lesion area processed by the image noise reduction algorithm of the scenet.

The Hipro former module forming the HformerNet comprises a depth separable convolution module and a lightweight self-care module, which exhibit stronger generalization ability based on a multi-scale depth separable convolution module and a lightweight self-care module, and are higher in the effect of reconstructing the LDCT than the SCUNet based on a parallel structure combined with convolution and self-care.

Fig. 12 is an enlarged image of the ROI marked with rectangular dotted lines provided in fig. 11. To further demonstrate the performance of the hipro module, the arrowed regions shown in fig. 12 should be a block of tissue with a uniform density distribution, but other methods have little internal detail to properly reconstruct the focal region other than hipro and CTformer. SCUNet, uformer, RED-CNN and CTformer both introduce more noise into the image and it is difficult to distinguish the density distribution of the tissue.

The details and the whole structure of the DU-GAN and the Hfomer provided by the embodiment can be well recovered, and the former is better than the DU-GAN in the aspect of artifact inhibition.

Example 3

The embodiment provides a low-dose CT image reconstruction system based on deep learning, which adopts a convolutional neural network to process a first CT image data set;

the three hipro modules together construct HformerNet, hformerNet includes three dimensions;

in HformerNet, the number of channels of each convolution network layer is k times that of the convolution network layer of the upper layer from the first scale to the third scale;

the Hipro module includes a depth separable convolution module and a lightweight self-attention module that calculates attention in a channel dimension; the depth separable convolution module extracts and reconstructs shallow features.

The convolution neural network with the Hipro module is adopted to process CT image data, so that the details and the whole structure of the CT image in the convolution neural network can be well recovered in the reconstruction process, and noise and artifact are effectively reduced.

The foregoing is merely illustrative and explanatory of the invention as it is described in more detail and is not thereby to be construed as limiting the scope of the invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, and that these obvious alternatives fall within the scope of the invention.

Claims

1. The low-dose CT image reconstruction method based on the deep learning is characterized by comprising the following steps of:

s1, extracting a first CT image data set;

2. The method of claim 1, wherein in the convolutional neural network of S2, a downsampling module based on a step size nxn convolutional block and an upsampling module based on a step size nxn deconvolution block are set for each scale.

3. The method of claim 2, wherein the hipro modules share multi-scale data information in a jump connection manner between the convolutional network layers.

4. The low dose CT image reconstruction method of claim 1, wherein the two convolutional network layers comprise a first convolutional network layer module and a second convolutional network layer module, and the three hipro mer modules comprise a first hipro mer module, a second hipro mer module, and a third hipro mer module.

5. The method of low dose CT image reconstruction as recited in claim 4, wherein said processing of said first CT image dataset with a convolutional neural network at S2 comprises the steps of:

s21, inputting a first CT image data set into a convolutional neural network;

6. The method of claim 5, wherein the downsampling module processing and the upsampling module processing are image feature extraction processing.

7. The low dose CT image reconstruction method as recited in claim 1, wherein the hipro former module comprises a depth separable convolution module and a lightweight self-attention module that calculates attention in a channel dimension by:

wherein W is ^Q 、W ^K And W is ^V All are fully connected operations;

8. the method of low dose CT image reconstruction of claim 7, wherein the lightweight self-attention module calculates attention in a channel dimension, further comprising the steps of:

104. transpose Q 'to obtain Q' ^T ；

9. The method of claim 7, wherein the lightweight self-care module calculates attention in a channel dimension as a function of:

10. the method of claim 7, wherein the depth separable convolution module extracts and reconstructs shallow features, comprising the steps of:

11. The low-dose CT image reconstruction system based on the deep learning is characterized in that the low-dose CT image reconstruction system adopts a convolutional neural network to process the first CT image data set;