CN116012478B

CN116012478B - CT metal artifact removal method based on convergence type diffusion model

Info

Publication number: CN116012478B
Application number: CN202211714237.7A
Authority: CN
Inventors: 骆功宁; 马兴华; 王宽全; 王玮
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-08-18
Anticipated expiration: 2042-12-27
Also published as: CN116012478A

Abstract

A CT metal artifact removal method based on a convergence type diffusion model relates to a CT metal artifact removal method. In order to solve the problem that the MAR method in the existing metal artifact removal cannot effectively cope with zero training sample MAR scenes, the method uses a convergence type diffusion model to remove metal artifacts, and firstly, a CT image accompanied with metal artifact noise is overlapped with Gaussian noise reaching a moment T, namely, a forward diffusion process corresponding to the image to be denoised; then taking Gaussian noise accompanied with image information bias as input, estimating m and ζ of the current time t through a neural network, and denoising reasoning from the time t to the time t-1 to obtain biased Gaussian distribution y' _t‑1 The method comprises the steps of carrying out a first treatment on the surface of the And repeating the steps from the time T to the time 0 to obtain CT images after completing metal artifact denoising. The method is used for removing the metal artifact affected by CT.

Description

CT metal artifact removal method based on convergence type diffusion model

Technical Field

The invention belongs to the technical field of medical image analysis, and particularly relates to a CT metal artifact removal method.

Background

In clinical diagnostics based on computed tomography (Computed Tomography, CT), potential metal implants in a patient (e.g. dental fillings, artificial hip joints, spinal implants, etc.) may cause loss of attenuation coefficients associated with human tissue during projection of X-rays and cause strong streak-like or shadow-like artifacts in the reconstructed CT image. The metal artifacts that reduce the imaging quality not only severely interfere with the doctor's diagnosis of potential lesions in CT images, but also have a detrimental effect on dose calculation in radiation therapy. With the widespread popularity and application of metal implants, metal artifact removal (Metal Artifact Reduction, MAR) has become a medical image analysis task with high clinical value based on CT systems.

Although the existing MAR method based on deep learning can cope with the MAR scene with visible samples in the model training process, the simple fitting learning of the preference training samples makes the method unable to effectively process zero training sample MAR scene in the image of unknown CT scanning position caused by unknown metal implant. During the X-ray transmission process, the CT system can generate different attenuation distribution for imaging of different scanning parts; different numbers, shapes and sizes of metal implants can produce mutually interfering artifacts of varying amplitudes, whereas the attenuation profile of invisible interference artifacts not involved in knowledge learning is difficult to transform into a uniform denoising profile by fitting. Therefore, the optimization conforming to logic of the sample generalization architecture is difficult to adapt the deep learning-based method to zero training sample MAR scenes, and thus cannot be popularized in a large scale in clinical application.

Disclosure of Invention

The invention aims to solve the problem that the MAR method in the existing metal artifact removal cannot effectively zero a training sample MAR scene, and provides a CT metal artifact removal method based on a convergence type diffusion model.

A CT metal artifact removal method based on a convergence type diffusion model takes a CT image accompanied with metal artifact noise as an input of the convergence type diffusion model, and uses the convergence type diffusion model to remove the metal artifact;

the convergent diffusion model is trained in advance, and the construction and training process of the convergent diffusion model comprises the following steps:

step S1, acquiring an original real CT image, and screening the CT image to obtain a screened CT image set; obtaining a mask of the metal implant, and screening the mask to obtain a screened mask set;

s2, selecting a CT image and a metal implant mask from the CT image and the mask set obtained by screening respectively, and synthesizing the CT image and the metal implant mask into a CT image with metal artifact noise;

step S3, repeatedly executing the step S2 to obtain a CT image accompanied by metal artifact noise, taking the CT image accompanied by metal artifact noise as a model input, taking an original CT corresponding to the synthesized CT image accompanied by metal artifact noise as a GroundTruth, and constructing a data set for training a convergence type diffusion model;

s4, constructing a convergence type diffusion model, wherein the convergence type diffusion model comprises two forward diffusion processes and a convergence type reverse diffusion process which respectively correspond to the image to be denoised and the denoising target; training a convergence type diffusion model by using the data set obtained in the step S3, and stopping training until the maximum iteration times are reached, so as to obtain a trained convergence type diffusion model;

in the training process, the image to be denoised corresponds to the synthesized CT image with metal artifact noise, and the denoising target corresponds to the CT image without metal artifact;

two forward diffusion processes: image x to be denoised ₀ Denoising target y ₀ Fixed on two independent Markov chains according to gradually-increased planning parametersWill x ₀ And y ₀ Corrupted to Gaussian noise x with image information bias _T And y _T Wherein at any time t there is Gaussian noise x biased by image information _t And y _t The definition is as follows:

wherein the weight alpha _t Is 1 beta _t 、The value of (2) is +.> Representing a gaussian distribution; i represents standard Gaussian noise;

the convergence type back diffusion process samples Gaussian noise with image information bias time by time based on theoretical correlation between CT images with or without artifact noise, and at any time t, the Gaussian noise is formed by x _t Reasoning to y _t-1 Is defined as:

wherein m is metal artifact noise of the CT image;mean value->Representing the variance;

according to Bayesian law, wherein q (x _t |y _t-1 M) is defined as:

joint gaussian distribution q (y _t-1 |x _t ,y ₀ M) mean value to be solved in reasoning process

Repeating the above convergent inversion for T times on the image to be denoisedThe diffusion process gets closer to y ₀ Sample distribution; in this process, the Gaussian distribution inferred from the t+1-step convergent back-diffusion process is defined as y' _t The method comprises the steps of carrying out a first treatment on the surface of the For the t-step back diffusion process, estimating the artifact noise m and the Gaussian distance xi by using a metal artifact denoising network arranged in the convergence back diffusion process, and then calculating y 'according to the estimated m and xi' _t Reasoning to obtain biased Gaussian distribution y' _t-1 Then based on y' _t-1 Performing a convergent back diffusion process in the t-1 step, and so on until deducing y' ₀ And (5) completing the CT image with the artifact removed.

Further, the process of selecting a CT image and a metal implant mask from the CT image and the mask set obtained by screening, respectively, and synthesizing the two into a CT image accompanied with metal artifact noise includes the following steps:

s21, converting the selected CT image into an attenuation coefficient distribution map, and dividing the attenuation coefficient distribution map into two distribution maps based on a preset threshold value, wherein the two distribution maps correspond to tissues and bones respectively;

s22, converting the two distribution diagrams into sine domain images through an FP algorithm, and adding beam hardening and Poisson noise influence;

step S23, similarly, converting the metal implant mask into a sine domain image by using an FP algorithm;

step S24, superposing three channels of a sinusoidal domain image corresponding to tissues and bones and a sinusoidal domain image corresponding to a metal implant mask as one sinusoidal domain image to obtain a sinusoidal domain image with metal artifact noise;

step S25, converting the sinusoidal domain image with the metal artifact noise into a CT image with the metal artifact noise through an FBP algorithm.

Further, in the step S3, in the process of repeatedly executing the step S2, the masks in the metal implant mask set are divided into five size classes according to the ratio of True values, and one mask is randomly selected from each size class; true values represent metal implant regions;

data synthesis is carried out on a CT image and five metals selected randomly from different size grades respectively; a total of five CT images with metal artifact noise are synthesized, and GroundTruth corresponding to the five CT images is the CT image without metal artifact;

and selecting n CT images to synthesize 5 CT images with metal artifact noise, and constructing a data set for training a convergence type diffusion model by using the synthesized 5 CT images with GroundTruth corresponding to the synthesized 5 CT images.

Further, the process of estimating the artifact noise m and the gaussian distance ζ by using the neural network includes the following steps:

step S421, gaussian noise y with image information bias _t ^′ As input, firstly projecting the image information as image information features through an encoder alternately connected by a convolution layer and a pooling layer;

step S422, the projection of the multi-layer perceptron formed by sequentially connecting two full-connection layers at the time t is embedded at the time;

step S423, dividing the image information features obtained in the step S421 into image features patch with the same size according to the dimension, wherein the number of the channels of the patch is consistent with the number of the channels of the image information features;

step S424, inputting the image feature patch obtained in step S423 to a feature-embedding conversion module, flattening the numerical value of each patch by the feature-embedding conversion module, and mapping the numerical value into image information to be embedded through a full connection layer;

step S425, the image information embedding and the time embedding obtained in the step S422 are input to an embedding association module formed by a transducer encoder together, and the image information embedding with the embedding association is obtained;

step S426, embedding the image information, performing full-connection layer mapping opposite to the step S424, and restoring the flattened information of the mapping into an image feature patch;

step S427, repeating steps S424, S425 and S426 for 4 times to obtain a highly correlated image feature patch incorporating time information;

step S428, image featuresThe patch is spliced into image information features according to the positions in dividing, and the image information features and x are obtained through a decoder respectively _t M and ζ with the same image size; and m and xi correspond to one decoder respectively, and the two decoders have the same structure and are formed by alternately connecting a convolution layer and a deconvolution layer.

Further, the encoder formed by alternately connecting the convolution layers and the pooling layers is formed by alternately connecting five convolution layers with convolution cores of 3×3 and five 2×2 largest pooling layers.

Preferably, the patch is sized to be 4×4.

Preferably, the number of heads of the transducer encoder is set to 8, and the dimension of a single head is set to 64.

Further, the decoder is as follows:

first four deconvolution layers with a convolution kernel of 3 x 3 and four convolution layers with a convolution kernel of 3 x 3 are alternately connected, and then one convolution layer with a convolution kernel of 1 x 1 with a channel number of 1 is connected.

Further, the process of training the convergent diffusion model using the data set obtained in step S3 includes the steps of:

step S431, sampling Gaussian noise E corresponding to the image to be denoised and the denoising target in Gaussian distribution _x Sum epsilon _y Then in [1, T]Randomly selecting an integer in the interval of the training as the time t of the training;

step S432, use E _x Sum epsilon _y Will x ₀ And y ₀ Corrupted to Gaussian noise x with image information bias _t And y _t ；

Step S433, X ₀ Inputting the calculated mean square error into a neural network to estimate m and xi, calculating the mean square error with real artifact noise and Gaussian distance, and summing the mean square error corresponding to m and xi as a loss function;

step S434, performing back propagation, and optimizing network parameters in the neural network;

step 435, repeatedly executing steps 431 to 434 until all the batch data in the data set are used for network training, then dividing the data set into batches again for the training, and iterating repeatedly until the maximum iteration number is reached.

Further, the process of metal artifact removal using the convergent diffusion model comprises the steps of:

step S51, overlapping Gaussian noise which reaches a time T on a CT image accompanied with metal artifact noise, namely, a forward diffusion process corresponding to the image to be denoised;

s52, estimating m and ζ of the current time t through a neural network by taking Gaussian noise accompanied with image information bias as input, and denoising reasoning from the time t to the time t-1 to obtain biased Gaussian distribution y' _t-1 ；

And step S53, repeatedly executing the step S52 for T times from the beginning of the T time to the end of the 0 time, and obtaining the CT image with the metal artifact denoising.

The beneficial effects of the invention are as follows:

the method acquires CT images through an electronic computer tomography technology, synthesizes the CT images with metal artifact noise by using the CT images and a metal implant mask, and trains network parameters of a metal artifact denoising network in a convergent diffusion model for removing metal artifacts by using a data set formed by the images, so that the convergent diffusion model can accurately and reliably finish a metal artifact removing task. The convergent diffusion model provided by the invention is a novel image denoising framework with high knowledge expansibility, can solve the problem that the prior MAR method cannot effectively zero training a sample MAR scene, and can be widely used in clinical application.

Drawings

FIG. 1 is a schematic diagram of a convergence diffusion model for CT metal artifact removal according to the present invention;

fig. 2 is a flow chart of a metal artifact denoising network in a convergence type diffusion model according to the present invention.

Detailed Description

Detailed description of the inventionthe present embodiment is described with reference to fig. 1 and 2.

The method for removing CT metal artifacts based on the convergence type diffusion model specifically comprises the following steps:

the CT image is obtained by an electronic computer tomography technique.

The mask for the metal implant is boolean data of the same size as the CT image, where True values represent the metal implant regions and False values represent the non-metal implant regions.

In this embodiment, the image sizes of the CT image and the mask of the metal implant are both unified to 256×256, so as to avoid interference caused by inconsistent image sizes on the CT image synthesis accompanied by metal artifact noise. The image size may be unified to other sizes than 256×256 according to actual needs.

S2, selecting a CT image and a metal implant mask from the CT image and the mask set obtained by screening respectively, and synthesizing the CT image and the metal implant mask into a CT image with metal artifact noise; the specific process comprises the following steps:

step S22, converting the two distribution diagrams into sine domain images through Forward Projection (FP) algorithm, and adding beam hardening and Poisson noise effects;

step S25, converting the sinusoidal domain image with metal artifact noise into a CT image with metal artifact noise by a Filtered Backward Projection (FBP) algorithm.

Step S3, repeatedly executing the step S2 to obtain a CT image accompanied by metal artifact noise, taking the CT image accompanied by metal artifact noise as a model input, taking an original CT corresponding to the synthesized CT image as a groundTruth, and constructing a data set for model training;

in the process of repeatedly executing the step S2, dividing the masks in the metal implant mask set into five size grades according to the proportion of True values, and randomly selecting one mask from each size grade; data synthesis is carried out on a CT image and five metals selected randomly from different size grades respectively; five CT images with metal artifact noise are combined together, and GroundTruth corresponding to the five CT images is the CT image without metal artifact.

In the present embodiment, the number n of CT images for data synthesis is set to 20,000, and the number n of CT images for data synthesis can be modified to other values according to actual needs. Wherein each CT image is data synthesized with five masks from different size classes of metal implants, a total of 100,000 CT images with metal artifact noise are synthesized for model training.

Step S4, constructing a convergence type diffusion model, as shown in FIG. 1, wherein the convergence type diffusion model comprises two forward diffusion processes and one convergence type reverse diffusion process which respectively correspond to the image to be denoised and the denoising target; training the constructed model by using the data set obtained in the step S3 until the maximum iteration times are reached; in the training process, the image to be denoised corresponds to the synthesized CT image with metal artifact noise, and the denoising target corresponds to the CT image without metal artifact;

the process of removing metal artifact from CT image with metal artifact noise by using convergent diffusion model comprises:

step S41, two forward diffusion processes: image x to be denoised ₀ Denoising target y ₀ Fixed on two independent Markov chains according to gradually-increased planning parametersWill x ₀ And y ₀ Corrupted to Gaussian noise x with image information bias _T And y _T Wherein at any time t there is Gaussian noise x biased by image information _t And y _t The definition is as follows:

in the present embodiment, the maximum diffusion time T of the convergence type diffusion model is set to 250, and parameters are plannedA linear increment value is assigned from 0.1 at time 0 to 0.9 at time T. According to the actual need, the maximum moment T can be modified into other values, the planning parameter +.>Other increment intervals within the (0, 1) interval or other non-linear increment assignment schemes may be used.

Step S42, sampling Gaussian noise with image information bias time by time based on theoretical correlation between CT images with or without artifact noise in the convergence type back diffusion process to complete metal artifact noise removal of CT images, wherein at any time t, X is used for measuring the time of the CT images _t Reasoning to y _t-1 Is defined as:

the theoretical correlation between CT images with or without artifact noise is the feasibility of superposition of attenuation profiles of the CT data modality, the linear attenuation profile μ (E) with metal implants is defined as:

μ（Ｅ）＝μ _t (E)⊙(1-M)+μ _m (E)⊙M

wherein mu _t (E) Sum mu _m (E) Attenuation profiles for tissue and metal regions, respectively, M being a metal implant mask;

sinusoidal domain image S projected by FP algorithm _ma The definition is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,for the FP algorithm, η (E) is the intensity distribution of the spectral energy at E.

Based on the above theory, the CT image with the metal implant can be considered as the superposition of the attenuation profiles of the tissue and the metal implant, and thus the theory between the CT images with or without artifact noiseThe theoretical relevance is defined as x ₀ ＝y ₀ +m, where m is the metal artifact noise of the CT image.

due to y _t-1 Obeying a gaussian distribution and being proportional to the power of e, combining the gaussian distribution q (y _t-1 |x _t ,y ₀ M) mean value to be solved in reasoning processThe definition is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,is->y ₀ Correlated with gaussian noise e.

According to the above-mentioned formula,the solution can be calculated by combining the artifact noise m and the gaussian noise e. Due to x _t And y _t In parallel superimposed gaussian noise in two independent markov chains, hence x _t And y _t The Gaussian distance xi between the superimposed noise is the difference of the Gaussian noise superimposed by the two, namely E _x -∈ _y 。

In the process of denoising the imageRepeating the above convergent back diffusion process T times to get the approach y ₀ Sample distribution; in this process, the Gaussian distribution inferred from the t+1-step convergent back-diffusion process is defined as y' _t (y 'during the first converging back diffusion process' _t Namely Gaussian noise x with image information bias obtained through a T-step forward diffusion process _T ) The method comprises the steps of carrying out a first treatment on the surface of the For the t-step back diffusion process, estimating the artifact noise m and the Gaussian distance xi by using a metal artifact denoising network arranged in the convergence back diffusion process, and then calculating y 'according to the estimated m and xi' _t Reasoning to obtain biased Gaussian distribution y' _t-1 Then based on y' _t-1 Performing a convergent back diffusion process in the t-1 step, and so on until deducing y' ₀ Namely, completing the CT image with the artifact removed; the specific process comprises the following steps:

step S421, gaussian noise y 'with image information bias' _t Inputting the metal artifact denoising network;

in a metal artifact denoising network, firstly, inputting is projected into an image information characteristic through an encoder alternately connected by a convolution layer and a pooling layer;

in this embodiment, the encoder is composed of five convolution layers with convolution kernels of 3×3 and five 2×2 max-pooling layers alternately connected, wherein the number of output channels of the convolution layers is 32, 64, 128, 256, and 512, respectively.

The encoder for converting the input into image information features may be replaced with other combinations of network elements that may be used to extract the features, and the number of convolutional layers may be adjusted according to training conditions and the decreasing trend of the mean square error loss.

in the present embodiment, the size of the divided patch is set to 4×4;

in this embodiment, the number of heads of the transducer encoder is set to 8, and the dimension of a single head is set to 64. The embedded transducer encoder for association in this embodiment may be replaced by other sequence-to-sequence network units such as a GRU unit and an LSTM unit, and the number of heads and the dimension of a single head in the transducer encoder may be adjusted according to the training situation and the trend of decreasing the mean square error loss.

step S428, splicing the image feature patch into image information features according to the dividing positions, and obtaining the image information features and x through a decoder respectively _t M and ζ with the same image size; m and xi correspond to one decoder respectively, and totally two decoders have the same structure and are formed by alternately connecting a convolution layer and a deconvolution layer;

in this embodiment, the decoder is first alternately connected with four deconvolution layers with 3×3 convolution kernels and four convolution layers with 3×3 convolution kernels, where the number of output channels of the deconvolution layers is 256, 128, 64, and 32; the number of output channels of the convolution layer is consistent with the number of input channels, and then a convolution layer output network with the number of channels of 1 and convolution kernel of 1 multiplied by 1 is connected.

The decoder for converting the image information features into output in this embodiment may be replaced by other network element combinations that can be used for signal decoding, and the number of convolution layers or deconvolution layers may be adjusted according to the training situation and the decreasing trend of the mean square error loss.

Step S429, estimating and obtaining m and xi to reason the joint distribution q (y) _t-1 |x _t ,y ₀ M), and from x ₀ And (5) starting to execute T times of estimation and reasoning, and sampling to obtain the CT image with metal artifact noise removed.

Step S43, inputting the data of one batch in the data set into a convergence type diffusion model to train parameters in a metal artifact denoising network:

Step S433, X ₀ Inputting the calculated mean square error into a metal artifact denoising network, estimating m and xi, calculating the mean square error with the real artifact noise and Gaussian distance, and summing the mean square errors corresponding to m and xi as a loss function;

step S434, performing back propagation, and optimizing network parameters in the metal artifact denoising network; in this embodiment, the optimizer for optimizing the network parameters is set as Adam optimizer. Adam optimizers for optimizing network parameters may be replaced with other optimizers.

Step S5, taking CT image accompanied with metal artifact noise as input, and removing metal artifact by using a convergent diffusion model.

The process of metal artifact removal using a convergent diffusion model comprises the steps of:

s52, estimating m and ζ of the current time t through a metal artifact denoising network by taking Gaussian noise accompanied with image information bias as input, and denoising reasoning from the time t to the time t-1 to obtain biased Gaussian distribution y' _t-1 ；

The above examples of the present invention are only for describing the calculation model and calculation flow of the present invention in detail, and are not limiting of the embodiments of the present invention. Other variations and modifications of the above description will be apparent to those of ordinary skill in the art, and it is not intended to be exhaustive of all embodiments, all of which are within the scope of the invention.

Claims

1. A CT metal artifact removal method based on a convergence type diffusion model is characterized in that CT images accompanied with metal artifact noise are used as input of the convergence type diffusion model, and the convergence type diffusion model is used for metal artifact removal;

according to Bayesian law, wherein q (x _t |y _t-1 M) is defined as:

Repeating the above convergent back diffusion process T times on the image to be denoised to get the approach y ₀ Sample distribution; in the process, the biased Gaussian is deduced from the t+1 step convergent back diffusion processThe distribution is defined as y' _t The method comprises the steps of carrying out a first treatment on the surface of the For the t-step back diffusion process, estimating the artifact noise m and the Gaussian distance xi by using a metal artifact denoising network arranged in the convergence back diffusion process, and then calculating y 'according to the estimated m and xi' _t Reasoning to obtain biased Gaussian distribution y' _t-1 Then based on y' _t-1 Performing a convergent back diffusion process in the t-1 step, and so on until deducing y' ₀ And (5) completing the CT image with the artifact removed.

2. The method of claim 1, wherein the steps of selecting a CT image and a metal implant mask from the set of CT images and masks obtained by screening, and synthesizing the two CT images into a CT image with metal artifact noise, respectively, comprise the steps of:

3. The method for removing metal artifacts of CT based on convergence diffusion model according to claim 2, wherein in the step S3, in the process of repeatedly executing step S2, the masks in the mask set of the metal implant are divided into five size classes according to the ratio of True values, and one mask is randomly selected from each size class; true values represent metal implant regions;

4. A method for removing CT metal artifacts based on a convergence type diffusion model as set forth in claim 1, 2 or 3, wherein estimating the artifact noise m and gaussian distance ζ using a neural network comprises the steps of:

step S421, y' _t As input, firstly projecting the image information as image information features through an encoder alternately connected by a convolution layer and a pooling layer;

step S428, splicing the image feature patch into image information features according to the dividing positions, and obtaining the image information features and x through a decoder respectively _t M and ζ with the same image size; and m and xi correspond to one decoder respectively, and the two decoders have the same structure and are formed by alternately connecting a convolution layer and a deconvolution layer.

5. The method of claim 4, wherein the encoder is composed of five convolution layers with 3×3 convolution kernels and five 2×2 largest pooling layers alternately connected.

6. The method of claim 4, wherein the patch is sized to be 4 x 4.

7. The method of claim 4, wherein the number of heads of the transform encoder is set to 8 and the dimension of a single head is set to 64.

8. The method of claim 4, wherein the decoder is configured to:

9. The method of claim 4, wherein the step of training the convergence type diffusion model using the data set obtained in the step S3 comprises the steps of:

10. The method of claim 9, wherein the step of performing metal artifact removal using the convergence diffusion model comprises the steps of: