CN116664710A

CN116664710A - CT image metal artifact unsupervised correction method based on transducer

Info

Publication number: CN116664710A
Application number: CN202310598218.0A
Authority: CN
Inventors: 闫镔; 朱林林; 韩玉; 李磊; 张智存; 孙艳敏; 谭思宇; 陈卓
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-08-29

Abstract

The invention provides a CT image metal artifact unsupervised correction method based on a transducer. The method comprises the following steps: constructing a metal artifact unsupervised correction network based on a CycleGAN architecture; introducing a relative contrast GAN loss function, a cyclic consistency loss function, and a projection retention loss function to construct a total loss function; constructing a training data set Z1 specifically comprises the following steps: collecting a real CT image set, distinguishing whether metal artifacts exist in each CT image in the image set based on CT values of CT image pixel points, and then randomly selecting a CT image with metal artifacts and a CT image without metal artifacts to form an actual training data pair; unsupervised correction network of metal artifacts using training dataset Z1Training to obtain a trained metal artifact unsupervised correction network; inputting the CT image of the metal artifact to be corrected into a trained generator G _AB And obtaining a corresponding metal artifact free CT image.

Description

CT image metal artifact unsupervised correction method based on transducer

Technical Field

The invention relates to the technical field of computer medical imaging, in particular to a CT image metal artifact unsupervised correction method based on a transducer.

Background

Computer Tomography (CT) systems, which can obtain information about the internal structure of an object under examination without loss, have become an important tool for medical diagnosis, evaluation and treatment planning and guidance. In actual CT scanning procedures, when metal implants such as metal dentures and hip prostheses are carried in the patient's body, inconsistent radiographic projection information may result. Inconsistent projection information can cause strong emission or streak artifacts in the reconstructed CT image. These artifacts interfere with interpretation of the image information, greatly reducing the quality of the image and severely hampering subsequent medical image analysis and treatment. Metal artifacts not only exhibit poor visual effects in CT images, but also increase the radiation dose in radiation therapy.

For metal artifact correction, many efforts have been made by related researchers over the last decades. The first type is a projection interpolation correction algorithm, and the basic principle is to reconstruct corrected projection data by using a filtered back projection reconstruction algorithm. Since the presence of a metal implant causes non-linear changes in the projection data, whereas Filtered Back Projection (FBP) is particularly sensitive to imperfections in the original projections, direct reconstruction can lead to severe metal artifacts in the reconstructed image. In order to reduce the interference of metal artifacts, relevant researchers carry out complement correction on incomplete projection information from projection. Another class of methods for correcting metal artifacts starts from reconstructed images and aims to directly estimate and remove streak artifacts in original contaminated images by image processing techniques.

In recent decades, deep learning has been developed, and has achieved a good effect in various tasks, especially in the field of image processing. Deep learning extracts various statistical information of features based on a large amount of training data, and processes different tasks based on the obtained large amount of feature information. Existing methods for correcting metal artifacts based on deep learning require training by acquiring structurally identical CT images, one containing a metal implant and the other without. The paired images cannot be obtained under the actual conditions, and most of the supervision-based methods can only adopt analog data for training. However, due to the diversity of metal artifacts and the different CT devices, it is difficult to accurately reproduce real scenes with synthetic artifacts, so that these supervised methods cannot achieve optimal effects in practical applications.

Disclosure of Invention

Aiming at the problem that a matching data set cannot be acquired, the invention provides a CT image metal artifact unsupervised correction method based on a transducer, which completes the suppression of metal artifacts on the basis of better retaining original effective information by global feature extraction and projection retention loss constraint.

The invention provides a CT image metal artifact unsupervised correction method based on a transducer, which comprises the following steps:

step 1: constructing a metal artifact unsupervised correction network based on a CycleGAN architecture; wherein the generation network comprises a generator G _AB Sum generator G _BA ，G _AB For converting a CT image with metal artifacts into a CT image without metal artifacts, G _BA For converting a metal artifact free CT image into a metal artifact free CT image; both generators adopt a generator based on a transducer, and the discriminator adopts a relative discriminator;

step 2: introducing a relative contrast GAN loss function, a cyclic consistency loss function, and a projection retention loss function to construct a total loss function;

step 3: constructing a training data set Z1 specifically comprises the following steps: collecting a real CT image set, distinguishing whether metal artifacts exist in each CT image in the image set based on CT values of CT image pixel points, and then randomly selecting a CT image with metal artifacts and a CT image without metal artifacts to form an actual training data pair;

step 4: training the metal artifact non-supervision correction network by utilizing a training data set Z1 to obtain a trained metal artifact non-supervision correction network;

step 5: inputting the CT image of the metal artifact to be corrected into a trained generator G _AB And obtaining a corresponding metal artifact free CT image.

Further, in step 3, distinguishing whether each CT image in the image set has a metal artifact or not based on the CT value of the CT image pixel point specifically includes: if the maximum CT value in the CT image is smaller than a first set threshold value, the CT image is a CT image without metal artifact; otherwise, adding the pseudo-artifact CT image into the CT image set;

for each CT image in the suspected artifact CT image set, setting an area with the CT value larger than a second set threshold value in the image as a metal area; then, CT images with the largest connected metal areas exceeding the set pixel number are selected as CT images with metal artifacts.

Further, before step 4, the method further comprises:

constructing a simulation data set Z2, which specifically comprises the following steps:

collecting a real CT image set without metal artifact, and synthesizing metal artifact aiming at any CT image without metal artifact in the image set to obtain a corresponding CT image with metal artifact, wherein one CT image with metal artifact and the CT image without metal artifact form a simulation training data pair;

training the metal artifact non-supervision correction network by using a simulation data set Z2 to obtain a trained metal artifact non-supervision correction network;

for trained generator G _AB If the performance requirement is met, then step 4 is carried out, if notThe structure of the metal artifact unsupervised correction network is adjusted and optimized.

Further, the image processing process of the generating network comprises a forward loop and a backward loop; in the forward loop, the true metal artifact image x is input to the generator G _AB To obtain a fake metal artifact free image G _AB (x) And artifact features F extracted during encoding _M Will G _AB(x) and F_M Input to another generator G _BA To obtain a fake image G with metal artifacts _BA (G _AB (x),F _M ) The method comprises the steps of carrying out a first treatment on the surface of the In the backward cycle, the true metal artifact free image y is input to the generator G _BA To obtain a fake image G with metal artifacts _BA (y,F _M ) Will G _BA (y,F _M ) Input to another generator G _AB To obtain a fake metal artifact free image G _AB (G _BA (y,F _M ))。

Further, generator G _AB Sum generator G _BA The backbone architecture of the system is the same and comprises an encoding module and a decoding module; the coding module sequentially comprises a convolution layer and a 5-layer coder; the decoding module sequentially comprises a 5-layer decoder and a convolution layer; in the coding module, the number of the transducer blocks contained in each layer of encoder is 1,2,2,2,4 in sequence from the shallow layer to the deep layer, and the number of the multiple heads corresponding to each layer of encoder is 2,2,2,2,4 in sequence;

correspondingly F _M ＝[f ₁ ,f ₂ ,f ₃ ,f ₄ ]，f _k K=1, 2,3,4, which is an artifact characteristic of the k-th layer encoder output;

correspondingly, generator G _BA The process of converting the metal artifact free CT image into the metal artifact free CT image specifically comprises the following steps: g _BA Extracting image features of the input metal artifact free CT image, the image features and f ₄ Spliced and input to G _BA The output of the first layer decoder and f ₃ After splicing, inputting to a second layer decoder, and outputting the second layer decoder to f ₂ After splicing, inputting to a third layer decoder, and outputting the third layer decoder to the output of the third layer decoder ₁ And after splicing, the CT images are sequentially processed by a fourth layer of decoder and a fifth layer of decoder, and the output of the fifth layer of decoder is spliced with the input CT images without metal artifacts and then input into a convolution layer to obtain the CT images with metal artifacts.

Further, generator G _AB Or generator G _BA The downsampling of the method adopts a pixel-unschubble operation, and the upsampling adopts a pixel-shuffle operation.

Further, the image processing process of the transducer block specifically includes:

the optimization of the channel attention mechanism is carried out on the input tensor, and the optimization specifically comprises the following steps: firstly, performing processing by using layer normalization operation, then respectively aggregating cross-channel image features of pixel level by applying convolution kernel size of 1×1 convolution, and coding space context by applying depth separable convolution with convolution kernel size of 3×3 to generate projections of query matrix Q, key matrix K and value matrix V; then readjusting the projection shapes of the query matrix Q, the key matrix K and the value matrix V, carrying out multiplication operation by utilizing the adjusted query matrix Q and the key matrix K to generate a transposed attention matrix with the size related to the channel number, and finally carrying out multiplication operation on the transposed attention matrix and the adjusted value matrix V, thereby realizing the optimization of a channel attention mechanism;

and then carrying out feature transformation on the feature map which completes optimization of the channel attention mechanism through a feedforward network, wherein the feature transformation specifically comprises the following steps: readjusting the projection shape of the feature map with optimized channel attention mechanism, and then adding the projection shape with tensors input to a transducer block pixel by pixel to obtain new features; performing layer normalization operation on the new features, and optimizing the spatial features through an upper channel and a lower channel respectively; the upper channel and the lower channel sequentially comprise convolution with a convolution kernel size of 1 multiplied by 1 and depth separable convolution with a convolution kernel size of 3 multiplied by 3; in the upper channel, the depth separable convolution is followed by a gel nonlinear activation function; the outputs of the two channels are multiplied pixel by pixel, and then are subjected to convolution processing with the convolution kernel size of 1×1 and then added with new features pixel by pixel to obtain the final output.

Further, the relative discriminator comprises 5 layers, in particular: the first layer to the fourth layer are composed of 4×4 convolution layers and a leakage ReLU activation function, and each comprise 64, 128, 256 and 512 convolution kernels; the step length from the first layer to the third layer is 2, and the step length of the fourth layer is 1; the fifth layer consists of a 4 x 4 convolutional layer and a sigmoid activation function, contains 1 convolutional kernel and has a step size of 1.

Further, the total loss function is constructed according to the formulas (1) to (5):

wherein , and />Representing the relative antagonistic GAN loss function, +.>Representing a cyclic consistency loss function,/->Representing a projection retention loss function, x and y representing CT images with and without metal artifacts, respectively,/->Representing the desirability operator, P _data The distribution of the data representing the real image, I.I ₁ Represents L ₁ Norms (F/F)>Represents a mask undisturbed by metal projection, P (·) represents orthographic projection, X _sino Representing the actual acquired projection image.

The invention has the beneficial effects that:

(1) The feature coding and decoding architecture based on the Transformer fully utilizes the global feature correlation of the image, realizes the full mining of the image feature information, and can effectively decompose the features into artifact components (i.e. metal artifacts, noise, and the like) and content components (i.e. anatomical structures). In projection, only a part of projection values affected by metal are subjected to nonlinear change, and the correction effect is further improved by introducing the originally acquired projection data into the network training process and using the information of the part which is not affected by the artifacts as projection domain constraint. According to the method, on the premise that false structure information is not introduced, metal artifacts in the image are well restrained, and edge detail information of the image is well kept.

(2) The projection retention loss is designed, and projection constraint is introduced into the network training process, so that the correction effect is further improved. The image processed by the method can recover the low attenuation detailed structure which is not recovered by other methods, and false structural information does not appear in the generated result, so that the metal artifact is well inhibited on the basis of keeping effective information.

Drawings

Fig. 1 is a schematic diagram of a framework of an unsupervised correction network for metal artifacts according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a generating network in an unsupervised correction network for metal artifacts according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a discriminator in an unsupervised correction network for metal artifacts according to the embodiments of the present invention;

fig. 4 is a schematic structural diagram of a transform block in a generation network according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment provides a CT image metal artifact unsupervised correction method based on a transducer, which comprises the following steps:

s101: constructing a metal artifact unsupervised correction network based on a CycleGAN architecture;

specifically, the overall architecture of the metal artifact unsupervised correction network is shown in fig. 1. The generation network comprises a generator G _AB Sum generator G _BA ，G _AB For converting a CT image with metal artifacts into a CT image without metal artifacts, G _BA For converting a metal artifact free CT image into a metal artifact free CT image.

The image processing process of the generating network comprises a forward loop and a backward loop; in the forward loop, the true metal artifact image x is input to the generator G _AB To obtain a fake metal artifact free image G _AB (x) And artifact features F extracted during encoding _M Will G _AB(x) and F_M Input to another generator G _BA To obtain a fake image G with metal artifacts _BA (G _AB (x),F _M ) The method comprises the steps of carrying out a first treatment on the surface of the In the backward circulation, the real state is realizedIs input to generator G _BA To obtain a fake image G with metal artifacts _BA (y,F _M ) Will G _BA (y,F _M ) Input to another generator G _AB To obtain a fake metal artifact free image G _AB (G _BA (y,F _M ))。

In the present embodiment, generator G _AB Sum generator G _BA The backbone architecture of the system is the same, and a generator based on a transducer is adopted and comprises an encoding module and a decoding module. The discriminator employs a relative discriminator.

As shown in fig. 2, the coding module sequentially includes a convolutional layer and a 5-layer encoder; the decoding module sequentially comprises a 5-layer decoder and a convolution layer; in the coding module, from shallow layer to deep layer, the number of transducer blocks contained in each layer of encoder is 1,2,2,2,4 in sequence, and the number of multi-heads corresponding to each layer of encoder is 2,2,2,2,4 in sequence. In each layer of encoder, the number of the transform blocks is gradually increased, so that the efficiency of network processing can be improved.

Specifically, starting from the input, the spatial size is reduced by encoder layering while the channel capacity is extended. The decoder takes as input the potential features of low resolution and progressively restores the high resolution representation.

Generator G _AB The image processing procedure (i.e. the generator located at the upper side in fig. 2) of converting a CT image with metal artifacts into a CT image without metal artifacts comprises: firstly, a convolution (in the embodiment, the convolution kernel size is 3 multiplied by 3, the channel numbers are 64 respectively, and the LeakyRelu is adopted as an activation function) is used for acquiring shallow features in an input CT image with metal artifacts, and then a 5-layer encoder is used for converting the shallow features into deep features; meanwhile, the artifact characteristics extracted by the encoders of all layers are obtained, F _M ＝[f ₁ ,f ₂ ,f ₃ ,f ₄ ]，f _k For the artifact features output by the k-th layer encoder, k=1, 2,3,4, which correspond to artifact feature 1, artifact feature 2, artifact feature 3, and artifact feature 4 in fig. 3, respectively. The deep features are then restored by a 5-layer decoder,and splicing the recovered high-resolution features with the input CT image with the metal artifact, and then processing the spliced CT image with the restored high-resolution features with a convolution layer to obtain the CT image without the metal artifact.

Generator G _BA The image processing procedure (i.e. the generator located at the lower side in fig. 2) for converting a metal artifact free CT image into a metal artifact free CT image specifically comprises: g _BA Extracting image features of the input metal artifact free CT image, the image features and f ₄ Spliced and input to G _BA The output of the first layer decoder and f ₃ After splicing, inputting to a second layer decoder, and outputting the second layer decoder to f ₂ After splicing, inputting to a third layer decoder, and outputting the third layer decoder to the output of the third layer decoder ₁ And after splicing, the image is sequentially processed by a fourth layer decoder and a fifth layer decoder, and the output of the fifth layer decoder and the input CT image without metal artifact are spliced and then input into a convolution layer (in the embodiment, the convolution kernel size is 1 multiplied by 1) to obtain the CT image with metal artifact.

Specifically, after the input artifact features and image features are subjected to the splicing operation, the number of channels is doubled, the obtained features are subjected to the convolution operation of 1×1, the number of channels is halved, and the original state is restored, so that the subsequent operation is facilitated.

In this embodiment, in the cyclic process, the features including the artifacts are input into the image feature encoding network and the artifact feature encoding network respectively, so as to further obtain the artifact features and the image features with artifact input, and the low-level image features without artifacts are input into the encoder so as to obtain the image features without artifact input. Based on the resulting features, we use a decoder to implement the processing of the different features. Image features with artifact input are used to obtain an image with metal artifacts removed, and the artifact features and image features without artifact input are input into a decoder to add metal artifacts to the original image. Through transmitting the artifact characteristics, the metal artifact characteristics can be effectively identified, extracted and multiplexed, the decomposing capability of the generating network on the artifact characteristics is improved, and meanwhile, the consistency of the artifact characteristics in the generating process is ensured.

In addition, give birth toAdult G _AB Or generator G _BA The downsampling of the method adopts a pixel-unschubble operation, and the upsampling adopts a pixel-shuffle operation.

In this embodiment, the discriminator adopts a relative discriminator, as shown in fig. 3, and the relative discriminator includes 5 layers, specifically: the first layer to the fourth layer are composed of 4×4 convolution layers and a leakage ReLU activation function, and each comprise 64, 128, 256 and 512 convolution kernels; the step length from the first layer to the third layer is 2, and the step length of the fourth layer is 1; the fifth layer consists of a 4 x 4 convolutional layer and a sigmoid activation function, contains 1 convolutional kernel and has a step size of 1. The difference of a block of 64×64 size is finally input to the sigmoid activation function to obtain a one-dimensional output.

S102: introducing a relative contrast GAN loss function, a cyclic consistency loss function, and a projection retention loss function to construct a total loss function;

specifically, the depth network needs to learn the mapping between the input information and the output target in the training process, and the constraint on different detail features can be realized by designing a targeted loss function. In order to achieve efficient removal of metal artifacts, the network model proposed in this embodiment is constructed according to formulas (1) to (5) to obtain a total loss function:

wherein , and />Representing the relative antagonistic GAN loss function, +.>Representing a cyclic consistency loss function,/->Representing the projection retention loss function, x and y representing the actual LR and HR projection chords, respectively,/->Representing the desirability operator, P _data CT images representing true containing and not containing metal artifacts, |·|| ₁ Represents L ₁ Norms (F/F)>Represents a mask undisturbed by metal projection, P (·) represents orthographic projection, X _sino Representing the actual acquired projection image.

In particular, a relative discriminator is used instead of a standard discriminator, which is used to predict the probability that an actual artifact-free image is relatively more realistic than a corrected image. The standard discriminator may be denoted as D _A (y) =σ (C (y)), where σ represents the Sigmoid function and C (y) represents the unconverted discriminator output; and the relative discriminator is represented asThus, the relative resistance GAN loss function can be defined as equation (1) and equation (2) above.

Constraining x and G in forward loops using loop consistency loss _BA (G _AB (x),F _M ) Y and G in the backward cycle _AB (G _BA (y,F _M ) Ensuring that the network-learned mapping can successfully convert the source input to the target output. Constraint G by the loss _BA (G _AB (x),F _M)≈x and G_AB (G _BA (y,F _M ) And) to prevent degradation of resistance learning.

The projection image data contains a large amount of real and effective original projection information, and only the projection information of the part shielded by the metal projection is changed. On the premise that the original projection image can be obtained, the projection image of the output result of the network and the original projection are ensured to be consistent as much as possible on each pixel except the metal trace. The present embodiment further constrains the generator based on the loss function of the projection feature construction as shown in equation (4).

S103: constructing a training data set Z1 specifically comprises the following steps: collecting a real CT image set, distinguishing whether metal artifacts exist in each CT image in the image set based on CT values of CT image pixel points, and then randomly selecting a CT image with metal artifacts and a CT image without metal artifacts to form an actual training data pair;

specifically, in this embodiment, the disclosed cone positioning and identification dataset of spineWeb is selected to construct the training dataset Z1. The first set threshold is 2000, the second set threshold is 2500, and the pixel number is 400; for each CT image in the spineWeb data set, the region division of whether metal artifact exists or not is carried out on the CT image based on the CT value of the CT image pixel point, and specifically comprises the following steps:

if the maximum CT value in the CT image is smaller than 2000, the CT image is a CT image without metal artifact; otherwise, adding the pseudo-artifact CT image into the CT image set;

for each CT image in the suspected artifact CT image set, setting an area with a CT value greater than 2500 in the image as a metal area; then, CT images with the largest connected metal area exceeding 400 are selected as CT images with metal artifacts.

Through the above-mentioned distinguishing selection mode, 6270 CT images with metal artifact and 21190 CT images without metal artifact are obtained. According to the invention, 200 CT images with metal artifacts are randomly selected as a test set, and other residual CT images are subdivided into a training set and a verification set; it should be noted that the CT images in the test set are no longer involved in the network training and verification.

S104: training the metal artifact non-supervision correction network by utilizing a training data set Z1 to obtain a trained metal artifact non-supervision correction network;

specifically, under the PyTorch environment on an AMAX workstation, an Adam optimizer and a basic learning rate of 2×10 are used ^-4 Training of the metal artifact unsupervised correction network is accomplished using the training set in data set Z1. And continuously evaluating the performance of the network in the training process, and after the training of the network model is completed, testing and verifying the actual performance by using a test set to evaluate the generalization capability and the actual performance of the network on new data. According to the condition that the network model is represented in the test set, super parameters are adjusted and the model is optimized, so that the performance and the robustness of the model are improved. This training process takes 10 hours on a 1000 pair dataset. Four GeForce RTX 3090 are used for calculating display cards for training and testing, the memory of each display card is 24GB, two CPU models are Intel Xeon Silver 4210, and the available memory is 128GB.

S105: inputting the CT image of the metal artifact to be corrected into a trained generator G _AB And obtaining a corresponding metal artifact free CT image.

According to the CT image metal artifact unsupervised correction method based on the transducer, the feature coding and decoding framework based on the transducer fully utilizes the global feature correlation of the image, so that the image feature information is fully mined, and the feature can be effectively decomposed into artifact components (namely metal artifacts, noise and the like) and content components (namely anatomical structures). In projection, only a part of projection values affected by metal are subjected to nonlinear change, and the correction effect is further improved by introducing the originally acquired projection data into the network training process and using the information of the part which is not affected by the artifacts as projection domain constraint. According to the method, on the premise that false structure information is not introduced, metal artifacts in the image are well restrained, and edge detail information of the image is well kept.

Example 2

On the basis of the embodiment, the proposed metal artifact non-supervision correction network is formally applied to an actual scene, and the embodiment of the invention firstly constructs a simulation data set Z2, trains the metal artifact non-supervision correction network by using the simulation data set Z2, and obtains a trained metal artifact non-supervision correction network; for trained generator G _AB If the performance requirement is met, step S104 is performed, otherwise, the structure of the metal artifact unsupervised correction network is adjusted and optimized.

In this embodiment, the simulation data set Z2 is constructed based on the deep version data set, and the process of constructing the simulation data set Z2 includes:

images containing metal artifacts are selected in the deep version dataset and the metal is manually segmented to obtain the typical metal implant shape in real cases. 8704 artifact-free CT images are randomly selected as reference images. And carrying out orthographic projection on the selected reference image to obtain a corresponding artifact-free sinogram. One or more of the extracted typical metal implants are then randomly selected, placed in the appropriate anatomical location, an image containing only the metal implant is acquired, and orthographic projection is performed to acquire a sinogram containing only the metal projection traces. And then, carrying out nonlinear correction on the projection value corresponding to the metal projection limit position in the artifact-free sinogram according to a nonlinear projection correction formula.

Where p represents a normal projection value, p' represents a projection value containing metal projection deviation information, m represents a projection attenuation threshold, and c represents the magnitude of nonlinear attenuation. M is set to 70% of the maximum projection value and c is set to 0.6.

And (5) performing filtered back projection reconstruction after the projection correction is completed, and obtaining a CT image of the synthetic metal artifact.

In this example, the pair of 8192 composite images is used for training, 256 pairs are used for verification, and the remaining 256 pairs are used for testing. In this embodiment, the training process refers to step S104 in embodiment 1, and is not described here again.

Example 3

Based on the above embodiments, as shown in fig. 4, an efficient transducer block is provided in this embodiment. The image processing process of the transducer block specifically comprises the following steps:

the optimization of the channel attention mechanism is carried out on the input tensor, and the optimization specifically comprises the following steps: firstly, performing processing by using layer normalization operation, then respectively aggregating cross-channel image features of pixel level by applying convolution kernel size of 1×1 convolution, and coding space context by applying unbiased depth separable convolution with convolution kernel size of 3×3 to generate projections of query matrix Q, key matrix K and value matrix V; then readjusting the projection shapes of the query matrix Q, the key matrix K and the value matrix V, carrying out multiplication operation by utilizing the adjusted query matrix Q and the key matrix K to generate a transposed attention matrix with the size related to the channel number, and finally carrying out multiplication operation on the transposed attention matrix and the adjusted value matrix V, thereby realizing the optimization of a channel attention mechanism;

in this embodiment, the efficient transducer block effectively reduces the data size and the computation size by converting the global attention computation of the original transducer into the channel attention computation.

And then carrying out feature transformation on the feature map which completes optimization of the channel attention mechanism through a feedforward network, wherein the feature transformation specifically comprises the following steps: readjusting the projection shape of the feature map with optimized channel attention mechanism, and then adding the projection shape with tensors input to a transducer block pixel by pixel to obtain new features; performing layer normalization operation on the new features, and optimizing the spatial features through an upper channel and a lower channel respectively; the upper channel and the lower channel sequentially comprise convolution with a convolution kernel size of 1 multiplied by 1 and unbiased depth separable convolution with a convolution kernel size of 3 multiplied by 3; in the upper channel, the depth separable convolution is followed by a gel nonlinear activation function; the outputs of the two channels are multiplied pixel by pixel, and then are subjected to convolution processing with the convolution kernel size of 1×1 and then added with new features pixel by pixel to obtain the final output.

In this embodiment, the outputs of the two channels are multiplied pixel by pixel to implement a weighting operation on the spatial features. Through space feature optimization weighting, each channel can realize feature interaction with other channels.

The method creatively provides a CT image metal artifact unsupervised correction network based on a transducer, inhibits the metal artifact by utilizing the strong feature extraction capability of a depth network, discovers the unrecovered low-attenuation detailed structure of other methods on the image processed by the method, generates a result without false structural information, and well realizes the inhibition of the metal artifact on the basis of keeping effective information.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The method for unsupervised correction of CT image metal artifacts based on the transducer is characterized by comprising the following steps:

2. The method of claim 1, wherein in step 3, distinguishing whether each CT image in the image set has a metal artifact based on a CT value of a CT image pixel comprises: if the maximum CT value in the CT image is smaller than a first set threshold value, the CT image is a CT image without metal artifact; otherwise, adding the pseudo-artifact CT image into the CT image set;

3. The method of claim 1, further comprising, prior to step 4:

for trained generator G _AB And (3) performing the performance test of the metal artifact non-supervision correction network, if the performance requirement is met, performing the step (4), otherwise, performing adjustment and optimization on the structure of the metal artifact non-supervision correction network.

4. The method of claim 1, wherein the image processing of the generated network comprises forward and backward loops; in the forward loop, the true metal artifact image x is input to the generator G _AB To obtain a fake metal artifact free image G _AB (x) And artifact features F extracted during encoding _M Will G _AB(x) and F_M Input to another generator G _BA To obtain a fake image G with metal artifacts _BA (G _AB (x),F _M ) The method comprises the steps of carrying out a first treatment on the surface of the In the backward cycle, the true metal artifact free image y is input to the generator G _BA To obtain a fake image G with metal artifacts _BA (y,F _M ) Will G _BA (y,F _M ) Input to another generator G _AB To obtain a fake metal artifact free image G _AB (G _BA (y,F _M ))。

5. The method for unsupervised correction of metal artifacts in CT images based on Transformer according to claim 4, wherein generator G _AB Sum generator G _BA The backbone architecture of the system is the same and comprises an encoding module and a decoding module; the coding module sequentially comprises a convolution layer and a 5-layer coder; the decoding module sequentially comprises a 5-layer decoder and a convolution layer; in the coding module, the number of the transducer blocks contained in each layer of encoder is 1,2,2,2,4 in sequence from the shallow layer to the deep layer, and the number of the multiple heads corresponding to each layer of encoder is 2,2,2,2,4 in sequence;

6. The method for unsupervised correction of metal artifacts in CT images based on transformers according to claim 5, wherein generator G _AB Or generator G _BA The downsampling of the method adopts a pixel-unschubble operation, and the upsampling adopts a pixel-shuffle operation.

7. The method for unsupervised correction of metal artifacts in a CT image based on a transducer according to claim 5, wherein the image processing process of the transducer block specifically comprises:

8. The method for unsupervised correction of metal artifacts in CT images based on Transformer according to claim 1, characterized in that the relative discriminator comprises 5 layers, in particular: the first layer to the fourth layer are composed of 4×4 convolution layers and a leakage ReLU activation function, and each comprise 64, 128, 256 and 512 convolution kernels; the step length from the first layer to the third layer is 2, and the step length of the fourth layer is 1; the fifth layer consists of a 4 x 4 convolutional layer and a sigmoid activation function, contains 1 convolutional kernel and has a step size of 1.

9. A method for unsupervised correction of metal artifacts in CT images based on transformers according to claim 3, wherein the total loss function is constructed according to formulas (1) to (5):

wherein , and />Representing the relative antagonistic GAN loss function, +.>Representing a cyclic consistency loss function,/->Representing a projection retention loss function, x and y representing real CT images with and without metal artifacts, respectively,/->Representing the desirability operator, P _data The distribution of the data representing the real image, I.I ₁ Represents L ₁ Norms (F/F)>Represents a mask undisturbed by metal projection, P (·) represents orthographic projection, X _sino Representing the actual acquired projection image.