CN116630466B

CN116630466B - Spine CT-MR conversion method and system based on generation antagonism network

Info

Publication number: CN116630466B
Application number: CN202310919034.XA
Authority: CN
Inventors: 张梦华; 黄伟杰; 刘新宇; 王连雷
Original assignee: University of Jinan; Qilu Hospital of Shandong University
Current assignee: University of Jinan; Qilu Hospital of Shandong University
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2023-10-24
Anticipated expiration: 2043-07-26
Also published as: CN116630466A

Abstract

The invention belongs to the technical field of image processing, and aims to solve the problem of spine CT image edge information loss in the existing spine CT image-MR conversion process, and provides a spine CT-MR conversion method and system based on a generated antagonism network, wherein a constructed encoder of the antagonism network comprises a first branch and a second branch, the first branch is used for extracting multi-scale global features of the spine CT image, the second branch is used for utilizing high-frequency multi-scale edge features of the spine CT image, the sensitivity to the edge structure of the spine CT image is improved, and the tiny structure of the edge of the spine CT is ensured not to be lost; the attention module of the decoder is utilized to guide the fusion of global features and edge features of corresponding scales, so that the spine CT structure is not lost as much as possible, and the conversion quality from spine CT to MR is improved while the bone tissue edge information of the spine CT image is protected.

Description

Spine CT-MR conversion method and system based on generation antagonism network

Technical Field

The invention belongs to the technical field of image recognition conversion, and particularly relates to a spine CT (computed tomography) to MR (magnetic resonance) conversion method and system based on a generated antagonism network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Computed tomography CT and magnetic resonance MR are the most common medical imaging techniques in the field of orthopedic clinical diagnosis. Among them, CT imaging provides excellent visualization of bone tissue, which is an ideal choice for analyzing bone structure, while MR is more suitable for soft tissue imaging. Most diseases require clinical diagnosis based on two medical images. Unfortunately, some patients are not able to perform MR examinations for various reasons, such as medical contraindications, emergency situations, or economic limitations. In contrast, CT imaging is generally faster, more affordable, and less contraindicated. Thus, CT imaging is a more readily available option for many patients. The modality conversion of spinal CT to MR can be used to assist medical professionals in diagnosing spinal disease and thus more accurate and efficient diagnosis. Furthermore, the modality conversion of the spine from CT to MR also helps to reduce medical costs by providing a more affordable alternative to conventional MR imaging. This is especially beneficial for patients who are not burdened with MR costs.

The medical image modality translation method based on deep learning comprises a deep convolutional neural network DCNN and a contrast network GAN generation. DCNN-based methods use multiple convolutions to extract high-level abstract and complex features of CT and MR, respectively. The modal features of the MR are then fused into the CT to effect a modal transformation from CT to MR. GAN consists of a generator and a discriminator whose parameters are updated alternately compared to DCNN to obtain better performance in unique resistance training. The generation of the resistant network GAN can be classified into two types of unsupervised learning and supervised learning. Among these, the representative of GAN based on unsupervised learning is CycleGAN, which directs learning of a mapping between two vision fields. However, due to the diversity and complexity of these two vision fields, the translated image structure may sometimes be incomplete. Furthermore, when the generator ignores some variability in the input data, the output image may be less realistic. Pix2Pix is a supervised learning based representation of GAN that requires a paired dataset that may be more difficult to obtain than the unpaired dataset used in CycleGAN. This also enables Pix2Pix to learn a more accurate "input-output" mapping, thereby achieving higher quality translation images and better structural consistency. However, when generating an contrast network GAN for the task of modality conversion from CT to MR, some challenges need to be overcome. First, the edge information of bone tissue in CT is prone to be partially lost during translation. Second, the soft tissue translation mass of the synthetic MR is low and deformation may occur.

In recent years, alaa et al have proposed a GAN with attention guidance, namely the uagGAN, to effect the conversion from MR to CT. The uagGAN may enhance the resistive loss function by optimizing the non-resistive loss function to obtain high frequency and low frequency information of the output image; in addition, global consistency of the images may also be ensured by the attention mask it generates to generate more accurate images. Karim et al propose a new framework called the medical generation antagonism network MedGAN, and in order to improve the clarity of the output, style delivery loss is proposed to guide the MedGAN to match textures and structures. The generation of the resistive network GAN provided above still suffers from small-edge information loss in the mode conversion from CT to MR.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a spine CT-MR conversion system and a spine CT-MR conversion method based on a generation antagonism network, which respectively extract multi-scale global features and multi-scale edge features of a spine CT image by using an encoder, fuse the extracted multi-scale global features and multi-scale edge features on corresponding scales by using an attention module in a decoder, and improve the conversion quality of the spine CT-MR while protecting the bone tissue edge information of the spine CT image.

To achieve the above object, a first aspect of the present invention provides: a spine CT-to-MR conversion system based on generating a resistance network, comprising:

the acquisition module is used for acquiring a spine CT image;

spine CT feature extraction module: the method comprises the steps of performing feature extraction on an acquired spine CT image based on a constructed encoder for generating an antagonism network, wherein the encoder comprises a first branch and a second branch, the first branch is used for extracting multi-scale global features of the spine CT image, and the second branch is used for extracting multi-scale edge features of the spine CT image;

a dummy MR image generation module: the decoder is used for generating an antagonism network to fuse the global features extracted by the first branch and the edge features extracted by the second branch on corresponding scales to obtain a pseudo MR image;

training module: determining differences of the spine CT image, the real MR image and the pseudo MR image by a discriminator for generating an antagonism network and calculating a loss function, and training an encoder-decoder for generating the antagonism network by using the loss function;

MR image generation module: the encoder-decoder is used for utilizing the trained antagonism network according to the spine CT image to be converted to obtain an MR image corresponding to the spine CT image to be converted.

A second aspect of the invention provides: a spine CT-to-MR conversion method based on generating a antagonism network, comprising:

acquiring a spine CT image;

performing feature extraction on the acquired spine CT image based on a constructed encoder for generating an antagonism network, wherein the encoder comprises a first branch and a second branch, the first branch is used for extracting multi-scale global features of the spine CT image, and the second branch is used for extracting multi-scale edge features of the spine CT image;

the decoder generating the antagonism network fuses the global features extracted by the first branch and the edge features extracted by the second branch on corresponding scales to obtain a pseudo MR image;

judging differences of the spine CT image, the real MR image and the pseudo MR image by a discriminator for generating the antagonism network and calculating a loss function, and training an encoder-decoder for generating the antagonism network by using the loss function;

and obtaining an MR image corresponding to the spine CT image to be converted by utilizing the trained encoder-decoder of the antagonism network according to the spine CT image to be converted.

The one or more of the above technical solutions have the following beneficial effects:

in the invention, the encoder of the constructed antagonism network comprises a first branch and a second branch, wherein the first branch is used for extracting multi-scale global features of the spine CT image, the second branch is used for improving the sensitivity to the edge structure of the spine CT image by using the multi-scale edge features of the high frequency of the spine CT image, and ensuring that the edge tiny structure of the spine CT is not lost; the attention module of the decoder is utilized to guide the fusion of global features and edge features of corresponding scales, so that the spine CT structure is not lost as much as possible, and the conversion quality from spine CT to MR is improved while the bone tissue edge information of the spine CT image is protected.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of a spine CT to MR conversion based on generating an antagonism network in accordance with a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a generation-resistant network according to a first embodiment of the present invention;

FIG. 3 (a) is a schematic diagram of a convolutional structure 1 network structure according to a first embodiment of the present invention;

FIG. 3 (b) is a schematic diagram of a downsampling network structure in accordance with a first embodiment of the present invention;

FIG. 3 (c) is a schematic diagram of an upsampling network structure according to a first embodiment of the present invention;

FIG. 3 (d) is a schematic diagram of a convolutional structure 2 network structure according to a first embodiment of the present invention;

FIG. 4 is a schematic diagram of a network structure of an attention module according to a first embodiment of the present invention;

FIG. 5 is a schematic diagram of a network structure of a channel attention module according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a spatial attention module network according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a network structure of a arbiter according to a first embodiment of the present invention;

FIG. 8 is a diagram of paired data sets according to a first embodiment of the present invention;

FIG. 9 is a diagram showing the experimental comparison of the first embodiment of the present invention;

FIG. 10 is a partial detail view of an experiment in accordance with a first embodiment of the present invention;

FIG. 11 is a diagram of an image generated by two branches according to a first embodiment of the present invention;

FIG. 12 (a) is a graph showing the loss function of ablation experiment Pix2Pix in accordance with the first embodiment of the present invention;

FIG. 12 (b) is a graph showing the loss function of the second branch of the ablation experiment Pix2 Pix+ in accordance with the first embodiment of the present invention;

FIG. 12 (c) is a graph showing the loss function of ablation experiment Pix2Pix+the second branch of the present embodiment+CBAM in accordance with the first embodiment of the present invention;

fig. 12 (d) is a graph showing the loss function of ablation experiments Pix2 pix+the second branch of the present embodiment+the attention module+the joint loss function in the first embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment discloses a spine CT-to-MR conversion system based on generating an antagonism network, comprising:

the acquisition module is used for acquiring a spine CT image;

MR image generation module: encoder-decoder for obtaining MR images corresponding to a spinal CT image to be converted using a trained antagonism network from the spinal CT image to be converted

The present embodiment provides an antagonistic network S-P GAN for the modality conversion of the spine from CT to MR. As shown in fig. 2, the antagonism network S-P GAN is composed of an encoder and a decoder. The encoder is of a double-branch structure and is used for extracting characteristic information of the CT and guaranteeing that tiny structural information of the CT is not lost, wherein a first branch of the encoder directly processes a CT image to obtain a global contour map; the second branch uses sobel operator, namely sobel operator to process CT to obtain the edge information graph of CT, then extracting the characteristic; the information of the two branches is then fused at the feature level so that the CT structure is not lost as much as possible.

Specifically, the first branch of the encoder includes a first set of convolution structures, a first downsampling, a second set of convolution structures, a second downsampling, a third set of convolution structures, a third downsampling, a fourth set of convolution structures, and a fourth downsampling that are sequentially connected.

As shown in fig. 3 (a), the first set of convolution structures, the second set of convolution structures, the third set of convolution structures, and the fourth set of convolution structures are identical, and each convolution structure comprises two convolution structures 1, where each convolution structure 1 includes a convolution layer, a normalization function, and a ReLU activation function that are connected in sequence. The convolution kernel size of the convolution layer in convolution structure 1 is 3, the step size is 1, and the padding is 1.

As shown in fig. 3 (d), the structures of the first downsampling, the second downsampling, the third downsampling and the fourth downsampling are the same, including the convolution layer, the normalization and the ReLU activation function which are sequentially connected, the convolution kernel size of the convolution layer in the downsampling structure is 3, the step size is 2, and the filling is 1.

In this embodiment, the size of a CT image of 256×256×3 is changed to 256×256×64 through the first set of convolution structures, changed to a 128×128×64 through the first downsampling, and then the picture sizes output through the second, third, and fourth sets of convolution structures are 64×64×128, 32×32×256, and 16×16×512, respectively.

A high frequency information branch is introduced at the second branch of the decoder to reduce the interference of CT low frequency information on small edge information of bone tissue. By adding a Sobel operator to the second branch of the encoder, a high frequency information branch is formed, enhancing the structural preservation of CT. The second branch of the encoder comprises a fifth group of convolution structures, a fifth downsampling, a sixth group of convolution structures, a sixth downsampling, a seventh group of convolution structures, a seventh downsampling, an eighth group of convolution structures, and an eighth downsampling which are connected in sequence.

The fifth group of convolution structures, the sixth group of convolution structures, the seventh group of convolution structures and the eighth group of convolution structures are the same, and the fifth group of convolution structures, the sixth group of convolution structures, the seventh group of convolution structures and the eighth group of convolution structures are the same as the first group of convolution structures, the second group of convolution structures, the third group of convolution structures and the fourth group of convolution structures.

The first downsampling, the second downsampling, the third downsampling, the fourth downsampling, the fifth downsampling, the sixth downsampling, the seventh downsampling, and the eighth downsampling have the same structure.

In this embodiment, by concatenating the multi-scale results in the second branch of the encoder with the same-scale results in the first branch of the encoder, the sensitivity of the network to edge details can be greatly enhanced, and the small structure of the CT is preserved. Specifically, the output results of the first group of convolution structures are spliced with the output results of the fifth convolution structure, the output results of the second group of convolution structures are spliced with the output results of the sixth convolution structure, the output results of the third group of convolution structures are spliced with the output results of the seventh convolution structure, and the output results of the fourth group of convolution structures are spliced with the output results of the eighth convolution structure.

The decoder in this embodiment includes a first attention module, a second attention module, a third attention module, a fourth attention module, a convolution structure, and a Tanh activation function connected in this order, and generates a dummy MR of the same size as the real MR by the decoder.

The first attention module, the second attention module, the third attention module and the fourth attention module have the same structure, namely are CBAM, and the CBAM respectively comprises an attention mechanism, a first convolution structure, a second convolution structure and up-sampling which are sequentially connected. As shown in fig. 3 (b), the upsampling includes deconvolution, normalization, and ReLU activation functions connected in sequence, the upsampling being a transposed convolution with a convolution kernel size of 3, a step size of 2, and a padding of 1.

The first convolution structure and the second convolution structure are the same as each other and are convolution structures 1, and the convolution structures 1 comprise a convolution layer, normalization and a ReLU activation function which are connected in sequence.

The convolution structures include a third convolution structure, a fourth convolution structure, and a fifth convolution structure. The third convolution structure, the fourth convolution structure, and the fifth convolution structure are all the same as the convolution structure 2, and as shown in fig. 3 (c), the convolution structure 2 includes a convolution layer, a normalization, and an activation function that are sequentially connected. The convolution structure 1 and the convolution structure 2 have different changes of the channel numbers of the input and the output, wherein the convolution structure 1 is that the output is 2 times of the input channel number, and the convolution structure 2 is that the output is 1/2 of the input.

In this embodiment, the output results of the fourth downsampling and the eighth downsampling are spliced to be input to the first attention module, the output results of the fourth convolution, the eighth convolution and the first attention module are spliced to be input to the second attention module, the output results of the third convolution, the seventh convolution and the second attention module are spliced to be input to the third attention module, the output results of the second convolution, the fifth convolution and the third attention module are spliced to be input to the fourth attention module, and the output results of the first convolution, the fifth convolution and the fourth attention module are spliced to be input to the convolution.

In order to fuse the feature information of the four scales of the first branch of the encoder and the feature information of the four scales of the second branch structure, the embodiment guides the decoder through the attention module, and the attention module adaptively modifies the feature tensor of the CT in the spatial dimension and the channel dimension. The attention module causes the model to increase the attention to the feature by assigning a channel weight and a spatial weight.

In this embodiment, the image size becomes 16×16×1024 through the convolution structure in the first attention module, the image size becomes 16×16×512 through the upsampling operation in the first attention module, then the same operation is performed through the stitching and attention mechanism, the output image sizes of the second attention module, the third attention module, and the fourth attention module are sequentially restored to 32×32×256, 64×64×128, 128×128×64, and then the image 256×256×3 with the same input size is output through the convolution structure and the Tanh activation function.

As shown in fig. 4-6, the attention mechanism includes a channel attention module and a spatial attention module, where the channel attention module includes a maximum pooling layer, an average pooling layer, a multi-layer perceptron, and an activation function in sequence. The channel attention module performs maximum pooling and average pooling operation on input features of the attention module respectively to obtain a first feature map and a second feature map, the first feature map and the second feature map are subjected to multi-layer sensing respectively to obtain a third feature map and a fourth feature map, and the third feature map and the fourth feature map are added and then subjected to a Sigmoid activation function to obtain the channel attention weight.

The spatial attention module comprises a maximum pooling layer, an average pooling layer and a 7×7 convolution layer which are connected in sequence. And sequentially carrying out a maximum pooling layer and an average pooling layer on the input of the spatial attention module to obtain a fifth characteristic diagram, splicing the fifth characteristic diagram based on the channel dimension, and then obtaining the spatial weight through a convolution layer with the convolution kernel size of 7 multiplied by 7 and a Sigmoid activation function.

The input features of the attention mechanism are subjected to the channel attention module to obtain channel weights, but the channel weights obtained by the attention module may be large or small, and direct weighting can cause local maximum or minimum deviation of the weights. Therefore, the channel weight needs to be normalized to be within a certain range, so as to obtain the normalized weight. And carrying out weighting and weight reconstruction on the normalized weight and the input feature of the attention mechanism to obtain new features, taking the reconstructed features as the input of the spatial attention module, and carrying out weighting and weight reconstruction on the output of the spatial attention module and the features reconstructed by the channel attention module to obtain the output fine features of the attention mechanism.

Channel weights of feature tensors are obtained in the channel dimension, and normalized weights are obtained through a sigmoid function and BatchNorm.

In the spatial attention module, the inputs of the spatial attention module perform global average pooling and global maximum pooling to formFeature dimensions. The spatial attention weights are then obtained by two convolution structures.

In this embodiment, as a tool to assist generator training, the discriminator is a PatchGAN structure, the structure of which is shown in fig. 7. The discriminator has two sets of inputs, the first set of inputs being a stitched image of the CT and the phantom MR, and the second set of inputs being a stitched image of the CT and the phantom MR. Establishment by discriminatorThe matrix is patched to measure the authenticity of the Synthetic MR.

Specifically, the discriminator comprises a first convolution layer, an activation function, a ninth downsampling, a tenth downsampling, an eleventh downsampling and a second convolution layer which are connected in sequence.

The ninth downsampling, tenth downsampling and eleventh downsampling have the same structure, specifically comprise a third convolution layer, normalization function and activation function which are sequentially connected, and change the dimension size of the input feature through downsampling into the feature with the dimension size (C, H/2 and W/2) from the dimension size (C, H, W), wherein C, H, W is the channel number, the height and the width of the image.

This embodiment proposes a new joint loss function to guide the CT to MR translation. The mixing loss function consists of three parts: structural consistency lossPixel translation penalty->And resistance loss->. The total loss function of the generator isWherein->，/>And->The weights for the structural consistency loss, pixel translation loss and resistance loss are respectively +.>、/>And->. The loss function of the arbiter is +.>。

For structural consistency loss: desired pseudo MR imagesIs improved by the structure of the CT image and the input CT image>Is identical in structure. However, the tissue structure of CT may change during the modality translation process. In order to solve this problem, the present embodiment proposes a structural consistency loss +.>It uses spatial correlation maps to enhance structural consistency between CT and CT high frequencies.

From characteristic tensorsRandom extraction element->And calculates the spatial correlation map +.>. For each of the same +.>For CT image->CT high-frequency information image output after the second branch is processed by sobel>And pseudo MR image->Performing the same operation to obtain ∈>，/>And->。

Can be performed at the feature tensor level, inputting the feature +.>，/>Andin pixel +.>The mathematical expressions of the upper mapping are respectively:

（1）

wherein,,and->Is an input feature +.>，/>And->In the channel->The characteristic tensor above, expressed as +.>. Input features->，/>、/>Is of the size C x H x W,for inputting features->，/>And->Random one pixel point in +.>In addition to->The remaining points outside, superscript T denotes transpose, < >>The dimension C, H, W is the number of channels, the height, and the width of the image.

Then, calculateAnd->，/>And->Cosine distance between them, finally, taking their average value as +.>：

（2）

（3）

Wherein,,is->And->Cosine distance of>Is->And->Cosine distance between them.

For pixel translation loss, the discriminator acts as an auxiliary generatorThe trained tool determines differences between real MR and pseudo MR at the pixel level. Such information is typically used for resistance loss. However, this information may also direct the quality of the synthetic MR generation. In order to generate a more realistic synthetic MR, the present embodiment proposes pixel translation loss. For->Loss scaling to more steadily reduce network loss during training, +.>As shown in the formula:

（4）

wherein,,is an input CT image, < >>Is true MR +.>Is generator pair->Output interpolation, & gt>Representing prediction on a pixel, < >>Indicating the desire.

The size of the input image is interpolated back by the output of the prediction discriminator and further predicted at the pixel level:

（5）

wherein,,is a discriminator pair->Output interpolation of>Is the discriminator output interpolation return input image size and +.>Normalized calculation function->To activate the function, to remove->To avoid interference with the generator.

The formula consisting of the two parts is connected up and down, and the high-quality synthetic MR is closer to the real MR, so that the reduction rate of the high-quality synthetic MR is smaller than that of the low-quality synthetic MR, and the influence of the low-quality synthetic MR is greatly reduced.

For countering losses: typically, both the generator and the discriminator are subjected to resistance losses at the same timeWhich is the key to achieving resistance training. The challenge training allows the generator and discriminator to perform better in competing relationships. For->The loss function is modified to make it more robust and create losses to the generator. During training, the output probability of the discriminator approaches 0:

（6）

wherein,,is an input CT image, < >>Is the true MR, D (y) discriminant for +.>For the G (x) generator +.>Is used for the interpolation of the outputs of (a).

At the same time, the method comprises the steps of,it is also necessary to have the generator output probability approach 0, i.e. against the loss of resistance:

（7）

wherein,,the interpolation output by the reception generator G (x) after the judgment by the judgment unit is shown.

In this embodiment, as shown in fig. 8, the paired data set is acquired from a spine patient having an actual clinical visit, and the acquisition of the paired data set obtains the consent of the spine patient and is legally obtained. Since the number of MR slices is smaller than the number of CT slices, automatic matching cannot be performed. Furthermore, by training the network in an unsupervised learning manner using mismatched data sets, it is not possible to directly obtain end-to-end feature information. Thus, under the direction of a specialist, CT and MR data are registered by 3D Slicer software. Each CT slice after registration has its corresponding MR slice and the network is trained by supervised learning. There is a total 924 for CT and MR data, including 740 training sets and 184 test sets.

This example uses two RTX3090 graphics to train under the Python framework. The batch size is set to 2 and the input CT image is filled inAfter the size is trained, the size of the composite MR is restored to the original size of the input CT. The adaptive momentum algorithm is set to optimize and the momentum is set to 0.5. The experiment was performed for a total of 200 epochs. A learning rate of 0.01 was used for the first 100 epochs, while the learning rate linearly decreased every 10 iterations in the last 100 epochs.

The present embodiment uses 740 pairs of CT and MR data to train the S-P-GAN, as well as other mainstream modality conversion algorithms. Algorithms for comparison experiments included CycleGAN, cGAN, LSeSim, pix2Pix, CUT and APS.

Qualitative and quantitative comparisons between S-P-GAN and existing methods are reflected in mean absolute error MAE, peak signal to noise ratio PSNR, and structural similarity SSIM. Three evaluation indexes of the algorithm are shown in table 1. Overall, S-P-GAN gave the best results among all three evaluation indexes. The evaluation result of SSIM proves the high sensitivity of S-P-GAN to the structure and the effective maintenance of the structural consistency. In the evaluation of PSNR, the experimental results of S-P-GAN were greater than 40, indicating that Synthetic MR was close to the group trunk MR. The evaluation result of MAE also reflects that the S-P-GAN modal translation has the highest accuracy.

Table 1:

in the comparison of the evaluation index, APS is closest to the S-P GAN. The reasoning results of APS are compared with the S-P GAN results of the method as shown in fig. 9. The figure shows the mid-sagittal position of the lumbar spine, showing from top to bottom the original CT image, experimental results of the APS algorithm, experimental results of the S-P GAN and true MR. The lumbar body, lumbar vertebral lamina, lumbar spinal nerve (cauda equina nerve) and herniated disc are in turn from left to right. As shown in fig. 9, example 1 shows the vertebral body of a patient suffering from a fracture of the lumbar spine. The S-P GAN experimental result shows that compared with the APS experimental result, the bone marrow signals in the vertebral body are clearer and more accurate. This can determine whether the compression fracture is new or old. In the circle shown in example 2 are sagittal images of the lumbar lamina, spinous process and part of the ligamentum flavum. The S-P GAN algorithm has great advantages over the APS algorithm in showing ligamentum flavum and spinous processes, and can clearly show ligamentum flavum. However, APS algorithms distinguish between skeletal and soft tissue structures, such as ligamentum flavum, leading to blurred contours and low clinical reference values. Example 3 shows the cauda equina in the lumbar spine, a common nerve structure that is easily compressed by herniated disk. The figure shows a lumbar slipping patient whose caudal equina is not pressed. However, experimental results of APS show that when the cauda equina are displayed, there should be successive images of the cauda equina in the form of "fractures", which greatly affect the clinical judgment of the severity of the disease in the patient. In contrast, the experimental results of S-P-GAN showed that the cauda equina had good continuity and compression. Example 4 shows images of an L4-5 spondylolisthesis and a T12-L4 disc herniation in the sagittal plane. The experimental results of S-P GAN show that the shape of the intervertebral disc protrusion is closer to that of a real MR. This demonstrates the excellent performance of S-P GAN in terms of image detail processing.

To better reflect the advantages of S-P-GAN, the local details are shown in FIG. 10. Examples 1-4 are sagittal lumbar images showing the performance advantage of this algorithm in generating MR images. In particular, in example 3, there was cartilage endplate degeneration and Modic changes in the upper and lower endplates of the disc margin of the patient' S vertebral body L5-S1, i.e., the L5 lower endplate and the S1 upper endplate. The method is a common imaging change in clinical diagnosis and treatment of lumbar degenerative diseases of middle-aged and elderly people, and has good application in predicting postoperative symptoms and functional changes of patients. The traditional CT examination can not determine whether the patient has Modic change, and the synthetic MR image generated by the algorithm of the embodiment can accurately display the Modic change of the lumbar vertebra of the patient, which has important and positive effects on clinical preoperative evaluation, in particular on evaluation of postoperative follow-up image data of the patient.

On the other hand, the class activation map CAM verifies the performance of the proposed network, as shown in fig. 11. The first branch focuses on the global features of the input CT and migrates the entire modality of the CT to the MR. However, the second branch pays attention to the detail features of the incoming CT, enhancing the sensitivity of the network to CT edge details, and preserving the CT edge structure. This demonstrates the effectiveness of the proposed network.

Ablation experiment: the effect of the proposed encoder, decoder and joint loss function is analyzed by three different configurations. The necessity of each structure was confirmed in the experiments. The evaluation index of each experiment is shown in table 2. With the improvement of these three values, three evaluation indexes prove that the high-frequency information branch is effective for the high sensitivity of the structure. At the same time, the attention module plays a role in feature fusion of the dual branches. The joint loss function has proven significant for training constraints and guidelines through three evaluation metrics.

Table 2:

the loss functions for the four experiments are shown in fig. 12 (a) -12 (d). The loss function curve of the encoder model with high frequency information branches is shown in fig. 12 (b), where the overall downward trend is more pronounced, but there is still a significant jump. Thus, the introduction of the attention module as a guide for the fusion of the dual branch features may reduce curve jumps. As shown in fig. 12 (d), the constraint of the joint loss function reduces the local minimum jump of the curve, making the drop more stable, thus avoiding overfitting of the model. From the above analysis it can be concluded that the proposed encoder, decoder and joint loss function can obtain a synthetic MR that is closer to ground truth.

The results of this example demonstrate that S-P GAN can better accomplish the modal transformation of the spine from CT to MR. The design of the second branch of the method enables the encoder to obtain global characteristics and high-frequency characteristics, and effectively maintains a small edge information structure. In addition, the attention module in the decoder effectively guides the fusion of the dual-branch features to improve the translation quality. At the same time, a new joint loss function guides and constrains the countermeasure training. Experiments show that the S-P GAN has good effects on lumbar vertebrae lamina, spinous processes and some ligamentum flavum, and plays an important role in surgical selection and planning. On the other hand, S-P GAN shows clearer and more accurate bone marrow signals in the vertebral body. More importantly, the translation results can be used to assess cartilage endplate degeneration and Modic changes. The method is a common imaging change in clinical diagnosis and treatment of middle-aged and elderly lumbar degenerative disease patients, and has good application in predicting postoperative symptoms and functional changes of patients. In clinical diagnosis, the method has great application value and economic value in the medical diagnosis process.

Example two

It is an object of the present embodiment to provide a spine CT to MR conversion method based on generating a antagonism network, comprising:

acquiring a spine CT image;

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. A spinal CT-to-MR conversion system based on generating a resistance network, comprising:

the acquisition module is used for acquiring a spine CT image;

MR image generation module: the encoder-decoder is used for utilizing the trained antagonism network according to the spine CT image to be converted to obtain an MR image corresponding to the spine CT image to be converted;

in the training module: judging differences of the spine CT image, the real MR image and the pseudo MR image through a constructed discriminator of the antagonism network, calculating structural consistency loss of the pseudo MR image and the spine CT image, pixel consistency of the real MR image and the pseudo MR image, and antagonism loss of a generator formed by an encoder-decoder and the discriminator, and training the generator of the antagonism network by using the structural consistency loss, the pixel consistency and the antagonism loss structure loss function;

the first branch comprises a plurality of groups of convolution structures which are sequentially connected, global features of different scales of the spine CT image are sequentially extracted through the plurality of groups of convolution structures, and downsampling operation is respectively carried out on the extracted global features of each scale;

the second branch comprises a Sobel operator, the edge features of the spine CT image are extracted by the Sobel operator, the edge features of the spine CT image extracted by the Sobel operator are extracted by a plurality of groups of convolution structures which are sequentially connected, the edge features of multiple scales are extracted, and the extracted edge features of each scale are respectively subjected to up-sampling operation.

2. The system of claim 1, wherein in the MR image generation module, the decoder includes a plurality of attention modules connected in sequence, the global features of the first branch outputs and the edge features of the second branch outputs are input into the decoder, and the global features and the edge features of the corresponding scales are fused by each attention module in the encoder.

3. The spine CT-to-MR conversion system based on the generation-correlation network according to claim 2, wherein in the MR image generation module, the attention module includes a spatial attention module and a channel attention module, the input features of the attention module obtain channel attention weights by using the channel attention module, the channel attention weights and the attention module input features are respectively normalized and then subjected to weighted fusion, and the result of the weighted fusion is subjected to pooled convolution operation by using the spatial attention module to obtain the spatial attention weights;

and carrying out weighted fusion on the result obtained by weighted fusion of the spatial attention weight, the normalized channel attention weight and the normalized attention module input characteristic to obtain the attention module output characteristic.

4. The system for generating a correlation network based spine CT to MR conversion of claim 3 wherein the channel attention module performs a maximum pooling and an average pooling operation on the input features of the attention module to obtain a first feature map and a second feature map, respectively performing multi-layer sensing on the first feature map and the second feature map to obtain a third feature map and a fourth feature map, and adding the third feature map and the fourth feature map to obtain a channel attention weight;

the spatial attention module sequentially carries out maximum pooling and average pooling operation on the results obtained by weighting and fusing the normalized channel attention weight and the normalized attention module input characteristics to obtain a fifth characteristic diagram; and (3) carrying out convolution operation and activation function on the fifth feature map to obtain the spatial attention weight.

5. The system for generating a correlation network based spine CT-to-MR conversion of claim 1 wherein in the training module, the loss of structural consistency is in particular;

respectively calculating a spatial correlation map of the input spine CT image, a high-frequency edge image obtained by a Sobel operator in the second branch and a pseudo MR image;

calculating the cosine distance between the spatial correlation map of the input spine CT image and the spatial correlation map of the pseudo MR image, and the cosine distance between the spatial correlation map of the high-frequency edge image and the spatial correlation map of the pseudo MR image;

and obtaining the structural consistency loss according to the cosine distance.

6. The spinal CT-to-MR conversion system based on generating a resistance network of claim 4, wherein in the training module, the resistance loss is specifically: and outputting the sum of interpolation expectations according to the real MR image after the interpolation is judged and the input spine CT image after the interpolation is output by the encoder-decoder and the interpolation is judged by the judging device.

7. A method of spinal CT-to-MR conversion based on generating a resistance network, comprising:

acquiring a spine CT image;

judging differences of the spine CT image, the real MR image and the pseudo MR image by a discriminator for generating the antagonism network and calculating a loss function, and training an encoder-decoder for generating the antagonism network by using the loss function; the method comprises the following steps: judging differences of the spine CT image, the real MR image and the pseudo MR image through a constructed discriminator of the antagonism network, calculating structural consistency loss of the pseudo MR image and the spine CT image, pixel consistency of the real MR image and the pseudo MR image, and antagonism loss of a generator formed by an encoder-decoder and the discriminator, and training the generator of the antagonism network by using the structural consistency loss, the pixel consistency and the antagonism loss structure loss function;

obtaining an MR image corresponding to the spine CT image to be converted by utilizing a trained encoder-decoder of the antagonism network according to the spine CT image to be converted;