CN116645380A

CN116645380A - Automatic segmentation method of tumor area in CT images of esophageal cancer based on two-stage progressive information fusion

Info

Publication number: CN116645380A
Application number: CN202310688086.0A
Authority: CN
Inventors: 黄勇; 徐凯; 张飞翔
Original assignee: Anhui University; Second Peoples Hospital of Hefei
Current assignee: Anhui University; Second Peoples Hospital of Hefei
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-08-25

Abstract

The invention relates to an automatic segmentation method for tumor areas of an esophageal cancer CT image based on two-stage progressive information fusion, which solves the defect that the automatic segmentation of the esophageal cancer CT image is difficult to realize compared with the prior art. The invention comprises the following steps: acquiring and preprocessing an esophageal cancer CT image; constructing an esophageal cancer CT image segmentation model; training of an esophageal cancer CT image segmentation model; obtaining and preprocessing a CT image of the esophageal cancer to be segmented; and obtaining the esophageal cancer CT image segmentation result. Based on the characteristics of large noise, low resolution and artifacts of the esophageal CT image, the invention proposes the characteristics extracted by using the image super-resolution reconstruction network, and then gradually blends the characteristics into the segmentation network, thereby effectively enhancing the quality of the esophageal CT image, enabling the network to extract more abundant detail characteristics, effectively carrying out segmentation and sketching of an esophageal cancer target region and improving the accuracy and efficiency of segmentation.

Description

Automatic segmentation of tumor area in CT images of esophageal cancer based on two-stage progressive information fusion method

技术领域technical field

本发明涉及医学图像分割技术领域，具体来说是基于两阶段渐进式信息融合的食管癌CT图像肿瘤区自动分割方法。The invention relates to the technical field of medical image segmentation, in particular to an automatic tumor area segmentation method for CT images of esophageal cancer based on two-stage progressive information fusion.

背景技术Background technique

食管癌是一种男性占主导地位的侵袭性恶性肿瘤，包括鳞状细胞食管癌和腺类食管癌，它们具有不同的病理特征和分布情况。全球范围内，鳞状细胞食管癌仍然是最常见的类型。目前，食管切除手术与三野淋巴切除技术已经达到了局部控制的极限，需要等待技术的进一步发展。此外，食管癌具有很强的侵袭性，早期就可能发生淋巴和血行转移等并发症。由于疾病早期症状不明显，大多数患者在出现进食困难、声音嘶哑等症状就诊时，肿瘤已经处于进展期，错过了手术切除的时机。对于不能接受手术治疗的食管癌患者，仅使用化疗、靶向治疗等单一手段效果不佳，5年生存率较低。相比之下，采用化疗、放疗和内镜治疗等多模式综合治疗方法正在逐渐成为主流，可以为一部分进展期食管癌患者提供长期生存的机会，在不适宜手术切除的情况下发挥重要作用。放射治疗的一个主要问题是确定肿瘤的位置，需要一种能够辅助定位的工具。因此，计算机断层扫描技术被广泛应用于放疗计划的制定过程中。Esophageal cancer is an aggressive male-dominated malignancy, including squamous cell esophageal carcinoma and adenoid esophageal carcinoma, which have different pathological features and distribution. Globally, squamous cell esophageal carcinoma remains the most common type. At present, the technique of esophagectomy and three-field lymphadenectomy has reached the limit of local control, and it is necessary to wait for the further development of the technique. In addition, esophageal cancer is very aggressive, and complications such as lymphatic and hematogenous metastasis may occur in the early stage. Since the early symptoms of the disease are not obvious, most patients have difficulty in eating, hoarseness and other symptoms when they go to the doctor, the tumor is already in the advanced stage, and the opportunity for surgical resection is missed. For patients with esophageal cancer who cannot receive surgical treatment, single methods such as chemotherapy and targeted therapy are not effective, and the 5-year survival rate is low. In contrast, multimodal comprehensive treatment methods such as chemotherapy, radiotherapy, and endoscopic therapy are gradually becoming mainstream, which can provide long-term survival opportunities for some patients with advanced esophageal cancer, and play an important role in cases where surgical resection is not suitable. A major problem with radiation therapy is determining the location of the tumor, which requires a tool that can aid in localization. Therefore, computed tomography technology is widely used in the process of radiotherapy planning.

精准放疗需要精确确定和勾画放疗的靶区。目前，放射治疗的靶区勾画工作主要由经验丰富的医生和物理师手工完成，准确性依赖于医生的经验水平。然而，这项工作繁琐而耗时，一个有经验的医生可能需要花费两天时间完成一组图像的标注。Precise radiotherapy requires precise determination and delineation of the target volume for radiotherapy. At present, the delineation of the target area in radiotherapy is mainly done manually by experienced doctors and physicists, and the accuracy depends on the experience level of doctors. However, this work is tedious and time-consuming, and it may take an experienced doctor two days to complete the annotation of a set of images.

因此，如何在医学影像上实现自动勾画靶区成为计算机视觉领域的热门问题。针对放射治疗靶区勾画工作，由于食管器官的复杂性，目前还没有成熟可行的解决方案。为了提高医生的工作效率并实现食管癌的精准治疗，研究和实现食管癌肿瘤靶区的自动化勾画成为一个迫切需要解决的问题。Therefore, how to automatically delineate the target area on medical images has become a hot issue in the field of computer vision. Due to the complexity of the esophageal organs, there is no mature and feasible solution for the delineation of radiation therapy target volume. In order to improve the work efficiency of doctors and realize the precise treatment of esophageal cancer, it is an urgent problem to study and realize the automatic delineation of tumor target volume of esophageal cancer.

基于深度学习的食管癌医学图像分割是一项突破性技术，可以很好地实现对器官和肿瘤靶区的自动分类、识别和分割任务，最大限度地减少医生难以发现的图像内部信息。在食管癌医学图像分割的诊断中，借助AI影像辅助，外科医生可以快速有效地进行癌症检测，节省诊断时间。最近的研究也表明，这项技术具有令人满意的稳健性和潜力。The medical image segmentation of esophageal cancer based on deep learning is a breakthrough technology, which can well realize the automatic classification, identification and segmentation tasks of organs and tumor targets, and minimize the internal information of the image that is difficult for doctors to discover. In the diagnosis of esophageal cancer medical image segmentation, with the help of AI image assistance, surgeons can quickly and effectively detect cancer and save diagnosis time. Recent studies have also demonstrated the satisfactory robustness and potential of this technique.

但是，食管癌肿瘤靶区或临床靶区的自动勾画是一项具有挑战性的任务。肿瘤区域的分割取决于CT图像中肿瘤与周围组织之间的对比度差异由于食管癌病变形态的多样性、位置的多变性以及周围组织的复杂性，传统的深度学习算法难以捕捉肿瘤的全部细节和特征。However, automatic delineation of tumor or clinical targets in esophageal cancer is a challenging task. The segmentation of the tumor area depends on the contrast difference between the tumor and the surrounding tissue in the CT image. Due to the diversity of esophageal cancer lesions, the variability of the location and the complexity of the surrounding tissue, it is difficult for the traditional deep learning algorithm to capture all the details and details of the tumor. feature.

因此，如何针对病变图像差异复杂的食管癌CT图像进行自动分割已经成为急需解决的技术问题。Therefore, how to automatically segment CT images of esophageal cancer with complex lesion image differences has become an urgent technical problem to be solved.

发明内容Contents of the invention

本发明的目的是为了解决现有技术中难以针对食管癌CT图像进行自动分割的缺陷，提供一种基于两阶段渐进式信息融合的食管癌CT图像肿瘤区自动分割方法来解决上述问题。The purpose of the present invention is to solve the defect in the prior art that it is difficult to automatically segment CT images of esophageal cancer, and provide a method for automatic tumor area segmentation of CT images of esophageal cancer based on two-stage progressive information fusion to solve the above problems.

为了实现上述目的，本发明的技术方案如下：In order to achieve the above object, the technical scheme of the present invention is as follows:

一种基于两阶段渐进式信息融合的食管癌CT图像肿瘤区自动分割方法，包括以下步骤：A method for automatic segmentation of tumor regions in CT images of esophageal cancer based on two-stage progressive information fusion, comprising the following steps:

11)食管癌CT图像的获取及预处理：获取DICOM格式的CT影像，对CT影像中食管颈段和食管腹段区域的CT图像数据进行数据增强处理，并对所有CT影像进行切片处理，即对三维的DICOM格式的CT影像进行截取操作，得到二维的.jpg格式的CT影像切片和.png格式的二值化标签影像，组成食管癌CT图像数据集；11) Acquisition and preprocessing of CT images of esophageal cancer: Acquire CT images in DICOM format, perform data enhancement processing on the CT image data of the cervical esophagus and abdominal esophagus in the CT images, and perform slice processing on all CT images, namely Intercept the 3D CT images in DICOM format to obtain 2D CT image slices in .jpg format and binarized label images in .png format to form an esophageal cancer CT image dataset;

12)构建食管癌CT图像分割模型：基于两阶段的渐进式信息融合技术构建食管癌CT图像分割模型；12) Build a CT image segmentation model for esophageal cancer: build a CT image segmentation model for esophageal cancer based on two-stage progressive information fusion technology;

13)食管癌CT图像分割模型的训练：将食管癌CT图像数据集输入食管癌CT图像分割模型进行训练；13) Training of the esophageal cancer CT image segmentation model: input the esophageal cancer CT image data set into the esophageal cancer CT image segmentation model for training;

14)待分割食管癌CT图像的获得及预处理；14) Acquisition and preprocessing of CT images of esophageal cancer to be segmented;

15)食管癌CT图像分割结果的获得：将预处理后的待分割食管癌CT图像输入训练后的食管癌CT图像分割模型，得到分割后的食管癌CT图像。15) Obtaining the segmentation result of esophageal cancer CT image: Input the preprocessed esophageal cancer CT image to be segmented into the trained esophageal cancer CT image segmentation model to obtain the segmented esophageal cancer CT image.

所述构建食管癌CT图像分割模型包括以下步骤：Described construction esophagus cancer CT image segmentation model comprises the following steps:

21)设定食管癌CT图像分割模型包括用于超分辨率重建的Swin Transformer网络模型和TransResUNet卷积神经网络模型，超分辨率重建的Swin Transformer输出的特征图与原图通过拼接操作进行渐进式信息融合，之后输入到用于分割的TransResUNet卷积神经网络模型得到最终的分割图；21) Set the CT image segmentation model of esophageal cancer to include the Swin Transformer network model and the TransResUNet convolutional neural network model for super-resolution reconstruction. The feature map output by the Swin Transformer for super-resolution reconstruction and the original image are progressively stitched together. Information fusion, and then input to the TransResUNet convolutional neural network model for segmentation to obtain the final segmentation map;

22)设定Swin Transformer网络模型，其包括6个带残差的Swin22) Set the Swin Transformer network model, which includes 6 Swin with residual

Transformer模块RSTB和一个残差连接结构，其中RSTB由6个SwinTransformer module RSTB and a residual connection structure, where RSTB consists of 6 Swin

Transformer Layer和一个卷积、一个残差连接组成；Transformer Layer consists of a convolution and a residual connection;

23)设定TransResUNet卷积神经网络模型：23) Set the TransResUNet convolutional neural network model:

231)设定TransResUNet卷积神经网络模型包括一个用于特征提取的下采样编码器模块，一个用于获取不同尺度感受野的特征金字塔ASPP模块，一个用于恢复图像分辨率的上采样解码器模块；231) Set the TransResUNet convolutional neural network model to include a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining receptive fields of different scales, and an upsampling decoder module for restoring image resolution ;

232)设定下采样编码器模块包括4个连续的带残差的下采样结构；232) Set the downsampling encoder module to include 4 continuous downsampling structures with residuals;

带残差的下采样结构包括分支A和分支B，其中分支A为一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个Transformer Encoder Block层、一个卷积核为3×3的卷积层、一个批量归一化层堆叠作为残差结构的分支A；The downsampling structure with residuals includes branch A and branch B, where branch A is a convolutional layer with a convolution kernel of 3×3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, and a volume The product kernel is a 3×3 convolutional layer, and a batch normalization layer is stacked as the branch A of the residual structure;

分支B为一个卷积核为1×1的卷积层、一个批量归一化层堆叠作为残差结构的另一个分支B；Branch B is a convolution layer with a convolution kernel of 1×1, and a batch normalization layer is stacked as another branch B of the residual structure;

两个分支进行加法操作，最后经过一个LeakyRelu层；The two branches perform addition operations, and finally pass through a LeakyRelu layer;

233)设定特征金字塔ASPP模块，其包括：233) set feature pyramid ASPP module, it comprises:

第一个操作模块：一个卷积核为1×1的卷积层；The first operation module: a convolution layer with a convolution kernel of 1×1;

第二个操作模块：一个卷积核为3×3空洞率为6的卷积层；The second operation module: a convolutional layer with a convolution kernel of 3×3 and a hole rate of 6;

第三个操作模块：一个卷积核为3×3空洞率为12的卷积层；The third operation module: a convolutional layer with a convolution kernel of 3×3 and a hole rate of 12;

第四个操作模块：一个卷积核为3×3空洞率为18的卷积层；The fourth operation module: a convolutional layer with a convolution kernel of 3×3 and a hole rate of 18;

第五个操作模块：一个自适应平均池化层、一个卷积核为1×1的卷积层、一个上采样操作进行堆叠构成；The fifth operation module: an adaptive average pooling layer, a convolution layer with a convolution kernel of 1×1, and an upsampling operation for stacking;

将这五个操作模块进行并行连接，并将其得到的5个特征图进行拼接操作，再经过卷积核为1×1的卷积层；These five operation modules are connected in parallel, and the five feature maps obtained are spliced, and then passed through a convolution layer with a convolution kernel of 1×1;

235)设定上采样解码器模块，包括4个连续的带残差的上采样结构，以及与编码器的四个带残差的下采样结构的输出分支的拼接结构；235) Set an upsampling decoder module, including 4 continuous upsampling structures with residuals, and a splicing structure with output branches of four downsampling structures with residuals of the encoder;

编码器的四个带残差的下采样结构分别得到四个不同尺寸的输出，The four downsampling structures with residuals of the encoder obtain four outputs of different sizes,

第四个尺寸的输出，输入到特征金字塔ASPP模块后拼接到解码器的第一个输入，The output of the fourth dimension is input to the feature pyramid ASPP module and then spliced to the first input of the decoder,

第三个尺寸的输出拼接到解码器的第二个输入，The output of the third dimension is concatenated to the second input of the decoder,

第二个尺寸的输出拼接到解码器的第三个输入，The output of the second dimension is concatenated to the third input of the decoder,

第一个尺寸的输出拼接到解码器的第四个输入；The output of the first dimension is concatenated to the fourth input of the decoder;

解码器结构为四个连续的带残差的上采样块，The decoder is structured as four consecutive upsampling blocks with residuals,

带残差的上采样块结构为：The upsampling block structure with residuals is:

一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支，A double upsampling operation, a splicing operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, a LeakyRelu layer, and a convolution with a convolution kernel of 3×3 layer stacked with a batch normalization layer as a branch of the residual structure,

以及一个卷积核为1×1的卷积层与一个批量归一化层堆叠作为残差结构的另一个分支；And a convolution layer with a convolution kernel of 1×1 stacked with a batch normalization layer as another branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层。The two branches perform an addition operation, and finally pass through a LeakyRelu layer in a decoder module.

所述食管癌CT图像分割模型的训练包括以下步骤：The training of described esophagus cancer CT image segmentation model comprises the following steps:

31)将食管癌CT图像数据集输入食管癌CT图像分割模型的Swin Transformer网络模型，从Swin Transformer网络模型输出特征图；31) Input the esophageal cancer CT image data set into the Swin Transformer network model of the esophageal cancer CT image segmentation model, and output the feature map from the Swin Transformer network model;

输入到用于超分辨率重建的Swin Transformer网络模型，执行一次卷积核大小为1×1的卷积层，经过6个连续的RSTB模块，一个残差连接操作，一个卷积核大小为1×1的卷积层，一个LeakyRelu层一个上采样操作，一个卷积核大小为1×1的卷积层，得到一个特征图；Input to the Swin Transformer network model for super-resolution reconstruction, perform a convolution layer with a convolution kernel size of 1×1, go through 6 consecutive RSTB modules, a residual connection operation, and a convolution kernel size of 1 ×1 convolutional layer, a LeakyRelu layer, an upsampling operation, and a convolutional layer with a convolution kernel size of 1×1 to obtain a feature map;

32)将Swin Transformer输出的特征图与原图通过拼接操作进行渐进式信息融合，得到拼接后的特征图；32) Perform progressive information fusion of the feature map output by the Swin Transformer and the original image through splicing operations to obtain the spliced feature map;

33)将拼接后的特征图输入到TransResUNet卷积神经网络模型；33) input the spliced feature map into the TransResUNet convolutional neural network model;

34)拼接后的特征图在下采样编码器模块中进行训练：34) The spliced feature maps are trained in the downsampling encoder module:

341)针对拼接后的特征图输入到第一个带残差的下采样结构，第一个带残差的下采样结构的分支A为一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个Transformer Encoder Block层、一个卷积核为3×3的卷积层、一个批量归一化层堆叠；第一个带残差的下采样结构的分支B为一个卷积核为1×1的卷积层、一个批量归一化层堆叠；341) Input the spliced feature map to the first downsampling structure with residuals, branch A of the first downsampling structure with residuals is a convolutional layer with a convolution kernel of 3×3, a batch A normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolutional layer with a convolution kernel of 3×3, and a batch normalization layer are stacked; the first branch B of the downsampling structure with residuals is A convolution layer with a convolution kernel of 1×1 and a batch normalization layer are stacked;

两个分支进行加法操作，最后执行一个LeakyRelu层得到第一个下采样输出；The two branches perform addition operations, and finally execute a LeakyRelu layer to obtain the first downsampling output;

342)对第一个下采样输出送入到第二个带残差的下采样结构，第二个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第二个下采样输出；342) The first downsampling output is sent to the second downsampling structure with residuals, the branch A and branch B of the second downsampling structure with residuals are added, and finally a LeakyRelu layer is executed to obtain the first Two downsampling outputs;

343)对第二个下采样输出送入到第三个带残差的下采样结构，第三个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第三个下采样输出；343) The second downsampling output is sent to the third downsampling structure with residuals, the branch A and branch B of the third downsampling structure with residuals are added, and finally a LeakyRelu layer is executed to obtain the first Three downsampling outputs;

344)对第三个下采样输出送入到第四个带残差的下采样结构，第四个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第四个下采样输出；344) The third downsampling output is sent to the fourth downsampling structure with residuals, the branch A and branch B of the fourth downsampling structure with residuals are added, and finally a LeakyRelu layer is executed to obtain the first Four downsampling outputs;

35)将以上四个下采样输出，输入到解码器模块，进行上采样恢复图像的分辨率；35) The above four down-sampling outputs are input to the decoder module, and up-sampling is performed to restore the resolution of the image;

36)将第四个下采样输出，输入到特征金字塔ASPP模块后拼接到解码器的第一个输入；36) the fourth down-sampling output is input to the first input of the decoder after being input to the feature pyramid ASPP module;

37)对解码器的第一个输入，执行一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支；37) For the first input of the decoder, perform a double upsampling operation, a concatenation operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, and a LeakyRelu layer , a convolution layer with a convolution kernel of 3×3 and a batch normalization layer are stacked as a branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第二个输入；The two branches perform an addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain the second input of the decoder;

38)将第三个下采样输出拼接到解码器的第二个输入，38) splicing the third downsampled output to the second input of the decoder,

对解码器的第二个输入，执行一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支；For the second input of the decoder, perform a double upsampling operation, a concatenation operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, a LeakyRelu layer, a A convolution layer with a convolution kernel of 3×3 is stacked with a batch normalization layer as a branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第三个输入；The two branches perform addition operations, and finally pass through the LeakyRelu layer in a decoder module to obtain the third input of the decoder;

39)将第二个下采样输出拼接到解码器的第三个输入，39) splicing the second downsampled output to the third input of the decoder,

对解码器的第三个输入，执行一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支；For the third input of the decoder, perform a double upsampling operation, a concatenation operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, a LeakyRelu layer, a A convolution layer with a convolution kernel of 3×3 is stacked with a batch normalization layer as a branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第四个输入；The two branches perform addition operations, and finally pass through a LeakyRelu layer in a decoder module to obtain the fourth input of the decoder;

310)将第一个下采样输出拼接到解码器的第四个输入，310) splicing the first downsampled output to the fourth input of the decoder,

对解码器的第四个输入，执行一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支；For the fourth input of the decoder, perform a double upsampling operation, a concatenation operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, a LeakyRelu layer, a A convolution layer with a convolution kernel of 3×3 is stacked with a batch normalization layer as a branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，再经过一个二倍上采样、一个卷积核为1×1的卷积层得到TransResUNet的最终输出；The two branches perform addition operations, and finally pass through a LeakyRelu layer in a decoder module, and then pass through a double upsampling and a convolutional layer with a convolution kernel of 1×1 to obtain the final output of TransResUNet;

311)正向传播，得到分割概率；311) Forward propagation to obtain the segmentation probability;

312)使用交叉熵损失函数和Dice损失函数作为食管癌CT图像分割模型的损失函数，对分割概率进行计算得到分割损失，其表达式如下：312) Using the cross-entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, the segmentation probability is calculated to obtain the segmentation loss, and its expression is as follows:

其中，交叉熵损失函数CE(p,q)中C表示类别数，p_i为真实值，q_i为预测值；DiceLoss公式中A和B分别代表其真实标签和模型预测标签所对应的掩膜矩阵，|A∩B|是A和B之间的交集，|A|和|B|分别表示A和B元素的个数，其中，分子的系数为2，是因为分母存在重复计算A和B之间的共同元素的原因；Among them, C in the cross-entropy loss function CE(p,q) represents the number of categories, p _i is the real value, and q _i is the predicted value; A and B in the DiceLoss formula represent the mask corresponding to the real label and the model prediction label respectively Matrix, |A∩B| is the intersection between A and B, and |A| and |B| represent the number of A and B elements, respectively, where the coefficient of the numerator is 2, because the denominator has repeated calculations of A and B The reason for the common elements between;

313)使用L1损失函数作为Swin Transformer网络模型的损失函数对食管癌CT图像进行超分辨率重建，其表达式如下：313) Use the L1 loss function as the loss function of the Swin Transformer network model to perform super-resolution reconstruction on CT images of esophageal cancer, and its expression is as follows:

其中，N表示样本数量，y_i是第i个样本的真实标签f(x_i)是第i个样本的模型预测值；Among them, N represents the number of samples, y _i is the true label of the i-th sample f(xi ₎ is the model prediction value of the i-th sample;

314)通过损失值反向传播确定梯度向量，更新食管癌CT图像分割模型参数；314) Determine the gradient vector by backpropagating the loss value, and update the parameters of the esophageal cancer CT image segmentation model;

315)判断是否达到设定的训练轮数，若是则完成食管癌CT图像分割模型的训练，否则继续训练。315) Judging whether the set number of training rounds is reached, if so, complete the training of the esophageal cancer CT image segmentation model, otherwise continue the training.

有益效果Beneficial effect

本发明的基于两阶段渐进式信息融合的食管癌CT图像肿瘤区自动分割方法，与现有技术相比基于食管CT影像噪声大、分辨率不高、有伪影的特点，提出用图像超分辨率重建网络提取的特征，再渐进式融合到分割网络中，有效的增强了食管CT影像的质量，能够使网络提取到更加丰富的细节特征，可以有效地进行食管癌靶区的分割和勾画，提高分割的准确性和效率。Compared with the prior art, the automatic segmentation method of tumor area in esophageal cancer CT images based on two-stage progressive information fusion of the present invention is based on the characteristics of large noise, low resolution and artifacts in esophageal CT images, and proposes to use image super-resolution The features extracted by the high-rate reconstruction network are gradually fused into the segmentation network, which effectively enhances the quality of esophageal CT images, enables the network to extract more detailed features, and can effectively segment and outline esophageal cancer targets. Improve the accuracy and efficiency of segmentation.

由于食管肿瘤位置多变、肿瘤解剖结构复杂、肿瘤靶区边界模糊、个体差异大，本发明提出的改进的TransformerResUNet可以提取到远距离依赖特征，从而提高肿瘤靶区的分割精度，提高模型鲁棒性。本发明增加了影像超分辨率重建分支、具有长距离依赖特征提取的Transformer Encoder Block模块、具有多尺度特征融合的ASPP模块，增强了网络的特征提取能力。Due to the variable location of esophageal tumors, the complex anatomical structure of the tumor, the blurred boundaries of the tumor target area, and the large individual differences, the improved TransformerResUNet proposed by the present invention can extract long-distance dependent features, thereby improving the segmentation accuracy of the tumor target area and improving the robustness of the model. sex. The invention adds an image super-resolution reconstruction branch, a Transformer Encoder Block module with long-distance dependent feature extraction, and an ASPP module with multi-scale feature fusion, which enhances the feature extraction capability of the network.

附图说明Description of drawings

图1为本发明的方法顺序图；Fig. 1 is a method sequence diagram of the present invention;

图2为本发明所涉及的食管癌CT图像分割模型结构图；Fig. 2 is the structural diagram of the esophageal cancer CT image segmentation model involved in the present invention;

图3为本发明所涉及的TransResUNet卷积神经网络模型结构图；Fig. 3 is a structural diagram of the TransResUNet convolutional neural network model involved in the present invention;

图4为本发明所涉及的特征金字塔ASPP模块结构图；Fig. 4 is the feature pyramid ASPP module structural diagram involved in the present invention;

图5为本发明所涉及的Swin Transformer网络模型结构图；Fig. 5 is the Swin Transformer network model structural diagram involved in the present invention;

图6a、图7a均为现有技术中的食管癌CT图像；Figure 6a and Figure 7a are CT images of esophageal cancer in the prior art;

图6b、图7b分别为图6a、图7a分割后的标签图像；Figure 6b and Figure 7b are the segmented label images of Figure 6a and Figure 7a respectively;

图6c、图7c分别为对图6a、图7a利用本发明所述方法产生的自动分割图像；Fig. 6c and Fig. 7c are the automatically segmented images generated by the method of the present invention for Fig. 6a and Fig. 7a respectively;

图6d、图7d分别为对图6a、图7a利用ResUNet网络产生的自动分割图像；Figure 6d and Figure 7d are the automatically segmented images generated by using the ResUNet network for Figure 6a and Figure 7a respectively;

图6e、图7e分别为对图6a、图7a利用UNet网络产生的自动分割图像。Figure 6e and Figure 7e are the automatically segmented images generated by the UNet network for Figure 6a and Figure 7a respectively.

具体实施方式Detailed ways

为使对本发明的结构特征及所达成的功效有更进一步的了解与认识，用以较佳的实施例及附图配合详细的说明，说明如下：In order to have a further understanding and understanding of the structural features of the present invention and the achieved effects, the preferred embodiments and accompanying drawings are used for a detailed description, as follows:

如图1所示，本发明所述的一种基于两阶段渐进式信息融合的食管癌CT图像肿瘤区自动分割方法，包括以下步骤：As shown in Fig. 1, a kind of method for automatic segmentation of tumor area in CT image of esophagus cancer based on two-stage progressive information fusion described in the present invention comprises the following steps:

第一步，食管癌CT图像的获取及预处理：获取DICOM格式的CT影像，对CT影像中食管颈段和食管腹段区域的CT图像数据进行数据增强处理，并对所有CT影像进行切片处理，即对三维的DICOM格式的CT影像进行截取操作，得到二维的.jpg格式的CT影像切片和.png格式的二值化标签影像，组成食管癌CT图像数据集。The first step is the acquisition and preprocessing of CT images of esophageal cancer: obtain CT images in DICOM format, perform data enhancement processing on the CT image data of the cervical esophagus and abdominal esophagus in the CT images, and perform slice processing on all CT images , which is to intercept the 3D CT image in DICOM format to obtain 2D CT image slices in .jpg format and binarized label images in .png format to form an esophageal cancer CT image dataset.

第二步，构建食管癌CT图像分割模型：基于两阶段的渐进式信息融合技术构建食管癌CT图像分割模型。The second step is to build a CT image segmentation model for esophageal cancer: to build a CT image segmentation model for esophageal cancer based on two-stage progressive information fusion technology.

DICOM格式的CT影像存储了患者丰富的医学影像信息，然而并不适用于深度学习网络的训练。因此需要对DICOM格式的数据进行转换，从中提取原始影像与标签信息用于模型训练和后续的数据分析工作。深度学习算法通常采用RGB格式作为数据输入，因此需要将DICOM格式的医学影像转换为常用的RGB格式，并制作成符合深度学习要求的数据集。转换过程包括两个关键步骤：首先，需要读取原始DICOM格式文件的元数据，对患者的每个切片进行单独分析，并提取像素进行归一化，设置到0～1之间。为了便于存储和后续的深度学习分析，将其映射至0～255之间，储存为512×512大小的影像切片数据。此外在进行深度学习的肿瘤分割任务中，需要使用由物理医师手动勾画的肿瘤区域标签。这一过程的读取，需要仔细阅读元数据，并找到与标签命名相应的轮廓。将其像素值设置为1，并将其余像素值设置为0。CT images in DICOM format store rich medical imaging information of patients, but they are not suitable for training deep learning networks. Therefore, it is necessary to convert the data in DICOM format, and extract the original image and label information from it for model training and subsequent data analysis. Deep learning algorithms usually use RGB format as data input, so it is necessary to convert medical images in DICOM format to commonly used RGB format, and make a dataset that meets the requirements of deep learning. The conversion process includes two key steps: first, it is necessary to read the metadata of the original DICOM format file, analyze each slice of the patient individually, extract pixels for normalization, and set them between 0 and 1. In order to facilitate storage and subsequent deep learning analysis, it is mapped to between 0 and 255 and stored as image slice data with a size of 512×512. In addition, in the tumor segmentation task of deep learning, it is necessary to use the tumor region labels manually drawn by physicists. The reading of this process requires careful reading of the metadata and finding the outline corresponding to the label naming. Set its pixel value to 1 and the rest to 0.

由于数据集中食管颈段和食管腹段区域的CT图像数据较少，对其进行数据增强以扩充数据集，可以增强网络的鲁棒性；同时，由于食管肿瘤区域较小，形状较不规则，CT影像噪声大、分辨率不高、有伪影，构建基于两阶段的渐进式信息融合技术构建食管癌CT图像分割模型可以有效地优化此问题，提升模型的效果。Due to the lack of CT image data in the cervical and abdominal regions of the esophagus in the data set, data enhancement to expand the data set can enhance the robustness of the network; at the same time, because the esophageal tumor area is small and the shape is relatively irregular CT images are noisy, have low resolution, and have artifacts. Building a CT image segmentation model for esophageal cancer based on two-stage progressive information fusion technology can effectively optimize this problem and improve the effect of the model.

(1)如图2所示，设定食管癌CT图像分割模型包括用于超分辨率重建的SwinTransformer网络模型和TransResUNet卷积神经网络模型，超分辨率重建的SwinTransformer输出的特征图与原图通过拼接操作进行渐进式信息融合，之后输入到用于分割的TransResUNet卷积神经网络模型得到最终的分割图。(1) As shown in Figure 2, the CT image segmentation model of esophageal cancer is set to include the SwinTransformer network model for super-resolution reconstruction and the TransResUNet convolutional neural network model. The feature map output by SwinTransformer for super-resolution reconstruction is the same as the original image The splicing operation performs progressive information fusion, and then input to the TransResUNet convolutional neural network model for segmentation to obtain the final segmentation map.

(2)如图5所示，设定Swin Transformer网络模型，其包括6个带残差的SwinTransformer模块RSTB和一个残差连接结构，其中RSTB由6个Swin Transformer Layer和一个卷积、一个残差连接组成。(2) As shown in Figure 5, set the Swin Transformer network model, which includes 6 SwinTransformer modules RSTB with residuals and a residual connection structure, where RSTB consists of 6 Swin Transformer Layers and a convolution, a residual Connection composition.

(3)设定TransResUNet卷积神经网络模型：(3) Set the TransResUNet convolutional neural network model:

A1)如图2所示，设定TransResUNet卷积神经网络模型包括一个用于特征提取的下采样编码器模块，一个用于获取不同尺度感受野的特征金字塔ASPP模块，一个用于恢复图像分辨率的上采样解码器模块；A1) As shown in Figure 2, the TransResUNet convolutional neural network model is set to include a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining receptive fields of different scales, and a module for restoring image resolution The upsampling decoder module;

A2)设定下采样编码器模块包括4个连续的带残差的下采样结构；A2) Set the downsampling encoder module to include 4 continuous downsampling structures with residuals;

A3)如图4所示，设定特征金字塔ASPP模块，其包括：A3) as shown in Figure 4, set feature pyramid ASPP module, it comprises:

A4)设定上采样解码器模块，包括4个连续的带残差的上采样结构，以及与编码器的四个带残差的下采样结构的输出分支的拼接结构；A4) Set the upsampling decoder module, including 4 continuous upsampling structures with residuals, and the splicing structure with the output branches of the four downsampling structures with residuals of the encoder;

第三步，食管癌CT图像分割模型的训练：将食管癌CT图像数据集输入食管癌CT图像分割模型进行训练。The third step is the training of the esophageal cancer CT image segmentation model: input the esophageal cancer CT image data set into the esophageal cancer CT image segmentation model for training.

训练过程中，由Swin Transformer网络模型对原始影像进行超分辨率重建，获取更加丰富的特征拼接到原始影像中，增强原始影像的边界特征，再输入到TransResUNet卷积神经网络模型中进行图像分割，这种两阶段的渐进式信息融合框架可以获得更好的分割精度。During the training process, the Swin Transformer network model performs super-resolution reconstruction on the original image, obtains richer features and stitches them into the original image, enhances the boundary features of the original image, and then inputs it into the TransResUNet convolutional neural network model for image segmentation. This two-stage progressive information fusion framework can achieve better segmentation accuracy.

食管癌CT图像分割模型的训练包括以下步骤：The training of the CT image segmentation model for esophageal cancer includes the following steps:

(1)将食管癌CT图像数据集输入食管癌CT图像分割模型的Swin Transformer网络模型，从Swin Transformer网络模型输出特征图；(1) Input the esophageal cancer CT image data set into the Swin Transformer network model of the esophageal cancer CT image segmentation model, and output the feature map from the Swin Transformer network model;

输入到用于超分辨率重建的Swin Transformer网络模型，执行一次卷积核大小为1×1的卷积层，经过6个连续的RSTB模块，一个残差连接操作，一个卷积核大小为1×1的卷积层，一个LeakyRelu层一个上采样操作，一个卷积核大小为1×1的卷积层，得到一个特征图。Input to the Swin Transformer network model for super-resolution reconstruction, perform a convolution layer with a convolution kernel size of 1×1, go through 6 consecutive RSTB modules, a residual connection operation, and a convolution kernel size of 1 ×1 convolutional layer, a LeakyRelu layer, an upsampling operation, and a convolutional layer with a convolution kernel size of 1×1 to obtain a feature map.

(2)将Swin Transformer输出的特征图与原图通过拼接操作进行渐进式信息融合，得到拼接后的特征图。(2) The feature map output by the Swin Transformer and the original image are progressively fused through a splicing operation to obtain the spliced feature map.

(3)将拼接后的特征图输入到TransResUNet卷积神经网络模型。(3) Input the spliced feature map into the TransResUNet convolutional neural network model.

(4)拼接后的特征图在下采样编码器模块中进行训练：(4) The spliced feature maps are trained in the downsampling encoder module:

B1)针对拼接后的特征图输入到第一个带残差的下采样结构，第一个带残差的下采样结构的分支A为一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个Transformer Encoder Block层、一个卷积核为3×3的卷积层、一个批量归一化层堆叠；第一个带残差的下采样结构的分支B为一个卷积核为1×1的卷积层、一个批量归一化层堆叠；B1) The spliced feature map is input to the first downsampling structure with residuals, branch A of the first downsampling structure with residuals is a convolutional layer with a convolution kernel of 3×3, a batch A normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolutional layer with a convolution kernel of 3×3, and a batch normalization layer are stacked; the first branch B of the downsampling structure with residuals is A convolution layer with a convolution kernel of 1×1 and a batch normalization layer are stacked;

B2)对第一个下采样输出送入到第二个带残差的下采样结构，第二个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第二个下采样输出；B2) Send the output of the first downsampling to the second downsampling structure with residuals, and perform addition operations on branch A and branch B of the second downsampling structure with residuals, and finally execute a LeakyRelu layer to obtain the first Two downsampling outputs;

B3)对第二个下采样输出送入到第三个带残差的下采样结构，第三个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第三个下采样输出；B3) The second downsampling output is sent to the third downsampling structure with residuals, the branch A and branch B of the third downsampling structure with residuals are added, and finally a LeakyRelu layer is executed to obtain the first Three downsampling outputs;

B4)对第三个下采样输出送入到第四个带残差的下采样结构，第四个带残差的下采样结构的分支A与分支B进行加法操作，最后执行一个LeakyRelu层得到第四个下采样输出。B4) The third downsampling output is sent to the fourth downsampling structure with residuals, the branch A and branch B of the fourth downsampling structure with residuals are added, and finally a LeakyRelu layer is executed to obtain the first Four downsampled outputs.

(5)将以上四个下采样输出，输入到解码器模块，进行上采样恢复图像的分辨率。(5) The above four down-sampling outputs are input to the decoder module to perform up-sampling to recover the resolution of the image.

(6)将第四个下采样输出，输入到特征金字塔ASPP模块后拼接到解码器的第一个输入。(6) The fourth downsampling output is input to the feature pyramid ASPP module and then spliced to the first input of the decoder.

(7)对解码器的第一个输入，执行一个二倍上采样操作、一个对应编码器层拼接操作、一个卷积核为3×3的卷积层、一个批量归一化层、一个LeakyRelu层、一个卷积核为3×3的卷积层与一个批量归一化层堆叠作为残差结构的一个分支；(7) For the first input of the decoder, perform a double upsampling operation, a splicing operation corresponding to the encoder layer, a convolution layer with a convolution kernel of 3×3, a batch normalization layer, and a LeakyRelu Layer, a convolutional layer with a convolution kernel of 3×3 and a batch normalization layer are stacked as a branch of the residual structure;

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第二个输入。The two branches perform an addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain the second input of the decoder.

(8)将第三个下采样输出拼接到解码器的第二个输入，(8) Splicing the third downsampled output to the second input of the decoder,

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第三个输入。The two branches perform an addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain the third input of the decoder.

(9)将第二个下采样输出拼接到解码器的第三个输入，(9) Splicing the second downsampling output to the third input of the decoder,

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，得到解码器的第四个输入。The two branches perform an addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain the fourth input of the decoder.

(10)将第一个下采样输出拼接到解码器的第四个输入，(10) Splicing the first downsampled output to the fourth input of the decoder,

两个分支进行加法操作，最后经过一个解码器模块中的LeakyRelu层，再经过一个二倍上采样、一个卷积核为1×1的卷积层得到TransResUNet的最终输出。The two branches perform addition operations, and finally pass through a LeakyRelu layer in a decoder module, and then pass through a double upsampling and a convolutional layer with a convolution kernel of 1×1 to obtain the final output of TransResUNet.

(11)正向传播，得到分割概率；(11) Forward propagation to obtain the segmentation probability;

(12)使用交叉熵损失函数和Dice损失函数作为食管癌CT图像分割模型的损失函数，对分割概率进行计算得到分割损失，其表达式如下：(12) Using the cross-entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, the segmentation probability is calculated to obtain the segmentation loss, and its expression is as follows:

(13)使用L1损失函数作为Swin Transformer网络模型的损失函数对食管癌CT图像进行超分辨率重建，其表达式如下：(13) Using the L1 loss function as the loss function of the Swin Transformer network model to perform super-resolution reconstruction on CT images of esophageal cancer, the expression is as follows:

(14)通过损失值反向传播确定梯度向量，更新食管癌CT图像分割模型参数；(14) Determine the gradient vector through loss value backpropagation, and update the parameters of the esophageal cancer CT image segmentation model;

(15)判断是否达到设定的训练轮数，若是则完成食管癌CT图像分割模型的训练，否则继续训练。(15) Determine whether the set number of training rounds is reached, if so, complete the training of the esophageal cancer CT image segmentation model, otherwise continue training.

第四步，待分割食管癌CT图像的获得及预处理。The fourth step is to obtain and preprocess the CT image of esophageal cancer to be segmented.

第五步，食管癌CT图像分割结果的获得：将预处理后的待分割食管癌CT图像输入训练后的食管癌CT图像分割模型，得到分割后的食管癌CT图像。The fifth step is to obtain the segmentation result of esophageal cancer CT image: input the preprocessed esophageal cancer CT image to be segmented into the trained esophageal cancer CT image segmentation model to obtain the segmented esophageal cancer CT image.

如图6a、图7a所示，其为两位食管癌患者的CT切片影像，图6b、图7b为两位食管癌患者的CT切片影像对应的标签。从图6c和图7c可以看出，本发明所述方法，相对于图6d、图7d所述的ResUNet网络模型、图6e、图7e所述的UNet网络模型，自动分割边界信息更加完整，与标签具有良好的一致性。As shown in Fig. 6a and Fig. 7a, they are CT slice images of two patients with esophageal cancer, and Fig. 6b and Fig. 7b are labels corresponding to the CT slice images of two patients with esophageal cancer. It can be seen from Fig. 6c and Fig. 7c that, compared with the ResUNet network model described in Fig. 6d and Fig. 7d, and the UNet network model described in Fig. 6e and Fig. 7e, the method of the present invention has more complete automatic segmentation boundary information, which is different from Labels have good consistency.

DSC表示戴斯相似性系数，其值在[0,1]之间，值越大表示精度越高，HD表示豪斯多夫距离，值越小表示边界的重合度越高。为了进行公正的实验，所有实验均采用相同的训练初始参数进行。如表1可见，本发明所述方法相比于经典的UNet、DSC和HD指标上分别提高了0.19和7.88，相比于ResUNet、DSC和HD指标上分别提高了0.09和7.88。DSC represents the Deiss similarity coefficient, and its value is between [0,1]. The larger the value, the higher the accuracy, HD represents the Hausdorff distance, and the smaller the value, the higher the coincidence of the boundary. In order to conduct unbiased experiments, all experiments are performed with the same training initial parameters. As can be seen in Table 1, compared with the classic UNet, DSC and HD indicators, the method of the present invention has increased by 0.19 and 7.88, respectively, compared with ResUNet, DSC and HD indicators, respectively increased by 0.09 and 7.88.

表1本发明所述方法对比经典U-Net网络以及ResUNet在DSC和HD指标上的分割精度对比表Table 1 Comparison table of segmentation accuracy of the method described in the present invention compared with the classic U-Net network and ResUNet on DSC and HD indicators

以上显示和描述了本发明的基本原理、主要特征和本发明的优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是本发明的原理，在不脱离本发明精神和范围的前提下本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明的范围内。本发明要求的保护范围由所附的权利要求书及其等同物界定。The basic principles, main features and advantages of the present invention have been shown and described above. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. What are described in the above-mentioned embodiments and the description are only the principles of the present invention. Variations and improvements, which fall within the scope of the claimed invention. The scope of protection required by the present invention is defined by the appended claims and their equivalents.

Claims

1. An automatic segmentation method for tumor areas of esophageal cancer CT images based on two-stage progressive information fusion is characterized by comprising the following steps:

11 Acquisition and pretreatment of esophageal cancer CT images: acquiring a CT image in a DICOM format, performing data enhancement processing on CT image data of an esophageal neck region and an esophageal abdominal region in the CT image, and performing slicing processing on all the CT images, namely performing interception operation on the three-dimensional CT image in the DICOM format to obtain a two-dimensional CT image slice in the jpg format and a binarization tag image in the png format to form an esophageal cancer CT image dataset;

12 Constructing an esophageal cancer CT image segmentation model: constructing an esophageal cancer CT image segmentation model based on a two-stage progressive information fusion technology;

13 Training of esophageal cancer CT image segmentation model): inputting the esophageal cancer CT image data set into an esophageal cancer CT image segmentation model for training;

14 Obtaining and preprocessing a CT image of the esophageal cancer to be segmented;

15 Acquisition of esophageal cancer CT image segmentation results: inputting the preprocessed esophageal cancer CT image to be segmented into a trained esophageal cancer CT image segmentation model to obtain a segmented esophageal cancer CT image.

2. The automatic segmentation method for the tumor area of the esophageal cancer CT image based on the two-stage progressive information fusion according to claim 1, wherein the construction of the esophageal cancer CT image segmentation model comprises the following steps:

21 Setting a esophageal cancer CT image segmentation model comprising a Swin Transformer network model and a TransResunet convolutional neural network model for super-resolution reconstruction, wherein the Swin is reconstructed by super-resolution

The feature map output by the transducer and the original map are subjected to progressive information fusion through splicing operation, and then input into a TransResunet convolutional neural network model for segmentation to obtain a final segmentation map;

22 A Swin transducer network model is set, which comprises 6 Swin transducer modules RSTB with residual errors and a residual error connection structure, wherein the RSTB consists of 6 Swin Transformer Layer and one convolution and one residual error connection;

23 Setting a TransResUNet convolutional neural network model:

231 Setting a TransResunet convolutional neural network model, wherein the TransResunet convolutional neural network model comprises a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining different scale receptive fields, and an upsampling decoder module for recovering image resolution;

232 Setting up a downsampling encoder module comprising 4 consecutive downsampling structures with residuals;

the downsampling structure with the residual comprises a branch A and a branch B, wherein the branch A is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3, and a batch normalization layer stack which is used as the branch A of the residual structure;

the branch B is a convolution layer with a convolution kernel of 1 multiplied by 1, and a batch of normalization layers are stacked to be used as another branch B of a residual structure;

the two branches are added, and finally pass through a LeakyRelu layer;

233 A set feature pyramid ASPP module comprising:

a first operation module: a convolution kernel is a1 x 1 convolution layer;

a second operation module: a convolution layer with a3 x 3 cavity rate of 6;

and a third operation module: a convolution layer with a3 x 3 cavity rate of 12;

fourth operation module: a convolution layer with a3 x 3 void fraction of 18;

a fifth operation module: an adaptive average pooling layer, a convolution layer with a convolution kernel of 1×1, and an up-sampling operation;

the five operation modules are connected in parallel, the obtained 5 feature images are spliced, and the five feature images are subjected to convolution kernel to form a convolution layer of 1 multiplied by 1;

234 Setting up a upsampling decoder module comprising 4 consecutive upsampling structures with residuals and a splice structure with the output branches of the four downsampling structures with residuals of the encoder;

the four residual downsampling structures of the encoder result in four different sized outputs respectively,

the output of the fourth size, input to the feature pyramid ASPP module and spliced to the first input of the decoder,

the output of the third size is spliced to the second input of the decoder,

the output of the second size is spliced to a third input of the decoder,

the output of the first size is spliced to a fourth input of the decoder;

the decoder structure is four consecutive upsampled blocks with residuals,

the upsampled block structure with residual is:

a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure,

and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;

the two branches perform an addition operation and finally pass through the LeakyRelu layer in one decoder block.

3. The automatic segmentation method for the tumor area of the esophageal cancer CT image based on the two-stage progressive information fusion according to claim 1, wherein the training of the esophageal cancer CT image segmentation model comprises the following steps:

31 Inputting the esophageal cancer CT image data set into a Swin Transformer network model of the esophageal cancer CT image segmentation model, and outputting a feature map from the Swin Transformer network model;

inputting to a Swin transform network model for super-resolution reconstruction, executing a convolution layer with a convolution kernel size of 1×1, performing a residual error connection operation on the convolution layer with the convolution kernel size of 1×1, performing an upsampling operation on the convolution layer with the convolution kernel size of 1×1 through 6 continuous RSTB modules, and obtaining a feature map;

32 Performing progressive information fusion on the feature map output by the Swin Transformer and the original map through splicing operation to obtain a spliced feature map;

33 Inputting the spliced characteristic diagram into a TransResunet convolutional neural network model;

34 Training the spliced feature map in a downsampling encoder module:

341 Inputting the spliced feature map into a first downsampling structure with residual errors, wherein a branch A of the first downsampling structure with residual errors is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3 and a batch normalization layer stack; the first branch B with the residual downsampling structure is a convolution layer with a convolution kernel of 1 multiplied by 1 and a batch normalization layer stack;

the two branches perform addition operation, and finally a LeakyRelu layer is executed to obtain a first downsampled output;

342 Feeding the first downsampled output into a second downsampled structure with residual errors, adding the branch A and the branch B of the second downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a second downsampled output;

343 Feeding the second downsampled output into a third downsampled structure with residual errors, adding the branch A and the branch B of the third downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a third downsampled output;

344 Feeding the third downsampled output into a fourth downsampled structure with residual errors, adding the branch A and the branch B of the fourth downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a fourth downsampled output;

35 The four downsampled outputs are input to a decoder module to carry out upsampling to recover the resolution of the image;

36 A fourth downsampled output is input to the feature pyramid ASPP module and spliced to the first input of the decoder;

37 For a first input of the decoder, performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure;

the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a second input of the decoder;

38 A third downsampled output is spliced to a second input of the decoder,

performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a second input to the decoder;

the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a third input of the decoder;

39 A second downsampled output is spliced to a third input of the decoder,

performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a third input to the decoder;

the two branches perform addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain a fourth input of the decoder;

310 A) the first downsampled output is spliced to a fourth input of the decoder,

performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a fourth input to the decoder;

the two branches are added, and finally, a final output of the TransResunet is obtained through a LeakyRelu layer in a decoder module, a convolution layer which is subjected to double up-sampling and a convolution kernel of 1 multiplied by 1;

311 Forward propagation to obtain segmentation probability;

312 Using the cross entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, calculating the segmentation probability to obtain segmentation loss, wherein the expression is as follows:

wherein C in the cross entropy loss function CE (p, q) represents the number of categories, p _i Is true value, q _i Is a predicted value; a and B in the Dice Loss formula respectively represent mask matrixes corresponding to the real label and the model prediction label, the A and B are intersections between A and B, the A and B respectively represent the number of the A and B elements, wherein the coefficient of the molecule is 2 because of repeated calculation of common elements between A and B in denominator;

313 Using the L1 loss function as the loss function of the Swin transducer network model to reconstruct the esophageal cancer CT image in super resolution, the expression is as follows:

wherein N represents the number of samples, y _i The real label f (x _i ) Is the model predictive value of the ith sample;

314 Determining gradient vectors through back propagation of loss values, and updating the parameters of the esophageal cancer CT image segmentation model;

315 Judging whether the set training round number is reached, if so, completing the training of the esophageal cancer CT image segmentation model, otherwise, continuing the training.