CN116645380A - Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion - Google Patents
Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion Download PDFInfo
- Publication number
- CN116645380A CN116645380A CN202310688086.0A CN202310688086A CN116645380A CN 116645380 A CN116645380 A CN 116645380A CN 202310688086 A CN202310688086 A CN 202310688086A CN 116645380 A CN116645380 A CN 116645380A
- Authority
- CN
- China
- Prior art keywords
- layer
- image
- esophageal cancer
- convolution
- decoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000000461 Esophageal Neoplasms Diseases 0.000 title claims abstract description 101
- 206010030155 Oesophageal carcinoma Diseases 0.000 title claims abstract description 101
- 201000004101 esophageal cancer Diseases 0.000 title claims abstract description 101
- 230000011218 segmentation Effects 0.000 title claims abstract description 42
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 230000000750 progressive effect Effects 0.000 title claims abstract description 22
- 238000003709 image segmentation Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000010606 normalization Methods 0.000 claims description 64
- 238000005070 sampling Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 19
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000003187 abdominal effect Effects 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 239000011800 void material Substances 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 2
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 239000000203 mixture Substances 0.000 abstract 1
- 230000006870 function Effects 0.000 description 12
- 238000013135 deep learning Methods 0.000 description 7
- 238000001959 radiotherapy Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 238000002512 chemotherapy Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 208000025402 neoplasm of esophagus Diseases 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000002271 resection Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 241000321096 Adenoides Species 0.000 description 1
- 206010013952 Dysphonia Diseases 0.000 description 1
- 208000010473 Hoarseness Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 210000002534 adenoid Anatomy 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002591 computed tomography Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000012277 endoscopic treatment Methods 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Epidemiology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Radiology & Medical Imaging (AREA)
- Image Processing (AREA)
Abstract
The invention relates to an automatic segmentation method for tumor areas of an esophageal cancer CT image based on two-stage progressive information fusion, which solves the defect that the automatic segmentation of the esophageal cancer CT image is difficult to realize compared with the prior art. The invention comprises the following steps: acquiring and preprocessing an esophageal cancer CT image; constructing an esophageal cancer CT image segmentation model; training of an esophageal cancer CT image segmentation model; obtaining and preprocessing a CT image of the esophageal cancer to be segmented; and obtaining the esophageal cancer CT image segmentation result. Based on the characteristics of large noise, low resolution and artifacts of the esophageal CT image, the invention proposes the characteristics extracted by using the image super-resolution reconstruction network, and then gradually blends the characteristics into the segmentation network, thereby effectively enhancing the quality of the esophageal CT image, enabling the network to extract more abundant detail characteristics, effectively carrying out segmentation and sketching of an esophageal cancer target region and improving the accuracy and efficiency of segmentation.
Description
Technical Field
The invention relates to the technical field of medical image segmentation, in particular to an automatic segmentation method for esophageal cancer CT image tumor regions based on two-stage progressive information fusion.
Background
Esophageal cancer is a dominant invasive malignancy in men, including squamous cell esophageal cancer and adenoid esophageal cancer, which have different pathological characteristics and profiles. Worldwide, squamous cell esophageal cancer remains the most common type. Currently, esophageal resection and triple-field lymphotomy techniques have reached local control limits, requiring further development of the technique. In addition, esophageal cancer is very invasive, and complications such as lymphatic and blood metastasis and the like can occur in early stage. Because early symptoms of the disease are not obvious, most patients have advanced tumor when patients have symptoms such as difficult eating, hoarseness and the like, and the time for surgical excision is missed. For patients with esophageal cancer which can not receive operation treatment, the effect is poor by only using single means such as chemotherapy, targeted therapy and the like, and the survival rate of 5 years is low. In contrast, multi-mode comprehensive treatment methods such as chemotherapy, radiotherapy and endoscopic treatment are becoming the mainstream, and can provide long-term survival opportunities for a part of patients with advanced esophageal cancer, and play an important role in the situation of unsuitable surgical resection. One of the main problems with radiation therapy is the determination of the location of the tumor, which requires a tool that can assist in positioning. Therefore, the computed tomography technique is widely used in the planning of radiotherapy plans.
Accurate radiotherapy requires accurate determination and delineation of the target region of the radiotherapy. Currently, the target delineation work of radiotherapy is mainly done manually by experienced doctors and physicists, and the accuracy depends on the experience level of doctors. However, this is cumbersome and time consuming, and an experienced physician may take two days to complete the annotation of a set of images.
Therefore, how to automatically delineate a target area on a medical image becomes a popular problem in the field of computer vision. Work on radiotherapy target volume delineation, there is currently no mature viable solution due to the complexity of the esophageal organs. In order to improve the working efficiency of doctors and realize accurate treatment of esophageal cancer, research and automatic sketching of tumor target areas of esophageal cancer become a urgent problem to be solved.
The esophageal cancer medical image segmentation based on deep learning is a breakthrough technology, can well realize the tasks of automatic classification, identification and segmentation of organs and tumor target areas, and furthest reduces the internal information of images which are difficult to be found by doctors. In diagnosis of esophageal cancer medical image segmentation, with the aid of AI images, a surgeon can quickly and effectively detect cancers, and diagnosis time is saved. Recent studies have also shown that this technique has satisfactory robustness and potential.
However, automatic delineation of esophageal cancer tumor targets or clinical targets is a challenging task. Segmentation of tumor regions depends on contrast differences between tumor and surrounding tissue in CT images due to the diversity of esophageal cancer lesions morphology, variability in location, and complexity of surrounding tissue, it is difficult for conventional deep learning algorithms to capture all details and features of the tumor.
Therefore, how to automatically segment the esophageal cancer CT image with complex lesion image differences has become a technical problem to be solved.
Disclosure of Invention
The invention aims to solve the defect that automatic segmentation is difficult to be carried out on an esophageal cancer CT image in the prior art, and provides an esophageal cancer CT image tumor area automatic segmentation method based on two-stage progressive information fusion to solve the problems.
In order to achieve the above object, the technical scheme of the present invention is as follows:
an automatic segmentation method for tumor areas of esophageal cancer CT images based on two-stage progressive information fusion comprises the following steps:
11 Acquisition and pretreatment of esophageal cancer CT images: acquiring a CT image in a DICOM format, performing data enhancement processing on CT image data of an esophageal neck region and an esophageal abdominal region in the CT image, and performing slicing processing on all the CT images, namely performing interception operation on the three-dimensional CT image in the DICOM format to obtain a two-dimensional CT image slice in the jpg format and a binarization tag image in the png format to form an esophageal cancer CT image dataset;
12 Constructing an esophageal cancer CT image segmentation model: constructing an esophageal cancer CT image segmentation model based on a two-stage progressive information fusion technology;
13 Training of esophageal cancer CT image segmentation model): inputting the esophageal cancer CT image data set into an esophageal cancer CT image segmentation model for training;
14 Obtaining and preprocessing a CT image of the esophageal cancer to be segmented;
15 Acquisition of esophageal cancer CT image segmentation results: inputting the preprocessed esophageal cancer CT image to be segmented into a trained esophageal cancer CT image segmentation model to obtain a segmented esophageal cancer CT image.
The construction of the esophageal cancer CT image segmentation model comprises the following steps:
21 Setting a esophageal cancer CT image segmentation model comprising a Swin transform network model and a TransResune convolutional neural network model for super-resolution reconstruction, performing progressive information fusion on a characteristic image and an original image output by the Swin transform for super-resolution reconstruction through splicing operation, and then inputting the characteristic image and the original image into the TransResune convolutional neural network model for segmentation to obtain a final segmentation image;
22 A Swin transducer network model is set up, which comprises 6 Swin with residuals
The transducer module RSTB and a residual connection structure, wherein the RSTB consists of 6 Swins
Transformer Layer and a convolution, a residual connection;
23 Setting a TransResUNet convolutional neural network model:
231 Setting a TransResunet convolutional neural network model, wherein the TransResunet convolutional neural network model comprises a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining different scale receptive fields, and an upsampling decoder module for recovering image resolution;
232 Setting up a downsampling encoder module comprising 4 consecutive downsampling structures with residuals;
the downsampling structure with the residual comprises a branch A and a branch B, wherein the branch A is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3, and a batch normalization layer stack which is used as the branch A of the residual structure;
the branch B is a convolution layer with a convolution kernel of 1 multiplied by 1, and a batch of normalization layers are stacked to be used as another branch B of a residual structure;
the two branches are added, and finally pass through a LeakyRelu layer;
233 A set feature pyramid ASPP module comprising:
a first operation module: a convolution kernel is a1 x 1 convolution layer;
a second operation module: a convolution layer with a3 x 3 cavity rate of 6;
and a third operation module: a convolution layer with a3 x 3 cavity rate of 12;
fourth operation module: a convolution layer with a3 x 3 void fraction of 18;
a fifth operation module: an adaptive average pooling layer, a convolution layer with a convolution kernel of 1×1, and an up-sampling operation;
the five operation modules are connected in parallel, the obtained 5 feature images are spliced, and the five feature images are subjected to convolution kernel to form a convolution layer of 1 multiplied by 1;
235 Setting up a upsampling decoder module comprising 4 consecutive upsampling structures with residuals and a splice structure with the output branches of the four downsampling structures with residuals of the encoder;
the four residual downsampling structures of the encoder result in four different sized outputs respectively,
the output of the fourth size, input to the feature pyramid ASPP module and spliced to the first input of the decoder,
the output of the third size is spliced to the second input of the decoder,
the output of the second size is spliced to a third input of the decoder,
the output of the first size is spliced to a fourth input of the decoder;
the decoder structure is four consecutive upsampled blocks with residuals,
the upsampled block structure with residual is:
a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure,
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform an addition operation and finally pass through the LeakyRelu layer in one decoder block.
The training of the esophageal cancer CT image segmentation model comprises the following steps of:
31 Inputting the esophageal cancer CT image data set into a Swin Transformer network model of the esophageal cancer CT image segmentation model, and outputting a feature map from the Swin Transformer network model;
inputting to a Swin transform network model for super-resolution reconstruction, executing a convolution layer with a convolution kernel size of 1×1, performing a residual error connection operation on the convolution layer with the convolution kernel size of 1×1, performing an upsampling operation on the convolution layer with the convolution kernel size of 1×1 through 6 continuous RSTB modules, and obtaining a feature map;
32 Performing progressive information fusion on the feature map output by the Swin Transformer and the original map through splicing operation to obtain a spliced feature map;
33 Inputting the spliced characteristic diagram into a TransResunet convolutional neural network model;
34 Training the spliced feature map in a downsampling encoder module:
341 Inputting the spliced feature map into a first downsampling structure with residual errors, wherein a branch A of the first downsampling structure with residual errors is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3 and a batch normalization layer stack; the first branch B with the residual downsampling structure is a convolution layer with a convolution kernel of 1 multiplied by 1 and a batch normalization layer stack;
the two branches perform addition operation, and finally a LeakyRelu layer is executed to obtain a first downsampled output;
342 Feeding the first downsampled output into a second downsampled structure with residual errors, adding the branch A and the branch B of the second downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a second downsampled output;
343 Feeding the second downsampled output into a third downsampled structure with residual errors, adding the branch A and the branch B of the third downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a third downsampled output;
344 Feeding the third downsampled output into a fourth downsampled structure with residual errors, adding the branch A and the branch B of the fourth downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a fourth downsampled output;
35 The four downsampled outputs are input to a decoder module to carry out upsampling to recover the resolution of the image;
36 A fourth downsampled output is input to the feature pyramid ASPP module and spliced to the first input of the decoder;
37 For a first input of the decoder, performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a second input of the decoder;
38 A third downsampled output is spliced to a second input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a second input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a third input of the decoder;
39 A second downsampled output is spliced to a third input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a third input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain a fourth input of the decoder;
310 A) the first downsampled output is spliced to a fourth input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a fourth input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches are added, and finally, a final output of the TransResunet is obtained through a LeakyRelu layer in a decoder module, a convolution layer which is subjected to double up-sampling and a convolution kernel of 1 multiplied by 1;
311 Forward propagation to obtain segmentation probability;
312 Using the cross entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, calculating the segmentation probability to obtain segmentation loss, wherein the expression is as follows:
wherein C in the cross entropy loss function CE (p, q) represents the number of categories, p i Is true value, q i Is a predicted value; a and B in the Dice Loss formula respectively represent mask matrixes corresponding to the real label and the model prediction label, the A and B are intersections between A and B, the A and B respectively represent the number of the A and B elements, wherein the coefficient of the molecule is 2 because of repeated calculation of common elements between A and B in denominator;
313 Using the L1 loss function as the loss function of the Swin transducer network model to reconstruct the esophageal cancer CT image in super resolution, the expression is as follows:
wherein N represents the number of samples, y i The real label f (x i ) Is the model predictive value of the ith sample;
314 Determining gradient vectors through back propagation of loss values, and updating the parameters of the esophageal cancer CT image segmentation model;
315 Judging whether the set training round number is reached, if so, completing the training of the esophageal cancer CT image segmentation model, otherwise, continuing the training.
Advantageous effects
Compared with the prior art, the automatic segmentation method for the esophageal cancer CT image tumor area based on the two-stage progressive information fusion is characterized in that the characteristics extracted by using an image super-resolution reconstruction network are provided based on the characteristics of large noise, low resolution and artifacts of the esophageal CT image, and then the characteristics are gradually fused into a segmentation network, so that the quality of the esophageal CT image is effectively enhanced, the network can extract richer detail characteristics, the segmentation and the delineation of an esophageal cancer target area can be effectively performed, and the segmentation accuracy and the segmentation efficiency are improved.
Because the esophageal tumor position is changeable, the tumor anatomical structure is complex, the boundary of the tumor target area is fuzzy, and the individual difference is large, the improved transformerrescenet can extract the remote dependency characteristic, thereby improving the segmentation precision of the tumor target area and the robustness of the model. The invention increases image super-resolution reconstruction branches, transformer Encoder Block modules with long-distance dependent feature extraction and ASPP modules with multi-scale feature fusion, and enhances the feature extraction capability of the network.
Drawings
FIG. 1 is a process sequence diagram of the present invention;
FIG. 2 is a block diagram of an esophageal cancer CT image segmentation model according to the invention;
FIG. 3 is a diagram of a TransResunet convolutional neural network model in accordance with the present invention;
FIG. 4 is a block diagram of a feature pyramid ASPP module according to the present invention;
FIG. 5 is a diagram of the Swin transducer network model according to the present invention;
FIGS. 6a and 7a are CT images of esophageal cancer in the prior art;
fig. 6b and 7b are label images of the split labels of fig. 6a and 7a, respectively;
FIGS. 6c and 7c are, respectively, automatically segmented images produced by the method of the present invention for FIGS. 6a and 7 a;
fig. 6d, 7d are respectively the automatically segmented images generated using the ResUNet network for fig. 6a, 7 a;
fig. 6e and 7e are respectively the automatically segmented images generated using UNet networks for fig. 6a and 7 a.
Detailed Description
For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:
as shown in fig. 1, the automatic segmentation method of the esophageal cancer CT image tumor region based on two-stage progressive information fusion comprises the following steps:
firstly, acquiring and preprocessing an esophageal cancer CT image: and acquiring a CT image in a DICOM format, performing data enhancement processing on CT image data of an esophageal neck region and an esophageal abdominal region in the CT image, and performing slicing processing on all the CT images, namely performing interception operation on the three-dimensional CT image in the DICOM format to obtain a two-dimensional jpg CT image slice and a png format binarization tag image, thereby forming an esophageal cancer CT image dataset.
Secondly, constructing an esophageal cancer CT image segmentation model: and constructing an esophageal cancer CT image segmentation model based on a two-stage progressive information fusion technology.
DICOM format CT images store patient-rich medical image information, but are not suitable for training in deep learning networks. Therefore, the DICOM format data needs to be converted, and the original image and the label information are extracted from the DICOM format data for model training and subsequent data analysis. The deep learning algorithm usually adopts RGB format as data input, so that medical images in DICOM format need to be converted into common RGB format and manufactured into a data set meeting the requirement of deep learning. The conversion process includes two key steps: firstly, metadata of an original DICOM format file needs to be read, each slice of a patient is analyzed independently, pixels are extracted for normalization, and the pixels are set to be between 0 and 1. In order to facilitate storage and subsequent deep learning analysis, the image slice data is mapped to between 0 and 255 and stored as 512×512 image slice data. Furthermore, in the task of tumor segmentation for deep learning, it is necessary to use tumor region labels manually delineated by a physical physician. Reading of this process requires careful reading of the metadata and finding the outline corresponding to the tag name. Its pixel value is set to 1 and the remaining pixel values are set to 0.
Because the CT image data of the neck region and the abdominal region of the esophagus in the data set are less, the data is enhanced to expand the data set, so that the robustness of the network can be enhanced; meanwhile, as the esophageal tumor area is smaller, the shape is irregular, the CT image has large noise and low resolution and has artifacts, the problem can be effectively optimized by constructing the esophageal cancer CT image segmentation model based on the two-stage progressive information fusion technology, and the effect of the model is improved.
(1) As shown in fig. 2, the set esophageal cancer CT image segmentation model includes a Swin transform network model and a transresune convolutional neural network model for super-resolution reconstruction, and the feature map output by the Swin transform of the super-resolution reconstruction and the original map are subjected to progressive information fusion through a stitching operation, and then input into the transresune convolutional neural network model for segmentation to obtain a final segmentation map.
(2) As shown in fig. 5, a Swin Transformer network model is set, which includes 6 Swin Transformer modules RSTB with residuals and one residual connection structure, wherein the RSTB is composed of 6 Swin Transformer Layer and one convolution, one residual connection.
(3) Setting a TransResunet convolutional neural network model:
a1 As shown in fig. 2, the transresune convolutional neural network model is configured to include a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining different scale receptive fields, and an upsampling decoder module for recovering image resolution;
a2 Setting up a downsampling encoder module comprising 4 consecutive downsampling structures with residuals;
the downsampling structure with the residual comprises a branch A and a branch B, wherein the branch A is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3, and a batch normalization layer stack which is used as the branch A of the residual structure;
the branch B is a convolution layer with a convolution kernel of 1 multiplied by 1, and a batch of normalization layers are stacked to be used as another branch B of a residual structure;
the two branches are added, and finally pass through a LeakyRelu layer;
a3 As shown in fig. 4, a feature pyramid ASPP module is set, which includes:
a first operation module: a convolution kernel is a1 x 1 convolution layer;
a second operation module: a convolution layer with a3 x 3 cavity rate of 6;
and a third operation module: a convolution layer with a3 x 3 cavity rate of 12;
fourth operation module: a convolution layer with a3 x 3 void fraction of 18;
a fifth operation module: an adaptive average pooling layer, a convolution layer with a convolution kernel of 1×1, and an up-sampling operation;
the five operation modules are connected in parallel, the obtained 5 feature images are spliced, and the five feature images are subjected to convolution kernel to form a convolution layer of 1 multiplied by 1;
a4 Setting up a upsampling decoder module comprising 4 consecutive upsampling structures with residuals and a splice structure with the output branches of the four downsampling structures with residuals of the encoder;
the four residual downsampling structures of the encoder result in four different sized outputs respectively,
the output of the fourth size, input to the feature pyramid ASPP module and spliced to the first input of the decoder,
the output of the third size is spliced to the second input of the decoder,
the output of the second size is spliced to a third input of the decoder,
the output of the first size is spliced to a fourth input of the decoder;
the decoder structure is four consecutive upsampled blocks with residuals,
the upsampled block structure with residual is:
a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure,
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform an addition operation and finally pass through the LeakyRelu layer in one decoder block.
Thirdly, training an esophageal cancer CT image segmentation model: and inputting the esophageal cancer CT image data set into an esophageal cancer CT image segmentation model for training.
In the training process, the Swin transform network model is used for carrying out super-resolution reconstruction on the original image, the obtained richer features are spliced into the original image, the boundary features of the original image are enhanced, and then the enhanced boundary features are input into the transform convolutional neural network model for image segmentation, so that the two-stage progressive information fusion framework can obtain better segmentation precision.
The training of the esophageal cancer CT image segmentation model comprises the following steps:
(1) Inputting the esophageal cancer CT image data set into a Swin Transformer network model of an esophageal cancer CT image segmentation model, and outputting a feature map from the Swin Transformer network model;
the method comprises the steps of inputting the residual error into a Swin transform network model for super-resolution reconstruction, executing a convolution layer with a convolution kernel size of 1 multiplied by 1, performing a residual error connection operation through 6 continuous RSTB modules, performing a convolution layer with a convolution kernel size of 1 multiplied by 1, performing an upsampling operation on a LeakyRelu layer, and performing a convolution layer with a convolution kernel size of 1 multiplied by 1 to obtain a characteristic diagram.
(2) And carrying out progressive information fusion on the characteristic map output by the Swin Transformer and the original map through splicing operation to obtain a spliced characteristic map.
(3) And inputting the spliced characteristic diagram into a TransResunet convolutional neural network model.
(4) Training the spliced characteristic diagram in a downsampling coder module:
b1 Inputting the spliced feature map into a first downsampling structure with residual errors, wherein a branch A of the first downsampling structure with residual errors is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3 and a batch normalization layer stack; the first branch B with the residual downsampling structure is a convolution layer with a convolution kernel of 1 multiplied by 1 and a batch normalization layer stack;
the two branches perform addition operation, and finally a LeakyRelu layer is executed to obtain a first downsampled output;
b2 Feeding the first downsampled output into a second downsampled structure with residual errors, adding the branch A and the branch B of the second downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a second downsampled output;
b3 Feeding the second downsampled output into a third downsampled structure with residual errors, adding the branch A and the branch B of the third downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a third downsampled output;
b4 Feeding the third downsampled output into a fourth downsampled structure with residual, adding the branch A and the branch B of the fourth downsampled structure with residual, and finally executing a LeakyRelu layer to obtain a fourth downsampled output.
(5) The four downsampled outputs are input to a decoder module for upsampling to recover the resolution of the image.
(6) The fourth downsampled output is input to the feature pyramid ASPP module and spliced to the first input of the decoder.
(7) Performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a first input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches add and finally pass through the LeakyRelu layer in one decoder block to obtain the second input of the decoder.
(8) The third downsampled output is spliced to the second input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a second input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches add and finally pass through the LeakyRelu layer in one decoder block to get the third input of the decoder.
(9) The second downsampled output is spliced to a third input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a third input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches add and finally pass through the LeakyRelu layer in one decoder block to get the fourth input of the decoder.
(10) The first downsampled output is spliced to a fourth input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a fourth input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches are added, and finally pass through a LeakyRelu layer in a decoder module, and then pass through a convolution layer with double up-sampling and a convolution kernel of 1×1 to obtain the final output of TransResunet.
(11) Forward propagation to obtain segmentation probability;
(12) Using the cross entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, and calculating segmentation probability to obtain segmentation loss, wherein the expression is as follows:
wherein C in the cross entropy loss function CE (p, q) represents the number of categories, p i Is true value, q i Is a predicted value; a and B in the Dice Loss formula respectively represent mask matrixes corresponding to the real label and the model prediction label, the A and B are intersections between A and B, the A and B respectively represent the number of the A and B elements, wherein the coefficient of the molecule is 2 because of repeated calculation of common elements between A and B in denominator;
(13) The L1 loss function is used as the loss function of the Swin transform network model to reconstruct the super-resolution of the esophageal cancer CT image, and the expression is as follows:
wherein N represents the number of samples, y i The real label f (x i ) Is the model predictive value of the ith sample;
(14) Determining gradient vectors through back propagation of loss values, and updating esophageal cancer CT image segmentation model parameters;
(15) Judging whether the set training round number is reached, if so, finishing the training of the esophageal cancer CT image segmentation model, otherwise, continuing the training.
Fourth, obtaining and preprocessing the CT image of the esophageal cancer to be segmented.
Fifthly, obtaining esophageal cancer CT image segmentation results: inputting the preprocessed esophageal cancer CT image to be segmented into a trained esophageal cancer CT image segmentation model to obtain a segmented esophageal cancer CT image.
As shown in fig. 6a and 7a, the images are CT slice images of two esophageal cancer patients, and fig. 6b and 7b are labels corresponding to the CT slice images of two esophageal cancer patients. As can be seen from fig. 6c and fig. 7c, the method of the present invention, compared with the reset network model shown in fig. 6d and fig. 7d, and the UNet network model shown in fig. 6e and fig. 7e, has more complete automatic segmentation boundary information and good consistency with the label.
DSC represents a Darnst similarity coefficient, the value of which is between [0,1], the larger the value is, the higher the accuracy is, HD represents a Harnst distance, and the smaller the value is, the higher the coincidence of the boundaries is. To conduct fair experiments, all experiments were conducted with the same training initial parameters. As can be seen from table 1, the method of the present invention is improved by 0.19 and 7.88 on the classical UNet, DSC and HD indices, respectively, and by 0.09 and 7.88 on the ResUNet, DSC and HD indices, respectively.
TABLE 1 comparison of the segmentation precision of classical U-Net networks and Resunet on DSC and HD indicators by the method of the invention
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (3)
1. An automatic segmentation method for tumor areas of esophageal cancer CT images based on two-stage progressive information fusion is characterized by comprising the following steps:
11 Acquisition and pretreatment of esophageal cancer CT images: acquiring a CT image in a DICOM format, performing data enhancement processing on CT image data of an esophageal neck region and an esophageal abdominal region in the CT image, and performing slicing processing on all the CT images, namely performing interception operation on the three-dimensional CT image in the DICOM format to obtain a two-dimensional CT image slice in the jpg format and a binarization tag image in the png format to form an esophageal cancer CT image dataset;
12 Constructing an esophageal cancer CT image segmentation model: constructing an esophageal cancer CT image segmentation model based on a two-stage progressive information fusion technology;
13 Training of esophageal cancer CT image segmentation model): inputting the esophageal cancer CT image data set into an esophageal cancer CT image segmentation model for training;
14 Obtaining and preprocessing a CT image of the esophageal cancer to be segmented;
15 Acquisition of esophageal cancer CT image segmentation results: inputting the preprocessed esophageal cancer CT image to be segmented into a trained esophageal cancer CT image segmentation model to obtain a segmented esophageal cancer CT image.
2. The automatic segmentation method for the tumor area of the esophageal cancer CT image based on the two-stage progressive information fusion according to claim 1, wherein the construction of the esophageal cancer CT image segmentation model comprises the following steps:
21 Setting a esophageal cancer CT image segmentation model comprising a Swin Transformer network model and a TransResunet convolutional neural network model for super-resolution reconstruction, wherein the Swin is reconstructed by super-resolution
The feature map output by the transducer and the original map are subjected to progressive information fusion through splicing operation, and then input into a TransResunet convolutional neural network model for segmentation to obtain a final segmentation map;
22 A Swin transducer network model is set, which comprises 6 Swin transducer modules RSTB with residual errors and a residual error connection structure, wherein the RSTB consists of 6 Swin Transformer Layer and one convolution and one residual error connection;
23 Setting a TransResUNet convolutional neural network model:
231 Setting a TransResunet convolutional neural network model, wherein the TransResunet convolutional neural network model comprises a downsampling encoder module for feature extraction, a feature pyramid ASPP module for obtaining different scale receptive fields, and an upsampling decoder module for recovering image resolution;
232 Setting up a downsampling encoder module comprising 4 consecutive downsampling structures with residuals;
the downsampling structure with the residual comprises a branch A and a branch B, wherein the branch A is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3, and a batch normalization layer stack which is used as the branch A of the residual structure;
the branch B is a convolution layer with a convolution kernel of 1 multiplied by 1, and a batch of normalization layers are stacked to be used as another branch B of a residual structure;
the two branches are added, and finally pass through a LeakyRelu layer;
233 A set feature pyramid ASPP module comprising:
a first operation module: a convolution kernel is a1 x 1 convolution layer;
a second operation module: a convolution layer with a3 x 3 cavity rate of 6;
and a third operation module: a convolution layer with a3 x 3 cavity rate of 12;
fourth operation module: a convolution layer with a3 x 3 void fraction of 18;
a fifth operation module: an adaptive average pooling layer, a convolution layer with a convolution kernel of 1×1, and an up-sampling operation;
the five operation modules are connected in parallel, the obtained 5 feature images are spliced, and the five feature images are subjected to convolution kernel to form a convolution layer of 1 multiplied by 1;
234 Setting up a upsampling decoder module comprising 4 consecutive upsampling structures with residuals and a splice structure with the output branches of the four downsampling structures with residuals of the encoder;
the four residual downsampling structures of the encoder result in four different sized outputs respectively,
the output of the fourth size, input to the feature pyramid ASPP module and spliced to the first input of the decoder,
the output of the third size is spliced to the second input of the decoder,
the output of the second size is spliced to a third input of the decoder,
the output of the first size is spliced to a fourth input of the decoder;
the decoder structure is four consecutive upsampled blocks with residuals,
the upsampled block structure with residual is:
a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure,
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform an addition operation and finally pass through the LeakyRelu layer in one decoder block.
3. The automatic segmentation method for the tumor area of the esophageal cancer CT image based on the two-stage progressive information fusion according to claim 1, wherein the training of the esophageal cancer CT image segmentation model comprises the following steps:
31 Inputting the esophageal cancer CT image data set into a Swin Transformer network model of the esophageal cancer CT image segmentation model, and outputting a feature map from the Swin Transformer network model;
inputting to a Swin transform network model for super-resolution reconstruction, executing a convolution layer with a convolution kernel size of 1×1, performing a residual error connection operation on the convolution layer with the convolution kernel size of 1×1, performing an upsampling operation on the convolution layer with the convolution kernel size of 1×1 through 6 continuous RSTB modules, and obtaining a feature map;
32 Performing progressive information fusion on the feature map output by the Swin Transformer and the original map through splicing operation to obtain a spliced feature map;
33 Inputting the spliced characteristic diagram into a TransResunet convolutional neural network model;
34 Training the spliced feature map in a downsampling encoder module:
341 Inputting the spliced feature map into a first downsampling structure with residual errors, wherein a branch A of the first downsampling structure with residual errors is a convolution layer with a convolution kernel of 3 multiplied by 3, a batch normalization layer, a LeakyRelu layer, a Transformer Encoder Block layer, a convolution layer with a convolution kernel of 3 multiplied by 3 and a batch normalization layer stack; the first branch B with the residual downsampling structure is a convolution layer with a convolution kernel of 1 multiplied by 1 and a batch normalization layer stack;
the two branches perform addition operation, and finally a LeakyRelu layer is executed to obtain a first downsampled output;
342 Feeding the first downsampled output into a second downsampled structure with residual errors, adding the branch A and the branch B of the second downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a second downsampled output;
343 Feeding the second downsampled output into a third downsampled structure with residual errors, adding the branch A and the branch B of the third downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a third downsampled output;
344 Feeding the third downsampled output into a fourth downsampled structure with residual errors, adding the branch A and the branch B of the fourth downsampled structure with residual errors, and finally executing a LeakyRelu layer to obtain a fourth downsampled output;
35 The four downsampled outputs are input to a decoder module to carry out upsampling to recover the resolution of the image;
36 A fourth downsampled output is input to the feature pyramid ASPP module and spliced to the first input of the decoder;
37 For a first input of the decoder, performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a LeakyRelu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a second input of the decoder;
38 A third downsampled output is spliced to a second input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a second input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in one decoder module to obtain a third input of the decoder;
39 A second downsampled output is spliced to a third input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a third input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches perform addition operation, and finally pass through a LeakyRelu layer in a decoder module to obtain a fourth input of the decoder;
310 A) the first downsampled output is spliced to a fourth input of the decoder,
performing a double up-sampling operation, a corresponding encoder layer splicing operation, a convolution layer with a convolution kernel of 3 x 3, a batch normalization layer, a leak relu layer, a convolution layer with a convolution kernel of 3 x 3 stacked with a batch normalization layer as a branch of the residual structure for a fourth input to the decoder;
and a convolution layer with a convolution kernel of 1×1 is stacked with a batch of normalization layers as another branch of the residual structure;
the two branches are added, and finally, a final output of the TransResunet is obtained through a LeakyRelu layer in a decoder module, a convolution layer which is subjected to double up-sampling and a convolution kernel of 1 multiplied by 1;
311 Forward propagation to obtain segmentation probability;
312 Using the cross entropy loss function and the Dice loss function as the loss function of the esophageal cancer CT image segmentation model, calculating the segmentation probability to obtain segmentation loss, wherein the expression is as follows:
wherein C in the cross entropy loss function CE (p, q) represents the number of categories, p i Is true value, q i Is a predicted value; a and B in the Dice Loss formula respectively represent mask matrixes corresponding to the real label and the model prediction label, the A and B are intersections between A and B, the A and B respectively represent the number of the A and B elements, wherein the coefficient of the molecule is 2 because of repeated calculation of common elements between A and B in denominator;
313 Using the L1 loss function as the loss function of the Swin transducer network model to reconstruct the esophageal cancer CT image in super resolution, the expression is as follows:
wherein N represents the number of samples, y i The real label f (x i ) Is the model predictive value of the ith sample;
314 Determining gradient vectors through back propagation of loss values, and updating the parameters of the esophageal cancer CT image segmentation model;
315 Judging whether the set training round number is reached, if so, completing the training of the esophageal cancer CT image segmentation model, otherwise, continuing the training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310688086.0A CN116645380A (en) | 2023-06-12 | 2023-06-12 | Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310688086.0A CN116645380A (en) | 2023-06-12 | 2023-06-12 | Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116645380A true CN116645380A (en) | 2023-08-25 |
Family
ID=87643352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310688086.0A Pending CN116645380A (en) | 2023-06-12 | 2023-06-12 | Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116645380A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117788296A (en) * | 2024-02-23 | 2024-03-29 | 北京理工大学 | Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network |
CN117934519A (en) * | 2024-03-21 | 2024-04-26 | 安徽大学 | Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement |
CN118196416A (en) * | 2024-03-26 | 2024-06-14 | 昆明理工大学 | Small target colorectal polyp segmentation method integrating multitasking cooperation and progressive resolution strategy |
-
2023
- 2023-06-12 CN CN202310688086.0A patent/CN116645380A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117788296A (en) * | 2024-02-23 | 2024-03-29 | 北京理工大学 | Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network |
CN117788296B (en) * | 2024-02-23 | 2024-05-07 | 北京理工大学 | Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network |
CN117934519A (en) * | 2024-03-21 | 2024-04-26 | 安徽大学 | Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement |
CN117934519B (en) * | 2024-03-21 | 2024-06-07 | 安徽大学 | Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement |
CN118196416A (en) * | 2024-03-26 | 2024-06-14 | 昆明理工大学 | Small target colorectal polyp segmentation method integrating multitasking cooperation and progressive resolution strategy |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113870258B (en) | Counterwork learning-based label-free pancreas image automatic segmentation system | |
CN116309650B (en) | Medical image segmentation method and system based on double-branch embedded attention mechanism | |
WO2023071531A1 (en) | Liver ct automatic segmentation method based on deep shape learning | |
CN111354002A (en) | Kidney and kidney tumor segmentation method based on deep neural network | |
CN116645380A (en) | Automatic segmentation method for esophageal cancer CT image tumor area based on two-stage progressive information fusion | |
CN109614991A (en) | A kind of segmentation and classification method of the multiple dimensioned dilatancy cardiac muscle based on Attention | |
CN107492071A (en) | Medical image processing method and equipment | |
CN109389584A (en) | Multiple dimensioned rhinopharyngeal neoplasm dividing method based on CNN | |
CN112215844A (en) | MRI (magnetic resonance imaging) multi-mode image segmentation method and system based on ACU-Net | |
WO2024104035A1 (en) | Long short-term memory self-attention model-based three-dimensional medical image segmentation method and system | |
CN114494296A (en) | Brain glioma segmentation method and system based on fusion of Unet and Transformer | |
CN116228690A (en) | Automatic auxiliary diagnosis method for pancreatic cancer and autoimmune pancreatitis based on PET-CT | |
CN117455906B (en) | Digital pathological pancreatic cancer nerve segmentation method based on multi-scale cross fusion and boundary guidance | |
JP2024143991A (en) | Image segmentation method and system in a multitask learning network | |
CN115471512A (en) | Medical image segmentation method based on self-supervision contrast learning | |
Li et al. | MCRformer: Morphological constraint reticular transformer for 3D medical image segmentation | |
CN114565601A (en) | Improved liver CT image segmentation algorithm based on DeepLabV3+ | |
Fu et al. | MSA-Net: Multiscale spatial attention network for medical image segmentation | |
CN114387282A (en) | Accurate automatic segmentation method and system for medical image organs | |
CN113205496A (en) | Abdominal CT image liver tumor lesion segmentation method based on convolutional neural network | |
Wang et al. | Multimodal parallel attention network for medical image segmentation | |
CN116468741A (en) | Pancreatic cancer segmentation method based on 3D physical space domain and spiral decomposition space domain | |
CN117058163A (en) | Depth separable medical image segmentation algorithm based on multi-scale large convolution kernel | |
Mani | Deep learning models for semantic multi-modal medical image segmentation | |
CN117934519B (en) | Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |