CN116258732A - Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images - Google Patents

Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images Download PDF

Info

Publication number
CN116258732A
CN116258732A CN202310109050.2A CN202310109050A CN116258732A CN 116258732 A CN116258732 A CN 116258732A CN 202310109050 A CN202310109050 A CN 202310109050A CN 116258732 A CN116258732 A CN 116258732A
Authority
CN
China
Prior art keywords
pet
images
segmentation
image
esophageal cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310109050.2A
Other languages
Chinese (zh)
Inventor
他得安
岳曜廷
宋少莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202310109050.2A priority Critical patent/CN116258732A/en
Publication of CN116258732A publication Critical patent/CN116258732A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10104Positron emission tomography [PET]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Architecture (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for segmenting an esophageal cancer tumor target region based on cross-modal feature fusion of PET/CT images; according to the method, a transitformer fusion attention progressive semantic nested network TransAttPSNN is used as a three-dimensional segmentation model of the tumor target area of the esophageal cancer to realize the segmentation of the tumor target area of the esophageal cancer; the TransAttPSNN network takes an attention progressive semantic nested network AttPSNN as a main structure, and comprises two paths of split networks, wherein one path is PET flow, the other path is CT flow, and a Transformer trans-modal self-adaptive feature fusion module is embedded in different scale feature layers of the two paths of split networks. Compared with the prior art, the method can effectively improve the segmentation accuracy of the esophageal cancer tumor target area and obtain better segmentation performance.

Description

Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images
Technical Field
The invention belongs to the field of intelligent processing of medical images, and particularly relates to an esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images.
Background
Esophageal cancer is asymptomatic early, resulting in its general progression to a late stage before it is diagnosed by diagnosis. For patients with middle and advanced esophageal cancer, radiotherapy is mainly involved in therapy; in particular esophageal squamous cell carcinoma, which is sensitive to radiation, radiotherapy is particularly effective. The design of the radiotherapy plan depends on the sketch of the tumor target area of the esophageal cancer, and the accurate sketch of the tumor target area of the esophageal cancer is helpful for the tumor to be irradiated by sufficient radiation dose in the radiotherapy process, and can also prevent normal tissues or dangerous organs around the tumor from being damaged due to excessive exposure to radiation irradiation. The current clinical task of delineating the tumor target area of the esophageal cancer is accomplished by manual operation of a doctor. This is a tedious, time consuming and laborious task, which to a certain extent takes up a lot of valuable medical resources. In addition, the manual sketching mode relies on the clinical experience of doctors to carry out subjective judgment and operation, so that the sketching outline of the esophageal cancer tumor target area of the same patient is correspondingly changed along with the judgment of different doctors, and the problem of inappropriateness is caused. Therefore, development of an effective automatic segmentation algorithm for esophageal cancer tumor target region by means of computer-aided technology has become an urgent need.
In actual clinical practice, many esophageal cancer patients who are scheduled to receive radiation therapy have undergone PET/CT imaging examinations. Although some techniques currently use deep learning to segment the tumor target area of esophageal cancer, these techniques are not processed based on PET/CT images and the accuracy of the segmentation remains to be improved.
Disclosure of Invention
In order to fully utilize the complementary information of each of functional metabolic imaging PET and anatomical structure imaging CT, the invention aims to provide a more accurate and effective esophageal cancer tumor target region segmentation method based on PET/CT images for processing.
The technical scheme of the invention is as follows.
A segmentation method of esophageal cancer tumor target regions based on cross-modal feature fusion of PET/CT images specifically comprises the following steps:
s1, collecting PET/CT images of clinical esophagus cancer patients and corresponding labels thereof to form a data set;
s2, preprocessing a PET/CT image data set;
s3, establishing a three-dimensional segmentation model of the esophageal cancer tumor target area: transformer fusion attention progressive semantic nesting networks (Transformer Fusing Attention Progressive Semantically-new networks, transAttPSNN);
the TransAttPSNN takes an attention progressive semantic Nested Network (Attention Progressive Semantically-Nested Network, attPSNN) as a main structure, the AttPSNN introduces a convolution attention mechanism in the progressive semantic Nested Network (Progressive Semantically-Nested Network, PSNN), the TransAttPSNN comprises two paths of split networks, one path is PET flow, the other path is CT flow, the PET flow has the same Network structure as the CT flow, and a Transformer trans-modal self-adaptive feature fusion module is embedded in 5 different scale feature levels of the two paths of split networks; between PET flow and CT flow, 5 Transformer cross-modal adaptive feature fusion modules are used for connecting 5 PET and CT feature images with different scales to carry out adaptive feature fusion on the PET and CT feature images, the fused results are respectively transmitted back to PET flow and CT flow paths to participate in subsequent information forward propagation, the output on the upper path AttPSNN decoding path and the lower path AttPSNN decoding path and the total output of the two paths AttPSNN decoding path are connected through a connection mode of depth supervision, then the output is processed through a convolution layer, and finally a segmentation prediction result is obtained through an output layer Sigmoid;
s4, training the established TransAttPSNN segmentation model;
s5, carrying out segmentation prediction on the PET/CT image of the unknown esophageal cancer patient by using the TransAttPSNN segmentation model obtained through training, outputting the optimal segmentation precision, and carrying out visual display on the segmentation result.
In the invention, in step S1, the label corresponding to the PET/CT image is determined by manually sketching and examining the tumor target area of the esophageal cancer on the CT axial slice by leading the DICOM file of the PET/CT into the ITK-SNAP software and referring to the corresponding PET image.
In the invention, in the step S2, the data preprocessing comprises three operations of carrying out secondary registration on PET/CT images to correct the position deviation between the PET/CT images, carrying out contrast enhancement on the CT images, cutting out the region of interest on the PET/CT images and carrying out normalization processing; preferably, the secondary registration of the PET/CT image adopts a multi-modal intensity three-dimensional registration algorithm, a registration method based on mutual information, a registration method based on an optical flow field or a registration method based on deep learning, and the PET image and the CT image which are output after registration have the same size; the CT image is cut off in window width to contrast-enhance the CT image.
In the invention, in step S3, in the TransAttPSNN network, the PET flow and the CT flow adopt AttPSNN networks, and each AttPSNN network contains characteristic images with 5 scales; specifically, the encoding path includes 5 convolution levels, wherein the first two convolution levels are each composed of 2 convolution modules, the last three convolution levels are each composed of 3 convolution modules, and the decoding path includes 4 convolution attention levels, wherein the first convolution attention level is composed of a convolution attention module+ConV layer, and the last three convolution attention levels are composed of a convolution attention module+ConV+tri-linear interpolation upsampling layer.
In the invention, in step S4, a four-fold cross validation mode is adopted to train the established TransAttPSNN segmentation model; the training method of four-fold cross validation specifically comprises the following steps: first, the dataset was divided into four equal parts. Secondly, sequentially taking one data as a test set, combining the other three data as a training set to train the TransAttPSNN model, so that a total of 4 TransAttPSNN segmentation models are obtained through training.
The invention aims at the problem that the PET/CT image has position deviation, and carries out secondary registration on the PET/CT image; according to the invention, aiming at the problem of poor contrast of the CT image, a reasonable window width cut-off threshold value is selected by a statistical analysis method to carry out window width cut-off processing on the CT image, so that the contrast of the CT image is improved; the invention provides a dual-mode PET/CT image-based tumor target area segmentation for esophageal cancer. According to the invention, a transducer model is introduced into a three-dimensional segmentation task of an esophageal cancer tumor target area, so that cross-mode self-adaptive feature fusion is realized, and further, the three-dimensional segmentation model of the esophageal cancer tumor target area is provided, and a transducer fusion attention progressive semantic nested network TransAttPSNN is provided.
Compared with the prior art, the invention has the beneficial effects that:
in the TransAttPSNN segmentation model, the introduction of a convolution attention mechanism enables the proposed AttPSNN model to be more effective than an original Progressive Semantic Nested Network (PSNN) model. In addition, complementary information which is beneficial to the PET and CT modes is mined based on a transducer model, so that the segmentation performance is improved. Therefore, compared with the most advanced method reported in the prior literature, the three-dimensional segmentation model TransAttPSNN of the esophageal cancer tumor designed by the invention can obtain better segmentation precision.
The invention brings good position deviation correction effect to the secondary registration of PET/CT images. The reasonable window width cut-off threshold value is selected through the statistical analysis method, so that the window width cut-off processing of the CT image brings the beneficial effect of improving the contrast of the CT image. According to the invention, a transducer model is introduced creatively, and based on the model, an esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images is provided, the segmentation accuracy of the esophageal cancer tumor target region is improved, and a certain technical support can be provided for an automatic esophageal cancer tumor target region segmentation task.
Drawings
FIG. 1 is a graph comparing results before and after a second registration of PET/CT images in accordance with the present invention; and comparing the results before and after the PET/CT image secondary registration. (a) The results were visualized superimposed with (b) 2 different example PET/CT images. The green plot represents PET, the purple plot represents CT, the blue outline represents the outline of the real label, and the green highlighting area in the blue outline represents the tumor lesion area in PET.
Fig. 2 is a graph showing the contrast enhancement of CT images before and after the present invention.
Fig. 3 is a frame diagram of a three-dimensional segmentation model TransAttPSNN of an esophageal cancer tumor target area designed by the invention.
Fig. 4 is a graph of a PET and CT feature image fused self-attention weight correlation matrix.
FIG. 5 is a three-dimensional visual result diagram of the segmentation result obtained by the method of the present invention. (a) is a three-dimensional visual view of the esophageal cancer tumor corresponding to a real label, (b) is a three-dimensional visual view of the segmentation result obtained by the invention, and (c) is a superposition view of (a) and (b).
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings and examples, to which the scope of the invention is not limited.
A segmentation method of esophageal cancer tumor target regions based on cross-modal feature fusion of PET/CT images specifically comprises the following steps:
s1, collecting PET/CT images of clinical esophageal cancer patients and corresponding labels thereof to form a data set. Wherein, the PET/CT image of the patient with esophageal cancer is DICOM data of the whole body 18F-FDG PET/CT image examination. The label corresponding to the PET/CT image is firstly agreed by two doctors in routine clinical work, a DICOM file of the PET/CT is imported into ITK-SNAP software (Version 3.6,United States), and the image is formed by manually sketching the tumor target area of the esophageal cancer on a CT axial slice by referring to the corresponding PET image; the delineated labels are then reviewed by a physician to determine the final label.
S2, preprocessing the PET/CT image data set. Specific preprocessing operations include the following 3 aspects:
s2.1, carrying out secondary registration on the PET/CT image. Although PET/CT scanners have hardware registration of PET/CT images, the patient's involuntary respiratory motion, abdominal organ peristalsis, heart beat, etc. lead to PET/CT images in fact in a non-strictly defined configuration during image acquisition. Therefore, we perform a secondary registration of the PET/CT images in the data preprocessing step to correct the positional deviation between the PET/CT images. The registration method used is a multi-modal intensity three-dimensional registration algorithm [1] The PET image and the CT image output after registration have the same size, and are 512×512. And (3) injection: besides the multi-modal intensity three-dimensional registration algorithm, the PET/CT image registration can be performed by adopting methods based on mutual information, optical flow field, deep learning and the like [2-4] . As shown in fig. 1, which is a graph comparing the results of the present invention before and after the second registration of the PET/CT images of two embodiments, it can be observed that the position deviation of the PET/CT images is better corrected after the second registration of the PET/CT images.
S2.2, contrast enhancement is carried out on the CT image. The specific operation is to cut off window width of CT image, and the pixel values smaller than-150 and larger than 150 in CT image matrix are respectively assigned as-150 and 150. As shown in fig. 2, which is a graph comparing the results before and after the contrast enhancement of the CT image according to an embodiment of the present invention, it can be observed that the contrast between the tumor focus area and the surrounding tissues in the CT image is improved after the window width truncation treatment is performed on the CT image.
S2.3, cutting out a region of interest from the PET/CT image and normalizing. Limited by the high overhead of deep neural network models on computational storage resources, while at the same time to alleviate the problem of highly unbalanced foreground and background (foreground representing tumor region, background representing non-tumor region) data [5] It is necessary to crop the region of interest for PET/CT images. The specific operation is that the PET/CT image of each patient in the data set and the corresponding label are cut into the region of interest containing esophageal cancer tumor and having the size of at least 64X 64, and the cut PET/CT image is normalized to [0,1]Interval. After the region of interest is cut and normalized on the PET/CT image, the data set required for training the network model is obtained.
S3, establishing a three-dimensional segmentation model of the esophageal cancer tumor target area: transformer fuses attention progressive semantic nested networks, transAttPSNN. Specific operations include the following 2 aspects:
s3.1, taking progressive semantic nested network PSNN reported in the prior literature as a main body in three-dimensional segmentation and forefront method of esophageal cancer tumor target area [6] Attention-seeking language is presented by introducing a convolved attention mechanism theretoThe nested network AttPSNN.
S3.2, designing a transducer cross-modal self-adaptive feature fusion module, and embedding the transducer cross-modal self-adaptive feature fusion module into different scale feature levels of two paths of AttPSNN segmentation networks (one path is a PET segmentation flow, and the other path is a CT segmentation flow), so that a final segmentation model TransAttPSNN is built.
A framework diagram of the TransAttPSNN model is shown in fig. 3. After the dual-channel PET/CT image is input into the TransAttPSNN network, the dual-channel PET/CT image is divided into 2 paths: the upper PET stream and the lower CT stream. The PET flow and the CT flow have the same network structure and are all the proposed AttPSNN networks. Each AttPSNN network contains feature images of 5 scale sizes; specifically, the encoding path includes 5 convolution levels, wherein the first two convolution levels are each composed of 2 convolution modules (ConV+BN+ReLU), and the last three convolution levels are each composed of 3 convolution modules; the fifth convolution level is similar to the middle bridge portion of a U-shaped fabric network. The decoding path comprises 4 convolution attention layers, wherein the first convolution attention layer consists of a convolution attention module+ConV layers, and the last three convolution attention layers consist of a convolution attention module+ConV+tri-linear interpolation upsampling layers. Between the PET stream and the CT stream, 5 Transformer cross-modality adaptive feature fusion modules are used to connect 5 different-scale PET and CT feature images for adaptive feature fusion. The fused results are transmitted back to the PET and CT flow paths, respectively, to participate in the forward propagation of subsequent information. And the output on the upper path and the lower path of AttPSNN decoding paths and the total output of the two paths are connected by a deep supervision connection mode, and then are processed by a convolution layer. And finally obtaining a segmentation prediction result through the output layer Sigmoid.
In the invention, the realization of the function of realizing the cross-modal self-adaptive feature fusion by using a transducer model is as follows.
3.2.1 theory of three-dimensional transducer model
Let the input image be x e R H′×W′×D′×C Where H ', W ', D ', C represent the height, width, depth, and channel number of the image, respectively. To avoid the problem of computing memory explosion, three dimensions are used firstAdaptive average pooling operation function AdapteveAvgPool 3d (), pools an input image x to x pooling ∈R H×W×D×C Where H, W, D, C is the pooled image height, width, depth, and channel number.
x pooling =AdaptiveAvgPool3d(x).(1)
Secondly, the first step of the method comprises the steps of, x is determined using a window of size 1 x C pooling Flattening into a series of patches to give x f ∈R (HWD)×C Where HWD is the number of patches generated and C is the dimension of each patch. Will x f Input to a standard transducer module, the procedure first handled in a Multi-Head Self-Attention (MHSA) module is as follows:
q=x f ·W q ,k=x f ·W k ,v=x f ·W v ,(2)
Figure BDA0004076092270000051
Figure BDA0004076092270000052
z=Concat(z (1) ;z (2) ;...;z (M) )·W o ,(5)
wherein W is q 、W k 、W v ∈R C×C Representing the mapping matrix. q, k, v ε R (HWD)×C Respectively representing query, key and value; m is the number of parallel self-attention headers in the MHSA. Note d=c/M is the dimension of each self-attention head, then
Figure BDA0004076092270000061
A mapping matrix for the mth self-attention head; accordingly, q m 、k m 、v m ∈R (HWD)×d (m=1, 2,., M) is the query, key, and value of the mth self-attention head. Sigma (·) represents the Softmax function, z (m) ∈R (HWD)×d Is the output of the mth self-attention head. W (W) o ∈R Md×C (i.e. W o ∈R C×C ) To map the matrix, z ε R (HWD)×C Is the final output of the MHSA.
Third, the output of MHSA is sent to a Multi-Layer perceptron (MLP) module for processing (MLP is composed of two fully connected linear layers, two Dropout layers, and one GELU activation Layer).
Fourth, to sum up the flow, add trainable position code P f After the LN layer and residual connection operation, the three-dimensional transducer model process flow with L transducer modules is as follows:
z 0 =x f +P f ,(6)
z′ l =MHSA(LN(z l-1 ))+z l-1 ,(7)
z l =MLP(LN(z′ l ))+z′,(l=1,2,...,L).(8)
finally, z L ∈R (HWD)×C Remodelling into
Figure BDA0004076092270000062
In the form of (2) and applying tri-linear interpolation again>
Figure BDA00040760922700000621
Upsampling to +.>
Figure BDA0004076092270000063
The output of the transducer can be restored to the same size as the original input image.
3.2.2 Cross-modality adaptive feature fusion theory based on three-dimensional Transformer model
Based on the three-dimensional transducer model theory, the PET and CT characteristic images to be fused are assumed to be x respectively PET ∈R H′×W′×D′×C And x CT ∈R H′×W′×D′×C Where H ', W ', D ', C represent the height, width, depth, and channel number of the image, respectively. Then the input image x is first input using equation (1) PET And x CT Pooling to
Figure BDA0004076092270000064
And->
Figure BDA0004076092270000065
Where H, W, D, C is the pooled image height, width, depth, and channel number.
Secondly, the first step of the method comprises the steps of, using a window of size 1 x C
Figure BDA0004076092270000066
And->
Figure BDA0004076092270000067
Respectively flattening into a series of patches to obtain +.>
Figure BDA0004076092270000068
And->
Figure BDA0004076092270000069
And then->
Figure BDA00040760922700000610
And->
Figure BDA00040760922700000611
Connection in patch dimension
Figure BDA00040760922700000612
Third, according to formulas (6) - (8), the method
Figure BDA00040760922700000613
Input a transducer model for processing to obtain an output z L ∈R (2HWD)×C . Will z L Remodelling to->
Figure BDA00040760922700000614
And split into two outputs +.>
Figure BDA00040760922700000615
And (3) with
Figure BDA00040760922700000616
Finally apply tri-linear interpolation to +.>
Figure BDA00040760922700000617
And->
Figure BDA00040760922700000618
Upsampling and restoring to the same size as the original input image respectively yields +.>
Figure BDA00040760922700000619
And->
Figure BDA00040760922700000620
The self-adaptive fusion of PET and CT characteristic images is completed, and the fusion process is specifically explained as follows:
will be
Figure BDA0004076092270000071
After inputting the transducer model, the self-attention weight W is calculated by referring to equation (4) when processing in the MHSA module a It can be considered that the correlation between every two patches of the PET and CT feature images after flattening and mapping is calculated, as shown in FIG. 4, wherein w ij (i, j=1, 2,..2 HWD) represents the correlation between the patch of location i and the patch of location j. Therefore, the transducer model can adaptively model the remote dependence relationship between the same modality and the cross modality of the PET and CT characteristic images in the training process, thereby realizing the characteristic fusion function.
S4, training the established segmentation model TransAttPSNN by adopting a four-fold cross validation mode. Specific operations include the following 4 aspects:
s4.1 divides the dataset into four equal parts.
S4.2, training and configuring a segmentation model TransAttPSNN. Specifically, the method comprises the following 2 aspects:
s4.2.1 randomly extracting 16 training patches with the size of 64 multiplied by 64 from each region of interest and the corresponding label thereof obtained in the data preprocessing step S2.3, and randomly performs a data enhancement operation (rotated 90 °, flipped left-right, flipped up-down, flipped left-right then rotated 90 °, or left unchanged) for each patch.
S4.2.2 performing hyper-parameter configuration on the TransAttPSNN segmentation model established in the step S3: training round number epoch=50, learning rate learning rate=5e—3, small batch size mini-batch=4, optimizer AdamW, weight decay parameter decoupled weight decay =0.01, loss function generalized Dice loss (Generalized Dice Loss, GDL). The formula for GDL is defined as follows:
Figure BDA0004076092270000072
wherein c represents the number of categories, y cn And p is as follows cn Representing the true label value and the predicted probability value of the nth pixel belonging to the c-th class respectively,
Figure BDA0004076092270000073
representing the inverse of the sum of squares of the respective total pixel numbers of the foreground and background (foreground representing the tumor region and background representing the non-tumor region). Thus, the weight w c The introduction of the method plays a role in correcting the contribution degree of the foreground and the background, so that the problem of unbalanced foreground and background data can be relieved. Epsilon=1×10 -8 To prevent the denominator from being zero.
S4.3, sequentially taking one data from the four equal data sets as a test set, and combining the other three data sets as a training set to train the TransAttPSNN model. In training the network, referring to fig. 3, the data is processed with a small batch (mini-batch=4) of samples as criteria, and the specific operations include the following 4 aspects:
s4.3.1 a two-channel PET/CT image is input to the established TransAttPSNN model.
S4.3.2 the extracted PET channel images are sent to PET stream for processing while the extracted CT channel images are sent to CT stream for processing.
S4.3.3 for PET and CT characteristic images of the same layer, a transducer self-adaptive characteristic fusion module is adopted to fuse the PET and CT characteristic images, and the fusion result is fed back to the PET stream and CT stream respectively for participation in information feedforward.
S4.3.4 the training network is continuously optimized to converge and obtain the three-dimensional segmentation model of the esophageal cancer tumor target region with good performance. It should be noted that, because of the four-fold cross-validation training mode, a total of 4 TransAttPSNN segmentation models will be trained.
S4.4, respectively inputting the four test sets into the TransAttPSNN model obtained by training the corresponding training set, and outputting average segmentation accuracy.
In the step S4.4, when testing the network, referring to fig. 3, the data is processed by a single sample as a criterion, and the specific operations include the following 4 aspects:
s4.4.1 to the trained TransAttPSNN model, a two-channel PET/CT image is input.
S4.4.2 the extracted PET channel images are sent to PET stream for processing while the extracted CT channel images are sent to CT stream for processing.
S4.4.3 for PET and CT characteristic images of the same layer, a transducer self-adaptive characteristic fusion module fuses the PET and CT characteristic images, and feeds back fusion results to PET streams and CT streams respectively to participate in follow-up information feedforward.
S4.4.4 is continuously feed-forward processed through the information until the corresponding segmentation result is output.
In the step S4.4, the segmentation accuracy is measured by 3 commonly used evaluation indexes, namely, a Dice similarity coefficient (Dice Similarity Coefficient, DSC), a hausdorff distance (Hausdorff Distance, HD) and an average surface distance (Mean Surface Distance, MSD). Wherein the hausdorff distance is also referred to as the maximum surface distance. DSC measures the degree of spatial overlap between predicted and real labels [9,10] . The distance indicators HD and MSD measure the maximum distance between the predicted tumor region edge and the actual tumor region edge, respectively, and the average distance [11] . Assuming that the predicted tumor region is denoted by P, the true tumor region is denoted by G, and their respective edge profiles are denoted by P, respectively C And G C Calculation formula of DSC, HD and MSDThe formulae are defined as follows:
Figure BDA0004076092270000081
Figure BDA0004076092270000082
Figure BDA0004076092270000083
wherein d (p, g) represents the Euclidean distance between pixel points p and g; the |P| and |G| represent the total number of pixels of the predicted and real tumor regions P and G, respectively; similarly, |P C I and I G C The i represents the total number of pixels predicted to the edge contour of the true tumor. DSC value is between [0,1 ]]The closer the value is to 1, the better the segmentation result. The values of HD and MSD are larger than or equal to 0, and the closer the value is to 0, the better the segmentation result is.
S5, predicting PET/CT image data of unknown esophageal cancer patients by using a TransAttPSNN segmentation model obtained through training, outputting the optimal segmentation accuracy, and visually displaying the segmentation result. The specific operations include the following 4 aspects:
s5.1, obtaining PET/CT scanning DICOM data of unknown esophageal cancer patients.
S5.2, preprocessing the acquired PET/CT image by using the data preprocessing method in the step S2.
S5.3, inputting the PET/CT region-of-interest image obtained by preprocessing into the 4 TransAttPSNN segmentation models obtained by training to output the corresponding 4-group segmentation accuracy.
And S5.4, selecting a group with the largest DSC value from the group consisting of the 4 groups obtained in the step S5.3 as the optimal segmentation precision, and visually displaying the corresponding segmentation result. Referring to fig. 5, fig. 5 (a) is a three-dimensional visual view of esophageal cancer tumor corresponding to a real label, fig. 5 (b) is a three-dimensional visual view of the segmentation result obtained by the present invention, and fig. 5 (c) is an overlapping view of (a) and (b). By observing the three-dimensional visualization result, the esophagus cancer tumor predicted by the invention is smoother than the tumor shape corresponding to the real label, which is closer to the real appearance of the focus in clinical practice. In addition, the esophagus cancer tumor predicted by the invention has good similarity with the real tumor.
Table 1. The method of the present invention compares the segmentation accuracy obtained by the segmentation method of other existing esophageal cancer tumors.
Figure BDA0004076092270000091
Table 1 shows the comparison results of the segmentation accuracy of the three-dimensional segmentation model TransAttPSNN of the tumor target region of the esophageal cancer designed by the invention and other existing segmentation methods of the tumor of the esophageal cancer. In Table 1, the convolution-based reference methods are U-Net, denseUNet and Two-stream chained PSNN, the convolution-based Attention methods are Attention U-Net and DDAUNet, and the transform model-based methods are UNETR, transBTS and CoTr. Among them, two-stream chained PSNN, denseunet, and DDAUNet represent advanced methods in the current GTV three-dimensional segmentation literature of esophageal cancer. As can be seen from the comparison evaluation index value, the segmentation performance of the TransAttPSNN network exceeds that of all other competing networks, and the maximum DSC value and the minimum HD value are obtained, and although the MSD value is less than the MSD of the UNETR, the MSD value is very little different. The Transfomer-based approach can achieve better segmentation performance (with the exception of the performance of CoTr) than the convolutional network approach. In the model based on the Transfomer, the TransAttPSNN designed by the invention has the best performance.
Reference to the literature
[1]MUTHUKUMARAN D,SIVAKUMAR M.Medical Image Registration:A matlab based approach[J].Int J Sci Res Comput Sci,Eng Inform Technol,2017,2(2):29-34.
[2]PENNEC X,CACHIER P,AYACHE N.Understanding the Demon's Algorithm:3D Non-rigid Registration by Gradient Descent[C].In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention,1999,597–605.
[3] Luo Shuqian, li Xiang multimode medical image registration based on maximum mutual information [ J ]. Chinese image graphic journal, 2000,5 (7): 551-8.
[4]HU Y,MODAT M,GIBSON E,et al.Weakly-supervised convolutional neural networks for multimodal image registration[J].Med Image Anal,2018,49:1-13.
[5]CRUM W R,CAMARA O,HILL D L.Generalized overlap measures for evaluation and validation in medical image analysis[J].IEEE T Med Imaging,2006,25(11):1451-61.[6]JIN D,GUO D,HO T Y,et al.DeepTarget:Gross tumor and clinical target volume segmentation in esophageal cancer radiotherapy[J].Med Image Anal,2021,68:101909.[7]RAJON D A,BOLCH W E.Marching Cube Algorithm:Review and trilinear interpolation adaptation for image-based dosimetric models[J].Comput Med Imag Grap,2003,27(5):411-35.
[8]HILL S.Trilinear Interpolation[J].Graphics Gems,1994:521-5.
[9]RAZZAK M I,IMRAN M,XU G.Efficient brain tumor segmentation with multiscale two-pathway-group conventional neural networks[J].IEEE J Biomed Health,2019,23(5):1911-9.
[10]CHEN G,YIN J,DAI Y,et al.A novel convolutional neural network for kidney ultrasound images segmentation[J].Comput Meth Prog Bio,2022,218:106712.
[11]FECHTER T,ADEBAHR S,BALTAS D,et al.Esophagus segmentation in CT via 3D fully convolutional neural network and random walk[J].Med Phys,2017,44(12):6341-52.
[12]IEK Z,ABDULKADIR A,LIENKAMP S S,et al.3D U-Net:Learning dense volumetric segmentation from sparse annotation[C].In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention,2016,424-32.
[13]FECHTER T,ADEBAHR S,BALTAS D,et al.A 3D fully convolutional neural network and a random walker to segment the esophagus in CT[J/OL]2017,1-23,arXiv:1704.06544.
[14]OKTAY O,SCHLEMPER J,FOLGOC L L,et al.Attention U-Net:Learning where to look for the pancreas[C].In Proceedings of the International Conference on Medical Imaging with Deep Learning,2018,1-10.
[15]YOUSEFI S,SOKOOTI H,ELMAHDY M S,et al.Esophageal gross tumor volume segmentation using a 3D convolutional neural network[C].In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention,2018,343-51.
[16]YOUSEFI S,SOKOOTI H,ELMAHDY M S,et al.Esophageal tumor segmentation in CT images using a dilated dense attention Unet(DDAUnet)[J].IEEE Access,2021,9:99235-48.
[17]HATAMIZADEH A,TANG Y,NATH V,et al.UNETR transformers for 3D medical image segmentation[C].In Proceedings of IEEE Winter Conference on Applications of Computer Vision,2021,1-11.
[18]WANG W,CHEN C,DING M,et al.TransBTS:Multimodal brain tumor segmentation using transformer[C].In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention,2021,1-11.
[19]XIE Y,ZHANG J,SHEN C,et al.CoTr:Efficiently bridging CNN and transformer for 3D medical image segmentation[J/OL]2021,1-13,arXiv:2103.03024。

Claims (8)

1. The esophageal cancer tumor target region segmentation method based on the cross-modal feature fusion of the PET/CT images is characterized by comprising the following steps of:
s1, collecting PET/CT images and corresponding labels of clinical esophagus cancer patients to form a PET/CT image data set;
s2, preprocessing a PET/CT image data set;
s3, establishing a three-dimensional segmentation model of the esophageal cancer tumor target area: transformer fusion attention progressive semantic nesting network TransAttPSNN;
the TransAttPSNN network takes an AttPSNN which introduces a convolution attention mechanism attention progressive semantic nesting network as a main structure, and comprises two paths of segmentation networks, wherein one path is a PET flow, the other path is a CT flow, the PET flow has the same network structure as the CT flow, and a Transformer trans-modal self-adaptive feature fusion module is embedded in 5 different scale feature layers of the two paths of segmentation networks; between PET flow and CT flow, 5 Transformer cross-modal adaptive feature fusion modules are used for connecting 5 PET and CT feature images with different scales to carry out adaptive feature fusion on the PET and CT feature images, the fused results are respectively transmitted back to PET flow and CT flow paths to participate in subsequent information forward propagation, the output on the upper path AttPSNN decoding path and the lower path AttPSNN decoding path and the total output of the two paths AttPSNN decoding path are connected through a connection mode of depth supervision, then the output is processed through a convolution layer, and finally a segmentation prediction result is obtained through an output layer Sigmoid;
s4, training the established TransAttPSNN segmentation model;
s5, carrying out segmentation prediction on the PET/CT image of the unknown esophageal cancer patient by using the TransAttPSNN segmentation model obtained through training, outputting the optimal segmentation precision, and carrying out visual display on the segmentation result.
2. The method according to claim 1, wherein in step S1, the label corresponding to the PET/CT image is determined by manually delineating and examining the esophageal cancer tumor target area on a CT axial slice by importing a DICOM file of the PET/CT into ITK-SNAP software and referring to the corresponding PET image.
3. The method according to claim 1, wherein in step S2, the data preprocessing includes three operations of performing a secondary registration on the PET/CT images to correct a positional deviation between the PET/CT images, performing contrast enhancement on the CT images, and cutting out a region of interest from the PET/CT images and normalizing.
4. The esophageal cancer tumor target region segmentation method according to claim 3, wherein the secondary registration of the PET/CT image adopts a multi-modal intensity three-dimensional registration algorithm, a registration method based on mutual information, a registration method based on an optical flow field or a registration method based on deep learning, and the PET image and the CT image which are output after registration have the same size; the CT image is cut off in window width to contrast-enhance the CT image.
5. The method according to claim 1, wherein in step S3, the TransAttPSNN networks are adopted for PET flow and CT flow, each AttPSNN network includes 5 scale feature images, the encoding path includes 5 convolution levels, the first two convolution levels each include 2 convolution modules, the last three convolution levels each include 3 convolution modules, the decoding path includes 4 convolution attention levels, the first convolution attention level includes convolution attention module+conv layer, and the last three convolution attention levels include convolution attention module+conv+tri-linear interpolation upsampling layer.
6. The method according to claim 1, wherein in step S4, the established TransAttPSNN segmentation model is trained by four-fold cross-validation.
7. The method for segmenting esophageal cancer tumor target according to claim 1, wherein in step S4, the model training method is as follows:
(1) Inputting a double-channel PET/CT image into the established TransAttPSNN segmentation model;
(2) Extracting PET channel images and sending the PET channel images to a PET stream for processing, and simultaneously extracting CT channel images and sending the CT channel images to a CT stream for processing;
(3) Aiming at PET and CT characteristic images of the same level, adopting a transducer self-adaptive characteristic fusion module to fuse the PET and CT characteristic images, and feeding back fusion processed results to PET streams and CT streams respectively to participate in information feedforward;
(4) The training network is continuously optimized to converge and obtain the esophageal cancer tumor target three-dimensional segmentation model with good performance.
8. The method according to claim 1 or 7, wherein in step S4, the optimizer is AdamW and the loss function is generalized Dice loss during training; the formula for GDL is defined as follows:
Figure FDA0004076092260000021
wherein c represents the number of categories, y cn And p is as follows cn Representing the true label value and the predicted probability value of the nth pixel belonging to the c-th class respectively,
Figure FDA0004076092260000022
representing the inverse of the square sum of the total pixel numbers of the foreground and the background, wherein the foreground represents a tumor area, and the background represents a non-tumor area; w (w) c Represents weight, epsilon=1×10 -8 To prevent the denominator from being zero. />
CN202310109050.2A 2023-02-14 2023-02-14 Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images Pending CN116258732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310109050.2A CN116258732A (en) 2023-02-14 2023-02-14 Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310109050.2A CN116258732A (en) 2023-02-14 2023-02-14 Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images

Publications (1)

Publication Number Publication Date
CN116258732A true CN116258732A (en) 2023-06-13

Family

ID=86687465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310109050.2A Pending CN116258732A (en) 2023-02-14 2023-02-14 Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images

Country Status (1)

Country Link
CN (1) CN116258732A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758048A (en) * 2023-07-06 2023-09-15 河北大学 PET/CT tumor periphery feature extraction system and extraction method based on transducer
CN117934519A (en) * 2024-03-21 2024-04-26 安徽大学 Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement
CN118212238A (en) * 2024-05-21 2024-06-18 宁德时代新能源科技股份有限公司 Solder printing detection method, solder printing detection device, computer equipment and storage medium
CN118351211A (en) * 2024-06-18 2024-07-16 英瑞云医疗科技(烟台)有限公司 Method, system and equipment for generating medical image from lung cancer CT (computed tomography) to PET (positron emission tomography)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758048A (en) * 2023-07-06 2023-09-15 河北大学 PET/CT tumor periphery feature extraction system and extraction method based on transducer
CN116758048B (en) * 2023-07-06 2024-02-27 河北大学 PET/CT tumor periphery feature extraction system and extraction method based on transducer
CN117934519A (en) * 2024-03-21 2024-04-26 安徽大学 Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement
CN117934519B (en) * 2024-03-21 2024-06-07 安徽大学 Self-adaptive segmentation method for esophageal tumor CT image synthesized by unpaired enhancement
CN118212238A (en) * 2024-05-21 2024-06-18 宁德时代新能源科技股份有限公司 Solder printing detection method, solder printing detection device, computer equipment and storage medium
CN118351211A (en) * 2024-06-18 2024-07-16 英瑞云医疗科技(烟台)有限公司 Method, system and equipment for generating medical image from lung cancer CT (computed tomography) to PET (positron emission tomography)
CN118351211B (en) * 2024-06-18 2024-08-30 英瑞云医疗科技(烟台)有限公司 Method, system and equipment for generating medical image from lung cancer CT (computed tomography) to PET (positron emission tomography)

Similar Documents

Publication Publication Date Title
Woźniak et al. Deep neural network correlation learning mechanism for CT brain tumor detection
Ren et al. Interleaved 3D‐CNN s for joint segmentation of small‐volume structures in head and neck CT images
Kumar et al. Breast cancer classification of image using convolutional neural network
Chan et al. Texture-map-based branch-collaborative network for oral cancer detection
CN116258732A (en) Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images
Lin et al. BATFormer: Towards boundary-aware lightweight transformer for efficient medical image segmentation
CN109215035B (en) Brain MRI hippocampus three-dimensional segmentation method based on deep learning
US11494908B2 (en) Medical image analysis using navigation processing
KR102442090B1 (en) Point registration method in surgical navigation system
Liu et al. Feature pyramid vision transformer for MedMNIST classification decathlon
Chen et al. Computer-aided diagnosis and decision-making system for medical data analysis: A case study on prostate MR images
Velliangiri et al. Investigation of deep learning schemes in medical application
Liu et al. Automated classification and measurement of fetal ultrasound images with attention feature pyramid network
Cui et al. Automatic Segmentation of Kidney Volume Using Multi-Module Hybrid Based U-Shape in Polycystic Kidney Disease
Lin et al. High-throughput 3dra segmentation of brain vasculature and aneurysms using deep learning
Khor et al. Anatomically constrained and attention-guided deep feature fusion for joint segmentation and deformable medical image registration
Ma et al. LCAUnet: A skin lesion segmentation network with enhanced edge and body fusion
CN114387282A (en) Accurate automatic segmentation method and system for medical image organs
Sengun et al. Automatic liver segmentation from CT images using deep learning algorithms: a comparative study
Zhang et al. SEG-LUS: A novel ultrasound segmentation method for liver and its accessory structures based on muti-head self-attention
CN116309754A (en) Brain medical image registration method and system based on local-global information collaboration
Wang et al. Triplanar convolutional neural network for automatic liver and tumor image segmentation
Bi et al. Hyper-Connected Transformer Network for Multi-Modality PET-CT Segmentation
Liu et al. Pool-UNet: Ischemic Stroke Segmentation from CT Perfusion Scans Using Poolformer UNet
Li et al. Diffusion Probabilistic Learning with Gate-fusion Transformer and Edge-frequency Attention for Retinal Vessel Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination