CN115512110A

CN115512110A - Medical image tumor segmentation method related to cross-modal attention mechanism

Info

Publication number: CN115512110A
Application number: CN202211163664.0A
Authority: CN
Inventors: 周亮; 陈顺; 陈建新; 李昂; 魏昕
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-12-23

Abstract

The invention discloses a medical image tumor segmentation method relating to a cross-modal attention mechanism, which comprises the following steps of: acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, and transversely scaling the PET image and the CT image into an image data pair with the same resolution; inputting the image data pair to a trained cross-modal image segmentation model to generate a target segmentation image; the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of extension encoders and a semantic fusion network based on a focus-focused unit, and the cross-modal image segmentation model is trained through historical positron emission computed tomography (PET) images and Computed Tomography (CT) images. The invention provides a medical image tumor segmentation method relating to a cross-modal attention mechanism, which can realize cross-modal accurate segmentation of a medical image.

Description

Medical image tumor segmentation method related to cross-modal attention mechanism

Technical Field

The invention relates to a medical image tumor segmentation method relating to a cross-modal attention mechanism, and belongs to the technical field of cross-modal image segmentation.

Background

Diffuse large B-cell lymphoma is a high-grade lymphoma type. Among medical images, in particular, a diffuse large B-cell lymphoma (PET-CT) image is generally the best choice for clinical analysis of lymphoma structure, and has also been successfully applied in the fields of computer-aided diagnosis and medical treatment.

PET is a molecular imaging technology based on cytology, the technology utilizes a tumor tracer agent to carry out tomography imaging on a human body from a cross section, a coronal plane and a sagittal plane respectively in an all-round way, dynamic changes of treatment response are shown in time through metabolism conditions of organ tissues, the technology has higher specificity and sensitivity, compared with other medical images, the PET image has stronger spatial resolution capability on a lesion area, particularly malignant tumor tissues, but because of the characteristics of the PET technology and the influence of image reconstruction, the PET image resolution and contrast are not high, image artifacts are serious, and partial volume effect and halo effect exist. The above problems may cause image non-uniformity and distortion, which may cause large errors in the actual focus position and boundary, thereby seriously affecting the accuracy and precision in the diagnosis and treatment process. CT imaging utilizes different absorption capacities of different tissues of a human body on X-rays to obtain structural information of a focus and the periphery, and because CT equipment has short image acquisition time and is less influenced by external factors, a positioning image is generally not distorted and is close to the actual shape of a tumor, the CT imaging technology is a technology which is widely applied at present. Although the CT image can better reflect the anatomy of organ tissues, accurately position the focus and show the focus morphology, when the density of the tissues infiltrated by the focus and the density of normal tissues have no obvious difference, certain difficulty is brought to the judgment of the position and the boundary of the focus. The PET image and the CT image are combined, so that more information of a focus and the periphery of a patient can be provided, and different image information has certain complementarity, wherein the CT image can perform attenuation correction on the PET image, solve the defect that the anatomical structure of the PET image is unclear, and simultaneously perform positioning analysis on the PET image reflecting the physiological, metabolic and functional characteristics of organs; qualitative and quantitative analysis of PET images, in turn, can provide valuable functional and metabolic information. Therefore, multi-modal lymphoma segmentation combined with PET-CT has important value in radiosurgery and radiotherapy planning.

In the processing of lymphoma images, the single-mode technology only focuses on single image feature extraction and processing, and neglects feature complementarity between cross-mode images. The uniqueness of the modalities has limited the development of medical image segmentation techniques. By combining with a rapidly developed neural network, the cross-modal image processing technology has both end-to-end convenience and extremely high accuracy. Learning-based methods, particularly those based on Convolutional Neural Networks (CNN), have evolved rapidly in medical image analysis over the past decade. CNN was originally proposed to accomplish the task of image-level classification. The visual application of CNN to image segmentation is a classification task at the pixel level, and each pixel is classified by a sliding window method (R-CNN). Later, U-Net is specially proposed for biomedical image segmentation, is a baseline network of various medical image segmentation tasks at present, and is inspired by a plurality of subsequent works, such as attention-UNet, denseUNet and the like, but the traditional single-mode convolutional neural network is difficult to realize great improvement of segmentation accuracy due to the feature singularity.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a medical image tumor segmentation method related to a cross-modal attention mechanism, so that accurate cross-modal medical image segmentation can be realized.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a medical image tumor segmentation method related to a cross-modal attention mechanism comprises the following steps:

acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, and transversely scaling the PET image and the CT image into an image data pair with the same resolution;

inputting the image data pair to a trained cross-modal image segmentation model to generate a target segmentation image;

the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;

the training method of the cross-modal image segmentation model comprises the following steps:

acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein the image data in each group of image data pairs are provided with tumor segmentation label information, and dividing the image data pairs into a training set, a verification set and a test set;

performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;

respectively extracting the layered features of the PET image and the CT image in the training set through an extension encoder, fusing through an image feature fusion module to obtain a PET-CT multi-modal fusion feature block, and outputting an attention feature map of a PET modality;

inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;

inputting the attention diagram of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature diagram;

similarity calculation is carried out between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and a weight vector of each segmentation characteristic corresponding to the current image characteristic is obtained through the operation of a sigmoid function;

judging each segmentation characteristic based on the weight vector to obtain a target segmentation image most similar to the current image characteristic;

calculating the joint loss of the image features through an image feature extraction module, wherein the joint loss comprises cross entropy loss obtained by comparing a target segmentation image with tumor segmentation label information and dice value loss, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.

The image feature extraction module comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, a pooling layer is connected behind every two convolutional layers, and PET image features v 'are respectively obtained by inputting historical positron emission computed tomography (PET) images and Computed Tomography (CT) images into the layers of the convolutional neural network CNN' _i ^(p) And CT image feature v' _i ^(c) Where i represents the number of layers, i =1,2,3,4.

The extended encoder comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, one pooling layer is connected behind every two convolutional layers, and the image feature fusion module comprises a feature fusion device structure f _p (. One.), is composed of two convolutional layers connected with one other pooling layer, and the convolutional neural network CNN extracts PET modal information v 'from the PET image and the CT image respectively' ₄ ^(p) And CT mode information v' ₄ ^(c) And input to the feature fusion framework f _p (. To) performing feature splicing and fusion, determining a fused feature layer, and obtaining an output which is a PET-CT multi-mode fusion feature block v' ₄ ^(pc) 。

The attention-aggregating unit includes a jump connector f _s () and a reverse propagator, PET-CT Multi-modality fused feature Block v' ₄ ^(pc) Input to the jump connector f _s (-) and CT modality information v' ₄ ^(c) Connecting, re-extracting and aggregating multiple layers of features, determining an output attention feature map through a sigmoid layer, reversely propagating to a feature extraction stage by using an attention feature structure by using a reverse propagator, re-sampling the features, optimizing the features by using the attention feature map, and determining the aggregated attention map

Wherein the content of the first and second substances,

and N is the total number of layers of image characteristics of the image data in the i-th layer down-sampling stage.

The joint loss is calculated through a cross entropy loss function, parameters of a joint loss updating cross modal image segmentation model are carried out by using a gradient descent algorithm, and the cross entropy loss is expressed as follows:

wherein y represents label of tumor segmentation label information, the positive class is 1, the negative class is 0,

representing the probability that the output image is predicted as positive. Judging whether the cross-modal image segmentation model is trained comprises the following steps: determining a Dice coefficient of the set similarity measurement function, and if the Dice coefficient is within a Dice coefficient threshold range, determining that the cross-modal image segmentation model is completely trained, wherein a Dice coefficient expression is expressed as follows: dice (a, B) =2 × (a ≠ B) \ (a ≡ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ≡ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≡ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.

A medical image tumor segmentation apparatus involving a cross-modal attention mechanism, comprising:

the image acquisition module is used for acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image;

the image scaling module is used for transversely scaling the PET image and the CT image into an image data pair with the same resolution;

respectively carrying out layered feature extraction on the PET image and the CT image in the training set through an extension encoder, obtaining a PET-CT multi-mode fusion feature block through fusion of an image feature fusion module, and outputting an attention feature map of a PET mode;

judging each segmentation feature based on the weight vector to obtain a target segmentation image most similar to the current image feature;

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for medical image tumor segmentation involving a cross-modal attention mechanism.

The invention has the beneficial effects that: the invention provides a medical image tumor segmentation method related to a cross-modal attention mechanism, which can be used for mining semantic correlation between a PET (positron emission tomography) modality and a CT (computed tomography) modality, constructing a cross-modal public subspace, thus making up the gap between heterogeneous data, and designing an attention interaction network at the same time, so that a generated model can train data of the two modalities at the same time, and the quality and the accuracy of a segmented image are improved; the cross-mode automatic segmentation task for the tumor medical image can be realized, the problems of low manual segmentation efficiency and inconsistent labeling quality are effectively solved, and the medical image segmentation efficiency is improved.

Drawings

FIG. 1 is a PET image and a CT image related to the present invention;

FIG. 2 is a flow chart of a medical image tumor segmentation method involving a cross-modality attention mechanism according to the present invention;

FIG. 3 is a schematic diagram of a cross-modal image segmentation model structure according to the present invention;

FIG. 4 is a schematic diagram of an attention network structure according to the present invention;

FIG. 5 is a comparison graph of the medical image tumor segmentation method of the present invention and the results of the image performance generated by the prior art method.

Detailed Description

The invention is further described with reference to the accompanying drawings, and the following examples are only used to illustrate the technical solutions of the invention more clearly, and should not be taken as limiting the scope of the invention.

As shown in fig. 2, the present invention discloses a medical image tumor segmentation method involving a cross-modal attention mechanism, comprising the following steps:

acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, wherein the two modal images comprise an image of a high-grade tumor patient and an image of a low-grade tumor patient as specifically shown in figure 1; transversely scaling the images of the two modes into image data pairs with the same resolution, converting the PET image intensity into SUV, and converting the CT image into Hounsfield units; the image pairs of the two modalities are divided into a training set, a verification set and a test set. The image data in each set of image data pairs carries tumor segmentation label information.

And step two, establishing a cross-modal image segmentation model based on the medical image, wherein the structure of the cross-modal image segmentation model structure is shown as a structure rabbit 3 of the cross-modal image segmentation model, and the cross-modal image segmentation model structure comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module based on an attention aggregation unit.

The image feature extraction module comprises a convolutional neural network CNN and is used for performing feature extraction on single-channel image data in a training set to obtain single-mode PET image features and CT image features. The image data feature extraction method comprises the following steps: in the down-sampling stage, any group of historical Positron Emission Tomography (PET) images and Computed Tomography (CT) images such as (2-1) image data P and C in the training set are extracted and input to each layer of the Convolutional Neural Network (CNN) to respectively obtain image features v' _i ^(c) ，v′ _i ^(c) I =1,2,3,4, cnn comprises a plurality of convolutional layers, each two convolutional layers being followed by a pooling layer; the neural network model with multi-level feature re-extraction is a two-input and two-output neural network model, and outputsThe ends contain PET and CT image features, respectively.

The cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit, wherein the expansion encoders are used for extracting layered features of PET images and CT images in a training set respectively, specifically, the expansion encoders comprise CNN neural networks, and features of a modal are extracted by using CNN neural network structures in a down-sampling stage. Then, fusing to obtain a PET-CT multi-modal fusion feature block, wherein feature fusion of the image data through an image feature fusion module comprises the following steps: extracting two modal information v 'of PET and CT from CNN network in down-sampling stage' ₄ ^(p) 、v′ ₄ ^(c) Via a feature fusion framework f _p () performing splicing fusion of features, determining a fused feature layer, and obtaining an output of v' ₄ ^(pc) . The fusion structure may in particular be constituted by two convolution layers connected by one pooling layer.

The schematic diagram of the attention network structure of the invention is shown in fig. 4, and the specific process of outputting the attention feature map of the PET modality is as follows: an up-sampling phase for inputting the extracted modal characteristics into the jump connector f _s (. To) with a multi-layer signature v 'of the down-sampling portion' _i ^(c) I =1,2,3,4, connecting, performing multi-layer feature re-extraction and aggregation, and finally determining an output attention feature map M through a sigmoid layer. The sigmoid layer normalizes the upsampled features H:

M＝sigmoid(H)

semantic fusion network matching CT (computed tomography) features and extracted v based on PET (positron emission tomography) attention mechanism ^(f) Each of which is

As a query vector, a sum is screened out

Generating fusion features belonging to the same class

And

form a fused feature-attention pair, then ^(f) The corresponding resultant attention characteristic is

The method comprises the following specific steps:

constructing a multilayer reverse propagator, reversely propagating to a feature extraction stage by using an attention feature structure, performing cross-modal feature resampling, optimizing features by using an attention map, and determining a fused attention feature map

Then, the PET attention map and the CT image features of the image feature extraction module are input into a semantic fusion network based on an attention aggregation unit together through standard back propagation to carry out primary semantic fusion to obtain an attention feature map of a fusion mode; performing upsampling semantic fusion on the attention map of the fusion mode and the extracted PET-CT multi-mode fusion feature block for the second time, and recovering a feature structure to obtain a final attention fusion feature map; and performing similarity calculation between the final attention fusion characteristic diagram and the PET attention diagram, obtaining a weight vector of each segmentation characteristic corresponding to the current image characteristic through the operation of a sigmoid function, and judging each segmentation characteristic result based on the weight vector to obtain a segmentation result most similar to the current image characteristic.

And step three, training the cross-modal medical image segmentation model on a training set. Extracting any group of historical Positron Emission Tomography (PET) images and Computed Tomography (CT) images in a training set, inputting the images into a cross-modal medical segmentation network model, and determining segmented tumor images, wherein the method further comprises the following steps: acquiring tumor segmentation label information corresponding to a tumor mask;

comparing the tumor image segmented by adopting a cross-modal method with the tumor segmentation label information corresponding to the historical tumor image in the training set, and calculating the joint loss of image characteristics according to an image characteristic extraction module, wherein the joint loss comprises the cross entropy loss of the cross-modal segmentation result compared with a true value image and the loss of a dice value. Calculating loss through a cross entropy loss function, continuously training a cross-modal medical segmentation neural network model by using a gradient descent algorithm, and using the calculated cross entropy loss to update parameters of the cross-modal image segmentation model.

The cross entropy loss can be expressed as:

indicating the probability that the output image is predicted as a positive class.

Continuously training a cross-modal medical segmentation neural network model by using a gradient descent algorithm, and then determining a face coefficient of a set similarity measurement function; and if the dice coefficient is within the dice coefficient threshold range, determining that the training of the cross-modal supervised learning medical segmentation neural network model is finished. The Dice coefficient expression is expressed as: dice (a, B) =2 × (a ≠ B)/(a ≡ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ∞ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≡ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.

And step four, after the training is finished, inputting the trained cross-modal image segmentation models in pairs into the cross-modal medical images of the test set, and outputting the cross-modal medical images as target segmentation images.

The invention also discloses a medical image tumor segmentation device related to the cross-modal attention mechanism, which comprises:

acquiring a historical positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein image data in each group of image data pairs are provided with tumor segmentation label information, and then dividing the image data pairs into a training set, a verification set and a test set;

inputting the attention map of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for performing upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature map;

calculating the joint loss of the image characteristics through an image characteristic extraction module, wherein the joint loss comprises cross entropy loss and dice value loss, the cross entropy loss is obtained by comparing a target segmentation image with tumor segmentation label information, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.

The invention also discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of medical image tumor segmentation involving a cross-modality attention mechanism as described above.

Comparing the method of the invention with the existing cross-modal image generation method:

(1) Simulation conditions

The invention is an experiment simulation carried out by Python software on an operating system with a central processing unit of Intel (R) Core (TM) i3-21203.30GHZ, a memory 6G and windows 10.

(2) Emulation content

The prior method I comprises the following steps: conventional UNet networks segment single-modality PET images.

The prior method II comprises the following steps: conventional UNet networks segment single-modality CT images.

The existing method three: the single-mode PET image is segmented using Attention-UNet with an added Attention mechanism.

The prior method comprises the following steps: the single-mode CT image is segmented by Attention-UNet with Attention mechanism added.

The existing method comprises the following steps: and (4) segmenting the pixel-level fusion PET-CT image by using a UNet network.

The existing method comprises the following steps: the CT image is segmented using single-modality PET as the surveillance information.

(3) Simulation result

The six methods and the Dice indexes respectively corresponding to the invention under the private PET-CT data set are respectively given in the simulation experiment, and the larger the index is, the higher the image classification accuracy is, which is specifically shown in Table 1.

TABLE 1 various indices for haptic generation of images under PET-CT data set

This was observed by combining the results of the generation of FIG. 5 and the evaluation results of Table 1. The method of the invention generates the segmentation image with the highest accuracy. The model is superior to other models, and has the highest semantic accuracy for the same generation target, and simultaneously verifies the importance degree of the cross-modal pairing network in the model.

In summary, the medical image segmentation method related to the cross-modal attention mechanism disclosed by the invention is mainly used for segmenting the PET-CT medical image, and meanwhile, the cross-modal interaction network is constructed by combining the attention mechanism, so that the problem that the traditional generation model only uses the limitation of a single mode to cause low segmentation precision is solved, and commonly used Dice is used as a performance evaluation index. The method not only considers the characteristics of the tumor on the single mode of the sample, but also considers the complementarity of the characteristics among different modes, and provides guarantee for the accuracy of generating the corresponding segmentation image by the multi-mode PET-CT. The method comprises the following implementation steps: the method comprises the following steps of (1) collecting a cross-modal PET-CT data set (2), setting a model (3) and training the model, wherein the training comprises the steps of carrying out feature extraction on a PET image sample and a CT image sample; an extraction attention map based on the PET image; constructing an attention interaction network for the PET attention diagram and the CT image characteristics, mapping the PET attention characteristics serving as supervision information and the CT characteristic information to a cross-modal public subspace, and strongly pairing the public representation of each modal according to the category and the distribution; cross-distribution alignment is carried out by using loss functions of similarity between the modes and similarity in the modes, and meanwhile, the category consistency of each mode is kept; training the network by adopting an alternative iteration method; and (4) generating a segmentation image. The invention utilizes the attention mechanism network to realize the strong pairing and semantic fusion of the PET-CT modal data according to the semantic correlation among different modal data, thereby generating the corresponding segmented image with better quality and higher precision, being applicable to multi-modal services such as medical image segmentation and the like, and improving the efficiency of the imaging segmentation.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A medical image tumor segmentation method related to a cross-modal attention mechanism is characterized by comprising the following steps: the method comprises the following steps:

acquiring a positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image;

transversely scaling the PET image and the CT image into image data pairs with the same resolution;

2. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: the image feature extraction module comprises a Convolutional Neural Network (CNN), the Convolutional Neural Network (CNN) comprises a plurality of convolutional layers, a pooling layer is connected behind each two convolutional layers, and the PET image features are respectively obtained by inputting a historical positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image into each layer of the Convolutional Neural Network (CNN)

And CT image features

Where i represents the number of layers, i =1,2,3,4.

3. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 2, wherein: the extended encoder comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, one pooling layer is connected behind every two convolutional layers, and the image feature fusion module comprises a feature fusion device structure f _p The convolution neural network CNN is formed by connecting two convolution layers with one pooling layer, and the convolution neural network CNN respectively extracts PET modal information from the PET image and the CT image

And CT modality information

And input to the feature fusion framework f _p Carrying out feature splicing and fusion, determining a fused feature layer, and obtaining an output which is a PET-CT multi-modal fusion feature block

4. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 3, wherein: the attention-aggregating unit includes a jump connector f _s () and inverse propagator, PET-CT Multi-modality fusion feature Block

Input to the jump connector f _s (. One.) with CT modality information

Connecting, re-extracting and aggregating multiple layers of features, determining an output attention feature map through a sigmoid layer, reversely propagating to a feature extraction stage by using an attention feature structure through a reverse propagator, re-sampling the features, and utilizing the feature re-samplingAttention feature map optimization of features to determine aggregated attention maps

Wherein the content of the first and second substances,

5. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: the joint loss is calculated through a cross entropy loss function, parameters of a joint loss updating cross modal image segmentation model are carried out by using a gradient descent algorithm, and the cross entropy loss is expressed as follows:

wherein y represents label of tumor segmentation label information, positive class is 1, negative class is 0,

representing the probability that the output image is predicted as positive.

6. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: judging whether the cross-modal image segmentation model is trained comprises the following steps: determining a Dice coefficient of the set similarity measurement function, and if the Dice coefficient is within a Dice coefficient threshold range, determining that the cross-modal image segmentation model is trained completely, wherein a Dice coefficient expression is represented as: dice (, B) =2 × (a ═ B) \ (a ═ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ≈ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≈ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.

7. A medical image tumor segmentation apparatus relating to a cross-modality attention mechanism, comprising:

calculating the similarity between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and obtaining the weight vector of each segmentation characteristic corresponding to the current image characteristic through the operation of a sigmoid function;

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of medical image tumor segmentation involving a cross-modality attention mechanism as set forth in any one of claims 1-6.