CN115512110A - Medical image tumor segmentation method related to cross-modal attention mechanism - Google Patents

Medical image tumor segmentation method related to cross-modal attention mechanism Download PDF

Info

Publication number
CN115512110A
CN115512110A CN202211163664.0A CN202211163664A CN115512110A CN 115512110 A CN115512110 A CN 115512110A CN 202211163664 A CN202211163664 A CN 202211163664A CN 115512110 A CN115512110 A CN 115512110A
Authority
CN
China
Prior art keywords
image
cross
modal
pet
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211163664.0A
Other languages
Chinese (zh)
Inventor
周亮
陈顺
陈建新
李昂
魏昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211163664.0A priority Critical patent/CN115512110A/en
Publication of CN115512110A publication Critical patent/CN115512110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Abstract

The invention discloses a medical image tumor segmentation method relating to a cross-modal attention mechanism, which comprises the following steps of: acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, and transversely scaling the PET image and the CT image into an image data pair with the same resolution; inputting the image data pair to a trained cross-modal image segmentation model to generate a target segmentation image; the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of extension encoders and a semantic fusion network based on a focus-focused unit, and the cross-modal image segmentation model is trained through historical positron emission computed tomography (PET) images and Computed Tomography (CT) images. The invention provides a medical image tumor segmentation method relating to a cross-modal attention mechanism, which can realize cross-modal accurate segmentation of a medical image.

Description

Medical image tumor segmentation method related to cross-modal attention mechanism
Technical Field
The invention relates to a medical image tumor segmentation method relating to a cross-modal attention mechanism, and belongs to the technical field of cross-modal image segmentation.
Background
Diffuse large B-cell lymphoma is a high-grade lymphoma type. Among medical images, in particular, a diffuse large B-cell lymphoma (PET-CT) image is generally the best choice for clinical analysis of lymphoma structure, and has also been successfully applied in the fields of computer-aided diagnosis and medical treatment.
PET is a molecular imaging technology based on cytology, the technology utilizes a tumor tracer agent to carry out tomography imaging on a human body from a cross section, a coronal plane and a sagittal plane respectively in an all-round way, dynamic changes of treatment response are shown in time through metabolism conditions of organ tissues, the technology has higher specificity and sensitivity, compared with other medical images, the PET image has stronger spatial resolution capability on a lesion area, particularly malignant tumor tissues, but because of the characteristics of the PET technology and the influence of image reconstruction, the PET image resolution and contrast are not high, image artifacts are serious, and partial volume effect and halo effect exist. The above problems may cause image non-uniformity and distortion, which may cause large errors in the actual focus position and boundary, thereby seriously affecting the accuracy and precision in the diagnosis and treatment process. CT imaging utilizes different absorption capacities of different tissues of a human body on X-rays to obtain structural information of a focus and the periphery, and because CT equipment has short image acquisition time and is less influenced by external factors, a positioning image is generally not distorted and is close to the actual shape of a tumor, the CT imaging technology is a technology which is widely applied at present. Although the CT image can better reflect the anatomy of organ tissues, accurately position the focus and show the focus morphology, when the density of the tissues infiltrated by the focus and the density of normal tissues have no obvious difference, certain difficulty is brought to the judgment of the position and the boundary of the focus. The PET image and the CT image are combined, so that more information of a focus and the periphery of a patient can be provided, and different image information has certain complementarity, wherein the CT image can perform attenuation correction on the PET image, solve the defect that the anatomical structure of the PET image is unclear, and simultaneously perform positioning analysis on the PET image reflecting the physiological, metabolic and functional characteristics of organs; qualitative and quantitative analysis of PET images, in turn, can provide valuable functional and metabolic information. Therefore, multi-modal lymphoma segmentation combined with PET-CT has important value in radiosurgery and radiotherapy planning.
In the processing of lymphoma images, the single-mode technology only focuses on single image feature extraction and processing, and neglects feature complementarity between cross-mode images. The uniqueness of the modalities has limited the development of medical image segmentation techniques. By combining with a rapidly developed neural network, the cross-modal image processing technology has both end-to-end convenience and extremely high accuracy. Learning-based methods, particularly those based on Convolutional Neural Networks (CNN), have evolved rapidly in medical image analysis over the past decade. CNN was originally proposed to accomplish the task of image-level classification. The visual application of CNN to image segmentation is a classification task at the pixel level, and each pixel is classified by a sliding window method (R-CNN). Later, U-Net is specially proposed for biomedical image segmentation, is a baseline network of various medical image segmentation tasks at present, and is inspired by a plurality of subsequent works, such as attention-UNet, denseUNet and the like, but the traditional single-mode convolutional neural network is difficult to realize great improvement of segmentation accuracy due to the feature singularity.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a medical image tumor segmentation method related to a cross-modal attention mechanism, so that accurate cross-modal medical image segmentation can be realized.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a medical image tumor segmentation method related to a cross-modal attention mechanism comprises the following steps:
acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, and transversely scaling the PET image and the CT image into an image data pair with the same resolution;
inputting the image data pair to a trained cross-modal image segmentation model to generate a target segmentation image;
the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;
the training method of the cross-modal image segmentation model comprises the following steps:
acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein the image data in each group of image data pairs are provided with tumor segmentation label information, and dividing the image data pairs into a training set, a verification set and a test set;
performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;
respectively extracting the layered features of the PET image and the CT image in the training set through an extension encoder, fusing through an image feature fusion module to obtain a PET-CT multi-modal fusion feature block, and outputting an attention feature map of a PET modality;
inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;
inputting the attention diagram of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature diagram;
similarity calculation is carried out between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and a weight vector of each segmentation characteristic corresponding to the current image characteristic is obtained through the operation of a sigmoid function;
judging each segmentation characteristic based on the weight vector to obtain a target segmentation image most similar to the current image characteristic;
calculating the joint loss of the image features through an image feature extraction module, wherein the joint loss comprises cross entropy loss obtained by comparing a target segmentation image with tumor segmentation label information and dice value loss, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.
The image feature extraction module comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, a pooling layer is connected behind every two convolutional layers, and PET image features v 'are respectively obtained by inputting historical positron emission computed tomography (PET) images and Computed Tomography (CT) images into the layers of the convolutional neural network CNN' i (p) And CT image feature v' i (c) Where i represents the number of layers, i =1,2,3,4.
The extended encoder comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, one pooling layer is connected behind every two convolutional layers, and the image feature fusion module comprises a feature fusion device structure f p (. One.), is composed of two convolutional layers connected with one other pooling layer, and the convolutional neural network CNN extracts PET modal information v 'from the PET image and the CT image respectively' 4 (p) And CT mode information v' 4 (c) And input to the feature fusion framework f p (. To) performing feature splicing and fusion, determining a fused feature layer, and obtaining an output which is a PET-CT multi-mode fusion feature block v' 4 (pc)
The attention-aggregating unit includes a jump connector f s () and a reverse propagator, PET-CT Multi-modality fused feature Block v' 4 (pc) Input to the jump connector f s (-) and CT modality information v' 4 (c) Connecting, re-extracting and aggregating multiple layers of features, determining an output attention feature map through a sigmoid layer, reversely propagating to a feature extraction stage by using an attention feature structure by using a reverse propagator, re-sampling the features, optimizing the features by using the attention feature map, and determining the aggregated attention map
Figure BDA0003861271620000041
Wherein the content of the first and second substances,
Figure BDA0003861271620000042
and N is the total number of layers of image characteristics of the image data in the i-th layer down-sampling stage.
The joint loss is calculated through a cross entropy loss function, parameters of a joint loss updating cross modal image segmentation model are carried out by using a gradient descent algorithm, and the cross entropy loss is expressed as follows:
Figure BDA0003861271620000043
wherein y represents label of tumor segmentation label information, the positive class is 1, the negative class is 0,
Figure BDA0003861271620000044
representing the probability that the output image is predicted as positive. Judging whether the cross-modal image segmentation model is trained comprises the following steps: determining a Dice coefficient of the set similarity measurement function, and if the Dice coefficient is within a Dice coefficient threshold range, determining that the cross-modal image segmentation model is completely trained, wherein a Dice coefficient expression is expressed as follows: dice (a, B) =2 × (a ≠ B) \ (a ≡ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ≡ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≡ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.
A medical image tumor segmentation apparatus involving a cross-modal attention mechanism, comprising:
the image acquisition module is used for acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image;
the image scaling module is used for transversely scaling the PET image and the CT image into an image data pair with the same resolution;
the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;
the training method of the cross-modal image segmentation model comprises the following steps:
acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein the image data in each group of image data pairs are provided with tumor segmentation label information, and dividing the image data pairs into a training set, a verification set and a test set;
performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;
respectively carrying out layered feature extraction on the PET image and the CT image in the training set through an extension encoder, obtaining a PET-CT multi-mode fusion feature block through fusion of an image feature fusion module, and outputting an attention feature map of a PET mode;
inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;
inputting the attention diagram of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature diagram;
similarity calculation is carried out between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and a weight vector of each segmentation characteristic corresponding to the current image characteristic is obtained through the operation of a sigmoid function;
judging each segmentation feature based on the weight vector to obtain a target segmentation image most similar to the current image feature;
calculating the joint loss of the image features through an image feature extraction module, wherein the joint loss comprises cross entropy loss obtained by comparing a target segmentation image with tumor segmentation label information and dice value loss, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for medical image tumor segmentation involving a cross-modal attention mechanism.
The invention has the beneficial effects that: the invention provides a medical image tumor segmentation method related to a cross-modal attention mechanism, which can be used for mining semantic correlation between a PET (positron emission tomography) modality and a CT (computed tomography) modality, constructing a cross-modal public subspace, thus making up the gap between heterogeneous data, and designing an attention interaction network at the same time, so that a generated model can train data of the two modalities at the same time, and the quality and the accuracy of a segmented image are improved; the cross-mode automatic segmentation task for the tumor medical image can be realized, the problems of low manual segmentation efficiency and inconsistent labeling quality are effectively solved, and the medical image segmentation efficiency is improved.
Drawings
FIG. 1 is a PET image and a CT image related to the present invention;
FIG. 2 is a flow chart of a medical image tumor segmentation method involving a cross-modality attention mechanism according to the present invention;
FIG. 3 is a schematic diagram of a cross-modal image segmentation model structure according to the present invention;
FIG. 4 is a schematic diagram of an attention network structure according to the present invention;
FIG. 5 is a comparison graph of the medical image tumor segmentation method of the present invention and the results of the image performance generated by the prior art method.
Detailed Description
The invention is further described with reference to the accompanying drawings, and the following examples are only used to illustrate the technical solutions of the invention more clearly, and should not be taken as limiting the scope of the invention.
As shown in fig. 2, the present invention discloses a medical image tumor segmentation method involving a cross-modal attention mechanism, comprising the following steps:
acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, wherein the two modal images comprise an image of a high-grade tumor patient and an image of a low-grade tumor patient as specifically shown in figure 1; transversely scaling the images of the two modes into image data pairs with the same resolution, converting the PET image intensity into SUV, and converting the CT image into Hounsfield units; the image pairs of the two modalities are divided into a training set, a verification set and a test set. The image data in each set of image data pairs carries tumor segmentation label information.
And step two, establishing a cross-modal image segmentation model based on the medical image, wherein the structure of the cross-modal image segmentation model structure is shown as a structure rabbit 3 of the cross-modal image segmentation model, and the cross-modal image segmentation model structure comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module based on an attention aggregation unit.
The image feature extraction module comprises a convolutional neural network CNN and is used for performing feature extraction on single-channel image data in a training set to obtain single-mode PET image features and CT image features. The image data feature extraction method comprises the following steps: in the down-sampling stage, any group of historical Positron Emission Tomography (PET) images and Computed Tomography (CT) images such as (2-1) image data P and C in the training set are extracted and input to each layer of the Convolutional Neural Network (CNN) to respectively obtain image features v' i (c) ,v′ i (c) I =1,2,3,4, cnn comprises a plurality of convolutional layers, each two convolutional layers being followed by a pooling layer; the neural network model with multi-level feature re-extraction is a two-input and two-output neural network model, and outputsThe ends contain PET and CT image features, respectively.
The cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit, wherein the expansion encoders are used for extracting layered features of PET images and CT images in a training set respectively, specifically, the expansion encoders comprise CNN neural networks, and features of a modal are extracted by using CNN neural network structures in a down-sampling stage. Then, fusing to obtain a PET-CT multi-modal fusion feature block, wherein feature fusion of the image data through an image feature fusion module comprises the following steps: extracting two modal information v 'of PET and CT from CNN network in down-sampling stage' 4 (p) 、v′ 4 (c) Via a feature fusion framework f p () performing splicing fusion of features, determining a fused feature layer, and obtaining an output of v' 4 (pc) . The fusion structure may in particular be constituted by two convolution layers connected by one pooling layer.
The schematic diagram of the attention network structure of the invention is shown in fig. 4, and the specific process of outputting the attention feature map of the PET modality is as follows: an up-sampling phase for inputting the extracted modal characteristics into the jump connector f s (. To) with a multi-layer signature v 'of the down-sampling portion' i (c) I =1,2,3,4, connecting, performing multi-layer feature re-extraction and aggregation, and finally determining an output attention feature map M through a sigmoid layer. The sigmoid layer normalizes the upsampled features H:
M=sigmoid(H)
semantic fusion network matching CT (computed tomography) features and extracted v based on PET (positron emission tomography) attention mechanism (f) Each of which is
Figure BDA0003861271620000081
As a query vector, a sum is screened out
Figure BDA0003861271620000082
Generating fusion features belonging to the same class
Figure BDA0003861271620000083
And
Figure BDA0003861271620000084
form a fused feature-attention pair, then (f) The corresponding resultant attention characteristic is
Figure BDA0003861271620000085
The method comprises the following specific steps:
constructing a multilayer reverse propagator, reversely propagating to a feature extraction stage by using an attention feature structure, performing cross-modal feature resampling, optimizing features by using an attention map, and determining a fused attention feature map
Figure BDA0003861271620000086
Then, the PET attention map and the CT image features of the image feature extraction module are input into a semantic fusion network based on an attention aggregation unit together through standard back propagation to carry out primary semantic fusion to obtain an attention feature map of a fusion mode; performing upsampling semantic fusion on the attention map of the fusion mode and the extracted PET-CT multi-mode fusion feature block for the second time, and recovering a feature structure to obtain a final attention fusion feature map; and performing similarity calculation between the final attention fusion characteristic diagram and the PET attention diagram, obtaining a weight vector of each segmentation characteristic corresponding to the current image characteristic through the operation of a sigmoid function, and judging each segmentation characteristic result based on the weight vector to obtain a segmentation result most similar to the current image characteristic.
And step three, training the cross-modal medical image segmentation model on a training set. Extracting any group of historical Positron Emission Tomography (PET) images and Computed Tomography (CT) images in a training set, inputting the images into a cross-modal medical segmentation network model, and determining segmented tumor images, wherein the method further comprises the following steps: acquiring tumor segmentation label information corresponding to a tumor mask;
comparing the tumor image segmented by adopting a cross-modal method with the tumor segmentation label information corresponding to the historical tumor image in the training set, and calculating the joint loss of image characteristics according to an image characteristic extraction module, wherein the joint loss comprises the cross entropy loss of the cross-modal segmentation result compared with a true value image and the loss of a dice value. Calculating loss through a cross entropy loss function, continuously training a cross-modal medical segmentation neural network model by using a gradient descent algorithm, and using the calculated cross entropy loss to update parameters of the cross-modal image segmentation model.
The cross entropy loss can be expressed as:
Figure BDA0003861271620000091
wherein y represents label of tumor segmentation label information, the positive class is 1, the negative class is 0,
Figure BDA0003861271620000092
indicating the probability that the output image is predicted as a positive class.
Continuously training a cross-modal medical segmentation neural network model by using a gradient descent algorithm, and then determining a face coefficient of a set similarity measurement function; and if the dice coefficient is within the dice coefficient threshold range, determining that the training of the cross-modal supervised learning medical segmentation neural network model is finished. The Dice coefficient expression is expressed as: dice (a, B) =2 × (a ≠ B)/(a ≡ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ∞ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≡ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.
And step four, after the training is finished, inputting the trained cross-modal image segmentation models in pairs into the cross-modal medical images of the test set, and outputting the cross-modal medical images as target segmentation images.
The invention also discloses a medical image tumor segmentation device related to the cross-modal attention mechanism, which comprises:
the image acquisition module is used for acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image;
the image scaling module is used for transversely scaling the PET image and the CT image into an image data pair with the same resolution;
the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;
the training method of the cross-modal image segmentation model comprises the following steps:
acquiring a historical positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein image data in each group of image data pairs are provided with tumor segmentation label information, and then dividing the image data pairs into a training set, a verification set and a test set;
performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;
respectively carrying out layered feature extraction on the PET image and the CT image in the training set through an extension encoder, obtaining a PET-CT multi-mode fusion feature block through fusion of an image feature fusion module, and outputting an attention feature map of a PET mode;
inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;
inputting the attention map of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for performing upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature map;
similarity calculation is carried out between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and a weight vector of each segmentation characteristic corresponding to the current image characteristic is obtained through the operation of a sigmoid function;
judging each segmentation characteristic based on the weight vector to obtain a target segmentation image most similar to the current image characteristic;
calculating the joint loss of the image characteristics through an image characteristic extraction module, wherein the joint loss comprises cross entropy loss and dice value loss, the cross entropy loss is obtained by comparing a target segmentation image with tumor segmentation label information, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.
The invention also discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of medical image tumor segmentation involving a cross-modality attention mechanism as described above.
Comparing the method of the invention with the existing cross-modal image generation method:
(1) Simulation conditions
The invention is an experiment simulation carried out by Python software on an operating system with a central processing unit of Intel (R) Core (TM) i3-21203.30GHZ, a memory 6G and windows 10.
(2) Emulation content
The prior method I comprises the following steps: conventional UNet networks segment single-modality PET images.
The prior method II comprises the following steps: conventional UNet networks segment single-modality CT images.
The existing method three: the single-mode PET image is segmented using Attention-UNet with an added Attention mechanism.
The prior method comprises the following steps: the single-mode CT image is segmented by Attention-UNet with Attention mechanism added.
The existing method comprises the following steps: and (4) segmenting the pixel-level fusion PET-CT image by using a UNet network.
The existing method comprises the following steps: the CT image is segmented using single-modality PET as the surveillance information.
(3) Simulation result
The six methods and the Dice indexes respectively corresponding to the invention under the private PET-CT data set are respectively given in the simulation experiment, and the larger the index is, the higher the image classification accuracy is, which is specifically shown in Table 1.
TABLE 1 various indices for haptic generation of images under PET-CT data set
Figure BDA0003861271620000121
This was observed by combining the results of the generation of FIG. 5 and the evaluation results of Table 1. The method of the invention generates the segmentation image with the highest accuracy. The model is superior to other models, and has the highest semantic accuracy for the same generation target, and simultaneously verifies the importance degree of the cross-modal pairing network in the model.
In summary, the medical image segmentation method related to the cross-modal attention mechanism disclosed by the invention is mainly used for segmenting the PET-CT medical image, and meanwhile, the cross-modal interaction network is constructed by combining the attention mechanism, so that the problem that the traditional generation model only uses the limitation of a single mode to cause low segmentation precision is solved, and commonly used Dice is used as a performance evaluation index. The method not only considers the characteristics of the tumor on the single mode of the sample, but also considers the complementarity of the characteristics among different modes, and provides guarantee for the accuracy of generating the corresponding segmentation image by the multi-mode PET-CT. The method comprises the following implementation steps: the method comprises the following steps of (1) collecting a cross-modal PET-CT data set (2), setting a model (3) and training the model, wherein the training comprises the steps of carrying out feature extraction on a PET image sample and a CT image sample; an extraction attention map based on the PET image; constructing an attention interaction network for the PET attention diagram and the CT image characteristics, mapping the PET attention characteristics serving as supervision information and the CT characteristic information to a cross-modal public subspace, and strongly pairing the public representation of each modal according to the category and the distribution; cross-distribution alignment is carried out by using loss functions of similarity between the modes and similarity in the modes, and meanwhile, the category consistency of each mode is kept; training the network by adopting an alternative iteration method; and (4) generating a segmentation image. The invention utilizes the attention mechanism network to realize the strong pairing and semantic fusion of the PET-CT modal data according to the semantic correlation among different modal data, thereby generating the corresponding segmented image with better quality and higher precision, being applicable to multi-modal services such as medical image segmentation and the like, and improving the efficiency of the imaging segmentation.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (8)

1. A medical image tumor segmentation method related to a cross-modal attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
acquiring a positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image;
transversely scaling the PET image and the CT image into image data pairs with the same resolution;
inputting the image data pair to a trained cross-modal image segmentation model to generate a target segmentation image;
the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;
the training method of the cross-modal image segmentation model comprises the following steps:
acquiring a historical positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein image data in each group of image data pairs are provided with tumor segmentation label information, and then dividing the image data pairs into a training set, a verification set and a test set;
performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;
respectively extracting the layered features of the PET image and the CT image in the training set through an extension encoder, fusing through an image feature fusion module to obtain a PET-CT multi-modal fusion feature block, and outputting an attention feature map of a PET modality;
inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;
inputting the attention map of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for performing upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature map;
similarity calculation is carried out between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and a weight vector of each segmentation characteristic corresponding to the current image characteristic is obtained through the operation of a sigmoid function;
judging each segmentation characteristic based on the weight vector to obtain a target segmentation image most similar to the current image characteristic;
calculating the joint loss of the image features through an image feature extraction module, wherein the joint loss comprises cross entropy loss obtained by comparing a target segmentation image with tumor segmentation label information and dice value loss, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.
2. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: the image feature extraction module comprises a Convolutional Neural Network (CNN), the Convolutional Neural Network (CNN) comprises a plurality of convolutional layers, a pooling layer is connected behind each two convolutional layers, and the PET image features are respectively obtained by inputting a historical positron emission computed tomography (PET) image and an electronic Computed Tomography (CT) image into each layer of the Convolutional Neural Network (CNN)
Figure FDA0003861271610000021
And CT image features
Figure FDA0003861271610000022
Where i represents the number of layers, i =1,2,3,4.
3. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 2, wherein: the extended encoder comprises a convolutional neural network CNN, the convolutional neural network CNN comprises a plurality of convolutional layers, one pooling layer is connected behind every two convolutional layers, and the image feature fusion module comprises a feature fusion device structure f p The convolution neural network CNN is formed by connecting two convolution layers with one pooling layer, and the convolution neural network CNN respectively extracts PET modal information from the PET image and the CT image
Figure FDA0003861271610000023
And CT modality information
Figure FDA0003861271610000024
And input to the feature fusion framework f p Carrying out feature splicing and fusion, determining a fused feature layer, and obtaining an output which is a PET-CT multi-modal fusion feature block
Figure FDA0003861271610000025
4. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 3, wherein: the attention-aggregating unit includes a jump connector f s () and inverse propagator, PET-CT Multi-modality fusion feature Block
Figure FDA0003861271610000026
Input to the jump connector f s (. One.) with CT modality information
Figure FDA0003861271610000027
Connecting, re-extracting and aggregating multiple layers of features, determining an output attention feature map through a sigmoid layer, reversely propagating to a feature extraction stage by using an attention feature structure through a reverse propagator, re-sampling the features, and utilizing the feature re-samplingAttention feature map optimization of features to determine aggregated attention maps
Figure FDA0003861271610000031
Wherein the content of the first and second substances,
Figure FDA0003861271610000032
and N is the total number of layers of image characteristics of the image data in the i-th layer down-sampling stage.
5. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: the joint loss is calculated through a cross entropy loss function, parameters of a joint loss updating cross modal image segmentation model are carried out by using a gradient descent algorithm, and the cross entropy loss is expressed as follows:
Figure FDA0003861271610000033
wherein y represents label of tumor segmentation label information, positive class is 1, negative class is 0,
Figure FDA0003861271610000034
representing the probability that the output image is predicted as positive.
6. The method for segmenting the tumor of the medical image related to the cross-modal attention mechanism as claimed in claim 1, wherein: judging whether the cross-modal image segmentation model is trained comprises the following steps: determining a Dice coefficient of the set similarity measurement function, and if the Dice coefficient is within a Dice coefficient threshold range, determining that the cross-modal image segmentation model is trained completely, wherein a Dice coefficient expression is represented as: dice (, B) =2 × (a ═ B) \ (a ═ B), where a is the number of tumor pixels of the network output image, B is the number of tumor pixels of the tumor segmentation label information, a ≈ B is the number of pixels of the tumor segmentation label information label that is a positive class, the network output image is also predicted to be a positive class, and a ≈ B is the total number of pixels of the tumor segmentation label information and the tumor region of the network output image.
7. A medical image tumor segmentation apparatus relating to a cross-modality attention mechanism, comprising:
the image acquisition module is used for acquiring a Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image;
the image scaling module is used for transversely scaling the PET image and the CT image into an image data pair with the same resolution;
the cross-modal image segmentation model comprises an image feature extraction module, an image feature fusion module and a cross-modal semantic enhancement module, wherein the cross-modal semantic enhancement module comprises a group of expansion encoders and a semantic fusion network based on an aggregation attention unit;
the training method of the cross-modal image segmentation model comprises the following steps:
acquiring a historical Positron Emission Tomography (PET) image and an electronic Computed Tomography (CT) image, transversely scaling the PET image and the CT image into image data pairs with the same resolution, wherein the image data in each group of image data pairs are provided with tumor segmentation label information, and dividing the image data pairs into a training set, a verification set and a test set;
performing feature extraction on single-channel image data in the training set through an image feature extraction module to obtain single-mode PET image features and CT image features;
respectively carrying out layered feature extraction on the PET image and the CT image in the training set through an extension encoder, obtaining a PET-CT multi-mode fusion feature block through fusion of an image feature fusion module, and outputting an attention feature map of a PET mode;
inputting the attention feature map of the PET modality and the CT image features into a semantic fusion network together through standard back propagation to perform semantic fusion for the first time to obtain the attention feature map of the fusion modality;
inputting the attention diagram of the fusion mode and the extracted PET-CT multi-mode fusion feature block into a semantic fusion network for upsampling and second semantic fusion, and recovering a feature structure to obtain a final attention fusion feature diagram;
calculating the similarity between the final attention fusion characteristic diagram and the attention characteristic diagram of the PET mode, and obtaining the weight vector of each segmentation characteristic corresponding to the current image characteristic through the operation of a sigmoid function;
judging each segmentation characteristic based on the weight vector to obtain a target segmentation image most similar to the current image characteristic;
calculating the joint loss of the image characteristics through an image characteristic extraction module, wherein the joint loss comprises cross entropy loss and dice value loss, the cross entropy loss is obtained by comparing a target segmentation image with tumor segmentation label information, the calculated joint loss is used for updating parameters of a cross-modal image segmentation model, and when the parameters of the model are converged, the optimal cross-modal image segmentation model and the parameters thereof are stored.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of medical image tumor segmentation involving a cross-modality attention mechanism as set forth in any one of claims 1-6.
CN202211163664.0A 2022-09-23 2022-09-23 Medical image tumor segmentation method related to cross-modal attention mechanism Pending CN115512110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211163664.0A CN115512110A (en) 2022-09-23 2022-09-23 Medical image tumor segmentation method related to cross-modal attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211163664.0A CN115512110A (en) 2022-09-23 2022-09-23 Medical image tumor segmentation method related to cross-modal attention mechanism

Publications (1)

Publication Number Publication Date
CN115512110A true CN115512110A (en) 2022-12-23

Family

ID=84506691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211163664.0A Pending CN115512110A (en) 2022-09-23 2022-09-23 Medical image tumor segmentation method related to cross-modal attention mechanism

Country Status (1)

Country Link
CN (1) CN115512110A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830017A (en) * 2023-02-09 2023-03-21 智慧眼科技股份有限公司 Tumor detection system, method, equipment and medium based on image-text multi-mode fusion
CN115984296A (en) * 2023-03-21 2023-04-18 译企科技(成都)有限公司 Medical image segmentation method and system applying multi-attention mechanism
CN117036830A (en) * 2023-10-07 2023-11-10 之江实验室 Tumor classification model training method and device, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830017A (en) * 2023-02-09 2023-03-21 智慧眼科技股份有限公司 Tumor detection system, method, equipment and medium based on image-text multi-mode fusion
CN115984296A (en) * 2023-03-21 2023-04-18 译企科技(成都)有限公司 Medical image segmentation method and system applying multi-attention mechanism
CN115984296B (en) * 2023-03-21 2023-06-13 译企科技(成都)有限公司 Medical image segmentation method and system applying multi-attention mechanism
CN117036830A (en) * 2023-10-07 2023-11-10 之江实验室 Tumor classification model training method and device, storage medium and electronic equipment
CN117036830B (en) * 2023-10-07 2024-01-09 之江实验室 Tumor classification model training method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Yun et al. Improvement of fully automated airway segmentation on volumetric computed tomographic images using a 2.5 dimensional convolutional neural net
CN109978850B (en) Multi-modal medical image semi-supervised deep learning segmentation system
WO2023071531A1 (en) Liver ct automatic segmentation method based on deep shape learning
CN113674253B (en) Automatic segmentation method for rectal cancer CT image based on U-transducer
CN115512110A (en) Medical image tumor segmentation method related to cross-modal attention mechanism
WO2021203795A1 (en) Pancreas ct automatic segmentation method based on saliency dense connection expansion convolutional network
Li et al. Automated measurement network for accurate segmentation and parameter modification in fetal head ultrasound images
CN110363802B (en) Prostate image registration system and method based on automatic segmentation and pelvis alignment
WO2022121100A1 (en) Darts network-based multi-modal medical image fusion method
CN113223005B (en) Thyroid nodule automatic segmentation and grading intelligent system
Ye et al. Medical image diagnosis of prostate tumor based on PSP-Net+ VGG16 deep learning network
Feng et al. Automatic localization and segmentation of focal cortical dysplasia in FLAIR‐negative patients using a convolutional neural network
Sun et al. COVID-19 CT image segmentation method based on swin transformer
CN112396605B (en) Network training method and device, image recognition method and electronic equipment
Wang et al. Multi-view fusion segmentation for brain glioma on CT images
Xu et al. Deep learning-based automated detection of arterial vessel wall and plaque on magnetic resonance vessel wall images
Kong et al. Data enhancement based on M2-Unet for liver segmentation in Computed Tomography
CN116797609A (en) Global-local feature association fusion lung CT image segmentation method
CN114511602B (en) Medical image registration method based on graph convolution Transformer
Huang et al. AU‐snake based deep learning network for right ventricle segmentation
CN113902738A (en) Heart MRI segmentation method and system
Huang et al. ADDNS: An asymmetric dual deep network with sharing mechanism for medical image fusion of CT and MR-T2
CN114764766A (en) B-mode ultrasonic image denoising method based on FC-VoVNet and WGAN
Che et al. Segmentation of bone metastases based on attention mechanism
Qiao et al. Fuzzy deep medical diagnostic system: gray relation framework and the guiding functionalities for the professional sports club social responsibility

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination