WO2024098318A1 - Medical image segmentation method - Google Patents

Medical image segmentation method Download PDF

Info

Publication number
WO2024098318A1
WO2024098318A1 PCT/CN2022/131075 CN2022131075W WO2024098318A1 WO 2024098318 A1 WO2024098318 A1 WO 2024098318A1 CN 2022131075 W CN2022131075 W CN 2022131075W WO 2024098318 A1 WO2024098318 A1 WO 2024098318A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
image
loss
transformer
medical image
Prior art date
Application number
PCT/CN2022/131075
Other languages
French (fr)
Chinese (zh)
Inventor
吴文霞
李志成
梁栋
赵源深
段静娴
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2022/131075 priority Critical patent/WO2024098318A1/en
Publication of WO2024098318A1 publication Critical patent/WO2024098318A1/en

Links

Images

Definitions

  • the invention relates to a medical image segmentation method.
  • Medical image segmentation is the basis of various medical image applications. In clinical auxiliary diagnosis, image-guided surgery and radiotherapy, medical image segmentation technology shows increasingly important clinical value. Traditional medical image segmentation is based on manual segmentation by experienced doctors, but this purely manual segmentation method is often time-consuming and laborious, and is greatly affected by the doctor's subjective influence. With the rapid development of deep learning technology, fully automatic image segmentation based on deep learning has developed rapidly. However, deep learning often relies on a large amount of high-quality labeled data, while medical image data is often scarce, and it is usually difficult to obtain high-quality labeled data.
  • the semi-supervised learning framework can directly learn from limited labeled data and a large amount of unlabeled data to obtain high-quality segmentation results.
  • Current semi-supervised medical image segmentation methods can be divided into three categories: adversarial learning methods, consistency regularization methods, and collaborative training methods.
  • Adversarial learning methods use discriminators to align the distribution of labeled and unlabeled data in the embedding space. The data needs to meet the distribution assumption, and many adversarial learning models are difficult to train.
  • the basic idea of the consistency regularization method is to regularize the model prediction, that is, a robust model should have similar outputs for similar inputs.
  • the difference between each method lies in how to inject noise and how to calculate consistency, but the consistency regularization method relies on a suitable data augmentation strategy, and the wrong pseudo-labels will continue to strengthen during training.
  • the collaborative training method is based on the assumption of low-density separation of data. The disadvantage of this method is that if the generated pseudo-labels are inaccurate, they will lead to self-reinforcement of classification errors.
  • semi-supervised segmentation is generally performed using adversarial learning methods, consistency regularization methods, and collaborative training methods.
  • the above methods all use the consistency of the output space and lack constraints in the feature space. Therefore, in many cases, the model cannot recognize the wrong features, causing this error to continue to accumulate during the training process.
  • the present invention provides a medical image segmentation method, which comprises the following steps: a. collecting nuclear magnetic resonance image data of tumor patients as a data set; b. performing data processing on the image data in the data set, wherein the data processing comprises: performing format conversion, resampling, registration and standardization on the image data in the data set; c. taking multimodal images that meet the requirements in the data set after the data processing as input of a model; d. establishing a multi-branch Transformer neural network as an encoder, and designing a separate Transformer for each modality to extract features; e. designing a modality fusion Transformer to fuse data of multiple modalities; f.
  • the patient's nuclear magnetic resonance image data is a multimodal nuclear magnetic resonance image; the nuclear magnetic resonance image data of each patient includes four commonly used modalities; the four commonly used modalities are T1, T2, T1C, and Flair modalities.
  • the step b specifically includes:
  • the DICOM format is converted into the NIFTI format; then the image is resampled; then the image is registered, and the points corresponding to the same spatial position at multiple time points are matched one by one.
  • the rigid registration mode is used for registration, and the mutual information is used as the image similarity measure; the image data in the data set is standardized using grayscale normalization and histogram equalization methods.
  • the step c specifically includes:
  • the multimodal images that meet the requirements in the dataset are used as the input of the model, and the dataset is divided into a training set and a test set.
  • the magnetic resonance image data with missing modalities, failed registration, or without tumors are excluded to avoid affecting the generalization performance of the model.
  • the dataset is divided into a training set and a test set in a ratio of 4:1.
  • the labeled data and unlabeled data are divided as needed and processed separately.
  • the step d specifically includes:
  • a separate Transformer is designed for each modality to extract features.
  • a multi-branch Transformer is proposed with the same number of branches as the input modalities in order to simultaneously extract independent features of multiple modalities.
  • the three-dimensional whole brain image is divided into K three-dimensional image blocks of fixed size, mapped into a one-dimensional vector of fixed length D, and position encoding is added to retain position information before being input into the visual Transformer model.
  • the step e specifically includes:
  • a fusion Transformer based on the cross-attention mechanism is designed separately: the fusion Transformer based on the cross-attention mechanism is divided into two parts, namely, a partial fusion Transformer and a global fusion Transformer; the partial fusion Transformer uses a single one-dimensional vector of each branch as a query to exchange information with other branches, and inputs the partial fusion result into the global fusion Transformer, and the multimodal information is more thoroughly fused together through the self-attention mechanism therein, thereby utilizing the global context information at the overall semantic structure level of the data.
  • the step f specifically includes:
  • the decoder gradually reshapes the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image.
  • the decoder takes the encoder output as five channel inputs.
  • the encoder outputs of each layer are fused layer by layer through convolution and deconvolution operations, and the image is restored to the specified size, and the sigmoid function is applied to obtain the final segmentation result.
  • the step g specifically includes:
  • Two enhancement methods are designed for a single unlabeled image.
  • a transformation is randomly selected for each sample in the batch from a predefined range: the first enhancement method is weak enhancement, which is the result of random flipping, moving and random scaling strategies with a probability of 50%; the other enhancement method is strong enhancement, which adds grayscale transformation on the basis of the weakly enhanced image.
  • the step h specifically includes:
  • the unlabeled data loss is divided into two parts, including the output space consistency loss and the contrastive learning loss.
  • the contrastive learning loss is calculated in that the encoder generates features based on the weakly enhanced image and the strongly enhanced image respectively.
  • the features at the same position are regarded as positive examples, and the features at different positions are regarded as negative examples.
  • the sampling method of negative examples adopts the gumbel sampling strategy, and selects k pixels with the smallest cosine similarity to form negative examples, or selects pixels with a longer distance as negative examples based on anatomical prior knowledge.
  • the InfoNCE loss is combined with the cosine similarity to obtain the pixel contrast loss.
  • the step i specifically includes:
  • the dice loss is calculated with the label as the supervised learning loss; for unlabeled data, the consistency loss is calculated between the results of weakly enhanced images and strongly enhanced images.
  • the step j specifically includes:
  • This application not only takes into account the consistency of the output space, but also solves to a certain extent the problem of error accumulation caused by the inability to filter out erroneous features that is common in current methods. It also uses Transformer as the main feature extraction network and utilizes the attention mechanism and global receptive field advantages in Transformer to locate tumors faster and more accurately, which improves the accuracy compared to the convolutional neural network method with only a local receptive field.
  • FIG1 is a flow chart of a medical image segmentation method according to the present invention.
  • FIG2 is a schematic diagram of a Transformer neural network provided by an embodiment of the present invention.
  • FIG3 is a schematic diagram of the Transformer neural network segmentation process provided by an embodiment of the present invention.
  • FIG. 1 it is a flowchart of a preferred embodiment of the medical image segmentation method of the present invention.
  • step S1 collecting the MRI image data of tumor patients as a data set. Specifically:
  • nuclear magnetic resonance image data of tumor patients are collected.
  • the nuclear magnetic resonance image data of the patients are multimodal nuclear magnetic resonance images.
  • the nuclear magnetic resonance image data of each patient includes four common modes; the four common modes are T1, T2, T1C, and Flair modes.
  • the patient images obtained in this step come from the patient image datasets jointly collected by the hospital, TCIA (The Cancer Imaging Archive) and TCGA (The Cancer Genome Atlas).
  • This embodiment does not limit the size of the data set.
  • Step S2 performing data processing on the image data in the data set, the data processing comprising: performing format conversion, resampling, registration and standardization on the image data in the data set. Specifically:
  • DICOM Digital Imaging and Communications in Medicine
  • DICOM refers to the medical digital image transmission protocol, which is a set of universal standard protocols for medical image processing, storage, printing and transmission.
  • the data obtained from the medical device is in DICOM format.
  • the DICOM format is converted into NIFTI (Neuro Imaging Informatics Technology Initiative) format; then the image is resampled to improve the image resolution; then the image is registered, and the points corresponding to the same position in space at multiple time points are matched one by one.
  • the rigid registration mode is used for registration, and mutual information is used as the image similarity metric.
  • the spatial resolution of the image is 1mm. Grayscale normalization, histogram equalization and other methods are used to standardize the image data in the data set.
  • Step S3 After the data is processed, the multimodal images that meet the requirements in the data set are used as the input of the model, and the data set is divided into a training set and a test set.
  • the training set is divided into labeled data and unlabeled data as needed, and processed separately.
  • the multimodal images that meet the requirements in the dataset are used as the input of the model, and the dataset is divided into a training set and a test set.
  • For the training set divide the labeled data and unlabeled data as needed and process them separately. In semi-supervised tasks, the proportion of labeled data seriously affects the segmentation results. Therefore, the number of labeled data in the training set is gradually reduced by 10%, and experiments are carried out separately.
  • Step S4 Establish a multi-branch Transformer neural network as an encoder, and design a separate Transformer for each modality to extract features. Specifically:
  • a multi-branch Transformer neural network is established.
  • the expected segmentation model is an encoder-decoder structure.
  • the encoder extracts appropriate features and the decoder restores the image to the input size.
  • a separate Transformer is designed for each modality to extract features.
  • a multi-branch Transformer is proposed, whose number of branches is equal to the number of input modalities in order to extract independent features of multiple modalities at the same time.
  • the three-dimensional whole brain image is divided into K fixed-size three-dimensional image blocks, mapped into a one-dimensional vector of fixed length D, and position encoding is added to retain position information, and input into the visual Transformer model.
  • Step S5 Design a modality fusion Transformer to fuse data from multiple modalities. Specifically:
  • this application separately designs a fusion Transformer based on a cross-attention mechanism.
  • the fusion Transformer based on the cross-attention mechanism is divided into two parts, namely a partial fusion Transformer and an overall fusion Transformer.
  • the partial fusion Transformer uses a single one-dimensional vector of each branch as a query to exchange information with other branches.
  • the partial fusion result is input into the overall fusion Transformer, and the multimodal information is more thoroughly integrated through the self-attention mechanism, thereby utilizing the global context information at the overall semantic structure level of the data.
  • Step S6 Establish a decoder to gradually reshape the encoder outputs of different scales into the input size to obtain a segmentation result that matches the original image. Specifically:
  • the decoder gradually reshapes the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image.
  • the decoder takes the encoder output as five channel inputs.
  • the encoder outputs of each layer are fused layer by layer through convolution and deconvolution operations, and the image is restored to the specified size, and the sigmoid function is applied to obtain the final segmentation result.
  • Step S7 construct weakly enhanced images and strongly enhanced images for unlabeled data. Specifically:
  • Two enhancement methods are designed for a single unlabeled image.
  • a transformation is randomly selected for each sample in the batch from a predefined range.
  • the first enhancement method is weak enhancement, which is the result of a random flip, move, and random scaling strategy with a probability of 50%.
  • the other enhancement method is strong enhancement, which adds grayscale transformation to the weakly enhanced image.
  • Step S8 select positive examples and negative examples according to the output of the encoder for different enhanced images, and calculate the contrast loss. Specifically:
  • the loss of unlabeled data is divided into two parts, including output space consistency loss and contrastive learning loss.
  • the calculation method of contrastive learning loss is that the encoder generates features based on weakly enhanced images and strongly enhanced images respectively. Features at the same position are regarded as positive examples, and features at different positions are regarded as negative examples.
  • the sampling method of negative examples adopts the gumbel sampling strategy, and selects k pixels with the smallest cosine similarity to form negative examples, or selects pixels with a longer distance as negative examples based on anatomical prior knowledge.
  • the goal of contrastive learning loss is to increase its similarity with positive pixels and reduce its similarity with k negative pixels. To achieve this goal, InfoNCE loss is combined with cosine similarity to obtain pixel contrast loss.
  • the positive example uses all labels as 1 to calculate the cross entropy loss
  • the negative example uses all labels as 0 to calculate the cross entropy loss.
  • the sum of the losses is the contrastive learning loss.
  • Step S9 calculate the dice loss for the label and the segmentation result. Calculate the consistency loss for the output of the two branches of the unlabeled data.
  • the total loss is the supervised learning loss, the contrastive learning loss and the consistency loss. Specifically:
  • Calculate the total loss For the segmentation results obtained with labeled data, calculate the dice loss with the label as the supervised learning loss. For unlabeled data, calculate the consistency loss between the results of the weakly enhanced image and the strongly enhanced image; the consistency loss is added to the contrastive learning loss as the semi-supervised loss. The total loss is the sum of the semi-supervised loss and the supervised loss.
  • Step S10 train the model, select the result with better effect as the final model and save it. Specifically:
  • data enhancement is performed using methods including but not limited to rotation, translation, scaling, and cropping to improve the generalization ability of the model;
  • the network output is a binary segmentation result
  • the results output by the network correspond to the original image, assisting doctors in diagnosing patients.
  • the more accurate model is selected and saved under the supervised data of each proportion.
  • This application uses the ability of contrastive learning to bring similar features closer and push heterogeneous features farther away to impose constraints on the feature space, further improving the effect of semi-supervised learning.
  • the model is constructed using visual Transformer instead of convolutional neural network, and the global receptive field brought by the attention mechanism is used to fuse multimodal information to better locate the tumor position, thereby improving the segmentation effect.

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A medical image segmentation method, comprising: collecting nuclear magnetic resonance image data of tumor patients as a data set (S1); performing data processing on the image data in the data set; after the data is processed, using multi-modal images meeting requirements in the data set as input of a model; designing a separate Transformer for each modality to extract a feature; designing a modality fusion Transformer to fuse data of a plurality of modalities (S5); gradually reshaping encoder output of different scales into the size of the input so as to obtain a segmentation result matching an original image; for unlabeled data in the data set, constructing a weakly enhanced image and a strongly enhanced image; selecting positives and negatives for the output of the differently enhanced images according to the encoders, and calculating contrastive loss (S8); calculating dice loss of labels and the segmentation result; and training the model to obtain a final model and saving the final model. The medical image segmentation method can better localize a tumor, thereby improving the segmentation effect.

Description

医学图像分割方法Medical Image Segmentation Methods 技术领域Technical Field
本发明涉及一种医学图像分割方法。The invention relates to a medical image segmentation method.
背景技术Background technique
医学图像分割是各种医学图像应用的基础,在临床辅助诊断、图像引导的外科手术和放射治疗中,医学图像分割技术显示出越来越重要的临床价值。传统的医学图像分割都是基于有经验的医生手动分割,而这种纯手动分割的方法往往费时费力,且受医生的主观影响比较大。随着深度学习技术的快速发展,基于深度学习的全自动影像分割得到迅猛的发展。然而,深度学习往往依赖于海量的高质量有标签数据,而医学影像数据往往比较稀缺,获取高质量的有标签数据通常比较困难。Medical image segmentation is the basis of various medical image applications. In clinical auxiliary diagnosis, image-guided surgery and radiotherapy, medical image segmentation technology shows increasingly important clinical value. Traditional medical image segmentation is based on manual segmentation by experienced doctors, but this purely manual segmentation method is often time-consuming and laborious, and is greatly affected by the doctor's subjective influence. With the rapid development of deep learning technology, fully automatic image segmentation based on deep learning has developed rapidly. However, deep learning often relies on a large amount of high-quality labeled data, while medical image data is often scarce, and it is usually difficult to obtain high-quality labeled data.
半监督学习框架能够直接从有限的带标签数据和大量的未带标签数据中学习,得到高质量的分割结果。当前半监督医学图像分割方法可以分为三类:对抗性学习方法、一致性正则化方法和协同训练方法。对抗性学习方法利用鉴别器对齐嵌入空间中已标记和未标记数据的分布,需要数据满足分布假设,且很多对抗性学习模型难以训练。一致性正则化方法的基本思想是将模型预测正则化,即一个鲁棒的对相似的输入应该有相似的输出,每个方法的区别在于如何注入噪声以及如何计算一致性,但一致性正则化方法依赖合适的数据增强策略,且错误的伪标签在训练中会不断强化。协同训练方法基于数据低密度分离假设,这种方法弊端在于生成的伪标签如果不准确将导致分类错误自我强化。The semi-supervised learning framework can directly learn from limited labeled data and a large amount of unlabeled data to obtain high-quality segmentation results. Current semi-supervised medical image segmentation methods can be divided into three categories: adversarial learning methods, consistency regularization methods, and collaborative training methods. Adversarial learning methods use discriminators to align the distribution of labeled and unlabeled data in the embedding space. The data needs to meet the distribution assumption, and many adversarial learning models are difficult to train. The basic idea of the consistency regularization method is to regularize the model prediction, that is, a robust model should have similar outputs for similar inputs. The difference between each method lies in how to inject noise and how to calculate consistency, but the consistency regularization method relies on a suitable data augmentation strategy, and the wrong pseudo-labels will continue to strengthen during training. The collaborative training method is based on the assumption of low-density separation of data. The disadvantage of this method is that if the generated pseudo-labels are inaccurate, they will lead to self-reinforcement of classification errors.
为充分利用未标记数据,一般利用对抗性学习方法、一致性正则化方法和协同训练方法等进行半监督分割。综合来看,上述方法都是利用了输出空间的一致性,缺少特征空间内的约束,因而很多情况下模型无法识别错误的特征,导致这个错误在训练过程中不断叠加。In order to make full use of unlabeled data, semi-supervised segmentation is generally performed using adversarial learning methods, consistency regularization methods, and collaborative training methods. In general, the above methods all use the consistency of the output space and lack constraints in the feature space. Therefore, in many cases, the model cannot recognize the wrong features, causing this error to continue to accumulate during the training process.
发明内容Summary of the invention
有鉴于此,有必要提供一种医学图像分割方法。In view of this, it is necessary to provide a medical image segmentation method.
本发明提供一种医学图像分割方法,该方法包括如下步骤:a.收集肿瘤患者的核磁共振图像数据作为数据集;b.对所述数据集中的图像数据进行数据处理,所述数据处理包括:对所述数据集中的图像数据进行格式转换、重采样、配准及标准化;c.将数据处理后,数据集中符合要求的多模态图像作为模型的输入;d.建立多分支Transformer神经网络作为编码器,对每一个模态设计单独的Transformer以提取特征;e.设计模态融合Transformer对多个模态的数据进行融合;f.建立解码器,逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果;g.对于数据集中的无标签数据,构建弱增强图像与强增强图像;h.根据编码器对不同增强的图像的输出选择正例与负例,计算对比损失;i.对标签与分割结果计算dice损失;j.进行模型的训练,选择效果较好的结果作为最终的模型并保存。The present invention provides a medical image segmentation method, which comprises the following steps: a. collecting nuclear magnetic resonance image data of tumor patients as a data set; b. performing data processing on the image data in the data set, wherein the data processing comprises: performing format conversion, resampling, registration and standardization on the image data in the data set; c. taking multimodal images that meet the requirements in the data set after the data processing as input of a model; d. establishing a multi-branch Transformer neural network as an encoder, and designing a separate Transformer for each modality to extract features; e. designing a modality fusion Transformer to fuse data of multiple modalities; f. establishing a decoder, and gradually reshaping encoder outputs of different scales into input sizes to obtain a segmentation result matching the original image; g. constructing a weakly enhanced image and a strongly enhanced image for unlabeled data in the data set; h. selecting positive examples and negative examples according to the output of the encoder for different enhanced images, and calculating the contrast loss; i. calculating the dice loss for labels and segmentation results; j. training the model, selecting a result with better effect as the final model and saving it.
具体地,所述患者的核磁共振图像数据为多模态核磁共振图像;每一例患者的核磁共振图像数据包含四个常用模态;所述四个常用模态为T1、T2、T1C、Flair模态。Specifically, the patient's nuclear magnetic resonance image data is a multimodal nuclear magnetic resonance image; the nuclear magnetic resonance image data of each patient includes four commonly used modalities; the four commonly used modalities are T1, T2, T1C, and Flair modalities.
具体地,所述的步骤b具体包括:Specifically, the step b specifically includes:
首先将DICOM格式转化为NIFTI格式;接着对图像进行重采样;然后对图像进行配准,将多个时间点中对应于空间同一位置的点一一对应起来,配准时使用刚性配准模式,使用互信息作为图像相似度度量;使用灰度归一化、直方图均衡化方法对数据集中的图像数据进行标准化处理。First, the DICOM format is converted into the NIFTI format; then the image is resampled; then the image is registered, and the points corresponding to the same spatial position at multiple time points are matched one by one. The rigid registration mode is used for registration, and the mutual information is used as the image similarity measure; the image data in the data set is standardized using grayscale normalization and histogram equalization methods.
具体地,所述的步骤c具体包括:Specifically, the step c specifically includes:
将数据集中符合要求的多模态图像作为模型的输入,将数据集划分训练集和测试集;首先排除模态缺失、配准失败或者不含肿瘤的磁共振图像数据,避免影响模型的泛化性能;然后以4:1的比例划分为训练集和测试集;针对训 练集根据需要划分有标签数据和无标签数据,分别进行处理。The multimodal images that meet the requirements in the dataset are used as the input of the model, and the dataset is divided into a training set and a test set. First, the magnetic resonance image data with missing modalities, failed registration, or without tumors are excluded to avoid affecting the generalization performance of the model. Then, the dataset is divided into a training set and a test set in a ratio of 4:1. For the training set, the labeled data and unlabeled data are divided as needed and processed separately.
具体地,所述的步骤d具体包括:Specifically, the step d specifically includes:
对每一个模态设计单独的Transformer提取特征;对于具有四个模态的输入,为了同时提取多个模态的独立特征,提出一种多分支Transformer,其分支个数与输入的模态数相等;将三维整脑图像分成K个固定大小的三维图像块,映射为固定长度D的一维向量,并添加位置编码以保留位置信息,输入视觉Transformer模型。A separate Transformer is designed for each modality to extract features. For input with four modalities, a multi-branch Transformer is proposed with the same number of branches as the input modalities in order to simultaneously extract independent features of multiple modalities. The three-dimensional whole brain image is divided into K three-dimensional image blocks of fixed size, mapped into a one-dimensional vector of fixed length D, and position encoding is added to retain position information before being input into the visual Transformer model.
具体地,所述的步骤e具体包括:Specifically, the step e specifically includes:
单独设计一个基于交叉注意机制的融合Transformer:所述基于交叉注意机制的融合Transformer分为两个部分,即部分融合Transformer与整体融合Transformer;所述部分融合Transformer使用每个分支的单个一维向量作为查询来与其他分支交换信息,将部分融合后的结果输入整体融合Transformer,通过其中的自注意力机制将多模态的信息更加彻底的融合到一起,从而在数据的整体语义结构层次利用了全局上下文信息。A fusion Transformer based on the cross-attention mechanism is designed separately: the fusion Transformer based on the cross-attention mechanism is divided into two parts, namely, a partial fusion Transformer and a global fusion Transformer; the partial fusion Transformer uses a single one-dimensional vector of each branch as a query to exchange information with other branches, and inputs the partial fusion result into the global fusion Transformer, and the multimodal information is more thoroughly fused together through the self-attention mechanism therein, thereby utilizing the global context information at the overall semantic structure level of the data.
具体地,所述的步骤f具体包括:Specifically, the step f specifically includes:
解码器逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果。解码器将编码器的输出作为五个通道输入。通过卷积与反卷积操作逐层融合各层编码器输出,并将图像恢复至指定大小,应用sigmoid函数获得最终分割结果。The decoder gradually reshapes the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image. The decoder takes the encoder output as five channel inputs. The encoder outputs of each layer are fused layer by layer through convolution and deconvolution operations, and the image is restored to the specified size, and the sigmoid function is applied to obtain the final segmentation result.
具体地,所述的步骤g具体包括:Specifically, the step g specifically includes:
对单个无标注图像设计两种增强方式,每个训练步骤中从一个预先定义的范围内随机为batch中每个样本选择变换:第一个增强方法为弱增强,弱增强是以50%的概率随机翻转、移动和随机缩放策略的结果;另一个增强方法为强增强,强增强是在弱增强的图像的基础上添加灰度变换。Two enhancement methods are designed for a single unlabeled image. In each training step, a transformation is randomly selected for each sample in the batch from a predefined range: the first enhancement method is weak enhancement, which is the result of random flipping, moving and random scaling strategies with a probability of 50%; the other enhancement method is strong enhancement, which adds grayscale transformation on the basis of the weakly enhanced image.
具体地,所述的步骤h具体包括:Specifically, the step h specifically includes:
无标签数据损失分为两个部分,包括输出空间一致性损失和对比学习损失; 对比学习损失的计算方法为编码器分别基于弱增强图像和强增强图像产生特征,同一个位置的特征互相视作正例,不同位置的特征视作负例,负例的采样方法采取gumbel采样策略,选取余弦相似度最小的k个像素组成负例,或者根据解剖学先验知识,选择距离较远的像素作为负例;将InfoNCE损失与余弦相似度相结合,得到像素对比度损失。The unlabeled data loss is divided into two parts, including the output space consistency loss and the contrastive learning loss. The contrastive learning loss is calculated in that the encoder generates features based on the weakly enhanced image and the strongly enhanced image respectively. The features at the same position are regarded as positive examples, and the features at different positions are regarded as negative examples. The sampling method of negative examples adopts the gumbel sampling strategy, and selects k pixels with the smallest cosine similarity to form negative examples, or selects pixels with a longer distance as negative examples based on anatomical prior knowledge. The InfoNCE loss is combined with the cosine similarity to obtain the pixel contrast loss.
具体地,所述的步骤i具体包括:Specifically, the step i specifically includes:
对于有标签数据获取的分割结果,与标签计算dice损失作为监督学习损失;对于无标签数据,弱增强图像和强增强图像的结果之间计算一致性损失。For the segmentation results obtained with labeled data, the dice loss is calculated with the label as the supervised learning loss; for unlabeled data, the consistency loss is calculated between the results of weakly enhanced images and strongly enhanced images.
具体地,所述的步骤j具体包括:Specifically, the step j specifically includes:
使用随机梯度下降作为优化器进行训练,使用权重衰减以防止过拟合;模型训练完成之后选择在各个比例的有监督数据情况下较为准确的模型保存。Use stochastic gradient descent as the optimizer for training, and use weight decay to prevent overfitting; after the model training is completed, select the more accurate model under supervised data of various proportions to save.
本申请不仅考虑了输出空间一致性,同时一定程度上解决了当前方法中普遍存在的无法筛选错误特征导致错误累积的问题,以及以Transformer为主要特征提取网络,利用Transformer中的注意力机制和全局感受野优势更快更准定位肿瘤,相比仅有局部感受野的卷积神经网络方法提高了准确率。This application not only takes into account the consistency of the output space, but also solves to a certain extent the problem of error accumulation caused by the inability to filter out erroneous features that is common in current methods. It also uses Transformer as the main feature extraction network and utilizes the attention mechanism and global receptive field advantages in Transformer to locate tumors faster and more accurately, which improves the accuracy compared to the convolutional neural network method with only a local receptive field.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明医学图像分割方法的流程图;FIG1 is a flow chart of a medical image segmentation method according to the present invention;
图2为本发明实施例提供的Transformer神经网络示意图;FIG2 is a schematic diagram of a Transformer neural network provided by an embodiment of the present invention;
图3为本发明实施例提供的Transformer神经网络分割流程示意图。FIG3 is a schematic diagram of the Transformer neural network segmentation process provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图及具体实施例对本发明作进一步详细的说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
参阅图1所示,是本发明医学图像分割方法较佳实施例的作业流程图。Referring to FIG. 1 , it is a flowchart of a preferred embodiment of the medical image segmentation method of the present invention.
请一并参阅图2-3,步骤S1,收集肿瘤患者的核磁共振图像数据作为数 据集。具体而言:Please refer to Figure 2-3, step S1, collecting the MRI image data of tumor patients as a data set. Specifically:
在本实施例中,收集肿瘤患者的核磁共振图像数据。所述患者的核磁共振图像数据为多模态核磁共振图像。每一例患者的核磁共振图像数据包含四个常用模态;所述四个常用模态为T1、T2、T1C、Flair模态。In this embodiment, nuclear magnetic resonance image data of tumor patients are collected. The nuclear magnetic resonance image data of the patients are multimodal nuclear magnetic resonance images. The nuclear magnetic resonance image data of each patient includes four common modes; the four common modes are T1, T2, T1C, and Flair modes.
本步骤中获得的患者的图像来自医院、TCIA(The Cancer Imaging Archive,癌症影像存档)以及TCGA(The Cancer Genome Atlas,癌症基因图谱)中共同收录的患者图像数据集。The patient images obtained in this step come from the patient image datasets jointly collected by the hospital, TCIA (The Cancer Imaging Archive) and TCGA (The Cancer Genome Atlas).
本实施例对数据集的大小不做限定,数据集越大,泛化能力越强。This embodiment does not limit the size of the data set. The larger the data set, the stronger the generalization ability.
步骤S2,对所述数据集中的图像数据进行数据处理,所述数据处理包括:对所述数据集中的图像数据进行格式转换、重采样、配准及标准化。具体而言:Step S2, performing data processing on the image data in the data set, the data processing comprising: performing format conversion, resampling, registration and standardization on the image data in the data set. Specifically:
对数据集中的图像数据进行格式转换、重采样、配准和标准化。DICOM(Digital Imaging and Communications in Medicine)是指医疗数字影像传输协定,是用于医学影像处理、储存、打印、传输的一组通用的标准协定。Perform format conversion, resampling, registration and standardization on the image data in the data set. DICOM (Digital Imaging and Communications in Medicine) refers to the medical digital image transmission protocol, which is a set of universal standard protocols for medical image processing, storage, printing and transmission.
医疗器械上获得的数据是DICOM格式,首先将DICOM格式转化为NIFTI(Neuro Imaging Informatics Technology Initiative)格式;接着对图像进行重采样,以提高图像的分辨率;然后对图像进行配准,将多个时间点中对应于空间同一位置的点一一对应起来,配准时使用刚性配准模式,使用互信息作为图像相似度度量,配准和重采样完成后图像的空间分辨率均为1mm。使用灰度归一化、直方图均衡化等方法对数据集中的图像数据进行标准化处理。The data obtained from the medical device is in DICOM format. First, the DICOM format is converted into NIFTI (Neuro Imaging Informatics Technology Initiative) format; then the image is resampled to improve the image resolution; then the image is registered, and the points corresponding to the same position in space at multiple time points are matched one by one. The rigid registration mode is used for registration, and mutual information is used as the image similarity metric. After registration and resampling, the spatial resolution of the image is 1mm. Grayscale normalization, histogram equalization and other methods are used to standardize the image data in the data set.
步骤S3,将数据处理后,数据集中符合要求的多模态图像作为模型的输入,将数据集划分训练集和测试集。针对训练集根据需要划分有标签数据和无标签数据,分别进行处理。具体而言:Step S3: After the data is processed, the multimodal images that meet the requirements in the data set are used as the input of the model, and the data set is divided into a training set and a test set. The training set is divided into labeled data and unlabeled data as needed, and processed separately. Specifically:
将数据集中符合要求的多模态图像作为模型的输入,将数据集划分训练集和测试集。首先排除模态缺失、配准失败或者不含肿瘤的磁共振图像数据,避免影响模型的泛化性能。然后以4:1的比例划分为训练集和测试集。针对训 练集根据需要划分有标签数据和无标签数据,分别进行处理。在半监督任务中,有标签数据的比例严重影响分割结果,因此以10%的比例逐渐递减训练集中有标签数据的数量,分别展开实验。The multimodal images that meet the requirements in the dataset are used as the input of the model, and the dataset is divided into a training set and a test set. First, exclude the MRI data with missing modalities, failed registration, or without tumors to avoid affecting the generalization performance of the model. Then divide it into a training set and a test set at a ratio of 4:1. For the training set, divide the labeled data and unlabeled data as needed and process them separately. In semi-supervised tasks, the proportion of labeled data seriously affects the segmentation results. Therefore, the number of labeled data in the training set is gradually reduced by 10%, and experiments are carried out separately.
步骤S4,建立多分支Transformer神经网络作为编码器,对每一个模态设计单独的Transformer以提取特征。具体而言:Step S4: Establish a multi-branch Transformer neural network as an encoder, and design a separate Transformer for each modality to extract features. Specifically:
建立多分支Transformer神经网络,预期的分割模型整体为编码-解码器结构,通过编码器提取适当的特征,通过解码器将图像还原至输入大小。对每一个模态设计单独的Transformer提取特征。对于具有四个模态的输入,为了同时提取多个模态的独立特征,提出一种多分支Transformer,其分支个数与输入的模态数相等。将三维整脑图像分成K个固定大小的三维图像块,映射为固定长度D的一维向量,并添加位置编码以保留位置信息,输入视觉Transformer模型。A multi-branch Transformer neural network is established. The expected segmentation model is an encoder-decoder structure. The encoder extracts appropriate features and the decoder restores the image to the input size. A separate Transformer is designed for each modality to extract features. For input with four modalities, a multi-branch Transformer is proposed, whose number of branches is equal to the number of input modalities in order to extract independent features of multiple modalities at the same time. The three-dimensional whole brain image is divided into K fixed-size three-dimensional image blocks, mapped into a one-dimensional vector of fixed length D, and position encoding is added to retain position information, and input into the visual Transformer model.
步骤S5,设计模态融合Transformer对多个模态的数据进行融合。具体而言:Step S5: Design a modality fusion Transformer to fuse data from multiple modalities. Specifically:
为了从多个角度充分融合各个模态的特征,以产生更强的图像特征,本申请单独设计了一个基于交叉注意机制的融合Transformer。所述基于交叉注意机制的融合Transformer分为两个部分,即部分融合Transformer与整体融合Transformer。所述部分融合Transformer使用每个分支的单个一维向量作为查询来与其他分支交换信息。将部分融合后的结果输入整体融合Transformer,通过其中的自注意力机制将多模态的信息更加彻底的融合到一起,从而在数据的整体语义结构层次利用了全局上下文信息。In order to fully integrate the features of each modality from multiple angles to produce stronger image features, this application separately designs a fusion Transformer based on a cross-attention mechanism. The fusion Transformer based on the cross-attention mechanism is divided into two parts, namely a partial fusion Transformer and an overall fusion Transformer. The partial fusion Transformer uses a single one-dimensional vector of each branch as a query to exchange information with other branches. The partial fusion result is input into the overall fusion Transformer, and the multimodal information is more thoroughly integrated through the self-attention mechanism, thereby utilizing the global context information at the overall semantic structure level of the data.
步骤S6,建立解码器,逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果。具体而言:Step S6: Establish a decoder to gradually reshape the encoder outputs of different scales into the input size to obtain a segmentation result that matches the original image. Specifically:
解码器逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果。解码器将编码器的输出作为五个通道输入。通过卷积与反卷积操作逐层融合各层编码器输出,并将图像恢复至指定大小,应用sigmoid函数获得最终分割结果。The decoder gradually reshapes the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image. The decoder takes the encoder output as five channel inputs. The encoder outputs of each layer are fused layer by layer through convolution and deconvolution operations, and the image is restored to the specified size, and the sigmoid function is applied to obtain the final segmentation result.
步骤S7,对于无标签数据,构建弱增强图像与强增强图像。具体而言:Step S7: construct weakly enhanced images and strongly enhanced images for unlabeled data. Specifically:
对于对单个无标注图像设计两种增强方式,每个训练步骤中从一个预先定义的范围内随机为batch中每个样本选择变换。第一个增强方法为弱增强,弱增强是以50%的概率随机翻转、移动和随机缩放策略的结果。另一个增强方法为强增强,强增强是在弱增强的图像的基础上添加灰度变换。Two enhancement methods are designed for a single unlabeled image. In each training step, a transformation is randomly selected for each sample in the batch from a predefined range. The first enhancement method is weak enhancement, which is the result of a random flip, move, and random scaling strategy with a probability of 50%. The other enhancement method is strong enhancement, which adds grayscale transformation to the weakly enhanced image.
步骤S8,根据编码器对不同增强的图像的输出选择正例与负例,计算对比损失。具体而言:Step S8, select positive examples and negative examples according to the output of the encoder for different enhanced images, and calculate the contrast loss. Specifically:
无标签数据损失分为两个部分,包括输出空间一致性损失和对比学习损失。对比学习损失的计算方法为编码器分别基于弱增强图像和强增强图像产生特征,同一个位置的特征互相视作正例,不同位置的特征视作负例,负例的采样方法采取gumbel采样策略,选取余弦相似度最小的k个像素组成负例,或者根据解剖学先验知识,选择距离较远的像素作为负例。对比学习损失的目标是增加其与正像素的相似度,降低其与k个负像素的相似度。为实现这一目标,将InfoNCE损失与余弦相似度相结合,得到像素对比度损失。具体表现为正例使用所有标签为1计算交叉熵损失,负例使用所有标签为0计算交叉熵损失,所得损失之和为对比学习损失。The loss of unlabeled data is divided into two parts, including output space consistency loss and contrastive learning loss. The calculation method of contrastive learning loss is that the encoder generates features based on weakly enhanced images and strongly enhanced images respectively. Features at the same position are regarded as positive examples, and features at different positions are regarded as negative examples. The sampling method of negative examples adopts the gumbel sampling strategy, and selects k pixels with the smallest cosine similarity to form negative examples, or selects pixels with a longer distance as negative examples based on anatomical prior knowledge. The goal of contrastive learning loss is to increase its similarity with positive pixels and reduce its similarity with k negative pixels. To achieve this goal, InfoNCE loss is combined with cosine similarity to obtain pixel contrast loss. Specifically, the positive example uses all labels as 1 to calculate the cross entropy loss, and the negative example uses all labels as 0 to calculate the cross entropy loss. The sum of the losses is the contrastive learning loss.
步骤S9,对标签与分割结果计算dice损失。对无标签数据两个分支的输出计算一致性损失。总损失为监督学习损失、对比学习损失和一致性损失。具体而言:Step S9, calculate the dice loss for the label and the segmentation result. Calculate the consistency loss for the output of the two branches of the unlabeled data. The total loss is the supervised learning loss, the contrastive learning loss and the consistency loss. Specifically:
计算总损失。对于有标签数据获取的分割结果,与标签计算dice损失作为监督学习损失。对于无标签数据,弱增强图像和强增强图像的结果之间计算一致性损失;所述一致性损失与所述对比学习损失相加作为半监督损失。总损失为半监督损失与监督损失之和。Calculate the total loss. For the segmentation results obtained with labeled data, calculate the dice loss with the label as the supervised learning loss. For unlabeled data, calculate the consistency loss between the results of the weakly enhanced image and the strongly enhanced image; the consistency loss is added to the contrastive learning loss as the semi-supervised loss. The total loss is the sum of the semi-supervised loss and the supervised loss.
步骤S10,进行模型的训练,选择效果较好的结果作为最终的模型并保存。具体而言:Step S10: train the model, select the result with better effect as the final model and save it. Specifically:
训练时使用包括但不限于旋转、平移、缩放、裁剪等方法进行数据增强提高模型的泛化能力;During training, data enhancement is performed using methods including but not limited to rotation, translation, scaling, and cropping to improve the generalization ability of the model;
使用随机梯度下降作为优化器进行训练,使用权重衰减以防止过拟合。对于输入的图像数据,网络输出的结果是二值化的分割结果;Stochastic gradient descent is used as the optimizer for training, and weight decay is used to prevent overfitting. For the input image data, the network output is a binary segmentation result;
通过网络输出的结果对应到原图中,辅助医生对病人进行诊断。The results output by the network correspond to the original image, assisting doctors in diagnosing patients.
模型训练完成之后选择在各个比例的有监督数据情况下较为准确的模型保存。After the model training is completed, the more accurate model is selected and saved under the supervised data of each proportion.
本申请利用对比学习拉近同类特征和推远异类特征的能力给特征空间施加约束,进一步提高半监督学习效果。利用视觉Transformer代替卷积神经网络构建模型,利用其中的注意力机制带来的全局感受野融合多模态信息,更好地定位肿瘤位置,从而提高分割效果。This application uses the ability of contrastive learning to bring similar features closer and push heterogeneous features farther away to impose constraints on the feature space, further improving the effect of semi-supervised learning. The model is constructed using visual Transformer instead of convolutional neural network, and the global receptive field brought by the attention mechanism is used to fuse multimodal information to better locate the tumor position, thereby improving the segmentation effect.
以上对发明的具体实施方式进行了详细说明,但其只作为范例,本发明并不限制于以上描述的具体实施方式。对于本领域的技术人员而言,任何对该发明进行的等同修改或替代也都在本发明的范畴之中,因此,在不脱离本发明的精神和原则范围下所作的均等变换和修改、改进等,都应涵盖在本发明的范围内。The specific implementation methods of the invention are described in detail above, but they are only examples, and the invention is not limited to the specific implementation methods described above. For those skilled in the art, any equivalent modification or substitution of the invention is also within the scope of the invention, and therefore, the equalization, modification, improvement, etc. made without departing from the spirit and principle of the invention should be included in the scope of the invention.

Claims (11)

  1. 一种医学图像分割方法,其特征在于,该方法包括如下步骤:A medical image segmentation method, characterized in that the method comprises the following steps:
    a.收集肿瘤患者的核磁共振图像数据作为数据集;a. Collect magnetic resonance imaging data of tumor patients as a data set;
    b.对所述数据集中的图像数据进行数据处理,所述数据处理包括:对所述数据集中的图像数据进行格式转换、重采样、配准及标准化;b. performing data processing on the image data in the data set, the data processing comprising: format conversion, resampling, registration and standardization on the image data in the data set;
    c.将数据处理后,数据集中符合要求的多模态图像作为模型的输入;c. After data processing, the multimodal images in the data set that meet the requirements are used as the input of the model;
    d.建立多分支Transformer神经网络作为编码器,对每一个模态设计单独的Transformer以提取特征;d. Establish a multi-branch Transformer neural network as an encoder and design a separate Transformer for each modality to extract features;
    e.设计模态融合Transformer对多个模态的数据进行融合;e. Design a modality fusion Transformer to fuse data from multiple modalities;
    f.建立解码器,逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果;f. Build a decoder to gradually reshape the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image;
    g.对于数据集中的无标签数据,构建弱增强图像与强增强图像;g. For the unlabeled data in the dataset, construct weakly enhanced images and strongly enhanced images;
    h.根据编码器对不同增强的图像的输出选择正例与负例,计算对比损失;h. Select positive and negative examples based on the encoder's output of different enhanced images and calculate the contrast loss;
    i.对标签与分割结果计算dice损失;i. Calculate the dice loss for the labels and segmentation results;
    j.进行模型的训练,选择效果较好的结果作为最终的模型并保存。j. Train the model, select the result with better effect as the final model and save it.
  2. 如权利要求1所述的医学图像分割方法,其特征在于,所述患者的核磁共振图像数据为多模态核磁共振图像;每一例患者的核磁共振图像数据包含四个常用模态;所述四个常用模态为T1、T2、T1C、Flair模态。The medical image segmentation method as described in claim 1 is characterized in that the patient's nuclear magnetic resonance image data is a multimodal nuclear magnetic resonance image; the nuclear magnetic resonance image data of each patient includes four commonly used modalities; the four commonly used modalities are T1, T2, T1C, and Flair modalities.
  3. 如权利要求2所述的医学图像分割方法,其特征在于,所述的步骤b具体包括:The medical image segmentation method according to claim 2, wherein the step b specifically comprises:
    首先将DICOM格式转化为NIFTI格式;接着对图像进行重采样;然后对图像进行配准,将多个时间点中对应于空间同一位置的点一一对应起来,配准时使用刚性配准模式,使用互信息作为图像相似度度量;使用灰度归一化、直方图均衡化方法对数据集中的图像数据进行标准化处理。First, the DICOM format is converted into the NIFTI format; then the image is resampled; then the image is registered, and the points corresponding to the same spatial position at multiple time points are matched one by one. The rigid registration mode is used for registration, and the mutual information is used as the image similarity measure; the image data in the data set is standardized using grayscale normalization and histogram equalization methods.
  4. 如权利要求3所述的医学图像分割方法,其特征在于,所述的步骤c具体 包括:The medical image segmentation method according to claim 3, characterized in that said step c specifically comprises:
    将数据集中符合要求的多模态图像作为模型的输入,将数据集划分训练集和测试集;首先排除模态缺失、配准失败或者不含肿瘤的磁共振图像数据,避免影响模型的泛化性能;然后以4:1的比例划分为训练集和测试集;针对训练集根据需要划分有标签数据和无标签数据,分别进行处理。The multimodal images that meet the requirements in the dataset are used as the input of the model, and the dataset is divided into a training set and a test set. First, the magnetic resonance image data with missing modalities, failed registration, or without tumors are excluded to avoid affecting the generalization performance of the model. Then, the dataset is divided into a training set and a test set in a ratio of 4:1. For the training set, labeled data and unlabeled data are divided as needed and processed separately.
  5. 如权利要求4所述的医学图像分割方法,其特征在于,所述的步骤d具体包括:The medical image segmentation method according to claim 4, characterized in that the step d specifically comprises:
    对每一个模态设计单独的Transformer提取特征;对于具有四个模态的输入,为了同时提取多个模态的独立特征,提出一种多分支Transformer,其分支个数与输入的模态数相等;将三维整脑图像分成K个固定大小的三维图像块,映射为固定长度D的一维向量,并添加位置编码以保留位置信息,输入视觉Transformer模型。A separate Transformer is designed for each modality to extract features. For input with four modalities, a multi-branch Transformer is proposed with the same number of branches as the input modalities in order to simultaneously extract independent features of multiple modalities. The three-dimensional whole brain image is divided into K three-dimensional image blocks of fixed size, mapped into a one-dimensional vector of fixed length D, and position encoding is added to retain position information before being input into the visual Transformer model.
  6. 如权利要求5所述的医学图像分割方法,其特征在于,所述的步骤e具体包括:The medical image segmentation method according to claim 5, characterized in that the step e specifically comprises:
    单独设计一个基于交叉注意机制的融合Transformer:所述基于交叉注意机制的融合Transformer分为两个部分,即部分融合Transformer与整体融合Transformer;所述部分融合Transformer使用每个分支的单个一维向量作为查询来与其他分支交换信息,将部分融合后的结果输入整体融合Transformer,通过其中的自注意力机制将多模态的信息更加彻底的融合到一起,从而在数据的整体语义结构层次利用了全局上下文信息。A fusion Transformer based on the cross-attention mechanism is designed separately: the fusion Transformer based on the cross-attention mechanism is divided into two parts, namely, a partial fusion Transformer and a global fusion Transformer; the partial fusion Transformer uses a single one-dimensional vector of each branch as a query to exchange information with other branches, and inputs the partial fusion result into the global fusion Transformer, and the multimodal information is more thoroughly fused together through the self-attention mechanism therein, thereby utilizing the global context information at the overall semantic structure level of the data.
  7. 如权利要求6所述的医学图像分割方法,其特征在于,所述的步骤f具体包括:The medical image segmentation method according to claim 6, wherein the step f specifically comprises:
    解码器逐步将不同尺度的编码器输出重塑为输入大小,以获得与原图匹配的分割结果。解码器将编码器的输出作为五个通道输入。通过卷积与反卷积操作逐层融合各层编码器输出,并将图像恢复至指定大小,应用sigmoid函数获得最终分割结果。The decoder gradually reshapes the encoder outputs of different scales to the input size to obtain a segmentation result that matches the original image. The decoder takes the encoder output as five channel inputs. The encoder outputs of each layer are fused layer by layer through convolution and deconvolution operations, and the image is restored to the specified size, and the sigmoid function is applied to obtain the final segmentation result.
  8. 如权利要求7所述的医学图像分割方法,其特征在于,所述的步骤g具体 包括:The medical image segmentation method according to claim 7, characterized in that the step g specifically comprises:
    对单个无标注图像设计两种增强方式,每个训练步骤中从一个预先定义的范围内随机为batch中每个样本选择变换:第一个增强方法为弱增强,弱增强是以50%的概率随机翻转、移动和随机缩放策略的结果;另一个增强方法为强增强,强增强是在弱增强的图像的基础上添加灰度变换。Two enhancement methods are designed for a single unlabeled image. In each training step, a transformation is randomly selected for each sample in the batch from a predefined range: the first enhancement method is weak enhancement, which is the result of random flipping, moving and random scaling strategies with a probability of 50%; the other enhancement method is strong enhancement, which adds grayscale transformation on the basis of the weakly enhanced image.
  9. 如权利要求8所述的医学图像分割方法,其特征在于,所述的步骤h具体包括:The medical image segmentation method according to claim 8, wherein the step h specifically comprises:
    无标签数据损失分为两个部分,包括输出空间一致性损失和对比学习损失;对比学习损失的计算方法为编码器分别基于弱增强图像和强增强图像产生特征,同一个位置的特征互相视作正例,不同位置的特征视作负例,负例的采样方法采取gumbel采样策略,选取余弦相似度最小的k个像素组成负例,或者根据解剖学先验知识,选择距离较远的像素作为负例;将InfoNCE损失与余弦相似度相结合,得到像素对比度损失。The unlabeled data loss is divided into two parts, including output space consistency loss and contrastive learning loss. The calculation method of contrastive learning loss is that the encoder generates features based on weakly enhanced images and strongly enhanced images respectively. Features at the same position are regarded as positive examples, and features at different positions are regarded as negative examples. The sampling method of negative examples adopts the gumbel sampling strategy, and selects k pixels with the smallest cosine similarity to form negative examples, or selects pixels with a longer distance as negative examples based on anatomical prior knowledge. The InfoNCE loss is combined with the cosine similarity to obtain the pixel contrast loss.
  10. 如权利要求9所述的医学图像分割方法,其特征在于,所述的步骤i具体包括:The medical image segmentation method according to claim 9, wherein the step i specifically comprises:
    对于有标签数据获取的分割结果,与标签计算dice损失作为监督学习损失;对于无标签数据,弱增强图像和强增强图像的结果之间计算一致性损失。For the segmentation results obtained with labeled data, the dice loss is calculated with the label as the supervised learning loss; for unlabeled data, the consistency loss is calculated between the results of weakly enhanced images and strongly enhanced images.
  11. 如权利要求10所述的医学图像分割方法,其特征在于,所述的步骤j具体包括:The medical image segmentation method according to claim 10, characterized in that the step j specifically comprises:
    使用随机梯度下降作为优化器进行训练,使用权重衰减以防止过拟合;模型训练完成之后选择在各个比例的有监督数据情况下较为准确的模型保存。Use stochastic gradient descent as the optimizer for training, and use weight decay to prevent overfitting; after the model training is completed, select the more accurate model under supervised data of various proportions to save.
PCT/CN2022/131075 2022-11-10 2022-11-10 Medical image segmentation method WO2024098318A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/131075 WO2024098318A1 (en) 2022-11-10 2022-11-10 Medical image segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/131075 WO2024098318A1 (en) 2022-11-10 2022-11-10 Medical image segmentation method

Publications (1)

Publication Number Publication Date
WO2024098318A1 true WO2024098318A1 (en) 2024-05-16

Family

ID=91031586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131075 WO2024098318A1 (en) 2022-11-10 2022-11-10 Medical image segmentation method

Country Status (1)

Country Link
WO (1) WO2024098318A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113180633A (en) * 2021-04-28 2021-07-30 济南大学 MR image liver cancer postoperative recurrence risk prediction method and system based on deep learning
CN113674253A (en) * 2021-08-25 2021-11-19 浙江财经大学 Rectal cancer CT image automatic segmentation method based on U-transducer
CN114494296A (en) * 2022-01-27 2022-05-13 复旦大学 Brain glioma segmentation method and system based on fusion of Unet and Transformer
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN115147600A (en) * 2022-06-17 2022-10-04 浙江中医药大学 GBM multi-mode MR image segmentation method based on classifier weight converter
CN115272386A (en) * 2021-04-30 2022-11-01 中国医学科学院基础医学研究所 Multi-branch segmentation system for cerebral hemorrhage and peripheral edema based on automatic generation label
WO2022228958A1 (en) * 2021-04-28 2022-11-03 Bayer Aktiengesellschaft Method and apparatus for processing of multi-modal data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113180633A (en) * 2021-04-28 2021-07-30 济南大学 MR image liver cancer postoperative recurrence risk prediction method and system based on deep learning
WO2022228958A1 (en) * 2021-04-28 2022-11-03 Bayer Aktiengesellschaft Method and apparatus for processing of multi-modal data
CN115272386A (en) * 2021-04-30 2022-11-01 中国医学科学院基础医学研究所 Multi-branch segmentation system for cerebral hemorrhage and peripheral edema based on automatic generation label
CN113674253A (en) * 2021-08-25 2021-11-19 浙江财经大学 Rectal cancer CT image automatic segmentation method based on U-transducer
CN114494296A (en) * 2022-01-27 2022-05-13 复旦大学 Brain glioma segmentation method and system based on fusion of Unet and Transformer
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN115147600A (en) * 2022-06-17 2022-10-04 浙江中医药大学 GBM multi-mode MR image segmentation method based on classifier weight converter

Similar Documents

Publication Publication Date Title
Liu et al. Auto-encoding knowledge graph for unsupervised medical report generation
Zhuang et al. An Effective WSSENet-Based Similarity Retrieval Method of Large Lung CT Image Databases.
Mahapatra et al. Training data independent image registration using generative adversarial networks and domain adaptation
CN115908800A (en) Medical image segmentation method
CN111260705B (en) Prostate MR image multi-task registration method based on deep convolutional neural network
CN114596318A (en) Breast cancer magnetic resonance imaging focus segmentation method based on Transformer
Wang et al. Multiscale transunet++: dense hybrid u-net with transformer for medical image segmentation
Han et al. Multi-scale 3D convolution feature-based broad learning system for Alzheimer’s disease diagnosis via MRI images
CN115512110A (en) Medical image tumor segmentation method related to cross-modal attention mechanism
Albishri et al. AM-UNet: automated mini 3D end-to-end U-net based network for brain claustrum segmentation
Fonov et al. DARQ: Deep learning of quality control for stereotaxic registration of human brain MRI to the T1w MNI-ICBM 152 template
Zhang et al. TW-Net: Transformer weighted network for neonatal brain MRI segmentation
Men et al. Continual improvement of nasopharyngeal carcinoma segmentation with less labeling effort
WO2024098318A1 (en) Medical image segmentation method
Verma et al. Role of deep learning in classification of brain MRI images for prediction of disorders: a survey of emerging trends
Cheng et al. Feature-enhanced adversarial semi-supervised semantic segmentation network for pulmonary embolism annotation
Li et al. IAS‐NET: Joint intraclassly adaptive GAN and segmentation network for unsupervised cross‐domain in neonatal brain MRI segmentation
CN116258685A (en) Multi-organ segmentation method and device for simultaneous extraction and fusion of global and local features
Guan et al. A mutual promotion encoder-decoder method for ultrasonic hydronephrosis diagnosis
CN114359194A (en) Multi-mode stroke infarct area image processing method based on improved U-Net network
Chen et al. WS-MTST: Weakly Supervised Multi-Label Brain Tumor Segmentation With Transformers
Dharwadkar et al. Right ventricle segmentation of magnetic resonance image using the modified convolutional neural network
Liu et al. Joint cranial bone labeling and landmark detection in pediatric CT images using context encoding
Tan et al. SwinUNeLCsT: Global–local spatial representation learning with hybrid CNN–transformer for efficient tuberculosis lung cavity weakly supervised semantic segmentation
Wu et al. RClaNet: An explainable Alzheimer's disease diagnosis framework by joint registration and classification