CN117611605B

CN117611605B - Method, system and electronic equipment for segmenting heart medical image

Info

Publication number: CN117611605B
Application number: CN202311577329.XA
Authority: CN
Inventors: 马晓轩; 廉焜程; 隋栋; 化凤芳
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-04-16
Anticipated expiration: 2043-11-24
Also published as: CN117611605A

Abstract

The invention discloses a method, a system and electronic equipment for segmenting a heart medical image, and relates to the technical field of image segmentation, wherein the method comprises the following steps: acquiring a target image; the target image is a heart medical image to be segmented; inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of heart medical images containing graffiti annotations of different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is trained using marker-free medical images of the heart at different times of the heart. The invention reduces the labor cost and improves the accuracy of the heart medical image segmentation.

Description

Method, system and electronic equipment for segmenting heart medical image

Technical Field

The present invention relates to the field of image segmentation technologies, and in particular, to a method, a system, and an electronic device for segmenting a cardiac medical image.

Background

Image segmentation is an important task in computer vision, especially for medical image clinical practice, accurate segmentation is essential for diagnosis and treatment planning, and thus image segmentation is one of the core challenges of medical images. In medical images, different types of data, such as x-rays, computed tomography, nuclear magnetism, etc., are generated according to different imaging modes. Most methods rely on supervised learning to achieve the most advanced performance. For medical images, accurate labeling requires a clinician with a high level of experience to accomplish. Therefore, medical image annotation tends to consume more manpower and resources than natural image annotation.

In recent years, contrast learning has achieved great success in learning image-level features. The main idea is to select positive and negative pairs of samples, then to use a contrast learning loss function to bring positive pairs of samples close to each other and negative pairs of samples away from each other. Through this contrast learning loss, the encoder parameters can be trained, taken as a pre-training, and the pre-trained encoder parameters are migrated to the supervised downstream tasks. Several works indicate that the use of a contrast-trained encoder can often work better than a supervised trained encoder. The key of contrast learning is to select a correct and reasonable contrast learning strategy according to the form of data. In the downstream task, a weakly supervised method is selected. Such as image level annotation, graffiti annotation, point annotation, etc. These methods are expected to improve the performance of medical image segmentation models while reducing the need for expensive and time-consuming expert segmentation annotations and inadequate image annotations. Segmentation of objects using graffiti labeling has been studied for many years. There have also been many graffiti-based labeling efforts, but there have been problems with using graffiti labeling to supervise training, in that while training with graffiti annotations can reduce the need for expensive and time-consuming expert segmentation annotations and inadequate image annotations, the imprecise nature of these labels can limit the accuracy of the resulting segmentation model. The limited supervisory signals provided by the graffiti annotations may hinder the ability of the model to learn the necessary visual features required for accurate medical image segmentation. Furthermore, medical image labeling often suffers from various quality defects, which may adversely affect performance, as compared to fully supervised methods

Disclosure of Invention

The invention aims to provide a method, a system and electronic equipment for segmenting a heart medical image, which reduce the labor cost and improve the accuracy of heart medical image segmentation.

In order to achieve the above object, the present invention provides the following solutions:

a method of segmenting a cardiac medical image, comprising:

Acquiring a target image; the target image is a heart medical image to be segmented; the cardiac medical image is a 2D slice image of the heart;

inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of cardiac medical images containing graffiti annotations of the heart at different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is obtained by training the encoder by using a plurality of cardiac medical images of the heart without the markers at different times.

Optionally, the training process of the pre-training encoder includes:

Initializing an encoder to obtain an initial encoder;

acquiring a plurality of marker-free cardiac medical images of the heart at different times;

Determining any two marker-free cardiac medical images belonging to the same heart at two times as a first training sample set;

Training the initial encoder by using each first training sample group to obtain the pre-training encoder; wherein the training process for the encoder at any current pre-training time includes:

Respectively inputting each cardiac medical image without the mark into a current pre-training frequency encoder to obtain the characteristics of the corresponding cardiac medical image without the mark under the current pre-training frequency;

the cosine similarity of the corresponding first training sample group under the current pre-training times is calculated based on the characteristics of the two cardiac medical images without marks in each first training sample group under the current pre-training times;

calculating encoder loss under the current pre-training times based on cosine similarity under the current pre-training times of all the first training sample groups;

judging whether a pre-training stopping condition is met; the pre-training stopping condition is that the encoder loss under the preset pre-training times or the current pre-training times is smaller than a first preset loss threshold value;

If yes, determining the encoder under the current pre-training times as the pre-training encoder;

If not, updating the encoder under the current pre-training times to the encoder under the next pre-training times, and returning to input the cardiac medical images without the marks into the encoder under the current pre-training times respectively to obtain the characteristics of the cardiac medical images without the marks under the current pre-training times.

Optionally, the marker-free cardiac medical image of any current heart at any current time period comprises:

Acquiring a volume medical image of the current heart in the current period without a mark;

And slicing the volumetric medical image to obtain a plurality of corresponding cardiac medical images without marks.

Optionally, the training process of the image segmentation model includes:

initializing 2 decoders to obtain 2 initial decoders;

constructing the partitioning network based on the pre-training encoder and 2 initial decoders;

Acquiring a plurality of cardiac medical images containing graffiti annotations of the heart at different times;

determining any two medical images of the heart containing graffiti annotations belonging to the same heart at two times as a second training sample set;

training the segmentation network by using each second training sample group to obtain the image segmentation model; the training process for the segmentation network under any current training times comprises the following steps:

Respectively inputting each cardiac medical image containing the graffiti annotation into a segmentation network under the current training times to obtain 2 prediction labels corresponding to the cardiac medical image containing the graffiti annotation under the current training times; the 2 prediction labels are the prediction labels output by 2 decoders;

calculating the loss of the current training times based on each prediction label and the corresponding graffiti annotation;

Determining a comprehensive prediction result corresponding to the current training times of the heart medical image containing the graffiti annotation based on 2 prediction labels of the current training times of each heart medical image containing the graffiti annotation;

Determining a mixing loss under the current training times based on 2 prediction labels and comprehensive prediction results under the current training times of each cardiac medical image containing the graffiti annotation;

Calculating total loss under the current training times based on the partial loss and the mixed loss under the current training times;

Judging whether a training stop condition is met; the training stopping condition is that the total loss is smaller than a second preset loss threshold value when the preset training times are reached or the total loss under the current training times is smaller than the second preset loss threshold value;

if yes, determining a segmentation network under the current training times as the training encoder;

If not, updating the segmentation network under the current training times into the segmentation network under the next training times, and returning to input the heart medical images containing the graffiti annotations into the segmentation network under the current training times respectively to obtain 2 prediction labels corresponding to the heart medical images containing the graffiti annotations under the current training times.

Optionally, inputs of 2 decoders in the partitioning network are connected to outputs of the pre-training encoders, respectively.

A segmentation system for cardiac medical images, comprising:

The image acquisition module is used for acquiring a target image; the target image is a heart medical image to be segmented; the cardiac medical image is a 2D slice image of the heart;

The segmentation module is used for inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of cardiac medical images containing graffiti annotations of the heart at different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is obtained by training the encoder by using a plurality of cardiac medical images of the heart without the markers at different times.

An electronic device comprising a memory for storing a computer program and a processor for running the computer program to cause the electronic device to perform the method of segmentation of a cardiac medical image as described above.

Optionally, the memory is a readable storage medium.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

The invention discloses a method, a system and electronic equipment for segmenting a heart medical image, wherein the method, the system and the electronic equipment firstly acquire the heart medical image to be segmented as a target image; then, inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium. Wherein the image segmentation model is obtained by training a segmentation network by using a plurality of cardiac medical images containing graffiti annotations of the heart at different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is obtained by training the encoder by using a plurality of cardiac medical images without marks of the heart in different periods, so that the labor cost is reduced, and the accuracy of the segmentation of the cardiac medical images is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for segmenting a cardiac medical image according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of the SPCL algorithm;

FIG. 3 is a schematic diagram of the encoder and decoder in the image segmentation stage;

FIG. 4 is a schematic diagram of a qualitative comparison of SPCL between other most advanced methods on an ACDC dataset;

FIG. 5 is a training process and accuracy diagram for different models;

FIG. 6 is a graph of representative ablation experiments demonstrating the improvement in model performance of different modules;

FIG. 7 is a graph of model training process and accuracy with the addition of different modules.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide a method, a system and electronic equipment for segmenting a heart medical image, which aim to reduce labor cost and improve accuracy of heart medical image segmentation.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Before proceeding with the description of the embodiments, the following description of the specialized vocabulary is presented.

(1) Contrast study: in recent years, several powerful self-supervised learning approaches have emerged, which are applied in a variety of computer vision tasks. Their main learning strategy is to use contrast loss for the coded hidden variable space. A popular loss function is infoNCE, which can bring positive pairs of samples close to each other and negative pairs of samples far apart from each other. In the field of image segmentation, the method of contrast learning is mostly a self-supervision method to design a powerful feature extractor. And transfers it to downstream tasks. Wherein MOCO uses a queue storage method in combination with a momentum encoder method. This ensures that batchsize of training is minimized in the case where the two encoders are as similar as possible. SIMCLR does not use the method of queues and momentum encoders, but rather, the training skills such as larger batchsize and effective image enhancement are used, so that good results are obtained. Although these methods can achieve good training results, they clearly require high hardware resources to implement. Some work also works on methods of selecting positive and negative sample pairs for contrast learning, which skillfully distinguish positive and negative sample pairs by using the characteristics of the data set, or achieve better results by better loss functions. Hritam Basak et al proposes an SSL strategy that uses dense projection header representations for contrast learning to learn robust local features, redefines "positive" and "negative" samples during contrast learning, and expands InfoNCE losses during pre-training to accommodate dense representations. Zifeng Wang et al, a false negative pair is presented, i.e. although reports do not belong to the study of the target patient, they can still describe the same symptoms and findings. Simply treating the reporting of different patients as a negative sample can make the supervision noisy and confuse the model. Thereby degrading the performance of the model. Xiangyu Zhao et al proposes a bi-directional voxel contrast loss and a new confidence negative sampling strategy. The two enhancement views should be encouraged to be similar and both views should be far from the negative samples in the feature space.

(2) Weakly supervised medical image segmentation (WSSS): the conventional difficulty of medical image segmentation is the difficulty of labeling. Only doctors with abundant clinical experience can accurately mark. It is therefore particularly important that relatively accurate results can be obtained using simple labeling information using weakly supervised methods. In recent years, the effort of weak supervision has mostly relied on class activation graphs (CLASS ACTIVATE MAP, CAM) whose approach is to perform Global Average Pooling (GAP) at the last layer of the network to generate a spatial class activation map representing the object region responsible for predicting the object class. However, the training process using CAM is generally divided into three parts, initial CAM generation, pseudo mask generation, and training of the segmentation model. Such a process is cumbersome and has a strong sense of cleavage. For medical images, there is also a common weak supervision method that uses graffiti labeling, which means that a user manually draws simple lines or graffiti to label the position of an object in the image. In the field of medical image segmentation, manual annotation data is typically provided in the form of points, lines or regions. The most common approach is to use existing graffiti labeling and machine learning algorithms in combination, such as GraphCuts, randomWalker. Wherein Zihan Li et al proposes a visual class embedded graffiti supervision model that enables a multimodal information enhancement mechanism to introduce class feature information into visual features. And simultaneously using CNN and transducer features to achieve better visual feature extraction. Ke Zhang et al, integrate the hybrid enhancement of supervision and regularization of consistency supervision, propose a segmentation framework CycleMix for graffiti supervision that can achieve the increment and decrement of graffiti in a hybrid training image. And proposes a consistency penalty, standardizing the limited supervision of the graffiti by penalizing inconsistent segmentation results at global and local levels. Andrea Leo et al combines graffiti labeling with an Antagonistic Attention Gate (AAGs) forcing the segmenter to locate objects in the image. AAG also encourages better training deeper layers in the segmenter, affected by the resistance gradient.

Example 1

Fig. 1 is a flowchart of a method for segmenting a cardiac medical image according to embodiment 1 of the present invention. As shown in fig. 1, the method for segmenting a cardiac medical image in the present embodiment includes:

Step1: acquiring a target image; the target image is a heart medical image to be segmented; the medical image of the heart is a 2D slice image of the heart.

Step 2: and inputting the target image into an image segmentation model to obtain a segmented target image.

The segmented target image is a heart medical image with a prediction segmentation frame and a corresponding prediction label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of heart medical images containing graffiti annotations of different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is trained using marker-free medical images of the heart at different times of the heart.

As an alternative embodiment, the training process of the pre-training encoder comprises:

initializing the encoder to obtain an initial encoder.

A plurality of cardiac medical images of the heart without markers at different times are acquired.

Any two marker-free medical images of the heart belonging to the same heart at two times are determined as a first training sample set.

Training the initial encoder by using each first training sample group to obtain a pre-training encoder; wherein the training process for the encoder at any current pre-training time includes:

And respectively inputting each cardiac medical image without the mark into the encoder under the current pre-training times to obtain the characteristics of the corresponding cardiac medical image without the mark under the current pre-training times.

And calculating cosine similarity of the corresponding first training sample group under the current pre-training times based on the characteristics of the two cardiac medical images without the marks in each first training sample group under the current pre-training times.

And calculating the encoder loss under the current pre-training times based on the cosine similarity under the current pre-training times of all the first training sample groups.

Judging whether a pre-training stopping condition is met; the pre-training stop condition is that the encoder loss at the preset number of pre-training times or the current number of pre-training times is less than a first preset loss threshold.

If yes, determining the encoder under the current pre-training times as a pre-training encoder.

As an alternative embodiment, a marker-free cardiac medical image of any current heart at any current time period, comprising:

A volumetric medical image of the current heart is acquired over a current time period without a marker.

Slicing the volumetric medical image to obtain a corresponding plurality of cardiac medical images without markers.

As an alternative embodiment, the training process of the image segmentation model includes:

Initializing 2 decoders to obtain 2 initial decoders.

A partitioning network is constructed based on the pre-training encoder and 2 initial decoders.

A plurality of cardiac medical images containing graffiti annotations of the heart at different times are acquired.

Any two medical images of the heart containing graffiti annotations belonging to the same heart at two times are determined as a second training sample set.

Training the segmentation network by using each second training sample group to obtain an image segmentation model; the training process for the segmentation network under any current training times comprises the following steps:

respectively inputting each cardiac medical image containing the graffiti annotation into a segmentation network under the current training times to obtain 2 prediction labels corresponding to the cardiac medical image containing the graffiti annotation under the current training times; the 2 predictive labels are predictive labels output by 2 decoders.

And calculating the loss of the current training times based on each prediction label and the corresponding graffiti annotation.

Based on the 2 prediction labels under the current training times of each cardiac medical image containing the graffiti annotation, a comprehensive prediction result under the current training times of the corresponding cardiac medical image containing the graffiti annotation is determined.

Based on the 2 prediction labels and the comprehensive prediction results for each current training time of the cardiac medical image containing the graffiti annotation, a mixing loss for the current training time is determined.

Based on the partial loss and the mixed loss of the current training times, the total loss of the current training times is calculated.

Judging whether a training stop condition is met; the training stopping condition is that the total loss under the preset training times or the current training times is smaller than a second preset loss threshold value.

If yes, determining the segmentation network under the current training times as a training encoder.

As an alternative embodiment, the inputs of the 2 decoders in the partitioning network are each connected to the output of the pre-training encoder.

In order to implement the method in embodiment 1, a comparison learning graffiti supervised Segmentation (SPCL) algorithm is also provided, and the implementation process and principle of the proposed SPCL algorithm are described below in terms of three aspects, namely, a comparison learning method and a design of a dual-branch network.

1. A framework overview.

As shown in fig. 2, the training process of the doodling supervision Segmentation framework for contrast learning (Contrastive Learning) is multi-stage and is divided into a Pre-training stage (Pre-training) and an image Segmentation stage (Segmentation). In the pre-training phase, unlabeled volumetric medical images (i.e., volumetric medical images without labels) of the End Diastole (ED) and End Systole (ES) of the same heart are input. 2D slices of the xy plane are sampled from the two volumetric medical images, and then these slices are fed into an encoder (Encoder) for feature extraction and scale compression. A Projection head (Projection head) composed of a multi-layer perceptron (MLP) is then used to map the feature map to vector space to calculate contrast loss, thereby supervising the training of the encoder and achieving the purpose of training the encoder parameters. After the pre-training phase is finished, the projection head is discarded, and the trained encoder parameters are frozen. And treating the parameters of this encoder as an initialization of the split network encoder, resulting in a pre-trained encoder. In the image segmentation stage, one 2D slice of the volumetric medical image is randomly selected and fed into a pre-training encoder, after which the extracted features are put into two different decoders. And dynamically fusing the prediction results of the two decoders once to obtain a new pseudo tag, and continuing to assist in supervising the training of the network formed by the pre-training encoder and the two different decoders by using the new pseudo tag.

2. And performing contrast learning by using the structural information of the medical image.

The medical image has features that are missing from natural images, and the volumetric medical image is observed to have several features: 1) The same organ site of different patients is of similar structure. 2) Volumetric medical images have a very high resolution along the z-axis, so that the 2D slices divided along the z-axis in the volumetric image have similar semantic information. 3) Different phases of the same organ are of similar semantic information, such as End Diastole (ED) and End Systole (ES) of the heart. These three features of the volumetric medical image are utilized in the present invention to generate a contrast-learned sample pair.

Specifically, two volumetric medical images are processed at a time, each of which is a different period of the heart image of the same patient. The two volumetric medical images are then separated into a sequence of 2D slices. The z-axis of each volumetric medical image is defined as the interval 0 to 1. And p (position) is defined in terms of the relative position of each slice in the volumetric medical image. Assuming that this volumetric medical image can be cut into N slices, one slice is the mth slice of which, then its p=m/N. A position information is defined for each slice, so that positive and negative sample pairs can be selected for contrast learning according to organ information fusion position information of different periods. A threshold a is defined and if p between slices is made a difference, the two slices are considered to be positive sample pairs if the result is less than this threshold. Results above this threshold value consider the two slices as negative sample pairs. And this threshold is not constant and can be dynamically adjusted according to the number of training rounds. It is believed that the model can learn more comprehensively in the hidden variable space, so that the performance of contrast learning is improved.

The contrast learning loss function is referenced SupCon in order to pull the distance of the positive sample in the feature space, and to pull the distance of the negative sample in the feature space. First, two similar but non-identical image enhancement is performed on N slices { xi } i=1..n sampled from different time phases of the same patient to 2N samples (i.e., slices) { xi } i=1..2n, then a contrast learned loss function (i.e., a loss function of encoder loss) is defined as shown in equation (1) and equation (2).

Wherein is encoder loss; the/> is the loss of sample i; m _ij is a mask, determining whether to consider sample i and sample j as "positive samples", if the difference between the labels of sample i and sample j is less than a certain threshold, then M _ij =1, otherwise M _ij＝0;Z_ij is cosine similarity between the features of sample i and the features of sample j, for measuring the similarity between sample i and sample j; t is a temperature parameter which can control the sensitivity of the loss function, and higher temperatures can make the similarity smoother; z _ik is the cosine similarity between the features of sample i and the features of sample k, and is used to measure the similarity between sample i and sample k.

3. A dual-branch network.

The image segmentation stage consists of an encoder E1 initialized by contrast learning and two independent and different decoders D1 and D2. An attention module (Convolutional Block Attention Module, CBAM) adds a perturbation into the UNet decoder. Wherein, CBAM module adds in the up-sampling process, the disturbance is to introduce disturbance at the feature level by using dropout. And canceling the layer jump connection of one of the decoders as shown in fig. 3, such a design has several advantages: 1) Adding dropout can allow the output of the two decoders to be different due to characteristic perturbations while preventing overfitting. 2) The CBAM module helps the model to better capture the characteristics of the input data through the combination of the channel and the spatial attention, and helps reduce the risk of overfitting by enhancing the attention of the model to important information CBAM, so that the model is more stable and robust. 3) The jump layer connection of a section is canceled in the double-branch decoder, so that the model parameters and the calculated amount are reduced. Reducing the size of the model while eliminating layer-jump connections can concentrate each decoder branch on a different type of feature learning. This can improve the diversity of the model. Two prediction results can be obtained by training only one network by using the structure of double branches. This prediction is obtained by training the decoder directly using graffiti labeling by minimizing the partial cross entropy loss. The loss function is shown in equation (3).

Wherein is fractional loss; p is the prediction probability; s is a one-hot graffiti annotation; c is the serial number of the category; omega _s is the set of labeled pixels in s; and/> is the predicted probability that pixel q belongs to class c.

The prediction results of the two decoders are used for dynamic fusion to obtain a stronger pseudo tag, and then the training of the supervision encoder is reversed. The formula of the mixed prediction result is shown as formula (4).

Pred＝arg max[α·p₁+(1.0-α)·p₂],α∈(0,1) (4)。

Wherein Pred is a mixed prediction result; alpha is an updatable parameter that changes in each iteration, with the interval of change being 0 to 1. The strategy can ensure the diversity of strong pseudo tag generation, and better supervise the decoder to update parameters; p ₁ is the prediction result output by the first decoder; p ₂ is the prediction result output by the second decoder.

In this way, the supervision mode can be changed from the graffiti annotation of few pixels to the whole image to be supervised. The new predictive label (i.e., the hybrid prediction result) is used to supervise both decoders to aid network training, and the loss function, i.e., the hybrid loss, is defined as shown in equation (5).

Wherein is the mixing loss; the/> is diceloss, and other segmentation loss functions, such as cross entropy loss, jaccard loss, etc., may be used instead.

Finally the network is trained using graffiti annotations by minimizing the following overall objective function. The total objective function, i.e., the total loss, is shown in equation (6).

And lambda is a weight coefficient and is used for balancing graffiti labeling supervision and new pseudo tag supervision.

4. Evaluation index

To evaluate the segmentation performance of the present model, the evaluation was performed using the performance index shown in the formula (7) -formula (10).

Hd95＝max(d(h,S),d(S,h))(9)。

Wherein, dice is coefficients; TP is true positive, i.e., is judged as a positive sample, in fact a positive sample; FP is a false positive, i.e. is judged to be a positive sample, in fact a negative sample; FN is false negative, i.e. judged as negative, but actually positive; precison is accuracy; hd95 is Hausdorff distance; h is a boundary point of prediction segmentation; s is a boundary point of real segmentation; d (h, S) is the distance from point h to point S; ASD is the average surface distance; n is the total number of samples.

Further, experiments using the SPCL algorithm are also provided. The analysis of the experimental results was specifically performed from the following four aspects.

1. A data set.

ACDC dataset is a public dataset containing MRI images of the heart of 100 patients, each of which contains a sequence of 2D images of the entire time dimension, the End Systole (ES) and End Diastole (ED) of the heart. Each image sequence contains three segmentation targets, right Ventricle (RV), left Ventricle (LV) and Myocardium (MYO), respectively. Recently Valvano et al issued graffiti annotations made by experts for medical ACDC datasets, and in order to evaluate the performance of supervised training using graffiti annotations, segmentation of 3D volumetric images into 2D slices was employed.

2. Experimental details.

(1) And (5) comparing the learning stages.

The UNet encoder is pre-trained on ACDC data sets using a contrast learning approach, which is a self-supervising phase that does not require the use of any annotation information. All data of 100 patients can be used for pre-training, but the invention only selects the data of the End Systole (ES) and the End Diastole (ED) of each patient for pre-training, which has the advantages of 1) greatly reducing the training time length and 2) enabling the parameters of the encoder pre-training to better meet the requirements of the downstream task stage. 3) The selection of positive and negative samples for these two periods is more clear and is more advantageous for computing contrast learning loss and training encoder parameters. The position information threshold for selecting positive and negative sample pairs is initially 0.1, and 0.01 is added every 25 epochs. Experiments prove that the model has the best performance. The data enhancement includes horizontal flipping of the image, random cropping of the image, and spatial transformations such as translational rotation and scaling of the input image. These transforms have a degree of randomness that ensures a fine gap for each image enhancement. Pre-training was done on NVIDIAGeForceGTX3060TiGPU with a training period of 100 epochs using SGD as the optimizer and an initial learning rate set to 0.1. Using a cosine-learning rate scheduler, the batch size is set to 16. The temperature was set to 0.1.

(2) And an image segmentation stage.

The partitioning of the basic network framework selects U-Net, wherein the encoder part uses the parameters after contrast learning pre-training as initialization, the decoder part changes the decoder structure into a double-branch structure by embedding an auxiliary decoder, a dropout layer is added in front of each conversion module of the auxiliary decoder to add disturbance, and the layer jump connection of the auxiliary decoder is cancelled, so that model parameters and calculation amount can be reduced, and the auxiliary encoder can be focused on different types of feature learning. This can improve the diversity of the model. And a channel attention mechanism and a spatial attention mechanism are added at each decoder to improve model performance. The training process is completed at NVIDIA GeForceGTX3060,306, 3060 TiGPU. The training period is 60k iterations. For network training, each 2D image sequence is enhanced by random rotation, random flipping, random noise. And adjusts the enhanced image to 256 x 256 as network input. SGD was used as an optimizer, where weight decay = 10 ^-4, momentum = 0.9. A multiple learning rate strategy is used to dynamically adjust the learning rate. The batch size was set to 8. The prediction is performed from 2D image sequence to 2D image sequence and the final result is stacked into the format of a 3D volumetric image. For comparison, the prediction result of the main decoder is used as the final result, and all experiments are performed under the same hardware environment, and the dice coefficient is used as an evaluation index.

3. Comparison with other models.

To demonstrate the overall segmentation performance of the models in the present invention, comparing SPCL with the different weak supervision segmentation methods in recent years is shown in table 1, and the segmentation performance of ACDC dataset in each model can be visually observed in fig. 4.

First, for different graffiti supervision policies on UNet: mumford-shah loss UNetmlos and entropy minimization UNetem and UNet variant network SwinUNet. Second, for the method MixUp, cutout, cutMix using different data enhancements. Finally, high performance methods using graffiti labeling were compared for the last two years. CycleMix and ScribbleVC. As shown in Table 1, the SPCL model of the present invention is superior to many UNet-based training strategies, model architecture, and data enhancement techniques in terms of graffiti supervision. In particular it is superior to the SOTA method ScribbleVC2% (90.4% vs 88.4%) on ACDC dataset. This demonstrates the effectiveness of incorporating contrast learning into a dual-branch split network. And the doodle labeling is used to supervise the segmentation of the dual-branch network, while more efficient pseudo tags are generated for supervision. Such SPCL achieves performance in a near fully supervised approach with a weakly supervised approach. And the annotation cost is superior to that of a fully supervised method, and the potential of medical image segmentation is huge. The training process and accuracy of each model can be observed visually from fig. 5.

Table 1 SPCL first Performance comparison Table with other most advanced methods on ACDC

In order to more scientifically and truly embody the performance advantages of the models, a hausdorff distance 95 percentile (HD 95) and an Average Surface Distance (ASD) are calculated from several models with better performance of the dic e coefficients respectively, so as to fairly perform performance comparison, as shown in table 2.

Table 2 SPCL second Performance comparison Table between other most advanced methods on ACDC

4. Ablation experiments.

An ablation study was performed on ACDC dataset to assess the effectiveness of the different components in the framework presented by the present invention, the same cardiac dataset experimental setup was used in all experiments. Different modules are deleted, such as contrast learning (Contrastive Learning, CL), attention, and layer jump connections (Skip Cconnection, SC). It is apparent from table 3 that canceling the layer jump connection #2 reduces the performance of the model, but experiments show that canceling the layer jump connection can increase the training speed and reduce the model parameters. The performance of the model can be obviously improved by adding attention modules to #3 and adding the contrast learning modules to #4, wherein the performance of the contrast learning and attention modules used by #7 is optimal, and the effectiveness of the modules is proved.

Table 3 SPCL image segmentation results table for different settings

Several experiments, representative of the ablation experiments, were selected for comparison of segmentation effects, as shown in fig. 6 (the first column in fig. 6 is the original image, the second column is groundtruth. The remaining columns show segmentation performance after addition of different modules). #7 is the best performance of the model of the present invention, including the use of contrast learning pre-training, attention mechanisms, and layer jump connections. #4 further highlights the improvement of the segmentation effect of the contrast learning module based on the end systole and end diastole and the position information. #2 is the base model without pre-training. The training process and accuracy of the model after each module is added can be visually observed from fig. 7.

The invention provides a framework for fusing self-supervision contrast learning with weak supervision image segmentation. In the contrast learning stage, the unique time information and position information of the medical image are used as a contrast learning method to effectively pretrain the encoder parameters. The image segmentation stage is learned from the graffiti annotation by adopting a double-branch network mode. And fusing the prediction results generated by the two decoders in the doodling supervision to serve as new pseudo labels to supervise the training of the encoder and the decoder. SPCL is an effective and lightweight model that can provide high quality pixel fraction results, and experiments on common cardiac MRI image segmentation datasets (ACDC) demonstrate that the effectiveness of this approach is superior to the most recent use of weakly supervised segmentation methods, SPCL model structures that can migrate the functionality of the model to other areas, such as medical image registration, medical image detection, etc., by replacing the structure of the segmented portions.

Example 2

A segmentation system of a cardiac medical image in the present embodiment includes:

The image acquisition module is used for acquiring a target image; the target image is a heart medical image to be segmented; the medical image of the heart is a 2D slice image of the heart.

The segmentation module is used for inputting the target image into the image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of heart medical images containing graffiti annotations of different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is trained using marker-free medical images of the heart at different times of the heart.

Example 3

An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the method of segmentation of cardiac medical images in embodiment 1.

As an alternative embodiment, the memory is a readable storage medium.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A method of segmenting a cardiac medical image, the method comprising:

Inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of cardiac medical images containing graffiti annotations of the heart at different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is obtained by training the encoder by using a plurality of cardiac medical images without marks of the heart in different periods;

the method is realized by using a comparison learning doodling supervision and segmentation algorithm, a comparison learning method and a double-branch network;

The training process of the contrast learning doodling supervision and segmentation algorithm framework is multi-section and is divided into a pre-training stage and an image segmentation stage; in the pre-training phase, inputting the volume medical images without marks of the end diastole and the end systole of the same heart; sampling 2D slices of the xy plane from the end diastole and end systole marker-free volumetric medical images, and conveying the slices to an encoder for feature extraction and scale compression to obtain feature maps; the projection head formed by a multi-layer perceptron maps the feature map to a vector space to calculate contrast loss, so that the training of the encoder is supervised, and the aim of training the encoder parameters is fulfilled; discarding the projection head after the pre-training stage is finished, and freezing the trained encoder parameters; taking the parameters of the encoder as the initialization of the segmentation network encoder to obtain a pre-training encoder;

In the contrast learning method, volumetric medical images of end diastole and end systole are divided into a sequence of 2D slices; defining a z-axis of each volumetric medical image as an interval of 0 to 1; defining the relative position of each slice in the volumetric medical image as p; selecting positive and negative samples according to the relative position p to perform contrast learning;

canceling one-segment layer jump connection in the dual-branch decoder; obtaining two prediction results by using a dual-branch network; the prediction results are obtained by training the decoder directly using graffiti labeling by minimizing partial cross entropy loss.

2. The method of segmentation of cardiac medical images as set forth in claim 1 wherein the training process of the pre-training encoder comprises:

Initializing an encoder to obtain an initial encoder;

3. The method of segmentation of cardiac medical images according to claim 2, wherein the marker-free cardiac medical image of any current heart at any current time period comprises:

4. The method of segmentation of cardiac medical images as set forth in claim 1 wherein the training process of the image segmentation model comprises:

initializing 2 decoders to obtain 2 initial decoders;

acquiring medical images of a plurality of hearts containing graffiti annotations at different periods;

5. The method of segmentation of cardiac medical images as set forth in claim 1 wherein inputs of 2 decoders in the segmentation network are each connected to an output of the pre-training encoder.

6. A segmentation system for cardiac medical images, the system comprising:

The segmentation module is used for inputting the target image into an image segmentation model to obtain a segmented target image; the segmented target image is a heart medical image with a predictive segmentation frame and a corresponding predictive label, the label is a segmented heart structure, and the structure comprises: right ventricle, left ventricle and myocardium; the image segmentation model is obtained by training a segmentation network by using a plurality of cardiac medical images containing graffiti annotations of the heart at different periods, and the segmentation network comprises: a pre-training encoder and 2 decoders; the pre-training encoder is obtained by training the encoder by using a plurality of cardiac medical images without marks of the heart in different periods;

the system is realized by using a comparison learning doodling supervision and segmentation algorithm, a comparison learning method and a double-branch network;

7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the method of segmentation of cardiac medical images according to any one of claims 1 to 5.

8. The electronic device of claim 7, wherein the memory is a readable storage medium.