WO2024098379A1

WO2024098379A1 - Fully automatic cardiac magnetic resonance imaging segmentation method based on dilated residual network

Info

Publication number: WO2024098379A1
Application number: PCT/CN2022/131363
Authority: WO
Inventors: 夏泽洋; 凡在; 熊璟
Original assignee: 深圳先进技术研究院
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2024-05-16

Abstract

A fully automatic cardiac magnetic resonance imaging segmentation method based on a dilated residual network, said method comprising: acquiring a cardiac magnetic resonance image; inputting the cardiac magnetic resonance image into a trained segmentation network, and segmenting a right ventricle region, a myocardial region, and a left ventricle region. The segmentation network is constructed on the basis of a residual network U-Net, and a bottleneck layer of the residual network adopts dilated convolution having a set dilation rate so as to combine an encoding path and a decoding path. According to the described method, the right ventricle, the left ventricle, the myocardium, and other regions can be accurately segmented from a cardiac magnetic resonance image, fully automatic segmentation of a heart image is achieved, and image segmentation performance of heart regions is improved.

Description

A fully automatic cardiac magnetic resonance imaging segmentation method based on dilated residual network

Technical Field

The present invention relates to the field of biomedical engineering technology, and more specifically, to a fully automatic cardiac magnetic resonance imaging segmentation method based on a dilated residual network.

Background technique

Heart diseases seriously threaten human life. In order to effectively treat and prevent such diseases, accurate calculation, modeling and analysis of the entire heart structure are crucial for research and application in the medical field. However, during CMRI (cinema magnetic resonance imaging), the constant beating of the heart makes it more difficult to obtain clear images, especially for patients with cardiovascular diseases, who are more likely to experience arrhythmias, difficulty holding their breath, etc. This causes the images of MRI (magnetic resonance imaging) scanners to contain various image artifacts, making it difficult to evaluate image quality. If the image data is not segmented correctly, clinicians may draw incorrect conclusions from the image data. Existing manual segmentation of images is not only time-consuming, but also difficult to guarantee accuracy. Therefore, it is necessary to realize automatic segmentation of the heart area to solve practical problems in the field of cardiac medicine.

Cardiac image segmentation refers to the division of cardiac images into multiple anatomically meaningful regions, based on which quantitative metrics such as myocardial mass, wall thickness, and the volumes of the left ventricle (LV) and right ventricle (RV) can be extracted. Therefore, it is particularly important to design an accurate and fully automatic cardiac segmentation algorithm. In recent years, deep convolutional neural networks (DCNNs) have been shown to be better than traditional computer vision methods at segmenting the left and right ventricles and myocardium. For example, the U-Net architecture is task-independent and has been applied to various biomedical segmentation tasks with only minor or substantial modifications. U-Net is the backbone of most of the most effective ventricle segmentation algorithms.

In the prior art, patent application CN202210321078.8 provides a CT image heart segmentation method and system based on artificial intelligence semantic segmentation. This technology reduces the impact of noise by optimizing the category probability and achieves accurate image segmentation. However, this solution does not involve the specific segmentation of the left ventricle, right ventricle, and myocardium in the heart, and cannot obtain independent heart tissue images.

Patent application CN202110391121.3 describes a cardiac segmentation model and pathology classification model training, cardiac segmentation, and pathology classification method and device based on cardiac MRI. This technology can greatly suppress background interference and promote rapid convergence of neural network training, but no improvement method is proposed for image segmentation accuracy and robustness.

After analysis, it was found that the existing technology lacks research on the bottleneck layer of U-Net. Since the background area in the image is much larger than the mask, the pixel degradation and loss of time and space information caused by the deepening of the network layers lead to the network's insufficient ability to extract sparse features of the image.

Summary of the invention

The purpose of the present invention is to overcome the defects of the prior art and provide a fully automatic cardiac magnetic resonance imaging segmentation method based on a dilated residual network. The method comprises the following steps:

Obtain cardiac magnetic resonance images;

Inputting the cardiac magnetic resonance image into a trained segmentation network to segment the right ventricle region, the myocardium region and the left ventricle region;

The segmentation network is constructed based on the residual network U-Net, and the bottleneck layer of the residual network adopts a dilated convolution block with a set dilation rate to combine the encoding path and the decoding path.

Compared with the prior art, the advantage of the present invention is that it proposes a fully automatic cardiac MRI (magnetic resonance imaging) segmentation method based on a dilated residual network, which can accurately segment the right ventricle, left ventricle, myocardium and other regions from cardiac MRI images, realize fully automatic segmentation of cardiac images, and improve the performance of cardiac region image segmentation.

Further features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.

FIG1 is a flow chart of a fully automatic cardiac magnetic resonance imaging segmentation method based on a dilated residual network according to an embodiment of the present invention;

FIG2 is a schematic diagram of a process from raw magnetic resonance image data to image segmentation according to an embodiment of the present invention;

FIG3 is a diagram of an automatic image segmentation architecture based on U-Net according to an embodiment of the present invention;

FIG4 is a schematic diagram of the architecture of an expanded residual block according to an embodiment of the present invention;

FIG5 is a schematic diagram of image segmentation results for an ACDC test data set according to an embodiment of the present invention;

In the attached figure, Conv-convolution; Norm-regularization; Maxpool-maximum pooling; UpConv-up convolution; Deconvolution-deconvolution; Skip-connection-skip connection; Pixel-wise addition-pixel-by-pixel addition; Kernel-kernel; Concatenation-cascade; Stride-stride; End Systole-end systole; End Diastole-end diastole.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless otherwise specifically stated.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Technologies, methods, and equipment known to ordinary technicians in the relevant art may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be considered as part of the specification.

In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not limiting. Therefore, other examples of the exemplary embodiments may have different values.

It should be noted that like reference numerals and letters refer to similar items in the following figures, and therefore, once an item is defined in one figure, it need not be further discussed in subsequent figures.

The present invention develops a fully automatic segmentation method for right ventricle (RV), myocardium (MYO) and left ventricle (LV) by combining short-axis CMRI (cinema magnetic resonance imaging) sequence images. The method captures multi-resolution features in U-Net by dilating convolutional residual network (DRN), thereby significantly increasing spatial and temporal information and maintaining positioning accuracy. In addition, the output of each dilation path is added pixel by pixel to improve the training response.

As shown in FIG1 and FIG2 , the provided fully automatic cardiac magnetic resonance imaging segmentation method based on dilated residual network includes the following steps:

Step S110, preprocessing the data set to obtain training samples.

Taking magnetic resonance imaging as an example, the size of a three-dimensional image is L×W×H, where L is the length of the image sequence, W is the width of the image, and H is the length of the image. In the dataset, the image label values are set to four labels using a mapping method, namely: black background = 0, RV = 1, MYO = 2, LV = 3.

Considering that there are significant differences in the display space size H×W and the range of intensity distribution of magnetic resonance movie images. In one embodiment, training samples are obtained through a data preprocessing process. Specifically, ACDC (Adverse Conditions Dataset with Correspondences) images are taken as an example. First, the input image is resampled. The ACDC dataset has a voxel spacing problem. Since the convolutional neural network cannot explain the voxel spacing, all images are resampled to the same voxel spacing of 1.52×1.52×6.35mm.

The data preprocessing process takes into account that the voxel spacing directly affects the overall voxel size of the image, and also affects the amount of contextual information that the convolutional neural network can extract from the image patch. In addition, if the voxel spacing is greatly increased, the image size will be reduced to the point where details are lost, so it is necessary to ensure that there is a trade-off between the amount of contextual information included in the network patch size and the amount of detail retained in the image data to obtain the best performance.

In one embodiment, for training data, all images are resampled to a median of 256×256 pixels. Then, magnetic resonance images of the ACDC dataset are obtained using multi-layer magnetic resonance movie images. For example, 2D-MRI (magnetic resonance imaging) slices of each patient and their associated annotations are extracted. And normalization is performed slice by slice for each time frame.

Step S120, expanding the training samples through data enhancement and constructing a training set.

Due to limited training data, the model cannot learn the expected invariance and robustness features, resulting in overfitting. Therefore, a variety of data augmentation techniques can be applied to the training data to expand the number of samples. For example, basic image transformation techniques are used, including random rotation, random elastic deformation, scaling, flipping, and gamma correction. When applied to the original training images, this data augmentation technique can effectively generate multiple views of the same image. By using a variety of data augmentation methods to expand the training samples, overfitting and class imbalance problems can be solved.

Step S130, constructing a segmentation network based on the dilated residual network.

In this paper, a heart segmentation network based on the U-net network is used as an example for explanation, as shown in Figures 3 and 4, where Figure 4 corresponds to the dilated residual block architecture of Figure 3. From the input image to the final output, the segmentation network follows the overall architecture of the encoder-decoder throughout the segmentation process. For example, a contraction path is constructed using 5 encoding blocks; each block consists of 2 convolutional layers with 3×3 kernels and 2×2 maximum pooling operations with a stride of 2. Initially, 32 convolution kernels are selected. After each maximum pooling operation, the convolution kernel will increase, resulting in 320 convolution kernels in the bottleneck layer of the U-Net. Similarly, the spatial dimension of the feature map is reduced by a factor of 2 through a downsampling operation. The rectified linear unit (ReLU) is replaced by a leaky linear rectifier, and instance regularization is used instead of normalization (BN).

The encoding and decoding paths are combined at the U-Net bottleneck layer by a dilated residual network (DRN), which captures global context and recovers spatial and temporal information without affecting the resolution of the segmentation map. In addition, the dilated residual network can effectively adjust the depth of the convolutional layer without degrading the network performance. For example, the receptive field in the dilated residual network block is enlarged by using dilated convolutions with different dilation rates (d=1, 3, and 5). Then, the previously generated features are cascaded with the current features through residual connections. After each 3×3 convolution in the DRN (Diluted Residual Network) block, a dropout operation with a forgetting rate of 0.5 is performed to prevent overfitting. Therefore, the dilated residual network captures contextual image information, high spatial resolution, and multi-texture features. The process of the decoding path is similar to that of the encoding path, however, the order of operations is reversed. The U-Net architecture provides the advantage of reusing the encoded feature maps from the encoding block to its corresponding level, where the spatial dimensions match. This can be achieved through channel-specific cascading. A 1×1 kernel projection operation is used at the last level of the decoding path to align the output channel dimensions with the classified categories (left ventricle, myocardium, and right ventricle). Finally, all extended path outputs are aggregated by upsampling and pixel-wise addition to enhance the training response.

Typically, natural images contain many objects whose identities and relative positions are important for understanding the scene. However, segmentation becomes more difficult when the target object is not spatially prominent, for example, when the target object is small compared to the background. If the features of the target object are lost during the downsampling process, they are not easy to recover during training. However, if a high (large) amount of spatial and temporal information is maintained throughout the network and output features that densely cover the input features are provided, backpropagation can learn important features from smaller and less prominent objects. Therefore, the present invention adopts an expanded convolutional network to extract more spatial information to predict small and dense image features by increasing the receptive field. The discrete expanded convolution is as follows:

in,

is a discrete function of input and output, k is a discrete kernel of size (2d+1) ² , * _l is a dilated convolution, and in the summation process, s+lt＝p must be satisfied, s represents the dilation stride, l represents the scaling factor, p represents the receptive field, and t represents an integer sequence, that is, t＝1,2,3...n.

A dilated residual network can better expand the receptive field to achieve a promising result and avoid the loss of image information at the bottleneck of U-Net. Dilated convolution introduces a new parameter called "dilation rate" to the convolution layer, which defines the spacing between the values when the convolution kernel processes the data, expanding the receptive field by adding holes. The dilated convolution layer is based on the regular convolution with dilation factors (d = 1, 3 and 5). For example, a 1×1 kernel is selected for the regular convolution layer and a 3×3 kernel is selected for the dilated convolution.

Among them, y _ij represents the dilated convolution with input x _ij , which is a convolution kernel with length M and width N, and m and n are the input variables of the dilated convolution. w(i, j) is the corresponding weight, i represents the image length index, j represents the image width index, and d represents the dilation rate.

Step S140, training the segmentation network using the set loss function.

The goal of segmentation is to detect the target object and draw a contour around it. The automatically segmented contour Cp (predicted) is compared with the corresponding annotated image to measure the accuracy of the proposed method. In this paper, the pixels surrounded by the contour are called A _p and _Ag .

There are many types of loss functions that can be used in segmentation network training, for example, Dice similarity coefficient or Hausdorff distance or other types of loss functions.

For example, for the Dice Similarity Coefficient (DSC), the ratio between the predicted silhouette and the ground truth silhouette represents the DSC score, usually expressed as a percentage between 0 and 1. A high Dice value indicates a good match.

Among them, _Ap represents the pixels surrounded by the predicted contour, and _Ag represents the pixels surrounded by the real contour.

The Hausdorff distance (HD) is a symmetric distance between the predicted and actual contours and provides the spatial resolution of the MRI cine images. The lower the HD value, the better the segmentation matching performance.

Where _Cp is the predicted automatically segmented contour, _Cg is the corresponding true labeled contour, d(i, j) represents the distance between the ground truth and the predicted contour, i represents the pixel of the predicted contour, and j represents the pixel of the ground truth contour. Considering that there is a significant class imbalance in images between the region of interest (ROI) and the background. To address this issue, different loss functions are tested, including dice loss and weighted cross entropy loss.

In a preferred embodiment, a dual loss function including dice loss and cross entropy loss is used to train the segmentation network. Specifically, the cross entropy loss is defined as follows:

Where C represents the total number of categories; c represents the category indicator, W = (w ₁ , w ₂ , w ₃ ... w _n ) is a series of learnable weights, w _n is the weight matrix of the nth layer; p(Y _i |X _i , W) represents the probability of a predicted pixel _Xi being misclassified relative to the ground truth label pixel; Y(c, x) represents the target label corresponding to the input x;

Represents the activation function value corresponding to the predicted category c of the input x. For example, for the category represented by c, black background = 0, RV = 1, MYO = 2, LV = 3.

The model was trained for a total of 500 iterations, where 250 images were randomly sampled from the dataset in each iteration of the training set until all the image data was exhausted. To improve generalization, slices were randomly cropped from the training images and the network was evaluated after each iteration on the validation set. For example, a multi-class variant of the dice loss was used to train the segmentation network.

Where u and v are the one-hot encoded vectors and the image segmentation label values corresponding to the class identifiers of the activation function softmax output, i represents the image length index, and k represents the image width index; c∈C is the class identifier, i.e., the left ventricle, right ventricle, myocardium, and background of the heart; ε is a tiny constant. After each traversal, the learning rate lr is recalculated according to the following formula. Finally, the best model is selected to evaluate the test set to ensure that the verification of RV (right ventricle), MYO (myocardium), and LV (left ventricle) achieves the highest DSC (Dice Similarity Coefficient). The network provides consistent and stable performance in all folds.

Among them, initial_learning_rate is the initial learning rate, currentepoch is the current number of iterations, and totalepoch is the total number of iterations.

Step S150, for the acquired target magnetic resonance image, using the trained segmentation network to identify regions such as the right ventricle, myocardium and left ventricle.

After completing the training of the segmentation network, the optimized model parameters can be obtained. Then, using the trained segmentation network, the regions of interest such as the right ventricle, myocardium and left ventricle, as well as the complete heart contour can be accurately distinguished. Then, based on these regions, quantitative measurements can be extracted, such as myocardial mass, the volume of the left and right ventricles, etc.

To further verify the effect of the present invention, experimental tests have been carried out on cardiac magnetic resonance images of multiple patients. See the schematic diagram of different heart slice positions shown in Figure 5. The experimental results show that the present invention achieves higher segmentation accuracy and segmentation speed of the left ventricle, right ventricle and myocardium, and obtains an overall dice similarity coefficient of 0.92±0.02 and an average Hausdorff distance of 8.06±0.05mm. In addition, the present invention improves the speed of image segmentation, for example, it takes an average of 0.28 seconds to process 2D magnetic resonance images. In addition, the network design of the present invention is used to predict a separate magnetic resonance image to segment the ventricular region, and successfully achieves automatic segmentation of cardiac images.

In summary, compared with the prior art, the present invention has the following technical effects:

1) The present invention introduces a dilated convolutional residual network and enhances the performance of the U-Net bottleneck layer to achieve fully automatic and accurate segmentation of cardiac MRI (magnetic resonance imaging) images, which solves the limitations of the U-Net bottleneck layer, significantly enhances spatial and temporal information, and improves accuracy while maintaining spatial consistency.

2) The present invention designs an extended residual network (DRN) block to replace the original bottleneck layer of U-Net, and uses a variety of loss functions to better utilize cardiac features to train the model and improve accuracy in the process of segmenting cardiac images.

3) The present invention has higher computing speed and robustness, and can be applied to a variety of cardiac CMRI (cine magnetic resonance imaging) data sets. For example, the processed data is a patient's magnetic resonance image at two different magnetic intensities, and after the data is processed, the patient's complete heart contour, left and right ventricles, and myocardial contour images can be obtained simultaneously.

The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium carrying computer-readable program instructions for causing a processor to implement various aspects of the present invention.

Computer readable storage medium can be a tangible device that can hold and store instructions used by an instruction execution device. Computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. More specific examples (non-exhaustive list) of computer readable storage medium include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanical encoding device, for example, a punch card or a convex structure in a groove on which instructions are stored, and any suitable combination thereof. The computer readable storage medium used here is not interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagated by a waveguide or other transmission medium (for example, a light pulse by an optical fiber cable), or an electrical signal transmitted by a wire.

The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.

The computer program instructions for performing the operation of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as "C" language or similar programming languages. Computer-readable program instructions may be executed completely on a user's computer, partially on a user's computer, as an independent software package, partially on a user's computer, partially on a remote computer, or completely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., using an Internet service provider to connect via the Internet). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be personalized by utilizing the state information of the computer-readable program instructions, and the electronic circuit may execute the computer-readable program instructions, thereby realizing various aspects of the present invention.

Various aspects of the present invention are described herein with reference to the flow charts and/or block diagrams of the methods, devices (systems) and computer program products according to embodiments of the present invention. It should be understood that each box of the flow chart and/or block diagram and the combination of each box in the flow chart and/or block diagram can be implemented by computer-readable program instructions.

These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processor of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

The flowcharts and block diagrams in the accompanying drawings show the possible architecture, functions and operations of the systems, methods and computer program products according to multiple embodiments of the present invention. In this regard, each box in the flowchart or block diagram can represent a module, a program segment or a part of an instruction, and the module, a program segment or a part of an instruction contains one or more executable instructions for realizing the specified logical function. In some alternative implementations, the functions marked in the box can also occur in a different order than the order marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or the flowchart, and the combination of the boxes in the block diagram and/or the flowchart can be implemented by a dedicated hardware-based system that performs the specified function or action, or can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that it is equivalent to implement it by hardware, implement it by software, and implement it by combining software and hardware.

Embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The selection of terms used herein is intended to best explain the principles of the embodiments, practical applications, or technical improvements in the marketplace, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present invention is defined by the appended claims.

Claims

A fully automatic cardiac magnetic resonance imaging segmentation method based on a dilated residual network comprises the following steps:

Obtain cardiac magnetic resonance images;

Inputting the cardiac magnetic resonance image into a trained segmentation network to segment the right ventricle region, the myocardium region and the left ventricle region;

The segmentation network is constructed based on the residual network U-Net, and the bottleneck layer of the residual network adopts a dilated convolution block with a set dilation rate to combine the encoding path and the decoding path.
The method according to claim 1, characterized in that the segmentation network is trained according to the following steps:

Constructing a training set, wherein the training set includes a plurality of sample data, each of which is a magnetic resonance image with a labeled category, and the labeled category is used to distinguish the right ventricular region, the myocardial region, and the left ventricular region;

Performing image enhancement on the training set to generate multiple views for the same magnetic resonance image, wherein the image enhancement includes one or more of random rotation, random elastic deformation, scaling, flipping, and gamma correction;

The segmentation network is trained using the image-enhanced training set with a set loss function to obtain optimized parameters.
The method according to claim 1, characterized in that training the segmentation network with a set loss function comprises:

Within a set number of iterations, the segmentation network is trained using a cross entropy loss function, and a set number of sample images are extracted from the training set in each iteration;

Randomly crop clips from the training images and evaluate the segmentation network after each iteration on the validation set. The segmentation network is trained using a multi-class variant of the Dice loss.

After each epoch, the learning rate is recalculated;

A segmentation network that meets the set performance requirements is selected as the trained segmentation network.
The method according to claim 3, characterized in that the cross entropy loss is expressed as:

Where C represents the total number of labeled categories; c represents the labeled category indicator, W = (w 1 , w 2 , w 3 ... w n ) is a series of weights to be learned, w n is the weight matrix of the nth layer, p(Y i |X i , W) represents the probability of a predicted pixel Xi being misclassified relative to the ground truth label pixel Yi , and Y(c, x) represents the target label corresponding to the input x;
represents the activation function value corresponding to the predicted category c of input x.
The method according to claim 3, characterized in that the dice loss is expressed as:

Among them, u and v are the one-hot encoding vectors output by the activation function softmax and the image segmentation label values corresponding to the annotation category indication c∈C, i represents the image length index, k represents the image width index, ε is a set constant, and C represents the total number of annotation categories.
The method according to claim 3, characterized in that the learning rate is updated according to the following formula:

Among them, initial_learning_rate is the initial learning rate, currentepoch is the current number of iterations, and totalepoch is the total number of iterations.
The method of claim 1, wherein the last stage of the decoding path of the segmentation network uses a 1×1 kernel projection operation.
The method according to claim 1, characterized in that the expansion rate is set to d=1, 3 or 5.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 8.
A computer device comprises a memory and a processor, wherein a computer program that can be run on the processor is stored in the memory, and wherein the processor implements the steps of any one of the methods of claims 1 to 8 when executing the computer program.