CN115330807A

CN115330807A - Choroidal neovascularization image segmentation method based on hybrid convolutional network

Info

Publication number: CN115330807A
Application number: CN202210814858.6A
Authority: CN
Inventors: 叶中玉; 刁东宇
Original assignee: Nari Technology Co Ltd; NARI Nanjing Control System Co Ltd
Current assignee: Nari Technology Co Ltd; NARI Nanjing Control System Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-11-11

Abstract

The invention discloses a choroid neovascularization image segmentation method based on a hybrid convolution network, which comprises the steps of firstly collecting a fundus OCT scanning image and marking the fundus OCT scanning image, and forming a data set by the marked image; constructing a depth space-time separation mixed convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, then expanding the two-dimensional features to three-dimensional, then carrying out three-dimensional attention depth space-time separation convolution, and aligning and fusing the two-dimensional features and the three-dimensional features; then training the constructed hybrid neural network model by using the data set; and finally, segmenting choroidal neovascularization from the fundus OCT scanning image by using the trained mixed neural network model to obtain a segmentation result. Local features of the choroid neovascularization image are better extracted through a space-time attention mechanism, and calculation parameters can be effectively reduced by using deep space-time separation convolution and performing dimension reduction operation on an input feature map, so that the network calculation amount is reduced, and the channel attention can be more effectively calculated.

Description

Choroidal neovascularization image segmentation method based on hybrid convolutional network

Technical Field

The invention relates to retina image segmentation, in particular to a choroid neovascular image segmentation method based on a hybrid convolution network.

Background

The tissue structure in the center of the retina is called a macular area, and the strong photosensitivity of the macular area can determine the quality of visual function. Choroidal neovascularization refers to the proliferative blood vessels from the choroidal capillaries, commonly found in the macula.

Medical image processing methods are various and complex. The imaging principle of the medical image is not refined enough, the specificity of the retina of an individual is large, the difference between individuals is large, the structures are different, the choroid neovascularization area is estimated manually and quantitatively by the traditional method, the whole process is time-consuming, the method depends on personal experience, and judgment errors exist.

With the development of deep learning, image segmentation techniques based on deep learning have become an important component of image segmentation, and for choroidal neovascularization segmentation, deep learning methods have recently succeeded in this regard. However, choroidal neovascularization segmentation presents greater difficulties and challenges than human organ image segmentation: 1) The fundus OCT image has artifacts and eye structure tissue noise 2) the characteristics are omitted by simply using a two-dimensional convolution network, and image inter-slice information needs to be captured by using three-dimensional convolution, but the calculation cost is high by simply using three-dimensional convolution.

Disclosure of Invention

The purpose of the invention is as follows: in view of the above disadvantages, the present invention provides a hybrid convolutional network-based choroidal neovascularization image segmentation method with fine segmentation and low computation.

The technical scheme is as follows: in order to solve the above problems, the present invention provides a choroidal neovascularization image segmentation method based on a hybrid convolution network, comprising the following steps:

(1) Constructing a data set; collecting an eye ground OCT scanning image, labeling choroidal neovascularization in the image, and forming a data set by the labeled image;

(2) Constructing a depth space-time separation hybrid convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, expanding the two-dimensional convolution to three-dimensional by the two-dimensional depth convolution, performing three-dimensional attention depth space-time separation convolution, and aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution;

(3) A mixed neural network model is obtained by using a deep space-time separation mixed convolution neural network of a fusion attention mechanism constructed by data set training;

(4) And segmenting a choroid neovascular image from the fundus OCT scan image by using the trained mixed neural network model to obtain a segmentation result.

Further, in the step (2), four consecutive two-dimensional feature images are converted into a three-dimensional feature vector.

Further, in the step (2), a new attention map is formed by subjecting the obtained three-dimensional feature vector to a time-space attention mechanism, and then deep space-time separation convolution is performed.

Further, when performing deep space-time separation convolution, the three-dimensional convolution is divided into two separate convolutions, which are 1 × Y × Z spatial convolution and X × 1 × 1 temporal convolution, respectively, so that the three-dimensional deep space-time separation convolution DSTS has:

wherein, K ^P Represents the convolution kernel of the point-to-convolution,

representing a spatial convolution

The convolution kernel of (a);

representing a time convolution

The convolution kernel of (2); u represents that two characteristic graphs of space convolution and time convolution are spliced; f' represents a final attention feature graph after the space-time attention mechanism; r represents a hole convolution operation.

Further, the calculation process of the final attention feature map F ″ through the spatiotemporal attention mechanism is as follows:

wherein F represents the input feature map in the input temporal attention module, M _C (F) Representing the output feature map generated by the elapsed time attention module, F' representing the input feature map in the input space attention module, M _S (F') represents the output feature map generated by the spatial attention module.

Further, after the three-dimensional feature vector is input into the time attention module, average pooling and maximum pooling are carried out to obtain maximum pooling features

And average pooling characteristics

Then a shared network layer consisting of a plurality of layers of perceptrons containing a hidden layer is adopted to receive the characteristics which are subjected to two kinds of pooling operation, and finally a channel attention diagram M is generated _X ∈R ^X×1×1 After the operation of the shared network layer, the sum of the elements is used to combine the output feature vectors, and the calculation formula in the time attention module is as follows:

wherein σ represents a sigmiod function, W ₀ And W ₁ Is the weight of the MLP part of the multi-layered perceptron, avgPool (F) means the average pooling of the input features F, maxPool (F) means the maximum pooling of the input features F.

Further, after the three-dimensional feature vector is input into the space attention module, average pooling and maximum pooling are carried out to obtain maximum pooled features

And average pooling characteristics

And coupling the features through standard convolution operation in the convolution layer to generate a final space attention diagram, wherein a calculation formula in a space attention module is as follows:

where σ denotes a sigmiod function and f denotes a convolution operation.

Further, the aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution in the step (2) specifically includes the following steps:

calculating a feature map and associated pixel probability scores output from the two-dimensional convolutional network:

X _2d ＝f _2d (I _2d ；θ _2d ),X _2d ∈R ^{4n×256×256×64}

y _2d ＝f _2dcls (X _2d ；θ _2dcls ),y _2d ∈R ^{4n×256×256×3}

wherein, I _2d Representing samples input into a two-dimensional convolutional network; n represents the batch size of the input training sample;

aligning the feature map and the probability score in the two-dimensional convolution network with the score map of the three-dimensional feature map, wherein the calculation formula is as follows:

X' _2d ＝T(X _2d ),X' _2d ∈R ^{n×256×256×64}

y' _2d ＝T(y _2d ),y' _2d ∈R ^{n×256×256×3}

wherein T represents the transformation of three-dimensional data composed of adjacent slices;

obtaining context feature y 'from two-dimensional convolutional network through skip connection' _2d The three-dimensional convolution network trains context pixels of a probability map generated by the two-dimensional convolution network, and the probability map generated by the two-dimensional convolution network feeds back the training of the three-dimensional convolution network, wherein the calculation formula is as follows:

X _3d ＝f _dsts (I,y' _2d ；θ _3d )

Z＝X _3d +X' _2d

wherein, X _3d And (3) representing a three-dimensional convolution network output feature map, and Z representing a two-dimensional three-dimensional mixed feature map, which refers to the sum of intra-chip features and inter-chip features in the two-dimensional and three-dimensional networks.

Further, joint learning and optimization are carried out on the two-dimensional and three-dimensional mixed features Z, and the calculation formula is as follows:

H＝f _hff (Z；θ _hff )

y _h ＝f _hffcls (H；θ _hffcls )

wherein H represents the optimized mixing feature, y _h Representing the pixel-level prediction probability of the mixed feature fusion layer.

Further, in the training-constructed depth space-time separation hybrid convolutional neural network integrating the attention mechanism, the imbalance of the segmentation grades of the three-dimensional choroid neovascularization is optimized by adopting a plurality of types of dice losses and cross entropy losses which are insensitive to the class imbalance, and the loss function is as follows:

where C represents the number of classes, V represents the number of voxels,

representing the prediction probability that voxel i belongs to class c, epsilon represents a smoothing factor,

a truth label indicating that voxel i belongs to class c.

Has the advantages that: compared with the prior art, the method has the obvious advantages that local features of the choroid neovascularization image are better extracted through a space-time attention mechanism, the calculation parameters can be effectively reduced by using deep space-time separation convolution and performing dimension reduction operation on the input feature map, so that the network calculation amount is reduced, and the channel attention can be more effectively calculated. The average pooling can effectively fuse spatial information, the maximum pooling is more in accordance with the attention mechanism, a pixel area closest to the target feature can be found in the feature map, and the feature map can be more effectively refined by using the combined operation of the maximum pooling and the average pooling. The spatial attention is more biased to the position information of the attention target in the image, and the temporal attention can be effectively supplemented. The spatial attention features are efficiently computed by concatenating the average pooling and maximum pooling operations on the channel axis after they have been applied to generate a temporal feature map.

Drawings

FIG. 1 is a schematic diagram of a hybrid convolutional network framework of the present invention;

FIG. 2 is a diagram of the operation of the time attention module in the hybrid convolutional network of the present invention;

FIG. 3 is a diagram of the operation of the spatial attention module in the hybrid convolutional network of the present invention.

Detailed Description

In this embodiment, a choroidal neovascularization image segmentation method based on a hybrid convolutional network includes the following steps:

(1) Constructing a data set; acquiring an eye ground OCT scanning image, labeling choroidal neovascularization in the image, manually labeling a choroidal neovascularization region in the scanning image as a standard, and forming a data set by the labeled image;

(2) Constructing a depth space-time separation mixed convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, expanding the two-dimensional convolution to three-dimensional by the two-dimensional depth convolution, performing three-dimensional attention depth space-time separation convolution, and aligning and fusing features generated by the two-dimensional convolution and features generated by the three-dimensional convolution;

(4) And segmenting choroidal neovascularization from the fundus OCT scan image by using the trained mixed neural network model to obtain a segmentation result.

In the step (2), the network structure is shown in fig. 1, and comprises two-dimensional convolution, two-dimensional depth convolution, three-dimensional point-to-point convolution and three-dimensional attention depth space-time separation convolution, wherein the three-dimensional attention depth space-time separation convolution comprises a three-dimensional space pyramid module and a space-time attention mechanism module. The hybrid convolution network adopts a coding and decoding structural form, the bottom of the hybrid convolution network is formed by two-dimensional depth convolution, and the rest of the hybrid convolution network is formed by three-dimensional depth and space-time separation convolution; spatial pyramid pooling is employed at the end of the encoder, which captures multi-scale information by parallel, different sized, three-dimensional void space-time separation convolutions.

In the step (3), the marked input image is subjected to down-sampling operation through common 3 x 3 two-dimensional convolution to obtain a two-dimensional feature image, then the two-dimensional feature image enters a two-dimensional depth convolution module to expand the depth two-dimensional convolution to three-dimensional, four continuous two-dimensional feature images are converted into a three-dimensional feature vector, and calculation and parameters are reduced by independently performing convolution on each input channel.

The dimension of the mechanism is T in the three-dimensional convolution layer through space-time attention _F ×W _F ×H _F X M feature map F' as input, dimension T _G ×W _G ×H _G And taking the characteristic diagram G of multiplied by N as an output, wherein T, W and H respectively represent a time dimension, a space width and a space height in the three-dimensional characteristic diagram, and M and N respectively represent the number of input and output channels. By convolution kernel K ^S (X Y Z M N) to parameterize a standard three-dimensional convolution layer, X, YZ is the time dimension and the space dimension of the convolution kernel, respectively, and the output three-dimensional convolution layer can be calculated by the following formula:

in the above equation, r represents a hole convolution operation.

For a convolution kernel K with X Y Z M ^D Can be calculated by the following formula:

then a convolution kernel K with dimensions of 1 × 1 × 1 × M × N is applied ^P The three-dimensional point-to-convolution to combine the output deep convolutions and then project them into a new tunnel space can be represented by:

the deep convolution can effectively reduce convolution parameters and calculation complexity, and is a powerful operation mode. For example, a tensor with 3 × 3 × 3 × 3 dimensions, c input channels and c output channels is subjected to convolution operation, and a standard convolution includes 27c ² But the depth convolution has only 27c of parameters, which is c times less than the standard parameters.

The spatiotemporal attention mechanism is better to extract local features of the choroidal neovascularization image, in a temporal attention module such as fig. 2, any channel of a feature map can be regarded as a feature detector, and attention is to find regional features which need to be learned in an input image. The calculation parameters can be effectively reduced by performing dimension reduction operation on the input feature diagram, so that the network calculation amount is reduced, and the channel attention can be calculated more effectively. The average pooling can effectively fuse spatial information, the maximum pooling is more matched with the attention mechanism, and the image closest to the target feature can be found in the feature mapThe primitive regions, using a combination of max pooling and average pooling, can refine the feature map more efficiently. The combination of the average pooling and the maximum pooling after the feature is input can effectively fuse the spatial information of the feature map and obtain the attention area feature, and the feature map obtained by the operation can be represented by the two symbols:

and

maximum pooling characteristic and average pooling characteristic, respectively. The structure adopts a shared network layer consisting of a plurality of layers of perceptrons containing a hidden layer to receive the characteristics subjected to two kinds of pooling operations, and finally generates a channel attention diagram M _X ∈R ^X×1×1 . After operation of the shared network layer, the sum of the elements is used to merge the output feature vectors, and the temporal attention module can be expressed as follows:

σ denotes a sigmiod function, W ₀ And W ₁ Is the weight of the MLP part.

Unlike the temporal attention module, the spatial attention module, as shown in fig. 3, prefers the position information of the attention target in the image, and can effectively supplement the temporal attention. The spatial attention features are efficiently computed by concatenating the average pooling and maximum pooling operations on the channel axis after they have been applied to generate a temporal feature map. When connecting feature maps, a convolution layer is used to generate a spatial feature map M _S ∈R ^H×W . Maximum pooling and average pooling operations are used to enable the input feature information to be effectively aggregated, and then two mappings are generated respectively and can be expressed as

And

spatial maximum pooling characteristics and average pooling characteristics are represented separately. And then coupling the features through standard convolution operation in the convolution layer to generate a final space attention diagram. The spatial attention module may be expressed as:

σ denotes a sigmoid function, and f denotes a convolution operation.

By separating the learning channel attention and the space attention process, the calculation amount can be greatly reduced, and the space-time characteristics can be effectively captured and learned. The overall process of computing the final spatiotemporal attention map F "can be expressed as:

more computation is required for three-dimensional convolution than for two-dimensional convolution. To make model operations more efficient, a three-dimensional volume is integrated into two separate convolutions, one being a 1 × Y × Z spatial convolution and one X × 1 × 1 temporal convolution, for the purpose of temporal-spatial separation (STS). The emphasis of spatial convolution is on the learning of spatial features, and the emphasis of temporal convolution is on the learning of temporal features. The parallel space-time separation computing method is defined as follows:

in the above formula

Represents the convolution kernel of a 1 xyxz spatial convolution,

a convolution kernel representing an X × 1 × 1 time convolution, and u represents splicing two feature maps. In the parallel STS module, the two convolutions are performed in parallel in two branches, and then their outputs are connected, which is also more effective for retinal image anisotropy. To further reduce computational complexity and model parameters, three-dimensional depth space-time separation convolution is employed.

After the three-dimensional space-time separation convolution operation, the output channel is divided into a spatial branch and a temporal branch, which respectively focus on learning the temporal and spatial features, and in each branch, a space/time convolution is performed in each channel. After independent feature learning, the outputs of the spatial and temporal branches are connected and fed to a point for feature integration into the convolution. The three-dimensional depth space-time separation convolution (DSTS) may be represented by the following equation:

in the above formula K ^P Is the convolution kernel of the point-to-convolution,

and

the convolution kernels of the spatial convolution and the temporal convolution, respectively, and F "is the final attention map via a spatio-temporal attention mechanism. We replace all three-dimensional convolutions with deep space-time separated convolutions to better save computational cost.

The two-dimensional convolution network with deep convolution can effectively learn high-level plane features, and has the defects that spatial information along the Z dimension is ignored, and the three-dimensional convolution network can make up the deficiency, but higher calculation cost is needed, so that the two-dimensional convolution network is adopted for combined fusion and optimized learning, the in-slice and inter-slice features of the CNV can be better learned, and choroid neovascularization image segmentation can be better performed.

The feature map and associated pixel probability scores output from the two-dimensional convolutional network may be represented as follows:

X _2d ＝f _2d (I _2d ；θ _2d )，X _2d ∈R ^{4n×256×256×64}

y _2d ＝f _2dcls (X _2d ；θ _2dcls )，y _2d ∈R ^{4n×256×256×3}

I _2d representing samples of the input two-dimensional convolutional network, and n represents the batch size of the input training samples.

In order to fuse the mixed features from the two-dimensional and three-dimensional convolution networks, the feature sizes need to be aligned, and the feature map and the probability score in the two-dimensional convolution network are aligned with the score map of the three-dimensional feature map according to the following formula:

X′ _2d ＝T(X _2d )，X′ _2d ∈R ^{n×256×256×64}

y′ _2d ＝T(y _2d )，y′ _2d ∈R ^{n×256×256×3}

t represents the transformation of adjacent slices into three-dimensional data.

The three-dimensional convolutional network part extracts multi-scale features through spatial pyramid pooling, and context features y 'are obtained from the two-dimensional convolutional network through skip connection' _2d The two-dimensional network part generates a characteristic probability graph, and the training is carried out on the context pixels of the two-dimensional network part through a three-dimensional convolution network. The probability graph generated by the two-dimensional convolutional network can feed back the training of the three-dimensional convolutional network part, so that the problem of overweight calculation burden of self-learning and updating the best weight by independently using the three-dimensional network is solved, and the self-learning speed and the calculation speed of the three-dimensional network part are greatly improved. The three-dimensional network part learning process can be described as:

X _3d ＝f _dsts (I，y′ _2d ；θ _3d )

Z＝X _3d +X′ _2d

X _3d representing a partial output characteristic diagram of the three-dimensional convolution network, Z represents a two-dimensional three-dimensional mixed characteristic, which respectively refers to two dimensionsAnd the sum of intra-and inter-patch features in the three-dimensional network, and then performing joint learning and optimization on the hybrid features at the HFF layer:

H＝f _hff (Z；θ _hff )

y _h ＝f _hffcls (H；θ _hffcls )

h represents the optimized mixing feature, y _h Pixel-level prediction probabilities of the mixed feature fusion layers.

In order to solve the problem that the three-dimensional choroidal neovascularization segmentation has serious grade imbalance, multiple types of dice losses and cross entropy losses which are insensitive to class imbalance are used in the mixed convolution network, and the formula is as follows:

in the above formula, C represents the number of analogy, V represents the number of voxels,

a truth label indicating that voxel i belongs to class c.

Gradient constraints are also introduced to better preserve CNV boundary gradient constraints G as:

where N represents a collection of pixel boundaries, g (N) represents the computation of a gradient,

and

representing the gradient of the pixel n in the x-direction and the y-direction, respectively.

The final loss function can be expressed as:

Claims

1. a choroid neovascularization image segmentation method based on a hybrid convolution network is characterized by comprising the following steps:

(3) Using a depth space-time separation hybrid convolution neural network of a fusion attention mechanism constructed by data set training to obtain a hybrid neural network model;

2. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 1, wherein in step (2), four consecutive two-dimensional feature images are converted into a three-dimensional feature vector.

3. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 2, wherein in the step (2), a new attention map is formed by subjecting the obtained three-dimensional feature vectors to a time-space attention mechanism, and then deep space-time separation convolution is performed.

4. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 3 wherein said deep spatiotemporal separation convolution is performed by dividing a three-dimensional convolution into two separate convolutions, 1X Y X Z spatial convolution and X1 temporal convolution, such that said three-dimensional deep spatiotemporal separation convolution DSTS has:

wherein, K ^P Represents the convolution kernel of the point-to-convolution,

representing a spatial convolution

The convolution kernel of (a);

representing a time convolution

The convolution kernel of (a); u represents that two feature graphs of the spatial convolution and the temporal convolution are spliced; f' represents a final attention feature graph after the space-time attention mechanism; r represents a hole convolution operation.

5. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 4, wherein the final attention feature map F "subjected to spatiotemporal attention mechanism is calculated by:

wherein F represents the input feature map in the input temporal attention module, M _C (F) Representing the output feature map generated by the elapsed time attention module, F' representing the input feature map in the input space attention module, M _S (F') representing an output feature map generated by a spatial attention module;

representing the multiplication of two matrix elements.

6. The method as claimed in claim 5, wherein the average pooling and maximum pooling are performed after inputting the three-dimensional feature vector into the temporal attention module to obtain the maximum pooled feature

And average pooling characteristics

Then a shared network layer consisting of a plurality of layers of sensors containing a hidden layer is adopted to receive the characteristics subjected to two kinds of pooling operation, and finally a channel attention diagram M is generated _X ∈R ^X×1×1 ，R ^X×1×1 Representing a feature diagram set with the number of channels being X, the length being 1 and the width being 1, merging output feature vectors by using the sum of elements after the operation of a shared network layer, wherein a calculation formula in a time attention module is as follows:

wherein σ represents a sigmiod function, W ₀ And W ₁ Is the weight of the multi-layer perceptron MLP part, avgPool (F) indicates the average pooling of the input features F, maxPool (F) indicates the maximum pooling of the input features F.

7. The method as claimed in claim 6, wherein the average pooling and maximum pooling are performed after the three-dimensional feature vector is inputted into the spatial attention module, so as to obtain maximum pooled features

And average pooling characteristics

R ^1×H×W Representing a feature diagram set with the number of channels being 1, the length being H and the width being W, coupling features through standard convolution operation in a convolution layer to generate a final spatial attention diagram, wherein a calculation formula in a spatial attention module is as follows:

where σ denotes a sigmiod function and f denotes a convolution operation.

8. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 1, wherein the step (2) of aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution specifically comprises the following steps:

calculating a feature map and a related pixel probability score output from the two-dimensional convolution network:

X _2d ＝f _2d (I _2d ；θ _2d ),X _2d ∈R ^{4n×256×256×64}

y _2d ＝f _2dcls (X _2d ；θ _2dcls ),y _2d ∈R ^{4n×256×256×3}

wherein, I _2d Representing samples input into a two-dimensional convolutional network; n represents the batch size of the input training sample; theta _2d Representing images in two-dimensional convolutionScoring prime probability;

aligning the feature map and the probability score in the two-dimensional convolution network with the score map of the three-dimensional feature map, firstly carrying out three-dimensional transformation on the two-dimensional convolution feature map, wherein the calculation formula is as follows:

X' _2d ＝T(X _2d ),X' _2d ∈R ^{n×256×256×64}

y' _2d ＝T(y _2d ),y' _2d ∈R ^{n×256×256×3}

X _3d ＝f _dsts (I,y' _2d ；θ _3d )

Z＝X _3d +X′ _2d

wherein, X _3d Representing a three-dimensional convolution network output characteristic diagram, and Z represents a two-dimensional three-dimensional mixed characteristic diagram, which refers to the sum of intra-chip characteristics and inter-chip characteristics in a two-dimensional network and a three-dimensional network; theta.theta. _3d Representing the pixel probability scores in a three-dimensional convolution.

9. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 8, wherein the two-dimensional three-dimensional hybrid feature Z is jointly learned and optimized according to a calculation formula:

H＝f _hff (Z；θ _hff )

y _h ＝f _hffcls (H；θ _hffcls )

where H represents the optimized blend feature, θ _hff Representing the pixel probability score, y, in the HFF layer _h Representing the pixel-level prediction probability of the mixed feature fusion layer.

10. The method for choroidal neovascularization image segmentation based on hybrid convolutional network according to claim 9, wherein in the deep spatiotemporal separation hybrid convolutional neural network of the fusion attention mechanism constructed by training, the segmentation level imbalance of the three-dimensional choroidal neovascularization is optimized by using various dice losses and cross entropy losses insensitive to class imbalance, and the loss function is:

where C denotes the number of classes, V denotes the number of voxels,

a truth label indicating that voxel i belongs to class c; g denotes a gradient constraint, e ⁽⁾ Representing the activation function in deep learning with respect to the variable mu.