CN117422871A

CN117422871A - Lightweight brain tumor segmentation method and system based on V-Net

Info

Publication number: CN117422871A
Application number: CN202311301482.XA
Authority: CN
Inventors: 王琳霖; 张彤; 王传云; 邵景; 李中一; 高骞; 张鑫
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2023-10-10
Filing date: 2023-10-10
Publication date: 2024-01-19

Abstract

The invention provides a lightweight brain tumor segmentation method based on V-Net, which comprises the following steps: step 1: preprocessing an image to obtain an image conforming to a network structure; step 2: constructing a V-Net network; step 3: improving the V-Net network; changing batch normalization in the V-Net network into group normalization; replacing the normal convolution in the network with a depth separable convolution; adding a squeze-and-specifiionattention attention mechanism into an encoder part of the network; step 4: training the improved network by adopting a mixed Loss function BCEDice Loss; selecting an optimal network model according to the training result; step 5: inputting the image data preprocessed in the step 1 into an optimal network model to obtain a segmentation result; step 6: and carrying out post-processing on the segmentation result. The method can also keep higher training precision on the basis of shortening training time, has good segmentation performance, and has positive significance for diagnosis of clinicians and treatment of patients.

Description

Lightweight brain tumor segmentation method and system based on V-Net

Technical Field

The invention belongs to the technical field of deep learning computer vision and medical image processing, and particularly relates to a V-Net-based lightweight brain tumor segmentation method and system.

Background

Brain tumor is one of the diseases seriously endangering the life safety of patients, is an abnormal cell group growing in the cranium, has irregular shape and uncertain volume, can appear at any position in the brain, can cause serious dysfunction of the nervous system of human body, and is a tumor seriously threatening the life of patients. Among them, brain glioma is the most common craniocerebral tumor, and has the characteristics of high morbidity, high recurrence rate, high mortality rate and low cure rate. Gliomas can be classified into high-grade gliomas (HGG) and low-grade gliomas (LGG) according to the extent of invasion and the prognosis of the patient. High-grade glioma has higher mortality rate, and low-grade glioma has slower development. Therefore, the diagnosis and timely treatment of the low-level brain tumor are carried out in early stage, and the method has great significance for increasing the survival chance of patients, improving the survival quality of the patients and prolonging the life of the patients.

Magnetic resonance imaging (Magentic Resonance Imaging, MRI) acquires and reconstructs information of the human body through magnetic resonance phenomena, and MRI can obtain brain images with higher contrast than CT imaging. MRI is a non-invasive technique, is harmless to human body, can provide complete non-invasive images without craniography, has good soft tissue contrast, shows a tissue structure with higher resolution, and is widely applied to diagnosis and treatment of clinical medicine.

Along with the development of deep learning and the promotion of other relevant hardware, a deep learning method is successfully applied to the field of medical images, and the characteristics of time and labor waste of manual segmentation are compensated by utilizing a segmentation method of a deep learning model. At present, brain tumor image segmentation by deep learning mainly comprises two methods of a 2D convolution network and a 3D convolution network, wherein the 2D convolution network mainly comprises the following steps: FCN, U-Net, U-Net++, and the like. In the 2D method we decompose the constructed 3D MRI volume into a number of 2D slices, each of which is passed into a segmentation model that generates a segmentation for each slice, and then recombines the two-dimensional slices to form a segmented three-dimensional volume, a disadvantage of using this approach is that significant context information in the 3D image is lost.

In summary, providing an MRI image segmentation method based on a 3D convolution network is a problem to be solved urgently.

Disclosure of Invention

In view of the above, the invention discloses a lightweight brain tumor segmentation method and a system based on V-Net, so as to obtain a network model with higher training accuracy and high training speed.

The technical scheme of the invention is as follows: a lightweight brain tumor segmentation method based on V-Net comprises the following steps:

step 1: preprocessing an image to obtain an image conforming to a network structure;

step 2: constructing a V-Net network; the V-Net network consists of an encoder and a decoder;

the encoder is used for extracting features from the original input image, and the decoder is used for restoring the extracted features into segmentation results;

wherein the encoder is composed of a plurality of Down transition modules, and the decoder part is composed of a plurality of UpTranstion modules; the encoder and the decoder are connected through jump;

the Down transition module is used for gradually reducing the size of the feature map and gradually increasing the number of feature channels, so that the encoder can capture information of different layers;

the UpTransit module is used for gradually restoring the size of the feature map to be the same as the input image, and reducing the number of feature channels;

the jump connection is used for connecting the characteristic diagram of the encoder with the characteristic diagram of the corresponding layer of the decoder, so that the network can better transmit information;

step 3: improving the V-Net network;

changing batch normalization in the V-Net network into group normalization; replacing the normal convolution in the network with a depth separable convolution;

adding a squeze-and-specifiionattention attention mechanism into an encoder part of the network;

step 4: training the improved network by adopting a mixed Loss function BCEDice Loss; selecting an optimal network model according to the training result;

step 5: inputting the image data preprocessed in the step 1 into an optimal network model to obtain a segmentation result;

step 6: and carrying out post-processing on the segmentation result.

Specifically, the step 1 image preprocessing includes: and (5) sequentially carrying out standardization, clipping and blocking treatment on the brain tumor MRI data set.

Specifically, the mixed Loss function bcalice Loss is to combine the binary cross entropy Loss and the Dice coefficient Loss, and calculate the final Loss through linear combination;

the binary cross entropy loss is calculated as follows:

where N represents the number of samples, y _pred ⁽ⁱ⁾ The probability that sample i is a positive example of model prediction, y _true ⁽ⁱ⁾ Is the actual label of the sample, and takes a value of 0 or 1; log represents natural logarithm;

the calculation formula of the Dice coefficient loss function is as follows:

where N represents the number of samples, y _pred ⁽ⁱ⁾ The probability that sample i is a positive example of model prediction, y _true ⁽ⁱ⁾ Is the actual label of the sample, and takes a value of 0 or 1;

thus, the mixing Loss function bcEDice Loss calculation formula is:

the values of alpha and beta are 0.5.

Specifically, the post-processing in step 5 includes removing noise, filling holes, and smoothing the segmentation boundary.

The invention also provides a light-weight brain tumor segmentation system based on V-Net, which comprises:

an image preprocessing module: the method comprises the steps of preprocessing an image data set to be segmented to obtain an image conforming to a network structure;

and a network construction module: for constructing a V-Net network;

V-Net network improvement module: the method is used for reducing network parameters, improving the network training speed and simultaneously keeping higher precision;

and the network training module: training the improved network by adopting a mixed Loss function BCEDice Loss;

and a post-processing module: for further processing of the segmentation results.

The invention provides a V-Net-based lightweight brain tumor segmentation method and a V-Net-based lightweight brain tumor segmentation system, which greatly reduce the calculated amount, simultaneously accelerate the training speed, keep higher training precision on the basis of shortening the training time, have good segmentation performance and have positive significance for diagnosis of clinicians and treatment of patients.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a diagram of four modes of a BraTs provided by an embodiment of the present disclosure;

FIG. 2 is a block diagram of preprocessed data provided by an embodiment of the present disclosure;

FIG. 3 is a block diagram of batch normalization and group normalization provided by an embodiment of the present disclosure;

FIG. 4 is a diagram of a generic convolution process provided by an embodiment of the present disclosure;

FIG. 5 is a diagram of a depth separable convolution process provided by an embodiment of the present disclosure;

FIG. 6 is a flowchart of an SE attention mechanism algorithm provided by an embodiment of the present disclosure;

FIG. 7 is a network block diagram provided by an embodiment of the present disclosure;

fig. 8 is a graph showing comparison of segmentation effects provided by the disclosed embodiments of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of systems consistent with aspects of the invention as detailed in the accompanying claims.

The method of the 3D convolution network mainly comprises the following steps: 3D U-Net, V-Net, etc. In the 3D method, the whole MRI volume is ideally input into the segmentation model and the 3D segmentation result of the whole MRI is obtained. Due to the limitations of hardware and other resources, the MRI image is typically cut into several blocks, which are then fed into a segmentation model, which is finally aggregated to form a segmentation map of the entire volume, which captures certain context information.

The embodiment firstly provides a light-weight brain tumor segmentation method based on V-Net, which comprises the following steps:

specifically, brain tumor MRI data were preprocessed, and the data set, the BraTS2018 data set, was normalized, cut and blocked. So that the processed image meets the requirement of the network structure.

In this embodiment: the training set and validation set of the experiments of the present invention were from the BraTs2018 dataset, the training set contained 210 advanced glioma patient samples, 75 lower glioma patient samples, and the validation set contained 66 unlabeled patient samples, wherein each sample in the training set in turn contained brain MRI images of 4 different modalities (T1, T2, T1ce, flair) and corresponding true label images. However, the BraTs only disclose training set data, and have no test set data, and if a part of the data in the training set is split to be used as a test set, the fitting phenomenon can occur due to too little training data, so that the network generalization capability is poor. To solve the problem of less data, we selected as the test set the increased fraction of the BraTs2019 training set over the BraTs2018 training set. Which contained 49 higher glioma patient samples and 1 lower glioma patient sample.

The pretreatment steps mainly comprise:

(1) 5 black slices were manually added. The four modality images (155,240,240) and corresponding masks (155,240,240) were added in front of 3 black slices and 2 behind them, and the image sizes were uniformly modified (160,240,240).

(2) And (5) standardization. The BraTS adopts four sequences of MR images of T1, T2, flair and T1ce, and the four sequences are images of different modes, so that the contrast of the images is different, and each mode image is normalized by adopting a z-score mode, and the average value is subtracted from the image to be divided by the standard deviation.

(3) Cutting. The grey part of the BraTS MR image is the brain region, the black is the background, the proportion of background information in the whole image is large, and the background does not help in segmentation. Viewing the MR image from the doctor's perspective automatically filters out this background information, focusing on the brain region, so that it is necessary to remove the background information around the brain region, and the network parameters become smaller after clipping, improving the performance of the network, where we clip the original size (160,240,240) image (160,160,160).

(4) The blocking processing, because the limited image of the memory display card resource size can not be completely input into the network, the image and the corresponding Mask need to be blocked. The cropped image and Mask size was (160,160,160), and in the present experiment, 5 (32,160,160) sized tiles were divided from the Axial direction.

(5) And merging the data and storing. The four standardized and blocked modes are combined into four channels, the shape after storage is (32,160,160,4), and the dtype is float64. Then, after the corresponding Mask is also segmented, combining the three labels into three nested subregions, and finally combining the three subregions into three channels, wherein the three channels are respectively WT, TC and ET, the numerical value is 0 or 1, the shape after storage is (32,160,160,3), and the dtype is uint8.

Step 2: constructing a V-Net network; the V-Net consists of two parts, encoder and decoder, which function to extract features from the original input image and to map the extracted features back to segmentation results, respectively. The V-Net adopts 3D convolution to process three-dimensional volume data, in addition, the V-Net introduces jump connection, and the network can better retain and transfer detail information by connecting the characteristic diagram of the encoder with the characteristic diagram of the corresponding layer of the decoder. The encoder portion of the V-Net consists of multiple DownTransition modules, each comprising a downsampling operation (typically using 3D convolution with step size 2), a batch normalization, a ReLU activation and one to more convolution operations (typically a 5 x 5 convolution kernel), and an optional Dropout layer. The effect of DownTransition is to progressively reduce the size of the feature map and progressively increase the number of feature channels to capture different levels of information in the encoder. The decoder portion of the V-Net consists of multiple upconversion modules, each of which includes a single normalization process with an upsampling operation (typically using a transposed convolution of step 2 or bilinear interpolation), a single ReLU activation function, and one or more convolution operations, the function of the decoder being to progressively restore the feature map size to the same as the input image and reduce the number of feature channels. V-Net also introduces a jump connection between the encoder and decoder, enabling the network to better communicate information by connecting the feature map of the encoder with the feature map of the corresponding layer of the decoder. This jump connection allows the network to access the low-level characteristic information of the encoder during decoding, thereby helping the network to better restore details and boundaries.

Step 3: improving the V-Net network; the batch normalization BN in the network is replaced by the group normalization GN, the batch normalization BN is used for processing the image in the traditional V-Net network, the calculation mode of the batch normalization BN is to independently take out N, H, W of each channel for normalization processing, so that the training speed can be increased, and the generalization capability of the network can be enhanced. However, the BN has a great influence on the accuracy due to the change of the size N of the batch, the smaller the batch is, the mean and variance obtained by calculation cannot represent the global, the higher the error rate is, and the higher the memory requirement of the computer is required by increasing the batch. Since the data volume of medical images is large, the volume is generally set to be relatively small. In this case, the calculation of the group normalization GN is not affected by the batch size, and the accuracy thereof is stable by taking an arbitrary value in a batch. Therefore, the group normalization GN is more suitable for the segmentation experiment of the medical image, so in the experiment of the invention, the batch normalization in the V-Net network is modified into the group normalization GN more suitable for the experiment. The basic idea of GN is as follows: dividing the extracted features of a certain layer into G groups, normalizing the features in each group, and finally merging the data after the normalization of the G groups.

The normal convolution in the network is replaced by a depth separable convolution (Depthwise Separable convolution), and an SE (Squeeze-and-specification) Attention mechanism is added to the encoder portion of the network. The network can reduce a large number of parameters, improve the training speed and keep higher precision.

The depth separable convolution is to divide one convolution kernel into two independent convolution kernels, and mainly includes two steps: depth convolution and point-by-point convolution. The deep convolution is to convolve each channel of the input feature map separately. Point-wise convolution refers to convolution over all channels using a 1 x 1 convolution kernel. As shown in the figure: let the size of the input feature map be D _K ×D _K X M, convolution kernel of size D _F ×D _F The calculation amount of convolution operation of the feature map and the kernel is D _F ×D _F ×M×N×D _K ×D _K . The convolution kernel of the depth convolution has a size D _F ×D _F X 1 xM, the calculation amount of convolution operation of the feature map and the depth convolution sum is D _F ×D _F ×M×D _K ×D _K . The convolution kernel size of the point-by-point convolution is 1×1×m×n, and the calculation amount of the convolution operation between the feature map and the point-by-point convolution is m×n×d _K ×D _K . By these two steps, its calculated amount can be expressed as D _F ×D _F ×M×D _K ×D _K +M×N×D _K ×D _K Whereas the calculation amount of the common convolution is D _F ×D _F ×M×N×D _K ×D _K Compared with the two, the calculated amount of the depth separable convolution is obviously reduced, so that the calculation speed of the network model can be greatly improved by changing the common convolution into the depth separable convolution in the convolution unit.

SE (Squeeze-and-specification) Attention mechanisms are added after the convolution of each layer of the V-Net network encoder section.

Depth separable convolution reduces the number of parameters and computation to some extent, but may sometimes limit the feature representation capabilities of the network, may not adequately capture complex features of the data, and thus affect the performance of the model. The SE Attention mechanism can dynamically adjust the importance of the inter-channel features, so that the network can pay more Attention to the important features, and the feature expression capability is enhanced. In addition, the SE Attention can help the network to better capture key information in the image, so that the performance of the model is improved. The basic idea of SE Attention is to learn the weights of each channel through global information pooling, which mainly comprises two steps: squeeze and specification. Wherein Ftr gives an input feature map X, and the input feature map X is subjected to Ftr operation to generate a feature map U. The feature map is subjected to ensemble averaging and pooling in the step of Squeeze (Fsq ()), a vector of 1 multiplied by C is generated, each channel is represented by a numerical value, the step of Excitation (Fsq) is completed through two layers of fully connected layers, weight information required by users is generated through weights W, the weights W are obtained through learning and are used for displaying the correlation of the features required by users for modeling, the weight vector S generated in the third step is used for carrying out weight assignment on the feature map U in Fscan, the size of the feature map X 'required by users is obtained, the size of the feature map X' is identical to that of the feature map U, and the SE module does not change the size of the feature map. Combining the two steps of Squeeze and specification together, SE attribute rescales the feature map for each channel by the learned weights. Channels with higher weights get more attention, while channels with lower weights are suppressed. Thus SE Attention can help the network focus on features that are meaningful to the task, thereby improving the performance of the network. The SE parameter amount is small, and the lightweight performance of the network is not affected.

Step 4: training the improved network by adopting a mixed Loss function BCEDice Loss;

since brain tumor segmentation is a very challenging task, it is mainly expressed in that brain tumor segmentation usually involves multiple classes, the distribution of different classes in the image is very different, resulting in a class imbalance problem, which makes the model more prone to predict larger classes and smaller classes may be ignored, affecting the segmentation effect. Furthermore, the shape and size of brain tumors vary greatly from patient to patient and from case to case, making it difficult for the model to capture all shape details and boundaries, especially for small or blurred-edge tumors. And the binary cross entropy Loss and the Dice coefficient Loss are combined together by using the BCEDice Loss, the final Loss is calculated through linear combination, the classification accuracy of the pixel level (through the binary cross entropy) and the overlapping degree of the segmentation result (through the Dice coefficient) can be comprehensively considered, and in the brain tumor segmentation task with unbalanced categories, the BCEDice Loss can better balance the two losses, so that the performance of the model is improved.

The binary cross entropy loss function (Binary Cross Entropy Loss) is a loss function commonly used for two classification tasks for measuring the difference between the prediction probability and the true label, and in brain tumor segmentation tasks, pixel-level segmentation is a classification problem, namely each pixel belongs to a tumor area or a non-tumor area, so that the network can be helped to learn the correct pixel classification by using the binary cross entropy loss. The calculation formula is as follows:

where N represents the number of samples, y _pred ⁽ⁱ⁾ The probability that sample i is a positive example of model prediction, y _true ⁽ⁱ⁾ Is the actual label of the sample and takes a value of 0 or 1.log represents the natural logarithm.

Dice coefficient loss function: the Dice coefficient is an index for evaluating the similarity of two sets, and is used for measuring the overlapping degree between the predicted segmentation result and the true segmentation. In brain tumor segmentation tasks, the Dice coefficient is widely used to measure the accuracy of segmentation, and is particularly effective for tasks with unbalanced categories. The formula is as follows:

where N represents the number of samples, y _pred ⁽ⁱ⁾ The probability that sample i is a positive example of model prediction, y _true ⁽ⁱ⁾ Is the actual label of the sample and takes a value of 0 or 1.

The final loss function is:

in the invention, the values of alpha and beta are both 0.5.

Post-processing the segmentation result:

VNet networks typically perform post-processing in brain tumor segmentation tasks to further optimize segmentation results. The goal of post-processing is to do some of the operations on the segmentation mask of the model output. Including removing noise, filling holes, smoothing segmentation boundaries. Thereby obtaining a more accurate segmentation result.

Selecting an optimal network model according to the training result, wherein the following table is used for comparing the segmentation effect of various algorithms on the BraTS data set:

the training set is randomly split into five equal parts, and five-fold cross validation is carried out, as shown in a table I, the average Dice value of the algorithm in the whole area, the enhancement area and the core area respectively reaches 90.10%, 90.10% and 89.42%. NVDLMED in the table is the algorithm for the first name of the BraTS2018 race. As can be seen from the table, the algorithm of the invention is respectively reduced by 12.5 times and 32.1 times compared with V-Net in terms of calculated amount and parameter amount, and is respectively reduced by 21.8 times and 28.6 times compared with NVDLMED, and the calculated amount and parameter amount are obviously reduced. However, in terms of computational accuracy, the algorithm of the invention is slightly lower than V-Net in the whole region (WT) by 1.89%, but is respectively higher than V-Net in the enhanced tumor region (ET) and the tumor core region (TC) by 1.71% and 0.52%, respectively, and is respectively higher than NVDLMED by 8.72% and 3.46%. In comprehensive view, the algorithm of the invention can greatly reduce the calculated amount, quicken the training speed, and can keep higher training precision on the basis of shortening the training time, has good segmentation performance, and has positive significance for diagnosis of clinicians and treatment of patients.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A lightweight brain tumor segmentation method based on V-Net comprises the following steps:

wherein the encoder is composed of a plurality of Down transition modules, and the decoder part is composed of a plurality of UpTranstion modules; splicing the feature images in a jump connection mode between the encoder and the decoder;

step 3: improving the V-Net network;

step 6: and carrying out post-processing on the segmentation result.

2. The method for segmenting light-weight brain tumor based on V-Net according to claim 1, wherein,

the step 1 of image preprocessing comprises the following steps: and (5) sequentially carrying out standardization, clipping and blocking treatment on the brain tumor MRI data set.

3. The V-Net based lightweight brain tumor segmentation method according to claim 1, wherein the mixed Loss function bcEDice Loss is a combination of binary cross entropy Loss and Dice coefficient Loss, and the final Loss is calculated by linear combination;

the binary cross entropy loss is calculated as follows:

the calculation formula of the Dice coefficient loss function is as follows:

thus, the mixing Loss function bcEDice Loss calculation formula is:

the values of alpha and beta are 0.5.

4. The V-Net based lightweight brain tumor segmentation method according to claim 1, wherein said step 5 post-processing comprises removing noise, filling voids, smoothing segmentation boundaries.

5. A lightweight brain tumor segmentation system based on V-Net is characterized by comprising

and a network construction module: for constructing a V-Net network;