WO2021184817A1

WO2021184817A1 - Method for segmenting liver and focus thereof in medical image

Info

Publication number: WO2021184817A1
Application number: PCT/CN2020/131402
Authority: WO
Inventors: 奚雪峰; 郑志华; 程成; 崔志明; 胡伏原; 付保川
Original assignee: 苏州科技大学; 苏州金比特信息科技有限公司
Priority date: 2020-03-16
Filing date: 2020-11-25
Publication date: 2021-09-23
Also published as: CN111402268A; CN111402268B

Abstract

The present invention relates to a method for segmenting a liver and a focus thereof in a medical image. The method comprises: firstly carrying out the screening and integration preprocessing of abdominal CT image data, dividing the abdominal CT image data into a plurality of data sets with different purposes, then building a new neural network, and carrying out initial training by employing small image data; then, storing the trained model, carrying out secondary training by using an original image and a new data enhancement mode, carrying out expansion and corrosion processing on a predicted image, and evaluating same by using a medical evaluation index; respectively by means of model prediction results trained by DL, GDL and TL loss functions, adding and averaging the prediction results of three loss models to form a fusion feature; and finally, modifying a network, wherein the three loss models are fused in a single network for training prediction. An end-to-end training test can be carried out, the liver and a focus can be identified at the same time with high precision and high speed, doctors are effectively helped to identify CT images, thereby reducing time and energy consumed by doctors, and reducing the probability of misdiagnosis.

Description

A method for segmenting liver and its lesions in medical images

Technical field

The invention relates to a method for segmenting liver and its lesions in medical images.

Background technique

At present, liver disease is one of the parts with high morbidity and mortality in the world. However, if liver disease occurs in the early stage, and the focus can be located in time, the focus can be controlled and defended, and the metastasis of the focus can be avoided. The treatment of liver disease is of great significance. The appearance of CT images has greatly improved the diagnosis level of doctors, but it is necessary to have a deep professional background and doctors with rich clinical experience to locate the lesion, and it is very time-consuming to diagnose the patient's disease. With the rapid development of computer vision technology, segmentation algorithms based on region, threshold segmentation and machine learning have emerged. The research on image semantic segmentation has made great progress. Medical image segmentation can accurately understand the location and size of the lesion, but it is accurate The degree needs to be improved.

FCN uses deconvolution to make the feature map reduced by convolution return to the original image, and divide the segmentation into pixel-level classification, which can process images of any size. However, by directly up-sampling the features, the deep and shallow information will be unequally combined, and the key feature information will be lost. U-Net is an encoding and decoding network. It performs feature extraction first, then upsampling and restoring. The features with the same number of channels at different scales are spliced, and the feature information of different scales is merged by jump connection. It can be trained with less data. The excellent model was later widely used in the segmentation of super-large images and medical images. On the basis of U-Net, there are several problems that need to be solved. U-Net generally adopts a five-layer structure. Simple data can be solved at a shallow level, and complex data can be optimized by deepening the network. Multi-deep networks are the most suitable and not solved; each network The importance of the layers is not clear, and the depth of the network is not pointed out; just through the short connections of each layer, the deep and shallow features cannot be effectively integrated. U-Net++ modified to directly forward the high-resolution feature mapping from the encoder to the decoder network, effectively overcoming the semantic gap of the codec.

Summary of the invention

The purpose of the present invention is to overcome the shortcomings of traditional neural networks used for medical image segmentation in terms of network depth, importance of different depths, and rationality of jump connections, and provide a method for segmenting liver and its lesions in medical images.

The purpose of the present invention is achieved through the following technical solutions:

A method for segmenting the liver and its lesions in medical images, the characteristics are:

Firstly, the abdominal CT image data is filtered and integrated and preprocessed, and divided into multiple data sets for different purposes, and then a new neural network is built, and the small image data is used for initial training;

After that, save the trained model, use the original image and the new data enhancement method for secondary training, perform expansion and corrosion processing on the predicted image, and evaluate it with medical evaluation indicators;

The prediction results of the models trained with the DL, GDL, and TL loss functions are respectively used. The prediction results of the above three loss models are added and the average is formed to form a fusion feature. Finally, the network is modified and the three loss models are merged into a single network for training prediction.

Further, the above-mentioned method for segmenting the liver and its lesions in a medical image specifically includes the following steps:

a) First, data screening and integration;

Remove the slices without liver from the training data set, and then shuffle them into 19,000 to 20,000 3d slices. The 3d slice is the current slice and the two preceding and following slices as the overall input, and 17,000 to 18,000 slices are selected as the training set. The remaining 1800-1900 slices are used as the verification set, and 70 patient sequences are used for testing. The training set size is divided into 224×224 and 512×512;

b) Then, build a new neural network and use the small image data for initial training;

Set the U-shaped path in Unet as the main path. The full path plus the ResNet structure form a codec structure. The dense jump connection is replaced by a 1×1 convolutional layer on the basis of DenseNet. In the transition zone between the liver and the lesion, the liver outputs The information becomes the input and convolution of the lesion, and the output of other layers of the liver is short-connected to the input of the corresponding depth of the lesion;

Reduce to 224×224 data through network training, apply effective weight distribution to subsequent model training, train the adjusted picture for 40-60 rounds, with 12-16 slices in each round, rotate the picture during the training process , Zoom in, zoom out, and combine with random probability;

c) Then, use the original image and the new data enhancement method for secondary training;

After training the model on the reduced image data, the network structure and weight distribution are retained, the original image is combined with the probability of rotation, scaling, flipping, and stretching, and the new learning rate is used for secondary training;

d) Finally, different medical evaluation results are obtained by adjusting the combination of loss functions;

Through a single optimal loss function model and a loss model based on a combination of weights and similarities, different levels of supervision signals are used to obtain different evaluation results.

Further, the above-mentioned method for segmenting the liver and its lesions in medical images, wherein step c) uses the original image and a new data enhancement method for secondary training, the original image is 512×512 size, and the image is rotated , Zoom, flip, and stretch operations, combined with random probability, using exponential decay learning rate, adjust each round of decay size adjustment changes, the equation is as follows:

The above formula, the decayed learning rate decayed_learning_rate, multiply the initial learning rate set first by the base decay speed decay_steps, the decay rate is set to 0.8～0.9, decay every global_step step, global_step is the current iteration round number, that is, how many The round can iterate through all the sample data, the initial learning rate is set to 1e-3 ~ 3e-3, and the original image training is set to 1e-4 ~ 3e-4, the result is that the learning rate is based on the base 0.8 ~ 0.9 per round The number of steps is attenuated.

Further, in the above-mentioned method for segmenting liver and its lesions in medical images, in step d) different medical evaluation results are obtained by adjusting the combination of loss functions. The loss functions are specifically DL, GDL, TL, and the three used are The loss function is the following formula. The loss functions applicable to the liver and the lesion are selected respectively. DL is used to evaluate the similarity between the predicted set and the true set, and is used in the case of unbalanced samples. The expression is as follows:

The quantitative calculation of the denominator adopts the method of element square and then summation, where k and t represent the elements of the prediction area and the true value area respectively, and ij represents the traversal of the elements; it is a set similarity measurement function, usually used to calculate the two samples Similarity, the range is [0,1], the coefficient in the numerator, due to the repeated calculation of the common element between k and t in the denominator, and finally the loss value is obtained by multiplying the double points of each category by the sum of the squares of the respective elements ；

GDL (Generalized Dice loss): When the liver lesion has multiple segmented areas, there is one Dice for each category, and GDL integrates multiple categories and uses one index for quantitative calculation. The formula is as follows:

Among them, k _ij is the true value of category i in the jth pixel, and t _ij is the corresponding predicted probability value; compared to DL, there is more weight wi as each category, wi is used to maintain the relationship between the lesion area and the DL coefficient Balance

The TL (Tversky loss) formula is as follows:

Where k _ij is the true value of category i at the jth pixel, and t _ij is the corresponding predicted probability value;

α and β respectively control the proportion of false positives and false negatives.

Further, in the above-mentioned method for segmenting the liver and its lesions in a medical image, when α=β=0.5, the TL coefficient is the DL coefficient.

Compared with the prior art, the present invention has significant advantages and beneficial effects, which are specifically embodied in the following aspects:

①Through data preprocessing, remove invalid liver images; denoise the image to improve the contrast and make it easier to segment fuzzy edges on the network; use serialized 3d images for fusion segmentation, which can retain the semantic information of the context; combine the synthesized 3d images in different combinations The data enhancement strengthens the robustness of the data set and prevents overfitting;

②The respective networks of the liver and the lesion use the codec network, and the transition area of the liver and the lesion segmentation is designed to better connect the resolution gap between the codecs. At the same time, the lesion only receives information from the liver, further narrowing the correct range , So that the network can reduce parameters and time to learn context information, and at the same time can accelerate the network convergence; the original resolution of the picture from the input to the final extraction of the smallest feature map using 16 or 32 times sampling, which can not only reduce the network reasoning time but also Used for denser feature extraction; in addition, remove U-Net++'s Droopout and maximum pooling operations to collect more low-level feature information;

③In terms of loss function, compare the performance of multiple Loss, select the optimal function for liver and lesions, and add weighted loss functions to networks of different depths, which can improve the discriminative ability of classifiers of different depth networks and effectively overcome gradients. Disappear the problem, and provide additional regularization; in addition, compare the single optimal loss function model and the loss model based on the combination of weight and similarity for in-depth supervision, select the output of the last residual block, and the loss of other layers is 0.3 The weight is added to the optimizer, and then the output results are weighted and averaged as the final loss. Joint decision-making at each level can effectively avoid the problem of multiple models consuming a lot of resources and time, and at the same time, it can absorb the advantages of each model to alleviate Over-segmentation and under-segmentation issues;

④The method of the present invention can perform end-to-end training tests, recognize liver and lesions at the same time with high precision and high speed, effectively help doctors recognize CT images, greatly reduce the time and energy consumed by doctors, reduce the probability of misdiagnosis, and have better The practical application value.

Other features and advantages of the present invention will be described in the following description, and partly become obvious from the description, or understood by implementing specific embodiments of the present invention. The purpose and other advantages of the present invention can be realized and obtained by the structures specifically pointed out in the written specification, claims, and drawings.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present invention, and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other related drawings can be obtained based on these drawings without creative work.

Figure 1: Schematic diagram of the network structure of the present invention;

Figure 2: Schematic diagram of the processing flow of the present invention;

Figure 3: A schematic diagram of data enhancement and serialization of the present invention;

Figure 4: Part of the network liver segmentation diagram of the present invention;

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. The components of the embodiments of the present invention generally described and shown in the drawings herein may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present invention.

It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once a certain item is defined in one figure, it does not need to be further defined and explained in subsequent figures. At the same time, in the description of the present invention, orientation terms and sequence terms are only used for distinguishing description, and cannot be understood as indicating or implying relative importance.

The method for segmenting the liver and its lesions in the medical image of the present invention, firstly, data screening and integration;

Then, build a new neural network and use the small image data for initial training;

Then, use the original image and the new data enhancement method for secondary training;

Use the original image and the new data enhancement method for secondary training. The original image is 512×512. The image is rotated, zoomed, flipped, and stretched, combined with random probability, and the exponential decay learning rate is used to adjust each round. The attenuation is adjusted and changed, and the equation is as follows:

Finally, different medical evaluation results can be obtained by adjusting the combination of loss functions;

The loss functions are specifically DL, GDL, and TL. The three loss functions used are as follows, and the loss functions applicable to the liver and the lesion are selected respectively. DL (Generalized Dice loss) is used to evaluate the similarity between the predicted set and the real set. When the sample is unbalanced, the expression is as follows:

GDL: When the liver lesion has multiple segmented areas, there is one Dice for each category, and GDL integrates multiple categories and uses one index for quantitative calculation. The formula is as follows:

The TL (Tversky loss) formula is as follows:

α and β respectively control the proportion of false positives and false negatives;

When α=β=0.5, the TL coefficient is the DL coefficient.

First, perform data preprocessing, remove the CT images without liver, integrate and serialize the remaining images, and do data enhancement in the form of 3D data streams. The accuracy is improved by synthesizing new samples, and then the relevant evaluation indicators are used for evaluation. The predicted image is post-processed after expansion and corrosion to obtain the predicted label;

Figure 1 The jump connection is similar to the original U-Net network, C and N use convolution, and R and D use the residual network structure. In the arc direction, the DenseNet structure is used between each, and the output of other layers of the liver is connected to the input corresponding to the depth of the lesion. In terms of loss function, the performance of multiple Loss is compared, and the optimal function for the liver and the lesion is selected.

Specific steps are as follows:

a) First, data preprocessing;

Select the liver tumor segmentation challenge (LiTS) official website data set, hosted by PatrickChrist. The original image of the data set is a serialized single-channel grayscale image of the abdomen, and the size of the original image and the label are both 512×512. The label is divided into two foreground and one background. 0 represents the background, 1 represents the liver, and 2 represents the lesion. Reorganize the labels into a three-channel image: background, liver, and lesions. The corresponding places are represented by 1 and the others are 0. . The data set contains a training set of 131 patient sequences. The training data set is eliminated without liver slices, and then shuffled into 19,000 to 20,000 3d slices. The 3d slice is the current slice and the two preceding and following slices as the overall input. Select one of them The 17,000 to 18,000 slices of the sample were used as the training set, the remaining 1800 to 1900 slices were used as the validation set, and there were 70 patient sequences used for testing.

b) Set training parameters and perform initial training;

To facilitate the application of effective weight distribution to model training, initially reduce the size of the image to 224×224, train the adjusted image for 40-60 rounds, with 12-16 slices per round, and finally fine-tune 20-40 on the original image. The round reaches the optimal model, and the specific processing flow is shown in Figure 2. Finally, perform operations such as image rotation, enlargement, and reduction, and combine them with a certain probability for the purpose of data enhancement. The intuitive effect of data enhancement is shown in Figure 3. Table 1 shows the network performance related to liver segmentation:

Table 1 Evaluation Index of Liver Segmentation

The bottom of Table 1 shows the comparison between the current mainstream semantic segmentation method and the method of the present invention, and the horizontal is the evaluation index. As can be seen from Table 1, in liver segmentation, except that the accuracy is not as good as the joint decision model of multiple loss functions, other indicators are better than all the previous networks. Due to the computer's hard disk read and write mechanism, multiple models and joint decision-making will greatly reduce the computer's computing speed. Adding parameters to a single model for in-depth supervision can achieve the combined effect. The speed and computing resource utilization are significantly better than those of multiple models. The segmentation results of the lesion are shown in Table 2:

Table 2 Network structure parameters

The bottom of Table 2 represents different losses and their combinations, and the horizontal is the evaluation index. Table 2 shows that the combination of weighting and similarity-based loss is invalid, and weighting can even reduce network performance. Compared with the results of liver segmentation, the performance of DL and GDL in lesion segmentation is better than that of TL and GTL, respectively. Therefore, using DL for in-depth supervision of lesion segmentation is better than each loss and joint decision-making results.

The first row of Figure 4 represents the real image of the label, and the longitudinal direction represents the training iteration effect of different networks. The last row represents the training output effect of the network. Compared with other network structures, the effect is the best.

Using a two-layer codec semicircular network, through dense jump connections, the deep and shallow semantic information are combined, which is easier for the optimizer to process; the transition zone between liver and lesion segmentation is designed to make the results of liver segmentation effective Passing to lesion segmentation greatly saves the time to segment the original image; selecting complementary loss functions to combine for in-depth supervision can effectively accept the gradient signal during back propagation and obtain more regularization effects. In weight-based and similarity-based, the most suitable loss function for liver and its lesion segmentation is selected, and the optimal loss is used for in-depth supervision, and both are used together for in-depth supervision. Finally, in liver segmentation, except for the model that is lower in accuracy than multiple loss functions for joint decision-making, other evaluation indicators, including lesions, all evaluation indicators exceed the multi-model fusion result.

In summary, the method for segmenting the liver and its lesions in the medical image of the present invention proposes an end-to-end codec network for segmenting the liver and its lesions. The 1*1 convolution kernel is used as the core unit of dense jump connections. The neural unit integrates multi-scale features, and spreads information with similar semantics that is easier to be processed by the optimizer without introducing too many parameters; the ResNet structure is used to strengthen the backbone network, and the overlap operation is used to replace the addition operation to ensure the depth and width of the network. Designing the transition zone between the liver and the lesion, limiting the lesion segmentation to the liver, saving computing resources, and the effect is better than using the network to segment the lesion alone. Based on the weighting strategy and the similarity-based loss model, the loss functions that are most suitable for liver and lesion segmentation are selected for in-depth supervision. It is better to use different loss functions for in-depth supervision than a single optimal loss function to meet the actual needs of doctors in diagnosis.

The above are only preferred embodiments of the present invention and are not used to limit the present invention. For those skilled in the art, the present invention can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once a certain item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed by the present invention, which shall be covered Within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope described in the claims.

It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

Claims

A method for segmenting liver and its lesions in medical images, which is characterized in:

Firstly, the abdominal CT image data is filtered and integrated and preprocessed, and divided into multiple data sets for different purposes, and then a new neural network is built, and the small image data is used for initial training;

After that, save the trained model, use the original image and the new data enhancement method for secondary training, perform expansion and corrosion processing on the predicted image, and evaluate it with medical evaluation indicators;

The prediction results of the models trained with the DL, GDL, and TL loss functions are respectively used. The prediction results of the above three loss models are added and the average is formed to form a fusion feature. Finally, the network is modified and the three loss models are merged into a single network for training prediction.
The method for segmenting the liver and its lesions in a medical image according to claim 1, characterized in that it specifically comprises the following steps:

a) First, data screening and integration;

Remove the slices without liver from the training data set, and then shuffle them into 19,000 to 20,000 3d slices. The 3d slice is the current slice and the two preceding and following slices as the overall input, and 17,000 to 18,000 slices are selected as the training set. The remaining 1800-1900 slices are used as the verification set, and 70 patient sequences are used for testing. The training set size is divided into 224×224 and 512×512;

b) Then, build a new neural network and use the small image data for initial training;

Set the U-shaped path in Unet as the main path. The full path plus the ResNet structure form a codec structure. The dense jump connection is replaced by a 1×1 convolutional layer on the basis of DenseNet. In the transition zone between the liver and the lesion, the liver outputs The information becomes the input and convolution of the lesion, and the output of other layers of the liver is short-connected to the input of the corresponding depth of the lesion;

Reduce to 224×224 data through network training, apply effective weight distribution to subsequent model training, train the adjusted picture for 40-60 rounds, with 12-16 slices in each round, rotate the picture during the training process , Zoom in, zoom out, and combine with random probability;

c) Then, use the original image and the new data enhancement method for secondary training;

After training the model on the reduced image data, the network structure and weight distribution are retained, the original image is combined with the probability of rotation, scaling, flipping, and stretching, and the new learning rate is used for secondary training;

d) Finally, different medical evaluation results are obtained by adjusting the combination of loss functions;

Through a single optimal loss function model and a loss model based on a combination of weights and similarities, different levels of supervision signals are used to obtain different evaluation results.
The method for segmenting the liver and its lesions in a medical image according to claim 2, characterized in that: step c) uses the original image and the new data enhancement method for secondary training, the original image is 512×512 size, Rotate, zoom, flip, and stretch the picture, combine them with random probability, and use the exponential decay learning rate to adjust each round of attenuation. The equation is as follows:

The above formula, the decayed learning rate decayed_learning_rate, multiply the initial learning rate set first by the base decay speed decay_steps, the decay rate is set to 0.8～0.9, decay every global_step step, global_step is the current iteration round number, that is, how many The round can iterate through all the sample data, the initial learning rate is set to 1e-3 ~ 3e-3, and the original image training is set to 1e-4 ~ 3e-4, the result is that the learning rate is based on the base 0.8 ~ 0.9 per round The number of steps is attenuated.
The method for segmenting liver and its lesions in medical images according to claim 2, characterized in that: step d) obtain different medical evaluation results by adjusting the combination of loss functions, the loss functions are specifically DL, GDL, TL, The three loss functions used are as follows, and the loss functions applicable to the liver and the lesion are selected respectively. DL is used to evaluate the similarity between the predicted set and the true set, and is used in the case of unbalanced samples. The expression is as follows:

The quantitative calculation of the denominator adopts the method of element square and then summation, where k and t represent the elements of the prediction area and the true value area respectively, and ij represents the traversal of the elements; it is a set similarity measurement function, usually used to calculate the two samples Similarity, the range is [0,1], the coefficient in the numerator, due to the repeated calculation of the common element between k and t in the denominator, and finally the loss value is obtained by multiplying the double points of each category by the sum of the squares of the respective elements ；

GDL: When the liver lesion has multiple segmented areas, there is one Dice for each category, and GDL integrates multiple categories and uses one index for quantitative calculation. The formula is as follows:

Among them, k ij is the true value of category i in the jth pixel, and t ij is the corresponding predicted probability value; compared to DL, there is more weight wi as each category, wi is used to maintain the relationship between the lesion area and the DL coefficient Balance

The TL formula is as follows:

Where k ij is the true value of category i at the jth pixel, and t ij is the corresponding predicted probability value;

α and β respectively control the proportion of false positives and false negatives.
The method for segmenting the liver and its lesions in a medical image according to claim 4, wherein when α=β=0.5, the TL coefficient is the DL coefficient.