CN111105423A

CN111105423A - Deep learning-based kidney segmentation method in CT image

Info

Publication number: CN111105423A
Application number: CN201911303210.7A
Authority: CN
Inventors: 杜强; 李剑楠; 郭雨晨; 聂方兴; 张兴
Original assignee: Beijing Xbentury Network Technology Co ltd
Current assignee: Beijing Xbentury Network Technology Co ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2020-05-05
Anticipated expiration: 2039-12-17
Also published as: CN111105423B

Abstract

The invention discloses a method for segmenting kidney in a CT image based on deep learning, which comprises the following steps: inputting a CT image group; normalizing each image in the image group; generating a position coding image of the kidney, and overlapping the position coding image with each image; convolving each image, determining an interested position area, and performing pixel Hadamard product on each processed image to obtain a segmented image; an output end binarizes the image; and performing Hadamard products on the obtained images and the images after the normalization processing, and determining to output the images of the kidney part. According to the method, the position code is added when the neural network training image is used, so that the problem that the spleen and the kidney are difficult to distinguish can be well distinguished and solved; in addition, an attention mechanism is used, so that the convergence rate of network fitting is higher, the network can pay attention to the kidney position region, and therefore the interference of the spleen is eliminated.

Description

Deep learning-based kidney segmentation method in CT image

Technical Field

The invention relates to the technical field of image segmentation, in particular to a method for segmenting a kidney in a CT image based on deep learning.

Background

With the large-scale growth of image data on the internet, image segmentation techniques have received a wide range of attention and applications. Especially in the segmentation in the medical field, the segmentation cost of doctors is high, the segmentation efficiency is low, the segmentation standards are uneven,

the existing medical segmentation technology is generally based on the traditional computer vision technology, a CT value (-1000Hu,1000Hu) is projected to an RGB space (0-255), features are extracted manually, a heuristic algorithm is adopted according to the experience of a doctor, and the extracted features are used for performing operations such as dimension reduction, machine learning algorithm and the like. Such a method has many advantages, such as the experience of the doctor being well automated, and for example, a series of machine learning algorithms, represented by random forests and SVMs, being interpretable and having relatively optimal solutions, and the requirements on the amount of data being not very high, perhaps tens or hundreds of models being able to be obtained that perform well, but performing very poorly on the prediction of new data, especially the performance achieved on multi-center verification.

From the current research, the method for solving the problems in recent years has shifted from the traditional computer vision field to the combination of deep learning and computer vision, and the deep learning method has many advantages, and the high robustness is one of the important points.

For example, although the neural network Unet has a certain effect on segmentation, for images with similar shapes and similar CT values, as shown in fig. 1, the kidneys and the spleen are confused, and therefore, it is desirable to provide a method for accurately identifying and segmenting the kidneys from the CT images.

Disclosure of Invention

In order to solve the technical problem, the technical scheme adopted by the invention is to provide a method for segmenting the kidney in a CT image based on deep learning, which comprises the following steps:

s1, inputting a CT image group;

s2, carrying out normalization processing on each image in the image group;

s3, generating a position code pattern of the kidney, and overlapping the position code pattern with each image in the step S2;

s4, performing convolution on the images in the step S3 to determine a position region of interest, and performing pixel Hadmard Product on the position region of interest and the processed images to obtain a segmented image or a characteristic image;

s5, outputting a binary image;

s6, Hadmard Product is performed on the image obtained in step S5 and each image in step S2, and an image obtained by segmenting the kidney part is determined and output.

In the above method, further comprising the step of: setting a loss function, selecting a plurality of models for training, establishing a model set by the trained models, segmenting the image group by the models in the model set respectively, and fusing the results of the plurality of models in a voting mode by using a model fusion technology to obtain a final segmentation result.

In the above method, the normalization process performed on each image in the image group is calculated as follows:

in the formula I_originRepresenting the original image, I_normalRepresenting the normalized image.

In the above method, the position code generation specifically includes:

PE(i,j)＝cos(β*eⁱ)+sin(β*e^j),i∈(0,511),j∈(0,511)

wherein, (i, j) represents the coordinates of the pixel points of the kidney, and the PE function is a position function related to (i, j), β is a hyper-parameter and is used for adjusting the frequency of different positions;

adding the position coding pattern and each processed image, wherein the calculation formula is as follows:

I_new＝I_normal(i,j)+α*PE(i,j)

here, α is a scale for adjusting the original image and the position information after the normalization processing.

In the above method, the region of interest (ROI) and the processed images are subjected to pixel hadamard product calculation as follows:

f_1up＝concat(f_2up,f_2down)*f_2attention；

f_2up＝concat(f_3up,f_3down)*f_3attention；

f_3up＝concat(f_4up,f_4down)*f_4attention；

wherein f is_1～4up1-4 layers of convolution output characteristic images are sampled upwards; f. of_2～4downDown-sampling 2-4 layers of output images; f. of_{2～4attention}Is a feature image that has undergone a layer of convolution and pooling.

In the above method, the binarized image is specifically: the binarized image outlining the kidney is distinguished by 0 and 1 values.

In the method, the CT image group is 3 continuous CT images.

According to the method, the position encoding is added when the neural network Unet trains the images, so that the problem that the spleen and the kidney are difficult to distinguish can be well distinguished and solved; in addition, an attention mechanism is used, the network can adjust the ROI according to the PE information in the process of back-propagation of the neural network, so that the attention effect of the region of interest is achieved, the network fitting convergence speed is higher, the network can pay attention to the region of the kidney position, and accordingly spleen interference is eliminated.

Drawings

FIG. 1 is a diagram illustrating a segmentation result obtained by a conventional segmentation method in the background art according to the present invention;

FIG. 2 is a process flow of the method provided by the present invention;

FIG. 3 is a schematic diagram of a neural network structure provided by the present invention;

FIG. 4 is a schematic diagram of a neural network framework incorporating position coding and a regional attention mechanism according to the present invention;

FIG. 5 is a diagram illustrating the effect of the CT image segmentation according to the present invention.

Detailed Description

The invention provides a method for a kidney CT image segmentation network, which adds a position coding and attention mechanism, enables a neural network to pass back-propagation (back-propagation) and does not confuse spleen or other organs when segmenting a kidney. The present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 2-3, the present invention provides a method for kidney segmentation in a CT image based on deep learning, comprising the following steps:

s1, inputting a CT image group; in this embodiment, the CT image group may be 3 continuous CT images, the length and width are 512 × 512, and the pixel value range is-1000 to 2000HU, and multiple groups of images may be transmitted for training during training, and the 3 continuous images have information of upper and lower layers, so that the model predicts the mask of the middle layer through the information of the upper and lower layers.

S2, carrying out normalization processing on each image in the image group; since the actually measured kidney pixel values are generally distributed between 25 and 50, and the spleen pixel values are generally distributed between 35 and 60, the input CT image set is normalized by the following formula:

wherein, I_originRepresenting the original image, I_normalRepresenting the normalized image.

S3, generating a position encoding map (Positional encoding) of the kidney, and overlapping the position encoding map with each image in the step S2; the implementation steps are as follows:

as shown in fig. 4, for the neural network with the position code and attention mechanism added, the position code is generated as follows:

PE(i,j)＝cos(β*eⁱ)+sin(β*e^j),i∈(0,511),j∈(0,511

wherein, (i, j) represents the coordinates of the pixel points of the kidney, the PE function is a position function related to (i, j), and β is a hyper-parameter for adjusting the frequency of different positions.

Adding the position-coding pattern to the processed images corresponds to expressing the position information in the form of frequency and superimposing the information on the processed images, with the formula:

I_new＝I_normal(i,j)+α*PE(i,j)

α is used for adjusting the ratio of the normalized original image to the position information, and the value is 0.1 in order to prevent α from affecting the normalized original image;

s4, convolving the images in step S3 to determine a region of interest (ROI), and performing a pixel hadamard Product (hadamard Product) on the region of interest (ROI) and the processed images to obtain a segmented image or feature image, specifically:

in the embodiment, because the PE (provider edge) is added, namely the position information, the position information always participates in the calculation in the convolution process, the network has certain sensitivity to the position information in the convolution process, and the attention effect is given to the sensitive position.

After the convolution operation is performed on each image in step S3, the position region is found, 2 × 2 pooling is performed, the position region of interest is found, and the obtained region of interest (ROI) is used as a pixel Hadmard Product. In the process of back propagation, the network can adjust the ROI according to the PE information so as to achieve attention effect of the region of interest (ROI), wherein pixel Hadmard Product calculation is carried out on the ROI and each processed image as follows:

f_1up＝concat(f_2up,f_2down)*f_2attention；

f_2up＝concat(f_3up,f_3down)*f_3attention；

f_3up＝concat(f_4up，f_4down)*f_4attention；

wherein f is_1～4up1-4 layers of convolution output characteristic images (feature maps) are sampled upwards;

f_2～4downdown-sampling 2-4 layers of output images; f. of_{2～4attention}Is a feature image after a layer of convolution and pooling, and then f is processed_downAnd f_upSplicing the corresponding characteristic graphs and then splicing the characteristic graphs with f_attentionThe multiplication finally yields an image of the location area ROI. By the method, the network fitting convergence speed is higher, the network can notice the kidney position region, and therefore spleen interference is eliminated.

S5, output end binarization image: the binarized image outlining the kidney is distinguished by 0 and 1 values. It is possible to perform down streaming tasks by this step.

S6, performing Hadamard Product (Hadmard Product) on the image obtained in the step S5 and each image in the step S2, and determining and outputting an image with a segmented kidney part; as shown in fig. 5, is the result of segmentation using the present invention.

Preferably, in this embodiment, in order to effectively improve the accuracy of the segmentation result, the method further includes the steps of: setting a loss function, selecting a plurality of models for training, establishing a model set S by the trained models, segmenting image groups respectively by the models in the model set S, and fusing the results of the plurality of models in a voting way by using a model fusion technology to obtain a final segmentation result, wherein the method specifically comprises the following steps:

in this embodiment, after the attention mechanism Unet is added, different backbones are used as model candidates of the Unet, and the model candidates may include: ResNet52, VggNet, DenseNet, SENEt, etc.; training a plurality of models through steps S1-S6 until loss functions are not obviously changed and putting the models into a model set S through different super-parameter adjustment, respectively segmenting image groups through the models in the model set S, voting the results, and performing weighted average on the voting results to obtain final segmentation results; the model training process is implemented by the prior known technology, and is not a protection point of the technology of the embodiment, so that redundant description is omitted.

In the present embodiment, the first and second embodiments,and selecting results returned by 2-4 of the 4 models for voting, and performing weighted average on all voting results to obtain a final segmentation result. The results returned by 3 of the 4 models can be selected for voting, and the total result can be obtained

And (4) grouping voting results, and performing weighted average on the 4 groups of voting results to obtain a final result, wherein the voting specifically means that the pixel value of the Mask is 1 if more than 2 models predict that the kidney exists, and otherwise, the pixel value of the Mask is 0. The advantage of this is that the individual model error can be effectively reduced.

According to the method for segmenting the kidney in the CT image based on deep learning, which is provided by the invention, training is carried out on a data set KiTS19 data set, a Dice coefficient (the Dice coefficient is one of common methods for evaluating the segmentation effect, and can also be used as a loss function to measure the difference between the segmentation result and a label) is improved by about 20% from the result 0.64 obtained by training with a single Unet to the current result 0.77 without adding any other optimization method. The iteration convergence rate is increased by about 60% from 30-35 times of iteration required by a single Unet to 10-15 times of iteration required by the present invention with the addition of a position coding and attention mechanism, and the performance of the present invention in the kidney with tumor can also reach a Dice value of 0.77.

The invention has the beneficial effects that:

(1) the method has an Attention mechanism, so that the training curve is converged more quickly, the characteristics in the image are found more easily, the characteristics are learned in a thinning mode, the convergence speed is obviously improved in comparison with a single Unet, about 30 iteration times are needed for the Unet in the task, and the previous effect can be achieved only by 10-15 iterations after the Positional Encoding and the Attention are added.

(2) The method has better performance in the task of segmenting the kidney, the quantization standard (Dice Coefficient) reaches a higher level, and the reason is that the task Specificity (Specificity) is greatly improved, and compared with the prior network, the importance of the position information can be adjusted through α, so that the network not only learns the shapes of the kidney and the spleen, but also judges according to the positions of the kidney and the spleen, and the occurrence of false positives (non-kidney areas and predicted kidney areas) is reduced.

(3) Compared with the traditional machine learning method, the end-to-end prediction method has the advantages that the parameters learned by the neural network form an input-output model, a single model (taking ResNet52 as a backbone as an example) can segment the image within 1E-2s, the segmentation efficiency is improved, compared with manual segmentation, the method has more unified standard, and powerful guarantee is provided for downstream tasks.

The present invention is not limited to the above-mentioned preferred embodiments, and any structural changes made under the teaching of the present invention shall fall within the protection scope of the present invention, which has the same or similar technical solutions as the present invention.

Claims

1. A kidney segmentation method in a CT image based on deep learning is characterized by comprising the following steps:

s1, inputting a CT image group;

s2, carrying out normalization processing on each image in the image group;

s5, outputting a binary image;

2. The method of claim 1, further comprising the step of: setting a loss function, selecting a plurality of models for training, establishing a model set by the trained models, segmenting the image group by the models in the model set respectively, and fusing the results of the plurality of models in a voting mode by using a model fusion technology to obtain a final segmentation result.

3. The method of claim 1, wherein the normalizing for each image in the set of images is calculated as:

4. The method of claim 1, wherein the position code generation is specifically as follows:

PE(i，j)＝cos(β*eⁱ)+sin(β*e^j)，i∈(0，511)，j∈(0，511)

wherein (i, j) represents the coordinates of the pixel points of the kidney, and the PE function is a position function related to (i, j), β is a hyper-parameter and is used for adjusting the frequency of different positions;

I_new＝I_normal(i，j)+α*PE(i，j)

5. The method of claim 1, wherein the region of interest (ROI) and the processed images are computed as a Hadmard Product of pixels as follows:

f_1up＝concat(f_2up，f_2down)*f_2attention；

f_2up＝concat(f_3up，f_3down)*f_3attention；

f_3up＝concat(f_4up，f_4down)*f_4attention；

6. The method according to claim 1, characterized in that said binarized image is specifically: the binarized image outlining the kidney is distinguished by 0 and 1 values.

7. The method of claim 1, wherein the set of CT images is 3 consecutive CT images.