CN113066576A

CN113066576A - Lung cancer screening method based on three-dimensional mask-area convolutional neural network

Info

Publication number: CN113066576A
Application number: CN202110515164.8A
Authority: CN
Inventors: 袁戎; 袁知东; 冯飞; 成管讯
Original assignee: Peking University Shenzhen Hospital
Current assignee: Peking University Shenzhen Hospital
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-07-02

Abstract

A lung cancer screening method based on a three-dimensional mask-area convolutional neural network is used for screening lung cancer. Because clinical CT of lung cancer shows that the lung cancer is lung nodules, the method comprises two steps, wherein a regression frame of suspected lung nodules is found in the first step, and the suspected nodules are classified in the second step, so that the nodules suspected as cancer lesions are found. The method detects lung nodules based on a three-dimensional mask-region convolutional neural network and enables the method to be applicable to CT data. Since not all nodules are cancerous, it is desirable to take into account the image signs of the patient to infer their likelihood of developing cancer. In order to improve the classification accuracy, the three-dimensional data is used for obtaining the surrounding structure of the suspected nodule, and the probability that the suspected nodule belongs to the cancer focus is comprehensively judged.

Description

Lung cancer screening method based on three-dimensional mask-area convolutional neural network

Technical Field

The invention belongs to the field of artificial intelligent identification and auxiliary diagnosis of medical images, and particularly relates to a lung cancer screening method based on a three-dimensional mask-region convolutional neural network.

Background

The lung cancer patients in China are many, the incidence rate is high, and the medical cost is high. According to the rank of the number of the patients, lung cancer accounts for 20.3 percent of all cancers and is the first to be attacked by malignant tumors. According to the national cancer report of 2019 issued by the national tumor quality control center, the death of malignant tumors reaches 23.91 percent of the total death causes of residents, the morbidity and the mortality of the malignant tumors are in a continuously rising state in recent decades, and the medical cost caused by the malignant tumors exceeds 2200 hundred million each year. In the clinical diagnosis process, doctors need to screen possible lung cancer focuses one by one in hundreds of CT images, which not only depends greatly on the clinical experience of the doctors, but also is extremely time-consuming and labor-consuming. Therefore, there is an urgent need to design a computer-aided lung cancer screening system to simplify the screening work of doctors and shorten the diagnosis time.

With the rapid development of deep learning, lung cancer screening methods based on convolutional neural networks are widely studied. The traditional lung cancer detection method detects candidate nodules based on some simple prior information, for example, the nodules are approximately circular on a 2D image and have a high CT value, the prior information is summarized into mathematical characteristics, and then classification is performed through a classifier to obtain a detection result. However, due to the height difference of the lung nodules in shape, size and texture, it is difficult to capture and distinguish features through low-level feature extraction through artificial induction, resulting in poor detection results. Meanwhile, the judgment of the quality and the malignancy of the nodule only from a single image is limited, and although some 2D or 2.5D deep neural networks have good expression in the aspect of reducing false positive, the detection sensitivity can be effectively improved by using 3D CT data and combining three-dimensional spatial information of the lesion, which can be achieved by a few methods at present.

Disclosure of Invention

The invention aims to provide a lung cancer screening method based on a three-dimensional mask-region convolutional neural network, which is used for effectively judging the benign and malignant lung nodules and accurately carrying out early screening on lung cancer.

According to an aspect of the present invention, a lung cancer screening method based on a three-dimensional mask-area convolutional neural network comprises the following steps:

the first step is as follows: data marking, namely, carrying out CT thin-layer scanning data on the lungs of a plurality of different patients, recording the three-dimensional position of a lung nodule in the CT data, and marking the benign and malignant degree of the corresponding lung nodule according to clinical experience and biopsy information;

the second step is that: data preprocessing, namely interpolating the CT data into equal resolution in three directions, and cutting the data to only reserve a lung region;

the third step: constructing and training a detection model, and using a three-dimensional mask-region convolution-based neural network as a lung nodule detection model;

the fourth step: according to the trained model, carrying out lung nodule detection on the CT data, and outputting a pixel cube of a suspected lung nodule, wherein the pixel cube comprises the coordinates and the size of the center point of the lung nodule;

the fifth step: constructing and training a classification model, and using a three-dimensional convolutional neural network as a classification model of benign and malignant pulmonary nodules;

and a sixth step: and outputting the coordinates and the size of the lung nodule center point with the highest malignant probability for further clinical report.

Preferably, in the second step, in the data preprocessing:

2.1) obtaining the spatial resolution of the data according to the CT data in the DICOM format; taking the minimum resolution as a reference, and enabling the resolution of the data to be the same in three directions by using linear interpolation;

2.2) the lung of the human body is divided into two parts, namely a left lung and a right lung; meanwhile, most of the lung is air, and the CT value of the air is 0, all cavities with different CT values of 0 and sizes in the interpolated data are found through a region growing algorithm, and the volume of each cavity is calculated according to the resolution;

2.3) the volume of the human lung is between 500 and 10000cm according to experience³Within this volumeThe two cavities of range and largest are considered to be the lungs;

2.4) filling the cavity caused by the blood vessels in the lung by using morphological filtering, wherein the filtering range is set to be 5 pixels;

2.5) cutting the data after interpolation, only reserving the data of the lung area, and reducing the subsequent calculation time.

The invention has the beneficial effects that: a method for detecting and classifying lung nodules in a CT image by using a deep learning method is provided, and the method is used for detecting the lung nodules based on a three-dimensional Mask-region convolutional neural network (3D Mask R-CNN) and is suitable for CT data. Since not all nodules are cancerous, it is desirable to take into account the image signs of the patient to infer their likelihood of developing cancer. In order to improve the classification accuracy, the three-dimensional data is used for obtaining the surrounding structure of the suspected nodule, and the probability that the suspected nodule belongs to the cancer focus is comprehensively judged.

Drawings

FIG. 1 is a network structure of a detection model according to the present invention;

FIG. 2 is a schematic diagram of a ResNet-FPN network;

FIG. 3 is a classification model network structure;

FIG. 4 detects the FROC curve obtained by the model;

FIG. 5 areas of suspected lung cancer detected in actual cases;

FIG. 6 shows that the numerical value of the classification result of the suspected lung cancer region in the actual case is closer to 1, which indicates that the probability of malignant lesion (lung cancer) is higher;

fig. 7 shows the result visualized in a medical image viewing system, in the form of a red box, the lesion area of suspected lung cancer.

Detailed Description

The invention is further described with reference to the following description and examples.

In fig. 1-7, a lung cancer screening method based on a three-dimensional mask-area convolutional neural network includes the following steps: the first step is as follows: and (3) data marking, namely preparing lung CT thin-layer scanning data of a plurality of different patients, recording the three-dimensional position of a lung nodule in the CT data, and marking the benign and malignant degree of the corresponding lung nodule according to clinical experience and biopsy information.

The second step is that: and (3) data preprocessing, wherein the CT data are interpolated into equal resolution in three directions, and only the lung region is reserved by cutting the data.

The third step: and (3) constructing and training a detection model, and using a three-dimensional mask-area convolution-based neural network as a lung nodule detection model.

The fourth step: and according to the trained model, carrying out lung nodule detection on the CT data, and outputting a pixel cube of a suspected lung nodule, wherein the pixel cube comprises the coordinates and the size of the center point of the lung nodule.

The fifth step: and (3) constructing and training a classification model, and using a three-dimensional convolutional neural network as the classification model of benign and malignant pulmonary nodules.

In some embodiments, in the second step, data is pre-processed

1.1 obtaining the spatial resolution of the data according to the CT data in the DICOM format; based on the minimum resolution, linear interpolation is used to make the data have the same resolution in three directions.

1.2 the lung of the human body is divided into 2 parts, namely a left lung and a right lung; meanwhile, most of the lung is air, and the CT value of the air is 0, so that all cavities with different CT values of 0 and sizes in the interpolated data are found through a region growing algorithm, and the volume of each cavity is calculated according to the resolution.

1.3 empirically, the volume of the human lung is 500 to 10000cm³The largest two cavities belonging to this volume range are considered to be the lungs

1.4 fill up the cavity due to the blood vessels in the lung using morphological filtering, with a filtering range set to 5 pixels.

And 1.5, cutting the data after interpolation, only reserving the data of the lung region, and reducing the subsequent calculation time.

In some embodiments, in the third step, the construction and training of the detection model:

2.1 the detection model uses a convolution neural network based on three-dimensional mask-area, the network structure is shown in fig. 1 and comprises three parts: the first part is a hierarchical connection structure consisting of ResNet-FPN, wherein the hierarchical connection structure comprises an up-sampling structure, a down-sampling structure and a cascade structure connected among layers; the second part is that the regional suggestion network generates regional suggestions for the extracted feature maps and carries out Region of interest alignment (ROI alignment); the third part is to calculate the loss function.

The 2.2ResNet-FPN network includes down-sampling from bottom to top, up-sampling from top to bottom, and cascading sections horizontally. The down-sampling path from bottom to top, consisting of a series of convolutional layers and max-pooling layers, as shown in fig. 2, is used for feature map extraction. The feature extraction is divided into 5 stages according to the size of the output feature map, and the final layer of output convolution 2, convolution 3, convolution 4 and convolution 5 of stage 2, stage 3, stage 4 and stage 5 are respectively defined as C2, C3, C4 and C5. In the downsampling stage of the FPN, the convolution layer is first, the convolution kernel size is 7 × 7 × 7, the step size is 2, and then a maximum pooling layer of 3 × 3 × 3 size is used to perform rough feature extraction on the image. The layers are connected by different numbers of residual blocks (ResBlock), C2 is connected by 3 residual blocks, C3 is connected by 4 residual blocks, C4 is connected by 6 residual blocks, and C5 is connected by 3 residual blocks. The final output of the residual unit is summed by the plurality of convolutional layer outputs and by the ingress and egress X, and then activated by the ReLU. The training parameters and the training calculated amount of the network are not increased in the process, but the training speed of the model can be greatly increased, and the training effect is improved. Meanwhile, when the layer number of the model is deepened, the structure can well solve the problem of network degradation.

Up-sampling from top to bottom, starting from the highest layer, using nearest neighbor sampling. Compared with common deconvolution operation, nearest neighbor sampling is simpler, training parameters can be reduced, and training time is shortened. The upsampling process is also divided into 5 stages corresponding to the downsampling, with M2, M3, M4 and M5 as the results. The results of the downsampling process, i.e., C2, C3, C4, and C5, were convolved each layer with a 1 x 1, sized convolution kernel, with the output channel consistent set to 36, and then summed with the corresponding signature. This is to keep the upsampling to get the same feature dimension. And then the characteristics after the convolution kernel addition of 3 multiplied by 3 are adopted for processing, so as to eliminate the mixing effect of the up-sampling. By the method, targets with different sizes are detected on the feature maps with different resolutions, and the detection accuracy of the targets with smaller sizes can be obviously improved.

2.3 obtaining regression frames generated by 9 kinds of anchor points in the characteristic diagram obtained by the FPN network, and then judging which anchor points are target candidate frames needing to be detected by using the RPN network. The detection image is convolved by a 3 multiplied by 3 shared weight with 128 channels of 36, and then the last feature map is compressed by one-dimensional convolution of 6 128 channels. Each anchor point can be a foreground or a background, foreground anchor points are obtained through Softmax classification, and regression frames of the detected target candidate regions are extracted preliminarily. The real bounding box of the detection target and the (Ax, Ay, Az, Ar) anchor bounding box are denoted by (Gx, Gy, Gz, Gr), where the first three elements represent the coordinate regression box of the center point and the last element represents the nodule size. A cross-over ratio (IoU) is used to determine the label of each anchor point. Anchors with targets greater than 0.5 and IoU less than 0.05 are considered positive and negative examples, respectively, and are ignored during retraining of other examples. The amount of translation (dx, dy, dz) between the front anchor and the true marker and the scale factor (dr) are as follows:

d_x＝(G_x–A_x)/A_r,

d_y＝(G_y–A_y)/A_r,

d_z＝(G_z–A_z)/A_r,

d_r＝log(G_r/A_r)

2.4 feature location positioning can be more accurate using ROI alignment (Align) instead of ROI Pooling due to the feature map dimensions that are compressed by the Pooling (Pooling) operation, resulting in feature location inaccuracies. Since the region suggestions given by the RPN network are generally non-integer numbers, it is necessary to make the region integers, that is, divide the boundary region into n units on average, and perform rounding processing on each unit. However, after the integer number is processed, the obtained candidate frame is already deviated from the candidate frame proposed by the initial RPN network, thereby affecting the accuracy of the final regression frame. Compared with pooling operation, ROI alignment obtains coordinates by using a bilinear interpolation method, the values of pixel points are obtained, and then sampling is carried out. For example, the number of sampling points obtained in the feature map is 4, the cell is divided into four small squares, and the center of each small square is the sampling point. And performing maximum pooling operation on the sampling points in each cell to obtain a final ROI alignment result.

2.5 in the iterative optimization of the training model parameters, the loss function consists of three parts: regression loss, classification loss and mask loss. The overall loss function is defined as follows:

the prediction probability and label of an anchor point are denoted by p i and p, respectively, p ∈ {0,1} (negative samples are 0, positive samples are 1). The following table i in the penalty function is the table below for the anchor points, pi represents the probability that the region proposal is foreground, and pi is the predicted probability of the true regression box compared to the anchor points.

In the fifth step, the construction and training of the classification model:

3.1 the output of the detection network is used as the input of the classification network, and in order to better utilize three-dimensional data, the classification model uses a three-dimensional convolution neural network.

3.2 classification network As shown in FIG. 3, it consists of 6 three-dimensional convolutional layers, 3 three-dimensional max pooling layers, 2 full link layers and the final activation layer. 2 missing layers (Dropout Layer) were added simultaneously to avoid overfitting.

In consideration of clinical requirements, the performance evaluation of the screening method mainly comprises two indexes, namely recall rate and the number of false positives, and the recall rate is as high as possible within the acceptable range of the false positives, which indicates that missed diagnosis can be reduced as much as possible through a model in clinic. Considering that a patient may have multiple lung nodules that need to be detected, the present invention uses a free response receive operating curve (FROC) to evaluate model performance, with the results shown in fig. 4.

The following table demonstrates a comparison of the results of the present method with other methods, which achieved an average FROC score of 0.804 over other methods. In addition, the inventive method achieved optimal results at 2 FPs. In clinical practice, 2FPs is of greater concern because false positives in this case do not place too much of a diagnostic burden on the physician. The method has the recall rate of 89.9 percent at 2 FPs.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A lung cancer screening method based on a three-dimensional mask-area convolutional neural network is characterized by comprising the following steps:

2. The lung cancer screening method based on the three-dimensional mask-area convolutional neural network as claimed in claim 1, wherein in the second step, in the data preprocessing:

2.3) the volume of the human lung is 500 to 10000cm³The two largest cavities belonging to this volume range are considered to be the lungs;

2.4) filling the cavity caused by the blood vessels in the lung by using morphological filtering, wherein the filtering range is set to be five pixels;

3. The lung cancer screening method based on the three-dimensional mask-area convolutional neural network as claimed in claim 1, wherein in the third step, the construction and training of the detection model:

the detection model uses a three-dimensional mask-based, area convolution neural network, and comprises three parts: the first part is a hierarchical connection structure formed by a ResNet-FPN network, wherein the hierarchical connection structure comprises an up-sampling structure, a down-sampling structure and a cascade structure connected between layers; the second part is that the regional suggestion network generates regional suggestions for the extracted feature maps and carries out region-of-interest alignment; the third part is to calculate the loss function.

4. The lung cancer screening method based on the three-dimensional mask-area convolutional neural network as claimed in claim 3, wherein in the third step, the construction and training of the detection model:

the ResNet-FPN network comprises a down-sampling part from bottom to top, an up-sampling part from top to bottom and a transverse cascading part; and the down-sampling path from bottom to top consists of a series of convolution layers and maximum pooling layers and is used for feature map extraction.

5. The method of claim 4 for screening lung cancer based on a three-dimensional mask-area convolutional neural network, wherein: according to the size of the output feature map, feature extraction is divided into five stages, and the last output convolution two, convolution three, convolution four and convolution five in the stage two, stage three, stage four and stage five are respectively defined as C2, C3, C4 and C5; in the downsampling stage of FPN, firstly, a convolution layer is formed, the size of the convolution kernel is 7 multiplied by 7, the step length is 2, then, a maximum pooling layer with the size of 3 multiplied by 3 is formed, and rough feature extraction is carried out on an image; the layers are connected by different numbers of residual blocks, C2 is connected by three residual blocks, C3 is connected by four residual blocks, C4 is connected by six residual blocks, and C5 is connected by three residual blocks; the final output of the residual error unit is added by the output of a plurality of convolution layers and the input and output X, and then activated by the ReLU;

sampling from top to bottom from the highest layer, and using nearest neighbor sampling; corresponding to the down sampling, the up sampling process is also divided into no stages, and M2, M3, M4 and M5 are used as the results; convolving the results of the downsampling process, i.e., C2, C3, C4, and C5, each layer with a 1 × 1 convolution kernel, with the output channels consistently set to 36, and then adding the corresponding feature maps; and then the characteristics after the addition of the convolution kernels of 3 multiplied by 3 are adopted for processing so as to eliminate the mixing effect of the up-sampling.

6. The method of claim 5 for screening lung cancer based on a three-dimensional mask-area convolutional neural network, wherein: obtaining a regression frame generated by nine anchor points in a characteristic diagram obtained by the FPN network, and then judging which anchor points are target candidate frames needing to be detected by using the RPN network; the detection image is convolved by a 3 multiplied by 3 shared weight with 128 channels of which the number is 36, and then the last feature map is compressed by one-dimensional convolution of 6 128 channels; each anchor point can be a foreground or a background, foreground anchor points are obtained through Softmax classification, and a regression frame of a detection target candidate region is preliminarily extracted; representing a real boundary box of a detection target and an anchor point boundary box (Ax, Ay, Az, Ar) by (Gx, Gy, Gz, Gr), wherein the first three elements represent a coordinate regression box of a central point and the last element represents the size of a nodule; using the cross-over ratio to determine the label of each anchor point; anchors with targets greater than 0.5 and IoU less than 0.05 are considered positive and negative examples, respectively, and are ignored during retraining of other examples.

7. The method of claim 4 for screening lung cancer based on a three-dimensional mask-area convolutional neural network, wherein: pooling operation can compress feature map dimensions, cause inaccuracy of feature positions, and the feature positions can be more accurately positioned by replacing ROI pooling through ROI alignment; the region suggestions given by the RPN are generally non-integer numbers, and the region suggestions need to be integer numbers, namely, the boundary region is averagely divided into n units, and rounding processing is carried out on each unit; however, after the integer transformation, the obtained candidate frame has a deviation from the candidate frame proposed by the initial RPN network, thereby affecting the accuracy of the final regression frame; compared with pooling operation, ROI alignment obtains coordinates by using a bilinear interpolation method, the values of pixel points are obtained, and then sampling is carried out.

8. The method of claim 1 for lung cancer screening based on a three-dimensional mask-area convolutional neural network, wherein: in the fifth step, the construction and training of a classification model,

5.1) taking the output result of the detection network as the input of a classification network, wherein in order to better utilize three-dimensional data, a three-dimensional convolution neural network is used as a classification model;

5.1) the classification network comprises six three-dimensional convolution layers, three-dimensional maximum pooling layers, two full-connection layers and a final activation layer; two missing layers are added simultaneously to avoid overfitting.