CN113888519A - Prediction system for predicting lung solid nodule malignancy - Google Patents

Prediction system for predicting lung solid nodule malignancy Download PDF

Info

Publication number
CN113888519A
CN113888519A CN202111198616.0A CN202111198616A CN113888519A CN 113888519 A CN113888519 A CN 113888519A CN 202111198616 A CN202111198616 A CN 202111198616A CN 113888519 A CN113888519 A CN 113888519A
Authority
CN
China
Prior art keywords
malignancy
layer
nodule
prediction system
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111198616.0A
Other languages
Chinese (zh)
Inventor
陈勃江
李为民
石峰
隗英
张瑞
任静
周庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202111198616.0A priority Critical patent/CN113888519A/en
Publication of CN113888519A publication Critical patent/CN113888519A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • G06T2207/30064Lung nodule

Abstract

The invention provides a prediction system for predicting the malignancy degree of a lung solid nodule, belonging to the field of prediction systems. The prediction system can effectively predict the malignancy degree of the lung solid nodule, and has high prediction accuracy, high sensitivity and high specificity, and the AUC is as high as 0.819. The prediction model provided by the invention has very important effects on clinical doctors to select the optimal treatment strategy according to the malignant condition of the patient nodules and carry out prognosis evaluation, and has wide application prospects.

Description

Prediction system for predicting lung solid nodule malignancy
Technical Field
The invention belongs to the field of prediction systems, and particularly relates to a prediction system for predicting the malignancy degree of a lung solid nodule.
Background
Lung cancer is a major public health problem worldwide, and the death rate of lung cancer is the first of all cancers in China, no matter in cities or rural areas. Early discovery, early diagnosis and early treatment are key measures for improving the prognosis of lung cancer and reducing the mortality. Early imaging of lung cancer usually shows nodules in the lung, and with the widespread application of high-resolution CT in hospitals, more and more nodules in the lung are detected, so the diagnosis of benign and malignant lung nodules becomes a very common and important clinical problem.
Most lung nodules lack typical clinical symptoms and are often found by chance during physical examination or diagnosis of other diseases. One big data study found that the incidence of lung nodule screening in east asia, north america and europe was 35.5%, 23%, 29%, respectively, with 0.54%, 1.7%, 1.2% of lung nodules ultimately diagnosed as lung cancer. Another screening study found that the probability of finding lung nodules was 25.9% in people with over 30 packs of smoke per year, and 1.1% of them were diagnosed with lung cancer. The pulmonary nodules can be single-shot or multiple-shot, and can be divided into solid nodules, partial solid nodules and frosted glass density nodules according to different densities; they can be classified into nodules, micro-nodules and millet-shaped nodules according to their diameters. There are many reasons for the formation of pulmonary nodules, and the pathological mechanisms that occur vary. The majority of malignant nodules are lung cancers, with adenocarcinoma being the most common and squamous carcinoma being the second; benign nodules are mostly caused by lymph node or granulomatous lesions in the lung, accounting for about 80%, and benign tumors such as hamartoma account for about 10%.
In recent years, clinical diagnosis and assessment of lung nodules has become increasingly dependent on imaging techniques as representative aids, which can be broadly classified as imaging, histopathology, cytology, bronchoscopy, surgery, and non-surgical biopsy. Among them, CT, especially high resolution CT, is the first choice imaging method for detecting and diagnosing lung nodules at present, and has the advantages of being simple and convenient, non-invasive, high in density resolution, capable of clearly displaying lung nodule signs and microstructures, etc., but it has the disadvantages of having certain radiation influence and strong diagnosis subjectivity; although low dose ct (ldct) can reduce radiation dose, image noise increases and diagnostic sensitivity and specificity awaits increase.
Yangkongqiang and the like (differential diagnosis research on benign and malignant pulmonary nodules based on a deep learning algorithm, Zunyi medical college Master's paper, 2018) report a model for predicting the benign and malignant pulmonary nodules based on the deep learning algorithm, but the model can only predict the benign and malignant pulmonary nodules and cannot accurately predict the benign and malignant pulmonary parenchyma nodules. Therefore, a system capable of accurately predicting the malignancy degree of the pulmonary solid nodule is developed, and the system has great significance for clinical doctors to select the optimal treatment strategy and prognosis evaluation according to the malignancy condition of the patient solid nodule.
Disclosure of Invention
The invention aims to provide a prediction system for predicting the malignancy degree of a lung solid nodule.
The invention provides a prediction system for predicting the malignancy degree of a pulmonary solid nodule, wherein the malignancy degree of the nodule refers to malignancy or benign, and the system comprises the following parts:
a first part: a data input section; for inputting characteristic data of a patient, the characteristic data being clinical characteristics and/or CT images;
a second part: a model training part; inputting the characteristic data of the patient with known node malignancy degree into a neural network model, and carrying out model training;
and a third part: a prediction section; and inputting the characteristic data of the patient with the node malignancy degree to be predicted into the model trained by the second part, and outputting a node malignancy degree prediction result.
Further, in the first part, the feature data are clinical features and HRCT images.
HRCT is short for high resolution CT.
Further, the clinical characteristics include age, sex, smoking history, history of malignancy, family history of malignancy, nodule diameter, and nodule location.
Further, the clinical characteristics are age, sex, smoking history, history of malignancy, family history of malignancy, nodule diameter and nodule location.
Further, the HRCT image is a preprocessed HRCT image, and the preprocessing method includes the following steps: standardizing the HRCT image to obtain the average value of the gray scale of 0 and the variance of 1; then, the normalized image is cut out to have a size of 64X 64 pixels and a resolution of 1X 1mm3The image block of (1).
Further, the neural network model is a convolutional neural network model.
Furthermore, the convolutional neural network model consists of an input module, four down-sampling modules, a pooling module and an output module; the input module is a three-dimensional convolution layer with the kernel size of 3 and the number of output channels of 16; the input channels of the four down-sampling modules are respectively 16, 32, 64 and 128, the output channels of the four down-sampling modules are respectively 32, 64, 128 and 256, and each down-sampling module consists of a three-dimensional convolution layer with the kernel size of 3 and the step length of 2, a Batch Normalization layer and a ReLU layer; the pooling module is an average pooling layer with an output size of 1 × 1 × 1; the output module is composed of two full-connection layers, a ReLU layer and a Softmax layer, wherein the output result of the first full-connection layer is input into the ReLU layer, the output result of the ReLU layer is input into the second full-connection layer, and the output result of the second full-connection layer is input into the Softmax layer.
Further, the patient is a pulmonary solid nodule patient.
Further, the prediction system is an optimized prediction system, and the optimization method comprises one or more of the following methods:
the method comprises the following steps: optimizing the CNN model by adopting a transfer learning strategy;
the method 2 comprises the following steps: a Class Activation Mapping (CAM) attention module is adopted to guide the network to focus on the nodule area;
the method 3 comprises the following steps: and obtaining 1-dimensional depth features after the feature map passes through a global average pooling layer, and splicing and inputting the depth features and the clinical features to a full-link layer, so that clinical information and image information are fused to guide improvement of prediction performance.
The invention provides the application of the prediction system in preparing equipment for predicting the malignancy degree of the solid nodules in the lung of a patient.
The present invention also provides a computer readable storage medium having stored thereon a prediction system as described above.
Malignant nodules and benign nodules can be pathologically diagnosed by sputum cytology examination or chest surgery, bronchoscopy or percutaneous lung biopsy under CT guidance.
Compared with the prior art, the prediction model constructed by the invention has the beneficial effects that:
the prediction model constructed by the invention can effectively predict the malignancy degree of the lung solid nodule, and has high prediction accuracy, high sensitivity and high specificity, and the AUC is as high as 0.819.
The prediction model provided by the invention has very important effects on clinical selection of the optimal treatment strategy and prognosis evaluation of doctors according to the malignant condition of the solid nodules in the lung of patients, and has wide application prospects.
Obviously, many modifications, substitutions, and variations are possible in light of the above teachings of the invention, without departing from the basic technical spirit of the invention, as defined by the following claims.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. All the technologies realized based on the above contents of the present invention belong to the scope of the present invention.
Drawings
FIG. 1 is a patient nodule size distribution.
Fig. 2 is a schematic structural diagram of a neural network model according to embodiment 1.
Fig. 3 is a graph comparing ROC curves of prediction models, in which "CNN + clinical model" represents the model constructed in example 1, "CNN model" represents the model constructed in comparative example 1, "clinical + omics RF model" represents the model constructed in comparative example 2, "clinical RF model" represents the model constructed in comparative example 3, and "omics RF model" represents the model constructed in comparative example 4.
Detailed Description
The raw materials and equipment used in the invention are known products and are obtained by purchasing commercial products.
Patient information:
the patient data adopted in the embodiment and the comparative example of the invention are derived from electronic medical records of pulmonary nodule patients from 1 month 2010 to 7 months 2017 in Huaxi hospital of Sichuan university.
Patients were enrolled according to the following criteria: (a) untreated 5-30mm non-calcified solid nodules were found in breast CT; (b) the thickness of the CT layer is less than or equal to 1mm (high-resolution CT, HRCT); (c) the nodules were pathologically confirmed. The following patients were excluded: (a) nodules less than 5mm (low risk) or multiple lung nodules, (b) pleural effusion, atelectasis, or lymphadenectasis; (c) pathologically unclear diagnosis or metastatic tumors. All benign and malignant nodules were confirmed by sputum cytology examination or breast surgery, bronchoscopy or CT guided percutaneous lung biopsy pathology.
A total of 2821 pathologically confirmed 5-30mm solitary pulmonary nodules were found, of which the solid nodules 1865 were present. Further analysis was performed on 720 patients with HRCT indicated as pulmonary solid nodules, with continuous variable age and nodule diameter expressed as mean. + -. standard deviation and compared to Student's test, and other classifications described in number of cases (scale) and compared to chi-square test. The statistical tests are all two-sided tests, and the difference with P <0.05 has statistical significance.
The analysis found that 720 HRCTs showed 348 benign and 372 malignant nodules in the patients with lung solid nodules. Malignant nodules are mainly lung adenocarcinoma (92.2%), and stage I accounts for 91.1%; the benign nodules contain 52.0% of inflammatory pseudonodules, 25.0% of benign tumors, 17.8% of nodules and 5.2% of other types.
Table 1 shows the clinical characteristics of the patients, such as age, sex, smoking status, history of malignant tumors, family history of malignant tumors, diameter and location of nodules. Patients with malignant nodules and benign nodules differ in age distribution, tumor history, and nodule diameter: patients with malignant nodules are older (51 + -13 years vs 60 + -10 years old, P <0.001), the incidence of malignant tumors is higher (3.2% vs 8.1%, P ═ 0.005), and malignant nodules are significantly larger than benign nodules (17.6 + -6.1 mm vs 19.2 + -5.6 mm, P < 0.001). The patients with malignant nodules and benign nodules have no significant difference in sex, smoking, family history of malignant tumors, nodule parts and the like.
The 720 HRCT patients' data shown as lung solid nodules are randomly divided into 517 training sets and 203 testing sets, and the training samples and the testing samples have no significant difference in clinical characteristics such as age, sex, smoking status, malignant tumor history, malignant tumor family history, nodule diameter and parts.
In the embodiment of the invention and the comparison example, 517 cases of patient data are taken as training samples when the model is built, and the other 203 cases of patient data are taken as test samples.
TABLE 1 clinical characteristics of patients
Figure BDA0003304039590000041
Note: p < 0.05.
Example 1: construction method of CNN prediction model combined with clinical characteristics for predicting lung solid nodule malignancy
The CNN prediction model combined with clinical characteristics for predicting the malignancy degree of the lung solid nodule is constructed by the following three steps:
the first step is as follows: data acquisition
7 clinical features and raw HRCT images of each patient in the training and test sets were acquired and the raw HRCT images were preprocessed.
1. Collection of 7 clinical features
The clinical characteristics are mainly collected by referring to the electronic medical record system of the western Hospital of Sichuan university. The 7 clinical features are specifically: age, sex, smoking history, history of malignancy, family history of malignancy, diameter of nodule, and location of nodule.
2. Acquiring and preprocessing original HRCT image
(2.1) HRCT image acquisition
HRCT images are obtained from a multi-row spiral CT scanning model of a CT scanner before treatment of a patient, CT scanning is 100-120 kv, pixel size is 0.7-0.9 mm, layer thickness is less than or equal to 1mm, standard convolution kernels are used for reconstruction, enhanced images and non-enhanced images exist, and size distribution of nodules is detailed in figure 1.
(2.2) nodule segmentation
And (3) carrying out three-dimensional manual segmentation on the target nodule by using ITK-SNAP software, and when one person cannot determine the boundary of the nodule, consulting another chest expert, and carrying out blind method on pathological results of the lesion by both persons. All images were analyzed at the lung window (window width 1500Hounsfield Units (HU); window level-700 HU) and the mediastinum window (window width 350 HU; window level-40 HU).
(2.3) HRCT image preprocessing
In CT image preprocessing, the CT image range is firstly reduced to-400-1500 HU to remove most of the change caused by bones, and then the CT scanning is resampled to be 1.0mm in layer thickness by utilizing trilinear interpolation. The image is then normalized to a gray scale mean of 0 and a variance of 1. Then, a size of 64 × 64 × 64 pixels with a resolution of 1 × 1 × 1mm is cut out from the normalized image3The image block of (1).
In order to avoid network overfitting and enhance system robustness, the invention adopts a data enhancement mode to rotate, scale and turn each image at random 50% probability. Randomly rotating to sample a rotation angle of-10 degrees along the x axis or the y axis; random scaling is in the range of 0.75 to 1.25; random flipping is achieved randomly along each axis.
The second step is that: model training
As shown in fig. 2, the Convolutional Neural Network (CNN) model adopted in the present embodiment is composed of an input module, four downsampling modules, a pooling module, and an output module. The input module is a three-dimensional convolution layer with the kernel size of 3 and the number of output channels of 16. The input channels of the four down-sampling modules are respectively 16, 32, 64 and 128, the output channels of the four down-sampling modules are respectively 32, 64, 128 and 256, and each down-sampling module consists of a three-dimensional convolution layer with the kernel size of 3 and the step length of 2, a Batch Normalization layer and a ReLU layer. The pooling module is an average pooling layer with an output size of 1 × 1 × 1. The output module is composed of two full-connection layers, a ReLU layer and a Softmax layer, the output result of the first full-connection layer is input into the ReLU layer, the output result of the ReLU layer is input into the second full-connection layer, the output result of the second full-connection layer is input into the Softmax layer, and the prediction result is finally output.
7 clinical characteristics of the patient in the training set acquired in the first step and the preprocessed HRCT image are input, a CNN model is trained, and a trained lung solid nodule malignancy prediction model is obtained.
In the model training process, the convolutional neural network extracts low-level features (such as gradients, textures, colors, edges and edges) from the preprocessed HRCT image through a three-dimensional convolutional layer, and then reduces the number of the features through average pooling layer dimensionality reduction; the same continues with extracting high-level features (e.g., shape, texture information) through the three-dimensional convolutional layer, and then continues with dimensionality reduction through the average pooling layer. Finally, 256 preset deep learning features are extracted from the preprocessed HRCT image.
The third step: prediction of malignancy of solid nodules in lung
Predicting by using the lung solid nodule malignancy prediction model established in the second step, inputting 7 clinical characteristics of the patient in the test set acquired in the first step and the preprocessed HRCT image, and outputting a prediction result: whether it is a malignant nodule.
Example 2: optimization of CNN prediction model in combination with clinical features for predicting lung solid nodule malignancy
A model for predicting the malignancy of the solid nodules in the lung was constructed and optimized according to the method of example 1. The specific operation is as follows:
the first step is as follows: data acquisition
The same as in example 1.
The second step is that: model training
The same as in example 1.
The third step: model optimization
And sequentially adopting the following three optimization strategies to optimize the lung solid nodule malignancy degree prediction model established in the second step so as to improve the performance of the model:
(1) and optimizing the CNN model by adopting a transfer learning strategy. The general premise of transfer learning is that the target experimental data volume is small and a proper pre-training model is provided. Training data of a general pre-training model is similar to target experimental data in type (like belonging to medical images and pulmonary nodule lesions), the feature extraction capability of the pre-training model is strong, compared with the training of the model through random initialization on a small data set, the model is optimized through a transfer learning method, the model performance can be improved to a certain extent, and the training efficiency is improved.
And initializing a target classification network (namely a real nodule good and malignant classification network) by using the weight of a good and malignant classification model pre-trained in other large-scale data sets, namely initializing network parameters of a target task by using parameters of a feature extraction layer (namely a network convolution layer) of the pre-trained model. And fine-tuning the network weights using the data set, wherein the convolutional layer initial learning rate is set to 10-4The initial learning rate of the full connection layer is 10-3
(2) The class activation mapping attention module is used to guide the network to focus more on the nodule region. The method may not be friendly enough for studies with large differences in data size distribution dimensions. Because small target regions may become too small or disappear after being downsampled multiple times by the convolution layer, the target regions may not be well focused when generating an attention map that propagates back onto the convolution feature map. Therefore, the method needs to perform self-selection according to the data size distribution characteristics of the method.
The core idea of CAM is to back-propagate the weights of the fully-connected layer onto the convolution feature map to generate an attention map and up-sample to the original size, guiding the response region of the network to the target nodule region. And f is set as a feature map before the global average pooling layer, and omega is a feature weight matrix of the full-connection layer. In order to make the attention generation process trainable, in the present study, ω is first assigned to a convolution layer with a kernel of 1 × 1 × 1, then the convolution layer is used to convolve the feature map, and finally the attention feature map a is obtained through a Linear rectification function (ReLU), where the specific formula is as follows:
A=ReLU(conv(f,w))
wherein A is X Y Z, X, Y, Z respectively corresponding to original output image
Figure BDA0003304039590000071
(this parameter can be defined in terms of the number of network downsamplings m:
Figure BDA0003304039590000072
)。
(3) and obtaining 1-dimensional depth features after the feature map passes through a global average pooling layer, and splicing and inputting the depth features and the clinical features to a full-link layer, so that clinical information and image information are fused to guide improvement of prediction performance.
The fourth step: prediction of malignancy of solid nodules in lung
Predicting by using the lung solid nodule malignancy prediction model optimized in the third step, inputting 7 clinical characteristics of the patient in the test set acquired in the first step and the preprocessed HRCT image, and outputting a prediction result: whether it is a malignant nodule.
The following is a method of establishing a comparative prediction model.
Comparative example 1: CNN prediction model establishment method without clinical characteristics
The CNN prediction model without clinical features is established with reference to the method for constructing the CNN prediction model with clinical features in example 1, except that only the preprocessed HRCT image is input in the second step and the preprocessed HRCT image is input in the third step, and 7 clinical features are not input.
Comparative example 2: RF prediction model establishing method combining clinical characteristics and imaging omics characteristics
The RF prediction model of this comparative example was constructed as follows:
the first step is as follows: data acquisition
7 clinical features and 146 omics features were acquired for each patient in the training and test sets.
1. Collection of 7 clinical features
The clinical characteristics are mainly collected by referring to the electronic medical record system of the western Hospital of Sichuan university. The 7 clinical features are specifically: age, sex, smoking history, history of malignancy, family history of malignancy, diameter of nodule, and location of nodule.
2. Collecting 146 image omics characteristics
The 146 image omics features specifically include 42 manually defined features (including HU value histogram interval feature, volume, density, quality, etc.) and 104 widely used image omics features, including 18 first-order image intensity statistical features, 14 shape features, and 72 texture features (such as gray level co-occurrence matrix, gray level travel matrix, gray level region size matrix, neighborhood gray level difference matrix, and gray level correlation matrix).
The acquisition method of 146 imaging characteristics is as follows:
after manual segmentation, a number of imaging features (including those based on intensity, surface, shape, and texture) are extracted. Since the volume, density distribution and surface of nodules play a crucial role in the identification of benign and malignant nodules, 42 specific manual features were calculated here for each nodule, including the mean and standard deviation of HU values, volume, density, mass, 30 histograms and 7 surface features, and a detailed characterization is reported (Shi F, Xia L, Shann F et al. Large-scale screening of COVID-19 free communication obtained particulate using infection size-aware classification. Phys Med Biol 2021; 66:065031(065011 pp)). In addition, 104 widely used imaging features including first-order image intensity statistics, shape and texture features (gray level co-occurrence matrix (GLCM), gray level travel matrix (GLRLM), gray level region size matrix (GLSZM), adjacent gray level difference matrix (NGTDM) and gray level correlation matrix (GLDM)) were automatically extracted using PyRadiomics.
The second step is that: model training
Inputting 7 clinical features and 146 imaging group features acquired in the first step, and training a Random Forest (RF) model to obtain a trained RF prediction model.
During training of the RF model, the hyperparameters are automatically optimized by means of an algorithm based on Bayesian optimization, and therefore the best performance is obtained in a training data set. In the RF model, the maximum tree number is set to 1000, and the maximum tree depth is set to 7.
The third step: prediction of malignancy of solid nodules in lung
And (3) predicting by using the RF prediction model established in the second step, inputting 7 clinical characteristics and 146 imaging group characteristics of the patient in the test set acquired in the first step, and outputting a prediction result: whether it is a malignant nodule.
Comparative example 3: RF prediction model establishing method based on clinical characteristics
The clinical feature-based RF prediction model is established by referring to the construction method of the RF prediction model combining the clinical features and the imaging group features in the comparative example 2, and the difference is that: the second step and the third step only input 7 clinical features, but not input 146 image omics features.
Comparative example 4: RF prediction model establishment method based on image omics characteristics
The RF prediction model based on the imaging omics characteristics was established with reference to the method for constructing the RF prediction model combining the clinical characteristics and the imaging omics characteristics in comparative example 2, with the only difference that: the second step and the third step only input 146 image group characteristics, but not input 7 clinical characteristics.
The following test examples demonstrate the advantageous effects of the present invention.
Test example 1: predictive performance evaluation of models
1. Test method
The sensitivity, specificity and accuracy of each prediction model constructed in example 1 and comparative examples 1 to 4 were calculated, and the model performance was evaluated using the Receiver Operating Characteristic (ROC) curve and the area under the ROC curve (AUC). Differences in AUC values between models were evaluated using the delong test. All statistical analyses were performed using R version 3.6.0 and Python version 3.7.0.
2. Test results
TABLE 2 comparison of predicted Performance for each model
Figure BDA0003304039590000081
Figure BDA0003304039590000091
Note: denotes P <0.05 compared to "CNN + clinical characteristics model".
The predicted performance of each model is shown in table 2 and fig. 3. Positive for malignant nodules, the CNN prediction model combined with clinical features constructed in example 1 has the highest AUC (0.819, 95% CI 0.760-0.877), sensitivity of 0.778, specificity of 0.788, and accuracy of 0.783; the AUC of the clinical-trait-free CNN prediction model constructed in control 1 was 0.816 (95% CI 0.758-0.875), the sensitivity was 0.758, the specificity was 0.788, and the accuracy was 0.773.
The sensitivity of the RF prediction model which is constructed in the comparative example 2 and combines clinical characteristics and imaging group characteristics is 0.616, the specificity is 0.788, the accuracy is 0.704, and the AUC is 0.811; the sensitivity, specificity, accuracy and AUC of the clinical feature-based RF prediction model constructed in control example 3 were 0.535, 0.740, 0.640 and 0.721, respectively; the sensitivity, specificity, accuracy and AUC of the RF prediction model based on the imagery omics features constructed in comparative example 4 were 0.747, 0.606, 0.675 and 0.778, respectively.
The experimental results show that the CNN prediction model combined with clinical characteristics established in the embodiment 1 of the invention has the highest AUC and the best prediction result on the malignancy degree of the lung solid nodules.
In summary, the present invention provides a prediction system for predicting the malignancy of a nodule. The prediction system can effectively predict the malignancy degree of the lung solid nodule, and has high prediction accuracy, high sensitivity and high specificity, and the AUC is as high as 0.819. The prediction model provided by the invention has very important effects on clinical selection of the optimal treatment strategy and prognosis evaluation of doctors according to the malignant condition of patient nodules, and has wide application prospects.

Claims (10)

1. A prediction system for predicting the malignancy of a pulmonary solid nodule, characterized by: the malignancy of the nodule refers to malignancy or benign, and the system comprises the following parts:
a first part: a data input section; for inputting characteristic data of a patient, the characteristic data being clinical characteristics and/or CT images;
a second part: a model training part; inputting the characteristic data of the patient with known node malignancy degree into a neural network model, and carrying out model training;
and a third part: a prediction section; and inputting the characteristic data of the patient with the node malignancy degree to be predicted into the model trained by the second part, and outputting a node malignancy degree prediction result.
2. The prediction system of claim 1, wherein: in the first part, the feature data are clinical features and HRCT images.
3. The prediction system of claim 2, wherein: the clinical characteristics include age, gender, smoking history, history of malignancy, family history of malignancy, nodule diameter, and nodule location.
4. The prediction system of claim 3, wherein: the clinical characteristics are age, sex, smoking history, history of malignancy, family history of malignancy, nodule diameter and nodule location.
5. The prediction system of claim 2, wherein: the HRCT image is a preprocessed HRCT image, and the preprocessing method comprises the following steps: standardizing the HRCT image to obtain the average value of the gray scale of 0 and the variance of 1; then cutting out 64 frames from the standardized image64 × 64 pixels with a resolution of 1 × 1 × 1mm3The image block of (1).
6. The prediction system according to any one of claims 1 to 5, wherein: the neural network model is a convolutional neural network model.
7. The prediction system of claim 6, wherein: the convolutional neural network model consists of an input module, four down-sampling modules, a pooling module and an output module; the input module is a three-dimensional convolution layer with the kernel size of 3 and the number of output channels of 16; the input channels of the four down-sampling modules are respectively 16, 32, 64 and 128, the output channels of the four down-sampling modules are respectively 32, 64, 128 and 256, and each down-sampling module consists of a three-dimensional convolution layer with the kernel size of 3 and the step length of 2, a Batch Normalization layer and a ReLU layer; the pooling module is an average pooling layer with an output size of 1 × 1 × 1; the output module is composed of two full-connection layers, a ReLU layer and a Softmax layer, wherein the output result of the first full-connection layer is input into the ReLU layer, the output result of the ReLU layer is input into the second full-connection layer, and the output result of the second full-connection layer is input into the Softmax layer.
8. The prediction system according to any one of claims 1 to 7, wherein: the patient is a lung solid nodule patient.
9. Use of the prediction system of any one of claims 1 to 8 in the manufacture of a device for predicting the malignancy of solid nodules in a lung of a patient.
10. A computer readable storage medium having stored thereon a prediction system as claimed in any one of claims 1 to 8.
CN202111198616.0A 2021-10-14 2021-10-14 Prediction system for predicting lung solid nodule malignancy Pending CN113888519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111198616.0A CN113888519A (en) 2021-10-14 2021-10-14 Prediction system for predicting lung solid nodule malignancy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111198616.0A CN113888519A (en) 2021-10-14 2021-10-14 Prediction system for predicting lung solid nodule malignancy

Publications (1)

Publication Number Publication Date
CN113888519A true CN113888519A (en) 2022-01-04

Family

ID=79002910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111198616.0A Pending CN113888519A (en) 2021-10-14 2021-10-14 Prediction system for predicting lung solid nodule malignancy

Country Status (1)

Country Link
CN (1) CN113888519A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665017A (en) * 2023-07-28 2023-08-29 神州医疗科技股份有限公司 Prostate cancer prediction system based on image histology and construction method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574859A (en) * 2015-12-14 2016-05-11 中国科学院深圳先进技术研究院 Liver tumor segmentation method and device based on CT (Computed Tomography) image
CN107451609A (en) * 2017-07-24 2017-12-08 上海交通大学 Lung neoplasm image identification system based on depth convolutional neural networks
CN108038844A (en) * 2017-11-30 2018-05-15 东北大学 A kind of good pernicious Forecasting Methodology of Lung neoplasm based on legerity type CNN
CN108389201A (en) * 2018-03-16 2018-08-10 北京推想科技有限公司 The good pernicious sorting technique of Lung neoplasm based on 3D convolutional neural networks and deep learning
CN108596868A (en) * 2017-07-26 2018-09-28 江西中科九峰智慧医疗科技有限公司 Lung neoplasm recognition methods and system in a kind of chest DR based on deep learning
CN109523521A (en) * 2018-10-26 2019-03-26 复旦大学 Lung neoplasm classification and lesion localization method and system based on more slice CT images
CN110309860A (en) * 2019-06-06 2019-10-08 昆明理工大学 The method classified based on grade malignancy of the convolutional neural networks to Lung neoplasm
CN110516688A (en) * 2019-08-30 2019-11-29 北京推想科技有限公司 The extracting method and system of Lung neoplasm attributive character information
CN111598871A (en) * 2020-05-15 2020-08-28 安徽医学高等专科学校 Multi-feature fusion auxiliary lung vitreous nodule detection system and medium
CN111915596A (en) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 Method and device for predicting benign and malignant pulmonary nodules
CN112215799A (en) * 2020-09-14 2021-01-12 北京航空航天大学 Automatic classification method and system for grinded glass lung nodules
CN112951406A (en) * 2021-01-27 2021-06-11 安徽理工大学 Lung cancer prognosis auxiliary evaluation method and system based on CT (computed tomography) image omics
CN113033650A (en) * 2021-03-22 2021-06-25 Oppo广东移动通信有限公司 Image classification method, training method and device of classification model and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574859A (en) * 2015-12-14 2016-05-11 中国科学院深圳先进技术研究院 Liver tumor segmentation method and device based on CT (Computed Tomography) image
CN107451609A (en) * 2017-07-24 2017-12-08 上海交通大学 Lung neoplasm image identification system based on depth convolutional neural networks
CN108596868A (en) * 2017-07-26 2018-09-28 江西中科九峰智慧医疗科技有限公司 Lung neoplasm recognition methods and system in a kind of chest DR based on deep learning
CN108038844A (en) * 2017-11-30 2018-05-15 东北大学 A kind of good pernicious Forecasting Methodology of Lung neoplasm based on legerity type CNN
CN108389201A (en) * 2018-03-16 2018-08-10 北京推想科技有限公司 The good pernicious sorting technique of Lung neoplasm based on 3D convolutional neural networks and deep learning
CN109523521A (en) * 2018-10-26 2019-03-26 复旦大学 Lung neoplasm classification and lesion localization method and system based on more slice CT images
CN110309860A (en) * 2019-06-06 2019-10-08 昆明理工大学 The method classified based on grade malignancy of the convolutional neural networks to Lung neoplasm
CN110516688A (en) * 2019-08-30 2019-11-29 北京推想科技有限公司 The extracting method and system of Lung neoplasm attributive character information
CN111598871A (en) * 2020-05-15 2020-08-28 安徽医学高等专科学校 Multi-feature fusion auxiliary lung vitreous nodule detection system and medium
CN111915596A (en) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 Method and device for predicting benign and malignant pulmonary nodules
CN112215799A (en) * 2020-09-14 2021-01-12 北京航空航天大学 Automatic classification method and system for grinded glass lung nodules
CN112951406A (en) * 2021-01-27 2021-06-11 安徽理工大学 Lung cancer prognosis auxiliary evaluation method and system based on CT (computed tomography) image omics
CN113033650A (en) * 2021-03-22 2021-06-25 Oppo广东移动通信有限公司 Image classification method, training method and device of classification model and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HASIB ZUNAIR 等: "Uniformizing Techniques to Process CT scans with 3D CNNs for Tuberculosis Prediction" *
RUI ZHANG 等: "Developing of risk models for small solid and subsolid pulmonary nodules based on clinical and quantitative radiomics features" *
XI OUYANG 等: "Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia" *
王智 等: "三维卷积神经网络预测肺腺癌患者肺CT内并发结节属性" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665017A (en) * 2023-07-28 2023-08-29 神州医疗科技股份有限公司 Prostate cancer prediction system based on image histology and construction method

Similar Documents

Publication Publication Date Title
Murugesan et al. A hybrid deep learning model for effective segmentation and classification of lung nodules from CT images
Khan et al. Lungs nodule detection framework from computed tomography images using support vector machine
Naik et al. Lung nodule classification on computed tomography images using deep learning
CN110188792B (en) Image feature acquisition method of MRI three-dimensional image of prostate
US8144963B2 (en) Method for processing biomedical images
Li et al. Research on the auxiliary classification and diagnosis of lung cancer subtypes based on histopathological images
Albalawi et al. Classification of breast cancer mammogram images using convolution neural network
Yang et al. Whole breast lesion detection using naive bayes classifier for portable ultrasound
Li Research on the detection method of breast cancer deep convolutional neural network based on computer aid
CN114066882A (en) Lung adenocarcinoma Ki67 expression level non-invasive detection method and device based on depth imaging omics
CN113850328A (en) Non-small cell lung cancer subtype classification system based on multi-view deep learning
Atrey et al. Mammography and ultrasound based dual modality classification of breast cancer using a hybrid deep learning approach
Renukadevi et al. Optimizing deep belief network parameters using grasshopper algorithm for liver disease classification
Katiyar et al. A Comparative study of Lung Cancer Detection and Classification approaches in CT images
CN113538435B (en) Pancreatic cancer pathological image classification method and system based on deep learning
Jothi et al. Soft set based feature selection approach for lung cancer images
CN113888519A (en) Prediction system for predicting lung solid nodule malignancy
Kawata et al. Example-based assisting approach for pulmonary nodule classification in three-dimensional thoracic computed tomography images1
yahia Ibrahim et al. An enhancement technique to diagnose colon and lung cancer by using double CLAHE and deep learning
CN110738649A (en) training method of Faster RCNN network for automatic identification of stomach cancer enhanced CT images
Dabade et al. A review paper on computer aided system for lung cancer detection
CN113889235A (en) Unsupervised feature extraction system for three-dimensional medical image
Li et al. Computer-aided detection breast cancer in whole slide image
CN114822842A (en) Magnetic resonance colorectal cancer T stage prediction method and system
Kawata et al. Computer-aided CT image features improving the malignant risk prediction in pulmonary nodules suspicious for lung cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220104