CN116542937A - Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology - Google Patents
Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology Download PDFInfo
- Publication number
- CN116542937A CN116542937A CN202310518369.0A CN202310518369A CN116542937A CN 116542937 A CN116542937 A CN 116542937A CN 202310518369 A CN202310518369 A CN 202310518369A CN 116542937 A CN116542937 A CN 116542937A
- Authority
- CN
- China
- Prior art keywords
- image
- lung
- tumor
- histology
- wettability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000020816 lung neoplasm Diseases 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013135 deep learning Methods 0.000 title claims abstract description 36
- 208000037841 lung tumor Diseases 0.000 title claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 25
- 239000003814 drug Substances 0.000 title claims abstract description 19
- 230000008595 infiltration Effects 0.000 title claims description 9
- 238000001764 infiltration Methods 0.000 title claims description 9
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 49
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims abstract description 44
- 201000005202 lung cancer Diseases 0.000 claims abstract description 44
- 210000004072 lung Anatomy 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims description 9
- 238000002372 labelling Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000007619 statistical method Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000012952 Resampling Methods 0.000 claims description 2
- 238000002790 cross-validation Methods 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract description 7
- 238000012706 support-vector machine Methods 0.000 abstract 2
- 238000000605 extraction Methods 0.000 description 10
- 238000003745 diagnosis Methods 0.000 description 8
- 201000011510 cancer Diseases 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000001356 surgical procedure Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 238000003384 imaging method Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000002512 chemotherapy Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 230000007170 pathology Effects 0.000 description 4
- 238000011282 treatment Methods 0.000 description 4
- 206010027476 Metastases Diseases 0.000 description 3
- 208000009956 adenocarcinoma Diseases 0.000 description 3
- 238000002591 computed tomography Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000011065 in-situ storage Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000009401 metastasis Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001959 radiotherapy Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 206010056342 Pulmonary mass Diseases 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000003759 clinical diagnosis Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 239000005337 ground glass Substances 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 210000004185 liver Anatomy 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 206010023774 Large cell lung cancer Diseases 0.000 description 1
- 206010025066 Lung carcinoma cell type unspecified stage 0 Diseases 0.000 description 1
- 206010035664 Pneumonia Diseases 0.000 description 1
- 206010041067 Small cell lung cancer Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000002469 basement membrane Anatomy 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 208000026106 cerebrovascular disease Diseases 0.000 description 1
- 238000013170 computed tomography imaging Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 210000004907 gland Anatomy 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 206010020718 hyperplasia Diseases 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 230000003907 kidney function Effects 0.000 description 1
- 230000003908 liver function Effects 0.000 description 1
- 201000005249 lung adenocarcinoma Diseases 0.000 description 1
- 201000009546 lung large cell carcinoma Diseases 0.000 description 1
- 108700025647 major vault Proteins 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000031864 metaphase Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000002685 pulmonary effect Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011127 radiochemotherapy Methods 0.000 description 1
- 208000000587 small cell lung carcinoma Diseases 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30061—Lung
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A method for discriminating lung tumor wettability based on characteristics of clinical medicine, image histology, deep learning neural network and the like specifically comprises collecting clinical characteristics of lung cancer patients. Image histology extracts texture features and higher-order features from CT images of lung cancer patients. The deep learning neural network extracts high-order nonlinear features in a lung cancer patient. In the aspect of CT image processing, a doctor marks a tumor area, and tumor cutting is performed to obtain a lung tumor part. In the aspect of model training, a lung image CT sample is simulated and generated by adopting multi-angle rotation (0 to 30 degrees), so that the generalization capability of the model is improved; and meanwhile, the problem of unbalanced sample distribution is solved by adopting an SMOTE algorithm. And selecting important features from all the features by adopting an LASSO algorithm to judge the wettability of the lung tumor, and calculating the weight value of the selected features. In sample classification, a nonlinear SVM (support vector machine) classifier with better stability is adopted. The invention has the following advantages: the characteristics of clinical medicine, neural network, image histology and the like are comprehensively utilized, and the decisive characteristics of tumor wettability are deeply explored, so that the result is more scientific and interpretable.
Description
Technical Field
The invention belongs to the technical field of medical images and artificial intelligence, and particularly relates to a method for judging lung tumor infiltration before a patient operates according to clinical lung CT images, so that the success rate of operation is improved.
Background
The determination of the wettability of lung cancer has important significance and value for the formulation of treatment schemes of lung cancer patients. In situ lung cancer is the earliest lung cancer, so it is also called stage 0 cancer. At this time the lung cancer cells are in the stage of just sprouting and do not invade their basal layer. For the treatment of the lung cancer in situ, once the lung cancer is found, the lung cancer can be cured by only adopting surgical excision, radiotherapy and chemotherapy are not needed, and serious complications are avoided after the lung cancer is cured, so that the life quality is not greatly influenced, and the service life is not influenced. Micro-invasive lung cancer refers to cancer cells that have broken through the basement membrane and began to invade surrounding tissues, but the extent of invasion is small and cannot be determined by the naked eye. For treating micro-invasive lung cancer, surgical treatment is also needed, and if enough excision can be ensured, the incision is negative, and in general, the treatment by the simple surgery is enough, and chemotherapy is not needed. Invasive lung cancer is also classified into early, middle and late stage invasive cancer, and early stage invasive cancer generally requires only surgical treatment and no radiotherapy or chemotherapy. Whereas metaphase invasive cancer may require chemoradiotherapy in addition to surgery. Advanced cancers, which are usually organs already transferred to a distant place, such as liver metastasis, lung metastasis and brain metastasis, cannot be surgically excised in most cases, and can only be treated by means of radiotherapy and chemotherapy, targeted therapy, immunotherapy and the like, so that the pain is relieved. Therefore, determining the wettability of a lung cancer patient prior to surgery is important for the formulation of a treatment regimen.
At present, the expected accuracy is not achieved only by clinical medical diagnosis, but also the final judging accuracy is not stable because a clinician uses a series of characteristics such as lesion size, attenuation, burr characteristics, cystic space around lesions and the like to diagnose the tumor attribute, and the characteristics have poor wettability judging accuracy on atypical lung cancer tumors, and meanwhile, clinical judgment has subjectivity and experience. Imaging histology also presents challenges for the discrimination of lung cancer wettability, as lung nodules may be small and look similar to other structures in the lung (e.g., blood vessels) or benign processes (e.g., focal tissue pneumonia). At the same time, the feature extraction process of image histology is relatively fixed, which ignores individual differences of patients. For these reasons, CT-based lung cancer screening has a high false positive rate. It is desirable to find a more flexible method to further improve the accuracy of the classification model.
In recent years, deep learning has been highly successful in the classification task of natural images. Deep learning provides various high-level semantic information of images (CT scans) that are different from image features extracted by image histology. Therefore, we expect deep learning to improve predictive models of classical image histology for lung cancer wettability. However, since the CT medical image is different from the conventional natural image and the sample data amount is not up to a certain scale, it is difficult to make the model achieve the expected effect.
There have been studies showing that feature level fusion can significantly improve the accuracy of final result prediction. The feature level fusion combines different feature vectors extracted and calculated by two models of an image histology and deep learning convolution network into a new feature vector for subsequent classification and prediction. The method has been applied to tasks such as detection and classification of lung nodules, image attribute analysis of tumors, and cancer survival prediction.
Inspired by the above study, we used a combination model of clinical diagnostics, image histology and deep learning to identify lung tumor infiltration. Due to the heterogeneity within tumors, a single model cannot efficiently and comprehensively extract useful features of tumors. Thus, the proposed method uses advanced multi-scale network structures of feature extraction to explore imaging phenotypes and predict the wettability of lung cancer tumors.
The invention combines clinical medicine, a traditional image histology method and a deep learning method, fuses the three types of characteristics, screens the characteristics by using a LASSO method, and adopts an SVM nonlinear classifier to judge the infiltration of lung cancer tumors. The traditional image histology and the neural network are combined, so that the accuracy of wettability discrimination is greatly improved.
Disclosure of Invention
The invention extracts the characteristics of the CT image of the lung tumor by fully developing, and particularly extracts the characteristics by clinical medicine, image histology and neural network. The invention collects clinical data of a first resident patient in Changzhou, comprising CT images of the lungs of the patient and image pictures of tumor areas marked by doctors, and simultaneously considers that the clinical data characteristics of the patient have important influence on final diagnosis. And a series of CT images are subjected to three-dimensional fusion, so that the feature extractor can obtain features conveniently. And then acquiring an ROI (region of interest) in the lung CT image, positioning the ROI in the tumor labeling picture, and intercepting a tumor image with a fixed size from each patient as an input image of a model. And (3) carrying out feature extraction on an input lung image by adopting an image histology feature extractor, and simultaneously carrying out feature extraction on an image by adopting a three-dimensional convolutional neural network Densenet to fuse clinical features, image histology features and deep learning features. And then adopting a characteristic screening method, adopting a LASSO algorithm to remove the characteristics with small influence on wettability and even no influence, and calculating the influence weight of the selected characteristics on the predicted value. In the data of the acquired patient, the ratio difference between invasive tumor and non-invasive tumor is large, two kinds of training data with equal quantity are generated by adopting an SMOTE algorithm, and finally, the lung tumor wettability is judged by adopting a nonlinear SVM classification model.
The invention still has room for improvement, and a preferred scheme exists. There are several schemes and ideas that one can use a deep learning model transducer based on self-attention mechanisms, which can produce a more interpretable model. Secondly, the research of the invention adopts a characteristic fusion method, but on the final sample classification, an SVM nonlinear classifier with better stability is adopted, and according to the research study of the next stage, other advanced nonlinear classifiers are adopted with perhaps better effects.
Compared with the traditional method, the research of the invention has the following advantages: firstly, the problem that heterogeneity and misdiagnosis cannot be avoided by only relying on low-dimensional characteristics in clinical medicine is solved, and the method can achieve excellent performance in atypical lung cancer cases. Secondly, the feature extraction process of image histology is relatively fixed, which ignores individual differences of patients. The addition of deep learning methods has attempted to explore more comprehensive features. Thirdly, the most important diagnosis characteristics are reserved by the statistical method LASSO, and the characteristic weight value is obtained, so that the problem that the simple deep learning neural network lacks of interpretability and scientificity in the aspect of medical diagnosis is solved.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a diagram showing a sample case of the present invention in ITK-SNAP software;
FIG. 3 is a diagram of a LASSO model parameter selection process according to the present invention;
FIG. 4 is a graph of a characteristic process of the invention selected to have an effect on lung tumor wettability;
FIG. 5 is a graph showing the number of lung cancer with balanced infiltrative and non-infiltrative lung cancer by SMOTE algorithm in the present invention:
FIG. 5 (a) is a graph showing the distribution of the number of cases before the two cases are equalized by the SMOTE algorithm in the present invention;
FIG. 5 (b) is a distribution chart of the number of cases after the two types of cases are equalized by the SMOTE algorithm in the present invention;
Detailed Description
The first table is a table of the determination results of wettability by image histology in the present invention.
And a second table is a discrimination result table of the wettability of the deep learning convolutional network in the invention.
And thirdly, a judging result table of wettability by combining clinical medicine and image histology with a neural network in the invention.
Specific implementation steps
Patient data acquisition:
we collected pulmonary glass nodule patients from 2019, 6 to 2020, 6 of the first people hospital in Changzhou and screened the group of patients by two experienced cardiothoracic surgeons. We screened 356 patients resected by surgery in this hospital and had a clear histological pathology subtype. And sex, age, height, weight, hypertension, diabetes and infiltration status were selected as clinical features (table 1). The specific group inclusion criteria are as follows: 1. a patient with lung cancer operation; 2. the pathology report is shown as: atypical adenomatous hyperplasia, in situ adenocarcinoma, microanflammatory adenocarcinoma, and invasive adenocarcinoma; 3. the group of patients has CT imaging data of a first people hospital in Changzhou city below 2 mm; 4. the diameter of the ground glass nodule is less than 2cm;5. the real components of the ground glass nodules are less than 50 percent. The exclusion criteria were: 1. a dead patient; 2. patients with serious cardiovascular and cerebrovascular diseases or abnormal liver and kidney functions; 3. slow lung resistance; 4. large cell lung cancer; 5. small cell lung cancer; 6. central lung cancer; 7. combining other mixed lung cancer, etc.
CT scanning parameters:
in this study, patient chest thin layer CT device manufacturers included GE, phillips, siemens. The acquisition parameters are consistently set as follows: the voltage was 140kVp (range 100-140 kVp), the tube current was 740mA (range 100-752 mA), the slice thickness was 1.0mm (range 0.65-2.0 mm).
Image segmentation and image preprocessing:
thin layer CT raw data for all patients were copied and loaded using ITK-SNAP software (www.itksnap.org) and ROI mapping was performed on tumor sites in thin layer CT. When data is loaded into the ITK-SNAP software, all patient information is masked. Three-dimensional ROI mapping of the entire tumor site was then performed by two cardiothoracic surgeons and reviewed by other surgeons. Three-dimensional ROI drawing for all thin layer CT was performed at the lung window level. As shown in the second figure, a lung CT image of the patient is selected, wherein (a) and (b) are respectively a lung side view and a tumor labeling area diagram of the patient, (c) and (d) are respectively a lung top view and a tumor labeling area diagram of the patient, and (e) and (f) are respectively a lung front view and a tumor labeling area diagram of the patient.
And determining tumor position information and size information in the lung CT image according to the ROI region in the labeling picture, determining the position coordinates of a tumor center point, and cutting a 50 x 50 cube tumor image taking the obtained tumor center coordinates as the center on the original CT image. And then carrying out picture normalization and unification treatment. The method specifically comprises the following steps: because the sizes of the body pixels of CT images are not consistent when different patients are acquired by CT, image resampling is performed. Unlike natural images, in medical imaging, the real size of a human body part (imaging size) is very important information. Thus, for example, in a CT image, there are two indices of voxel spacing (spacing) and voxel number (resolution): imaging size = spacing x resolution.
Since different scanners or different acquisition protocols typically produce datasets with different voxel spacing, which is not understood by neural network CNN, we need to resample the spacing of all medical images to be uniform, so that resolution can reflect the imaging size.
Extracting image histology characteristics:
high throughput features are extracted to quantitatively analyze the essential properties of the ROI. Extracting features in the lung image of the patient according to the image histology feature extractor, wherein the features comprise:
1) Shape characteristics: the shape and size features reflect the information of the shape, size, regularity and the like of the tumor. For example, the major diameter, volume and surface area of a tumor reflect tumor size information; the ellipsoidal degree of a tumor reflects whether its shape tends to be spherical; and the compactness reflects whether the shape of the tumor is regular, whether the edges are regular, etc.
2) First order statistical features: the first order statistical features are obtained by calculating gray values of the ROI image, and generally comprise first order statistics such as maximum value, minimum value, mean value, median value, range, variance, kurtosis, skewness, entropy and the like. The first-order statistical features are used to reflect the distribution of gray scale intensities within the tumor and reflect the heterogeneity within the tumor.
3) Texture features: the first-order statistics and shape size features reflect low-dimensional information (e.g., brightness and shape, etc.) in the image that is easily perceived visually. Unlike first-order statistical features and shape features, texture features are obtained mainly by several texture matrices: such as a gray scale correlation matrix (GLDM), a gray scale area size matrix (GLSZM), a gray scale co-occurrence matrix (GLCM), a gray scale run-length matrix (GLRLM), a neighborhood gray scale differential matrix (NGTDM), etc., which can quantify information that is difficult to be perceived simply by vision, such as a texture pattern or tissue distribution inside a tumor.
4) High-order features and model-conversion-based features: although the three types of features described above reflect visual information and texture patterns of the tumor in terms of low and high dimensions, respectively, the amount of such information is limited. In order to obtain information of different frequency domains, the feature extraction is also applied to wavelet transformation, which decomposes the original tumor image into different frequency domains, and then extracts the three types of features in each wavelet image. The wavelet transformation can obtain multi-frequency-domain multi-scale image information, and for clinical problems which are difficult to describe by using simple tumor image visual characteristics, the high-dimensional abstract characteristics of the wavelet characteristics can play different roles, and capture clinical information which is difficult to visually perceive.
The four features are extracted from CT images one by one with the help of an image histology feature extractor, and are used as the basis for quantitative analysis of tumors.
Deep learning feature extraction:
the CT image feature extraction by using the three-dimensional neural network Densenet specifically comprises the following parts.
1) Data enhancement: CT scans also enhance data by rotating at random angles during training. Since the data is stored in the shape of Rank-3, we add a size of 1 at axis 4 to be able to perform 3D convolution on the data. While defining the training and validation data loader, the training data will be randomly rotated through different angles. Both training and validation data have re-normalized the gray values to zero to one.
2) Building a training set and a verification set: the training set and the verification set are divided by 7:3, and a random division mode is adopted.
3) Constructing a 3D convolutional neural network model and training the model: and (3) constructing a 3-dimensional neural network, designating the model weight with the best save effect of save_best_only=true, designating the early-stop strategy, and then obtaining the optimal model weight parameters.
4) Selecting a model loss function: the loss function selects the Binary cross-entcopy.
(wherein y i For case prediction results)
Selection of a model optimizer: the model optimizer selects Adam, and the Adam algorithm records the first moment of the gradient, namely the average of all the past gradients and the current gradient, so that the gradient updated last time is not too different from the gradient updated currently when updated each time, namely the gradient is smooth and stable in transition, and the model optimizer can adapt to an unstable objective function.
5) Setting of learning rate: the dynamic learning rate is set, the model is optimized by using a larger learning rate at the initial stage of model training, the learning rate is gradually reduced along with the increase of iteration times, and the model is ensured not to have too large fluctuation at the later stage of training, so that the model is closer to an optimal solution.
6) Acquiring CT characteristics: the model loads the pre-training weight parameters and outputs the characteristics of each patient.
Feature fusion and screening:
clinical medical characteristics, image histology characteristics and neural network characteristics are fused: the clinical characteristics, the image histology characteristics and the characteristics extracted by the neural network of each patient are combined to form a new characteristic vector.
This stage is to select important features from all features that have an influence on wettability. Although feature data of all aspects of tumors in CT images are extracted through three methods, the features do not have influence on wettability, so that the feature with the strongest classifying ability in training is selected by using LASSO algorithm.
Wherein the LASSO algorithm is specifically as follows: the basic idea of LASSO is to minimize the sum of squares of residuals under the constraint that the sum of absolute values of the regression coefficients is less than a constant, so that some regression coefficients strictly equal to 0 can be generated, and an interpretable model is obtained, and the working principle of the model function is as follows:
y i for the prediction result, the prediction result in the present invention is invasive lung cancer or non-invasive lung cancer. X is x i Corresponding to y i The value of each feature, w i For each feature weight. Wherein the cost function expression is as follows:
m is the number of features, w i For characteristic weights, in [10 ] -3 ,10 1 ]And selecting optimal regularization parameters (the regularization parameters play an important role on the LASSO model) on the range, and outputting the weight occupied by the selected characteristics. And selecting the characteristic with larger weight as the input of the classifier. The optimized canonical parameters are selected as shown in fig. 3, where MSE is the mean square error and Lamda is the LASSO model parameters. The features with the feature weight of 0 are removed as far as possible, a small number of important features are reserved, and the final prediction result is more accurate.
By applying the LASSO algorithm, the multiple characteristic coefficients of the patient are compressed and the regression coefficients are changed, so that the purpose of characteristic selection is achieved, and the characteristics influencing wettability are screened out. The selected features are shown in fig. 4, with the horizontal axis being the selected feature name and the vertical axis being the feature weight.
Training a classifier:
because the lung tumor type of the collected patient occupies a larger wettability, the ratio of the number of patients suffering from the invasive lung tumor to the number of non-invasive (in situ) lung tumors in the whole data is 5:3, the lung adenocarcinoma which is infiltrated in the training set occupies a larger ratio, and the prediction sensitivity of the final model to the invasive lung gland is higher, so that the SMOTE algorithm is adopted to balance the sample proportion. The basic idea of SMOTE algorithm is to analyze a minority class of samples and artificially synthesize new samples from the minority class of samples to be added to the dataset. After the data are subjected to the SMOTE algorithm, the quantity ratio of the two kinds of data in the training set is 1:1. The original data distribution is shown in fig. 5 (a), and the data distribution after the SMOTE algorithm is adopted is shown in fig. 5 (b).
In the classification and discrimination stage, an SVM nonlinear classifier is adopted to discriminate the tumor infiltration. The SVM is a classification model, and is a linear classifier with the maximum interval defined in the feature space; the SVM also includes a kernel technique, in which RBF kernel functions are employed to enable nonlinear mapping.
Results comparison shows
The primary ideas of the method for judging the lung tumor infiltration are four:
the method comprises the following steps: according to pathology theory, available characteristics such as microcapillaries are manually extracted from lung tumor regions of patients, and the accuracy rate of judging lung tumor stage, type and wettability reaches 60% through clinical medicine summary.
The second method is as follows: the feature extraction is carried out on the CT image and the marked image of the lung by adopting an image histology method, and then the wettability is judged by adopting an SVM classifier. The results are shown in Table one.
Table I image group science determination wettability results table
precision | recall | f1-score | support | |
0 | 0.67 | 0.80 | 0.73 | 50 |
1 | 0.69 | 0.52 | 0.59 | 42 |
accuracy | 0.67 | 92 | ||
macro avg | 0.68 | 0.66 | 0.66 | 92 |
weighted avg | 0.68 | 0.67 | 0.67 | 92 |
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
And a third method: is a method of adopting a deep learning convolutional network (Densenet model). And obtaining final model weight through multiple rounds of model training, and predicting the lung cancer wettability of the test set cases. The results are shown in Table II.
Table two deep learning convolution network judging lung cancer wettability result table
precision | recall | f1-score | support | |
0 | 0.53 | 0.64 | 0.58 | 14 |
1 | 0.80 | 0.71 | 0.75 | 28 |
macro avg | 0.69 | 0.69 | 0.69 | 42 |
weighted avg | 0.66 | 0.68 | 0.67 | 42 |
Weighted avg | 0.71 | 0.69 | 0.70 | 42 |
Samples avg | 0.69 | 0.69 | 0.69 | 42 |
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
The method four: the method combines the three methods, adopts image histology to extract image histology characteristics, and adopts a deep learning method to extract nonlinear characteristics. Combining the image histology characteristics with the neural network characteristics and the clinical pathology characteristics to form a final characteristic vector, and finally judging wettability by adopting an SVM nonlinear classifier. The results are shown in Table three.
Table III clinical medicine, image histology and deep learning convolution network judging lung cancer wettability result table
precision | recall | f1-score | support | |
0 | 0.81 | 0.88 | 0.84 | 33 |
1 | 0.79 | 0.68 | 0.73 | 22 |
accuracy | 0.80 | 55 | ||
macro avg | 0.80 | 0.78 | 0.79 | 55 |
weighted avg | 0.80 | 0.80 | 0.80 | 55 |
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
Each method results table formula annotation:
the True value is Positive, and the model considers the number of Positive (True positive=tp);
the true value is positive, the model considers the number of Negative (False negative=fn);
the true value is negative, the model considers the number of Positive (False positive=fp);
the True value is Negative, and the model considers the number of Negative (True negative=tn);
macroavg: precision, recall, F1 sum-averaging of the classes;
weighted avg: and (3) carrying out weighted average on the f1_score of each category, wherein the weight is the proportion of the number of each category in y_true.
Discussion of the invention
According to the invention, the characteristics in the CT image are fully developed, and the combination method of clinical medicine, image histology and deep learning is adopted, so that compared with the method of singly using image histology or deep learning, the accuracy of judging the lung tumor wettability is greatly improved.
The heterogeneity and misdiagnosis are unavoidable by a clinical diagnostics method alone, the subjective influence of the clinical diagnostics is large, and the diagnosis accuracy is not stable. Compared with clinical diagnosis, the image histology has better effect. Because it explores the higher dimensional features in CT images, however, these features are difficult to identify and discover by the human eye alone, and at the same time, they also play an important role in wettability.
Deep learning has enjoyed great success in the fields of image analysis and computer vision and has accomplished many complex image classification tasks. However, during experimentation, it has been found that pure deep learning does not effectively improve classification results. The deep learning method (network model is Densenet) has a sensitivity of only 0.650 and a specificity of 0.730. The low-order features extracted by image histology may be used to describe general morphology, location, and texture information of the tumor, while deep learning may extract high-order features to form a more personalized classification model for each patient. Meanwhile, clinical medical diagnosis still has certain guiding significance, such as basic medical history of patients and genetic testing, and has important influence on the wettability of lung cancer. Experiments show that the combination of the three methods achieves the highest precision.
The technical reasons for obtaining the achievement in the technical scheme are as follows:
1) The multi-feature fusion is used as a diagnosis basis, and the image histology extraction features comprise shape features, first-order statistical features, texture feature high-order features and model conversion-based features. The nonlinear features extracted by deep learning are combined to form a comprehensive feature vector. The defect of insufficient precision of the wettability of the single aspect characteristic in diagnosing lung tumors is overcome.
2) The clinical diagnosis is added, and various clinical data such as basic medical history of patients, gene test and the like are collected, so that the diagnosis result is more accurate.
3) The addition of the traditional statistical method, such as LASSO algorithm, screens out important features as the basis for judging wettability, and calculates feature weight values instead of the traditional deep learning black box experiment. So that the final result is interpretable and scientific.
Claims (8)
1. The method for judging the lung tumor infiltration based on clinical medicine, deep learning neural network and image histology is characterized by comprising the following steps:
accurately acquiring a tumor region in a lung CT image by using a lung tumor labeling picture, and preprocessing the acquired image;
the method comprises the steps of combining the clinically collected medical features, features extracted from CT tumor areas by image histology and high-dimensional nonlinear features extracted from a deep learning neural network to form a comprehensive feature vector which is used as a basis for distinguishing wettability;
the SMOTE algorithm is utilized to balance the number of two types of cases, so that the problem that the sample number of the patients suffering from invasive lung cancer and the sample number of the patients suffering from non-invasive lung cancer are too large is solved;
and screening the characteristics by using a statistical method LASSO, and selecting the characteristics with influence on wettability. And then adopting an SVM nonlinear classifier to classify the samples.
2. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the steps of obtaining accurate CT images of the lung tumor and performing preprocessing comprise:
and acquiring the position information of a tumor region according to the lung tumor CT image marked by a doctor, obtaining the center coordinate of the tumor, and cutting a 50X 50 cube tumor image with the center coordinate as the center on the original CT image. Preprocessing includes image resampling, unity pixel size, and the like.
3. The method for judging the lung tumor wettability based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein a training set formed by lung CT images is adopted to train a Densenet model, model weights are stored at the optimal time of model effect, and weight parameters are loaded to obtain the characteristics of all lung images.
4. The method for judging the lung tumor wettability based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the original lung CT image and the marked lung tumor area image are loaded into an image histology feature extractor in a one-to-one correspondence manner, all image histology features of the image are obtained to form medical image feature vectors, nonlinear features are extracted through a convolution network to form neural network feature vectors, meanwhile, the three feature vectors are combined, features influencing the judgment of the lung wettability are screened out according to a statistical method, and the type is further judged.
5. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the statistical feature screening method is a LASSO method.
6. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the collected clinical medical features specifically comprise: basic information of patients, family history, basic history, genome test, and the like.
7. The method for determining the wettability of lung tumors based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the application of the SMOTE algorithm enables the number of patients with invasive lung cancer and the number of patients with non-invasive lung cancer to be balanced on a training set, and improves the model performance.
8. The method for determining the wettability of lung cancer tumors based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the problem of overfitting is solved by adopting five-fold cross validation, and a model with the best performance is selected for prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310518369.0A CN116542937A (en) | 2023-05-09 | 2023-05-09 | Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310518369.0A CN116542937A (en) | 2023-05-09 | 2023-05-09 | Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116542937A true CN116542937A (en) | 2023-08-04 |
Family
ID=87443105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310518369.0A Pending CN116542937A (en) | 2023-05-09 | 2023-05-09 | Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116542937A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825363A (en) * | 2023-08-29 | 2023-09-29 | 济南市人民医院 | Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network |
CN117174257A (en) * | 2023-11-03 | 2023-12-05 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | Medical image processing device, electronic apparatus, and computer-readable storage medium |
-
2023
- 2023-05-09 CN CN202310518369.0A patent/CN116542937A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116825363A (en) * | 2023-08-29 | 2023-09-29 | 济南市人民医院 | Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network |
CN116825363B (en) * | 2023-08-29 | 2023-12-12 | 济南市人民医院 | Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network |
CN117174257A (en) * | 2023-11-03 | 2023-12-05 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | Medical image processing device, electronic apparatus, and computer-readable storage medium |
CN117174257B (en) * | 2023-11-03 | 2024-02-27 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | Medical image processing device, electronic apparatus, and computer-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bilgin et al. | Cell-graph mining for breast tissue modeling and classification | |
Byra et al. | Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks | |
CN116542937A (en) | Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology | |
EP2027566B1 (en) | Automatic recognition of preneoplastic anomalies in anatomic structures based on an improved region-growing segmentation, and computer program therefor | |
CN104376147B (en) | The image analysis system of risk score based on image | |
CN109658411A (en) | A kind of correlation analysis based on CT images feature Yu Patients with Non-small-cell Lung prognosis situation | |
El-Baz et al. | Three-dimensional shape analysis using spherical harmonics for early assessment of detected lung nodules | |
JP2003225231A (en) | Method and system for detecting lung disease | |
CN116188423B (en) | Super-pixel sparse and unmixed detection method based on pathological section hyperspectral image | |
KR20180022607A (en) | Determination of result data on the basis of medical measurement data from various measurements | |
KR20120041468A (en) | System for detection of interstitial lung diseases and method therefor | |
Cabral et al. | Fractal analysis of breast masses in mammograms | |
CN115067978B (en) | Method and system for evaluating curative effect of osteosarcoma | |
CN112638262B (en) | Similarity determination device, method, and program | |
CN115937130A (en) | Image processing method for predicting ovarian cancer Ki-67 expression based on dual-energy CT | |
WO2022225794A1 (en) | Systems and methods for detecting and characterizing covid-19 | |
Vivek et al. | Artificial Neural Network Based Effective Detection of Breast Cancer By Using Mammogram Data | |
US20230091506A1 (en) | Systems and Methods for Analyzing Two-Dimensional and Three-Dimensional Image Data | |
Javed et al. | Detection of lung tumor in CE CT images by using weighted support vector machines | |
Zheng et al. | 3D context-aware convolutional neural network for false positive reduction in clustered microcalcifications detection | |
CN116740386A (en) | Image processing method, apparatus, device and computer readable storage medium | |
Li et al. | Gleason grading of prostate cancer based on improved AlexNet | |
CN112329876A (en) | Colorectal cancer prognosis prediction method and device based on image omics | |
Theissen et al. | Learning cellular phenotypes through supervision | |
Lauria | GPCALMA: Implementation in Italian hospitals of a computer aided detection system for breast lesions by mammography examination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |