CN116542937A - Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology - Google Patents

Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology Download PDF

Info

Publication number
CN116542937A
CN116542937A CN202310518369.0A CN202310518369A CN116542937A CN 116542937 A CN116542937 A CN 116542937A CN 202310518369 A CN202310518369 A CN 202310518369A CN 116542937 A CN116542937 A CN 116542937A
Authority
CN
China
Prior art keywords
image
lung
tumor
histology
wettability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310518369.0A
Other languages
Chinese (zh)
Inventor
尹诗
冯子康
刘学军
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202310518369.0A priority Critical patent/CN116542937A/en
Publication of CN116542937A publication Critical patent/CN116542937A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A method for discriminating lung tumor wettability based on characteristics of clinical medicine, image histology, deep learning neural network and the like specifically comprises collecting clinical characteristics of lung cancer patients. Image histology extracts texture features and higher-order features from CT images of lung cancer patients. The deep learning neural network extracts high-order nonlinear features in a lung cancer patient. In the aspect of CT image processing, a doctor marks a tumor area, and tumor cutting is performed to obtain a lung tumor part. In the aspect of model training, a lung image CT sample is simulated and generated by adopting multi-angle rotation (0 to 30 degrees), so that the generalization capability of the model is improved; and meanwhile, the problem of unbalanced sample distribution is solved by adopting an SMOTE algorithm. And selecting important features from all the features by adopting an LASSO algorithm to judge the wettability of the lung tumor, and calculating the weight value of the selected features. In sample classification, a nonlinear SVM (support vector machine) classifier with better stability is adopted. The invention has the following advantages: the characteristics of clinical medicine, neural network, image histology and the like are comprehensively utilized, and the decisive characteristics of tumor wettability are deeply explored, so that the result is more scientific and interpretable.

Description

Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology
Technical Field
The invention belongs to the technical field of medical images and artificial intelligence, and particularly relates to a method for judging lung tumor infiltration before a patient operates according to clinical lung CT images, so that the success rate of operation is improved.
Background
The determination of the wettability of lung cancer has important significance and value for the formulation of treatment schemes of lung cancer patients. In situ lung cancer is the earliest lung cancer, so it is also called stage 0 cancer. At this time the lung cancer cells are in the stage of just sprouting and do not invade their basal layer. For the treatment of the lung cancer in situ, once the lung cancer is found, the lung cancer can be cured by only adopting surgical excision, radiotherapy and chemotherapy are not needed, and serious complications are avoided after the lung cancer is cured, so that the life quality is not greatly influenced, and the service life is not influenced. Micro-invasive lung cancer refers to cancer cells that have broken through the basement membrane and began to invade surrounding tissues, but the extent of invasion is small and cannot be determined by the naked eye. For treating micro-invasive lung cancer, surgical treatment is also needed, and if enough excision can be ensured, the incision is negative, and in general, the treatment by the simple surgery is enough, and chemotherapy is not needed. Invasive lung cancer is also classified into early, middle and late stage invasive cancer, and early stage invasive cancer generally requires only surgical treatment and no radiotherapy or chemotherapy. Whereas metaphase invasive cancer may require chemoradiotherapy in addition to surgery. Advanced cancers, which are usually organs already transferred to a distant place, such as liver metastasis, lung metastasis and brain metastasis, cannot be surgically excised in most cases, and can only be treated by means of radiotherapy and chemotherapy, targeted therapy, immunotherapy and the like, so that the pain is relieved. Therefore, determining the wettability of a lung cancer patient prior to surgery is important for the formulation of a treatment regimen.
At present, the expected accuracy is not achieved only by clinical medical diagnosis, but also the final judging accuracy is not stable because a clinician uses a series of characteristics such as lesion size, attenuation, burr characteristics, cystic space around lesions and the like to diagnose the tumor attribute, and the characteristics have poor wettability judging accuracy on atypical lung cancer tumors, and meanwhile, clinical judgment has subjectivity and experience. Imaging histology also presents challenges for the discrimination of lung cancer wettability, as lung nodules may be small and look similar to other structures in the lung (e.g., blood vessels) or benign processes (e.g., focal tissue pneumonia). At the same time, the feature extraction process of image histology is relatively fixed, which ignores individual differences of patients. For these reasons, CT-based lung cancer screening has a high false positive rate. It is desirable to find a more flexible method to further improve the accuracy of the classification model.
In recent years, deep learning has been highly successful in the classification task of natural images. Deep learning provides various high-level semantic information of images (CT scans) that are different from image features extracted by image histology. Therefore, we expect deep learning to improve predictive models of classical image histology for lung cancer wettability. However, since the CT medical image is different from the conventional natural image and the sample data amount is not up to a certain scale, it is difficult to make the model achieve the expected effect.
There have been studies showing that feature level fusion can significantly improve the accuracy of final result prediction. The feature level fusion combines different feature vectors extracted and calculated by two models of an image histology and deep learning convolution network into a new feature vector for subsequent classification and prediction. The method has been applied to tasks such as detection and classification of lung nodules, image attribute analysis of tumors, and cancer survival prediction.
Inspired by the above study, we used a combination model of clinical diagnostics, image histology and deep learning to identify lung tumor infiltration. Due to the heterogeneity within tumors, a single model cannot efficiently and comprehensively extract useful features of tumors. Thus, the proposed method uses advanced multi-scale network structures of feature extraction to explore imaging phenotypes and predict the wettability of lung cancer tumors.
The invention combines clinical medicine, a traditional image histology method and a deep learning method, fuses the three types of characteristics, screens the characteristics by using a LASSO method, and adopts an SVM nonlinear classifier to judge the infiltration of lung cancer tumors. The traditional image histology and the neural network are combined, so that the accuracy of wettability discrimination is greatly improved.
Disclosure of Invention
The invention extracts the characteristics of the CT image of the lung tumor by fully developing, and particularly extracts the characteristics by clinical medicine, image histology and neural network. The invention collects clinical data of a first resident patient in Changzhou, comprising CT images of the lungs of the patient and image pictures of tumor areas marked by doctors, and simultaneously considers that the clinical data characteristics of the patient have important influence on final diagnosis. And a series of CT images are subjected to three-dimensional fusion, so that the feature extractor can obtain features conveniently. And then acquiring an ROI (region of interest) in the lung CT image, positioning the ROI in the tumor labeling picture, and intercepting a tumor image with a fixed size from each patient as an input image of a model. And (3) carrying out feature extraction on an input lung image by adopting an image histology feature extractor, and simultaneously carrying out feature extraction on an image by adopting a three-dimensional convolutional neural network Densenet to fuse clinical features, image histology features and deep learning features. And then adopting a characteristic screening method, adopting a LASSO algorithm to remove the characteristics with small influence on wettability and even no influence, and calculating the influence weight of the selected characteristics on the predicted value. In the data of the acquired patient, the ratio difference between invasive tumor and non-invasive tumor is large, two kinds of training data with equal quantity are generated by adopting an SMOTE algorithm, and finally, the lung tumor wettability is judged by adopting a nonlinear SVM classification model.
The invention still has room for improvement, and a preferred scheme exists. There are several schemes and ideas that one can use a deep learning model transducer based on self-attention mechanisms, which can produce a more interpretable model. Secondly, the research of the invention adopts a characteristic fusion method, but on the final sample classification, an SVM nonlinear classifier with better stability is adopted, and according to the research study of the next stage, other advanced nonlinear classifiers are adopted with perhaps better effects.
Compared with the traditional method, the research of the invention has the following advantages: firstly, the problem that heterogeneity and misdiagnosis cannot be avoided by only relying on low-dimensional characteristics in clinical medicine is solved, and the method can achieve excellent performance in atypical lung cancer cases. Secondly, the feature extraction process of image histology is relatively fixed, which ignores individual differences of patients. The addition of deep learning methods has attempted to explore more comprehensive features. Thirdly, the most important diagnosis characteristics are reserved by the statistical method LASSO, and the characteristic weight value is obtained, so that the problem that the simple deep learning neural network lacks of interpretability and scientificity in the aspect of medical diagnosis is solved.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a diagram showing a sample case of the present invention in ITK-SNAP software;
FIG. 3 is a diagram of a LASSO model parameter selection process according to the present invention;
FIG. 4 is a graph of a characteristic process of the invention selected to have an effect on lung tumor wettability;
FIG. 5 is a graph showing the number of lung cancer with balanced infiltrative and non-infiltrative lung cancer by SMOTE algorithm in the present invention:
FIG. 5 (a) is a graph showing the distribution of the number of cases before the two cases are equalized by the SMOTE algorithm in the present invention;
FIG. 5 (b) is a distribution chart of the number of cases after the two types of cases are equalized by the SMOTE algorithm in the present invention;
Detailed Description
The first table is a table of the determination results of wettability by image histology in the present invention.
And a second table is a discrimination result table of the wettability of the deep learning convolutional network in the invention.
And thirdly, a judging result table of wettability by combining clinical medicine and image histology with a neural network in the invention.
Specific implementation steps
Patient data acquisition:
we collected pulmonary glass nodule patients from 2019, 6 to 2020, 6 of the first people hospital in Changzhou and screened the group of patients by two experienced cardiothoracic surgeons. We screened 356 patients resected by surgery in this hospital and had a clear histological pathology subtype. And sex, age, height, weight, hypertension, diabetes and infiltration status were selected as clinical features (table 1). The specific group inclusion criteria are as follows: 1. a patient with lung cancer operation; 2. the pathology report is shown as: atypical adenomatous hyperplasia, in situ adenocarcinoma, microanflammatory adenocarcinoma, and invasive adenocarcinoma; 3. the group of patients has CT imaging data of a first people hospital in Changzhou city below 2 mm; 4. the diameter of the ground glass nodule is less than 2cm;5. the real components of the ground glass nodules are less than 50 percent. The exclusion criteria were: 1. a dead patient; 2. patients with serious cardiovascular and cerebrovascular diseases or abnormal liver and kidney functions; 3. slow lung resistance; 4. large cell lung cancer; 5. small cell lung cancer; 6. central lung cancer; 7. combining other mixed lung cancer, etc.
CT scanning parameters:
in this study, patient chest thin layer CT device manufacturers included GE, phillips, siemens. The acquisition parameters are consistently set as follows: the voltage was 140kVp (range 100-140 kVp), the tube current was 740mA (range 100-752 mA), the slice thickness was 1.0mm (range 0.65-2.0 mm).
Image segmentation and image preprocessing:
thin layer CT raw data for all patients were copied and loaded using ITK-SNAP software (www.itksnap.org) and ROI mapping was performed on tumor sites in thin layer CT. When data is loaded into the ITK-SNAP software, all patient information is masked. Three-dimensional ROI mapping of the entire tumor site was then performed by two cardiothoracic surgeons and reviewed by other surgeons. Three-dimensional ROI drawing for all thin layer CT was performed at the lung window level. As shown in the second figure, a lung CT image of the patient is selected, wherein (a) and (b) are respectively a lung side view and a tumor labeling area diagram of the patient, (c) and (d) are respectively a lung top view and a tumor labeling area diagram of the patient, and (e) and (f) are respectively a lung front view and a tumor labeling area diagram of the patient.
And determining tumor position information and size information in the lung CT image according to the ROI region in the labeling picture, determining the position coordinates of a tumor center point, and cutting a 50 x 50 cube tumor image taking the obtained tumor center coordinates as the center on the original CT image. And then carrying out picture normalization and unification treatment. The method specifically comprises the following steps: because the sizes of the body pixels of CT images are not consistent when different patients are acquired by CT, image resampling is performed. Unlike natural images, in medical imaging, the real size of a human body part (imaging size) is very important information. Thus, for example, in a CT image, there are two indices of voxel spacing (spacing) and voxel number (resolution): imaging size = spacing x resolution.
Since different scanners or different acquisition protocols typically produce datasets with different voxel spacing, which is not understood by neural network CNN, we need to resample the spacing of all medical images to be uniform, so that resolution can reflect the imaging size.
Extracting image histology characteristics:
high throughput features are extracted to quantitatively analyze the essential properties of the ROI. Extracting features in the lung image of the patient according to the image histology feature extractor, wherein the features comprise:
1) Shape characteristics: the shape and size features reflect the information of the shape, size, regularity and the like of the tumor. For example, the major diameter, volume and surface area of a tumor reflect tumor size information; the ellipsoidal degree of a tumor reflects whether its shape tends to be spherical; and the compactness reflects whether the shape of the tumor is regular, whether the edges are regular, etc.
2) First order statistical features: the first order statistical features are obtained by calculating gray values of the ROI image, and generally comprise first order statistics such as maximum value, minimum value, mean value, median value, range, variance, kurtosis, skewness, entropy and the like. The first-order statistical features are used to reflect the distribution of gray scale intensities within the tumor and reflect the heterogeneity within the tumor.
3) Texture features: the first-order statistics and shape size features reflect low-dimensional information (e.g., brightness and shape, etc.) in the image that is easily perceived visually. Unlike first-order statistical features and shape features, texture features are obtained mainly by several texture matrices: such as a gray scale correlation matrix (GLDM), a gray scale area size matrix (GLSZM), a gray scale co-occurrence matrix (GLCM), a gray scale run-length matrix (GLRLM), a neighborhood gray scale differential matrix (NGTDM), etc., which can quantify information that is difficult to be perceived simply by vision, such as a texture pattern or tissue distribution inside a tumor.
4) High-order features and model-conversion-based features: although the three types of features described above reflect visual information and texture patterns of the tumor in terms of low and high dimensions, respectively, the amount of such information is limited. In order to obtain information of different frequency domains, the feature extraction is also applied to wavelet transformation, which decomposes the original tumor image into different frequency domains, and then extracts the three types of features in each wavelet image. The wavelet transformation can obtain multi-frequency-domain multi-scale image information, and for clinical problems which are difficult to describe by using simple tumor image visual characteristics, the high-dimensional abstract characteristics of the wavelet characteristics can play different roles, and capture clinical information which is difficult to visually perceive.
The four features are extracted from CT images one by one with the help of an image histology feature extractor, and are used as the basis for quantitative analysis of tumors.
Deep learning feature extraction:
the CT image feature extraction by using the three-dimensional neural network Densenet specifically comprises the following parts.
1) Data enhancement: CT scans also enhance data by rotating at random angles during training. Since the data is stored in the shape of Rank-3, we add a size of 1 at axis 4 to be able to perform 3D convolution on the data. While defining the training and validation data loader, the training data will be randomly rotated through different angles. Both training and validation data have re-normalized the gray values to zero to one.
2) Building a training set and a verification set: the training set and the verification set are divided by 7:3, and a random division mode is adopted.
3) Constructing a 3D convolutional neural network model and training the model: and (3) constructing a 3-dimensional neural network, designating the model weight with the best save effect of save_best_only=true, designating the early-stop strategy, and then obtaining the optimal model weight parameters.
4) Selecting a model loss function: the loss function selects the Binary cross-entcopy.
(wherein y i For case prediction results)
Selection of a model optimizer: the model optimizer selects Adam, and the Adam algorithm records the first moment of the gradient, namely the average of all the past gradients and the current gradient, so that the gradient updated last time is not too different from the gradient updated currently when updated each time, namely the gradient is smooth and stable in transition, and the model optimizer can adapt to an unstable objective function.
5) Setting of learning rate: the dynamic learning rate is set, the model is optimized by using a larger learning rate at the initial stage of model training, the learning rate is gradually reduced along with the increase of iteration times, and the model is ensured not to have too large fluctuation at the later stage of training, so that the model is closer to an optimal solution.
6) Acquiring CT characteristics: the model loads the pre-training weight parameters and outputs the characteristics of each patient.
Feature fusion and screening:
clinical medical characteristics, image histology characteristics and neural network characteristics are fused: the clinical characteristics, the image histology characteristics and the characteristics extracted by the neural network of each patient are combined to form a new characteristic vector.
This stage is to select important features from all features that have an influence on wettability. Although feature data of all aspects of tumors in CT images are extracted through three methods, the features do not have influence on wettability, so that the feature with the strongest classifying ability in training is selected by using LASSO algorithm.
Wherein the LASSO algorithm is specifically as follows: the basic idea of LASSO is to minimize the sum of squares of residuals under the constraint that the sum of absolute values of the regression coefficients is less than a constant, so that some regression coefficients strictly equal to 0 can be generated, and an interpretable model is obtained, and the working principle of the model function is as follows:
y i for the prediction result, the prediction result in the present invention is invasive lung cancer or non-invasive lung cancer. X is x i Corresponding to y i The value of each feature, w i For each feature weight. Wherein the cost function expression is as follows:
m is the number of features, w i For characteristic weights, in [10 ] -3 ,10 1 ]And selecting optimal regularization parameters (the regularization parameters play an important role on the LASSO model) on the range, and outputting the weight occupied by the selected characteristics. And selecting the characteristic with larger weight as the input of the classifier. The optimized canonical parameters are selected as shown in fig. 3, where MSE is the mean square error and Lamda is the LASSO model parameters. The features with the feature weight of 0 are removed as far as possible, a small number of important features are reserved, and the final prediction result is more accurate.
By applying the LASSO algorithm, the multiple characteristic coefficients of the patient are compressed and the regression coefficients are changed, so that the purpose of characteristic selection is achieved, and the characteristics influencing wettability are screened out. The selected features are shown in fig. 4, with the horizontal axis being the selected feature name and the vertical axis being the feature weight.
Training a classifier:
because the lung tumor type of the collected patient occupies a larger wettability, the ratio of the number of patients suffering from the invasive lung tumor to the number of non-invasive (in situ) lung tumors in the whole data is 5:3, the lung adenocarcinoma which is infiltrated in the training set occupies a larger ratio, and the prediction sensitivity of the final model to the invasive lung gland is higher, so that the SMOTE algorithm is adopted to balance the sample proportion. The basic idea of SMOTE algorithm is to analyze a minority class of samples and artificially synthesize new samples from the minority class of samples to be added to the dataset. After the data are subjected to the SMOTE algorithm, the quantity ratio of the two kinds of data in the training set is 1:1. The original data distribution is shown in fig. 5 (a), and the data distribution after the SMOTE algorithm is adopted is shown in fig. 5 (b).
In the classification and discrimination stage, an SVM nonlinear classifier is adopted to discriminate the tumor infiltration. The SVM is a classification model, and is a linear classifier with the maximum interval defined in the feature space; the SVM also includes a kernel technique, in which RBF kernel functions are employed to enable nonlinear mapping.
Results comparison shows
The primary ideas of the method for judging the lung tumor infiltration are four:
the method comprises the following steps: according to pathology theory, available characteristics such as microcapillaries are manually extracted from lung tumor regions of patients, and the accuracy rate of judging lung tumor stage, type and wettability reaches 60% through clinical medicine summary.
The second method is as follows: the feature extraction is carried out on the CT image and the marked image of the lung by adopting an image histology method, and then the wettability is judged by adopting an SVM classifier. The results are shown in Table one.
Table I image group science determination wettability results table
precision recall f1-score support
0 0.67 0.80 0.73 50
1 0.69 0.52 0.59 42
accuracy 0.67 92
macro avg 0.68 0.66 0.66 92
weighted avg 0.68 0.67 0.67 92
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
And a third method: is a method of adopting a deep learning convolutional network (Densenet model). And obtaining final model weight through multiple rounds of model training, and predicting the lung cancer wettability of the test set cases. The results are shown in Table II.
Table two deep learning convolution network judging lung cancer wettability result table
precision recall f1-score support
0 0.53 0.64 0.58 14
1 0.80 0.71 0.75 28
macro avg 0.69 0.69 0.69 42
weighted avg 0.66 0.68 0.67 42
Weighted avg 0.71 0.69 0.70 42
Samples avg 0.69 0.69 0.69 42
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
The method four: the method combines the three methods, adopts image histology to extract image histology characteristics, and adopts a deep learning method to extract nonlinear characteristics. Combining the image histology characteristics with the neural network characteristics and the clinical pathology characteristics to form a final characteristic vector, and finally judging wettability by adopting an SVM nonlinear classifier. The results are shown in Table three.
Table III clinical medicine, image histology and deep learning convolution network judging lung cancer wettability result table
precision recall f1-score support
0 0.81 0.88 0.84 33
1 0.79 0.68 0.73 22
accuracy 0.80 55
macro avg 0.80 0.78 0.79 55
weighted avg 0.80 0.80 0.80 55
Note that: 0 represents a non-invasive lung cancer case, 1 represents an invasive lung cancer case
Each method results table formula annotation:
the True value is Positive, and the model considers the number of Positive (True positive=tp);
the true value is positive, the model considers the number of Negative (False negative=fn);
the true value is negative, the model considers the number of Positive (False positive=fp);
the True value is Negative, and the model considers the number of Negative (True negative=tn);
macroavg: precision, recall, F1 sum-averaging of the classes;
weighted avg: and (3) carrying out weighted average on the f1_score of each category, wherein the weight is the proportion of the number of each category in y_true.
Discussion of the invention
According to the invention, the characteristics in the CT image are fully developed, and the combination method of clinical medicine, image histology and deep learning is adopted, so that compared with the method of singly using image histology or deep learning, the accuracy of judging the lung tumor wettability is greatly improved.
The heterogeneity and misdiagnosis are unavoidable by a clinical diagnostics method alone, the subjective influence of the clinical diagnostics is large, and the diagnosis accuracy is not stable. Compared with clinical diagnosis, the image histology has better effect. Because it explores the higher dimensional features in CT images, however, these features are difficult to identify and discover by the human eye alone, and at the same time, they also play an important role in wettability.
Deep learning has enjoyed great success in the fields of image analysis and computer vision and has accomplished many complex image classification tasks. However, during experimentation, it has been found that pure deep learning does not effectively improve classification results. The deep learning method (network model is Densenet) has a sensitivity of only 0.650 and a specificity of 0.730. The low-order features extracted by image histology may be used to describe general morphology, location, and texture information of the tumor, while deep learning may extract high-order features to form a more personalized classification model for each patient. Meanwhile, clinical medical diagnosis still has certain guiding significance, such as basic medical history of patients and genetic testing, and has important influence on the wettability of lung cancer. Experiments show that the combination of the three methods achieves the highest precision.
The technical reasons for obtaining the achievement in the technical scheme are as follows:
1) The multi-feature fusion is used as a diagnosis basis, and the image histology extraction features comprise shape features, first-order statistical features, texture feature high-order features and model conversion-based features. The nonlinear features extracted by deep learning are combined to form a comprehensive feature vector. The defect of insufficient precision of the wettability of the single aspect characteristic in diagnosing lung tumors is overcome.
2) The clinical diagnosis is added, and various clinical data such as basic medical history of patients, gene test and the like are collected, so that the diagnosis result is more accurate.
3) The addition of the traditional statistical method, such as LASSO algorithm, screens out important features as the basis for judging wettability, and calculates feature weight values instead of the traditional deep learning black box experiment. So that the final result is interpretable and scientific.

Claims (8)

1. The method for judging the lung tumor infiltration based on clinical medicine, deep learning neural network and image histology is characterized by comprising the following steps:
accurately acquiring a tumor region in a lung CT image by using a lung tumor labeling picture, and preprocessing the acquired image;
the method comprises the steps of combining the clinically collected medical features, features extracted from CT tumor areas by image histology and high-dimensional nonlinear features extracted from a deep learning neural network to form a comprehensive feature vector which is used as a basis for distinguishing wettability;
the SMOTE algorithm is utilized to balance the number of two types of cases, so that the problem that the sample number of the patients suffering from invasive lung cancer and the sample number of the patients suffering from non-invasive lung cancer are too large is solved;
and screening the characteristics by using a statistical method LASSO, and selecting the characteristics with influence on wettability. And then adopting an SVM nonlinear classifier to classify the samples.
2. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the steps of obtaining accurate CT images of the lung tumor and performing preprocessing comprise:
and acquiring the position information of a tumor region according to the lung tumor CT image marked by a doctor, obtaining the center coordinate of the tumor, and cutting a 50X 50 cube tumor image with the center coordinate as the center on the original CT image. Preprocessing includes image resampling, unity pixel size, and the like.
3. The method for judging the lung tumor wettability based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein a training set formed by lung CT images is adopted to train a Densenet model, model weights are stored at the optimal time of model effect, and weight parameters are loaded to obtain the characteristics of all lung images.
4. The method for judging the lung tumor wettability based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the original lung CT image and the marked lung tumor area image are loaded into an image histology feature extractor in a one-to-one correspondence manner, all image histology features of the image are obtained to form medical image feature vectors, nonlinear features are extracted through a convolution network to form neural network feature vectors, meanwhile, the three feature vectors are combined, features influencing the judgment of the lung wettability are screened out according to a statistical method, and the type is further judged.
5. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the statistical feature screening method is a LASSO method.
6. The method for determining the wettability of a lung tumor based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the collected clinical medical features specifically comprise: basic information of patients, family history, basic history, genome test, and the like.
7. The method for determining the wettability of lung tumors based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the application of the SMOTE algorithm enables the number of patients with invasive lung cancer and the number of patients with non-invasive lung cancer to be balanced on a training set, and improves the model performance.
8. The method for determining the wettability of lung cancer tumors based on clinical medicine, deep learning neural network and image histology according to claim 1, wherein the problem of overfitting is solved by adopting five-fold cross validation, and a model with the best performance is selected for prediction.
CN202310518369.0A 2023-05-09 2023-05-09 Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology Pending CN116542937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310518369.0A CN116542937A (en) 2023-05-09 2023-05-09 Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310518369.0A CN116542937A (en) 2023-05-09 2023-05-09 Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology

Publications (1)

Publication Number Publication Date
CN116542937A true CN116542937A (en) 2023-08-04

Family

ID=87443105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310518369.0A Pending CN116542937A (en) 2023-05-09 2023-05-09 Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology

Country Status (1)

Country Link
CN (1) CN116542937A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825363A (en) * 2023-08-29 2023-09-29 济南市人民医院 Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network
CN117174257A (en) * 2023-11-03 2023-12-05 福建自贸试验区厦门片区Manteia数据科技有限公司 Medical image processing device, electronic apparatus, and computer-readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825363A (en) * 2023-08-29 2023-09-29 济南市人民医院 Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network
CN116825363B (en) * 2023-08-29 2023-12-12 济南市人民医院 Early lung adenocarcinoma pathological type prediction system based on fusion deep learning network
CN117174257A (en) * 2023-11-03 2023-12-05 福建自贸试验区厦门片区Manteia数据科技有限公司 Medical image processing device, electronic apparatus, and computer-readable storage medium
CN117174257B (en) * 2023-11-03 2024-02-27 福建自贸试验区厦门片区Manteia数据科技有限公司 Medical image processing device, electronic apparatus, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
Bilgin et al. Cell-graph mining for breast tissue modeling and classification
Byra et al. Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks
CN116542937A (en) Method for judging lung tumor infiltration based on clinical medicine, deep learning neural network and image histology
EP2027566B1 (en) Automatic recognition of preneoplastic anomalies in anatomic structures based on an improved region-growing segmentation, and computer program therefor
CN104376147B (en) The image analysis system of risk score based on image
CN109658411A (en) A kind of correlation analysis based on CT images feature Yu Patients with Non-small-cell Lung prognosis situation
El-Baz et al. Three-dimensional shape analysis using spherical harmonics for early assessment of detected lung nodules
JP2003225231A (en) Method and system for detecting lung disease
CN116188423B (en) Super-pixel sparse and unmixed detection method based on pathological section hyperspectral image
KR20180022607A (en) Determination of result data on the basis of medical measurement data from various measurements
KR20120041468A (en) System for detection of interstitial lung diseases and method therefor
Cabral et al. Fractal analysis of breast masses in mammograms
CN115067978B (en) Method and system for evaluating curative effect of osteosarcoma
CN112638262B (en) Similarity determination device, method, and program
CN115937130A (en) Image processing method for predicting ovarian cancer Ki-67 expression based on dual-energy CT
WO2022225794A1 (en) Systems and methods for detecting and characterizing covid-19
Vivek et al. Artificial Neural Network Based Effective Detection of Breast Cancer By Using Mammogram Data
US20230091506A1 (en) Systems and Methods for Analyzing Two-Dimensional and Three-Dimensional Image Data
Javed et al. Detection of lung tumor in CE CT images by using weighted support vector machines
Zheng et al. 3D context-aware convolutional neural network for false positive reduction in clustered microcalcifications detection
CN116740386A (en) Image processing method, apparatus, device and computer readable storage medium
Li et al. Gleason grading of prostate cancer based on improved AlexNet
CN112329876A (en) Colorectal cancer prognosis prediction method and device based on image omics
Theissen et al. Learning cellular phenotypes through supervision
Lauria GPCALMA: Implementation in Italian hospitals of a computer aided detection system for breast lesions by mammography examination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination