CN115500851A - Early lung cancer risk layered prediction system based on deep learning - Google Patents

Early lung cancer risk layered prediction system based on deep learning Download PDF

Info

Publication number
CN115500851A
CN115500851A CN202211253156.1A CN202211253156A CN115500851A CN 115500851 A CN115500851 A CN 115500851A CN 202211253156 A CN202211253156 A CN 202211253156A CN 115500851 A CN115500851 A CN 115500851A
Authority
CN
China
Prior art keywords
module
deep learning
lung cancer
clinical
clinical characteristics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211253156.1A
Other languages
Chinese (zh)
Inventor
龚静山
江长思
陈亚曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Peoples Hospital
Original Assignee
Shenzhen Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Peoples Hospital filed Critical Shenzhen Peoples Hospital
Priority to CN202211253156.1A priority Critical patent/CN115500851A/en
Publication of CN115500851A publication Critical patent/CN115500851A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/02Arrangements for diagnosis sequentially in different planes; Stereoscopic radiation diagnosis
    • A61B6/03Computed tomography [CT]
    • A61B6/032Transmission computed tomography [CT]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/50Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment specially adapted for specific body parts; specially adapted for specific clinical applications
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B6/00Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
    • A61B6/52Devices using data or image processing specially adapted for radiation diagnosis
    • A61B6/5211Devices using data or image processing specially adapted for radiation diagnosis involving processing of medical diagnostic data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pathology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Veterinary Medicine (AREA)
  • Computing Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Optics & Photonics (AREA)
  • High Energy & Nuclear Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Dentistry (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Pulmonology (AREA)
  • Quality & Reliability (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to a deep learning-based early lung cancer risk layered prediction system, which is characterized by comprising the following components: the data acquisition module is used for acquiring thin layer CT image data and clinical characteristics of the chest of a user; the feature extraction module is used for extracting deep learning features from chest thin-layer CT image data of a user; the region candidate module is used for identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of the region candidates; the classification prediction module is used for predicting the risk classification type of the early lung cancer of the user according to the clinical characteristics and the deep learning characteristics of the regional candidates by adopting a pre-trained risk classification prediction model.

Description

Early lung cancer risk layered prediction system based on deep learning
Technical Field
The invention relates to the technical field of medical treatment, in particular to a deep learning-based early lung cancer risk layered prediction system.
Background
Lung cancer, the second most common malignancy to men's prostate cancer and women's breast cancer, is the leading disease of global cancer-related deaths, accounting for 21% of all cancer-related deaths. In recent years, with the widespread application of low-dose CT screening, lung cancer has been discovered early. However, although the widespread use of low-dose CT screening has enabled early detection of lung cancer and thoracoscopically assisted minimally invasive treatment has enabled localized resection of lung cancer to be a safe surgical procedure that can effectively remove tumors and preserve the patient's lung function as far as possible, there is no way to improve lung cancer prognosis. The reason for this phenomenon is mainly based on the fact that the existing examination technology means can not accurately diagnose the lung cancer before operation, risk stratification is carried out on early lung cancer, and an individualized treatment scheme is adopted, such as selection of a proper operation mode and whether radiotherapy and chemotherapy are carried out after operation, so that the purposes of improving the survival rate and the life quality of patients are achieved.
Lung adenocarcinoma is the major histological type of lung cancer, accounting for about 60% of primary lung cancers. Research shows that the 5-year disease-free survival rate of the orthotopic adenocarcinoma and the micro-invasive adenocarcinoma is 100 percent, while the survival rate of the invasive lung adenocarcinoma is 67 to 90 percent; studies have also shown that the degree of infiltration is independently related to lymph node metastasis rate and risk of death, so the degree of lung adenocarcinoma infiltration is an important factor in patient prognosis and treatment options. Lung adenocarcinomas can be classified into carcinoma in situ (AIS), microaneurturized adenocarcinoma (MIA), and Invasive Adenocarcinosoma (IA) depending on the degree of invasion. Although AIS is a pre-invasive lesion and MIA is an early invasive lesion, both can better preserve the lung function of a patient by adopting local excision (wedge excision or sublegnation excision) and have no obvious difference in prognosis from lung lobe excision, for IA, lung lobe excision plus lymph node cleaning is the gold standard for treatment, and therefore, the AIS/MIA and IA have important guiding significance for selecting a treatment scheme of the patient. Secondly, lung adenocarcinoma air space (STAS) and pulmonary mediastinal lymph node metastasis are also important independent risk factors affecting lung cancer prognosis and operation mode. Local resection is adopted for early lung cancer with STAS and lymph node metastasis instead of lung lobe resection and lung mediastinal lymph node cleaning, so that high local recurrence rate can occur, and the survival rate and the life quality of a patient are influenced.
Since the degree of lung adenocarcinoma infiltration, STAS status and lymph node metastasis can only be accurately assessed if obtained in surgical specimens. Although CT and/or PET/CT may help to detect the enlarged lymph nodes of the hilum and mediastinum, there are still 15-21% of patients with early stage lung cancer with clinical diagnosis of stage Ia with lymph node metastasis, leading to an underestimation of clinical stage. Therefore, the clinician before the operation lacks effective biomarkers to evaluate the infiltration degree of the tumor, the STAS state and whether lymph node metastasis exists, and the clinician can make guidance for the operation mode of early lung cancer. CT is the first choice for imaging examination in lung cancer diagnosis and treatment by virtue of its excellent spatial and density resolution, is inexpensive and popular, and has wide clinical availability. Clinical interpretation of CT images relies primarily on the radiologist's observation of some macroscopic features of the pulmonary nodule or mass, such as size, lobulation, burr, reinforcement, and whether there is an enlargement of the lymph nodes. Although some CT signs (such as size and solid component ratio) of lung cancer have a certain significance for the degree of infiltration and STAS status, and the size and intensity of lymph nodes, the accuracy is not high, and the subjectivity is strong.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a deep learning-based early lung cancer risk stratification prediction system, which can improve the accuracy of predicting the risk stratification type of early lung cancer.
In order to achieve the purpose, the invention adopts the following technical scheme: in a first aspect, a deep learning-based early lung cancer risk stratification prediction system is provided, which includes:
the data acquisition module is used for acquiring thin layer CT image data and clinical characteristics of the chest of a user;
the feature extraction module is used for extracting deep learning features from chest thin-layer CT image data of a user;
the region candidate module is used for identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of the region candidates;
and the classification prediction module is used for predicting the risk classification type of the early lung cancer of the user according to the clinical characteristics and the deep learning characteristics of the region candidates by adopting a pre-trained risk classification prediction model.
Further, the prediction system further comprises:
the pathological feature module is used for carrying out standardization and normalization processing on the acquired clinical features; and the classification prediction module carries out prediction according to the normalized clinical characteristics and the deep learning characteristics of the region candidates.
Further, the prediction system further comprises:
the model building module is used for building a risk stratification prediction model based on chest thin-layer CT image data and clinical characteristics of a plurality of lung cancer patients of each risk stratification type by adopting a convolutional neural network, and obtaining the trained risk stratification prediction model after training, testing and verifying;
and the marking module is used for marking the thin CT image data of the chest of the user to obtain a three-dimensional region of interest of the tumor, and further used for training the classification prediction module.
Further, the model building module is internally provided with:
the data input unit is used for inputting the pre-acquired thin CT image data and clinical characteristics of the breasts of a plurality of lung cancer patients in each risk stratification type and carrying out image preprocessing on the thin CT image data of the breasts;
the clinical characteristic selecting unit is used for analyzing the difference of clinical characteristics among the risk stratification types and selecting the clinical characteristics of which the difference meets the preset requirement based on the clinical characteristics after standardization and normalization processing;
the training unit is used for training the feature extraction module, the region candidate module and the classification prediction module to generate a pre-constructed model of the corresponding module;
and the risk hierarchical prediction model construction unit is used for introducing the clinical characteristics with the difference meeting the preset requirement and the deep learning characteristics of the regional candidates into the convolutional neural network, and establishing a risk hierarchical prediction model in the training set for training to obtain the trained risk hierarchical prediction model.
Furthermore, the corresponding models constructed by the training unit are respectively transmitted back to the feature extraction module by the region candidate module and the classification prediction module by adopting a gradient descent method, the models of the region candidate module are constructed by adopting a regression type full-supervised learning method, and the models of the classification prediction module are constructed by adopting a classification type full-supervised learning method.
In a second aspect, a method for constructing a risk stratification prediction model for early lung cancer is provided, which includes:
acquiring thin CT image data and clinical characteristics of the chest of a plurality of lung cancer patients in each risk stratification type, and performing data processing on the thin CT image data and the clinical characteristics of the chest;
extracting deep learning features from the breast thin-layer CT image data after data processing, and identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of region candidates;
analyzing the difference of the clinical characteristics of the risk stratification type, and selecting the clinical characteristics of which the difference meets the preset requirement based on the clinical characteristics after data processing;
and dividing the clinical features with the difference meeting the preset requirement and the deep learning features of the regional candidates into a training set and a verification set according to a specific proportion, and training and verifying the risk hierarchical prediction model to obtain the trained risk hierarchical prediction model.
Further, the acquiring thin layer CT image data and clinical features of the chest of a plurality of lung cancer patients of each risk stratification type and performing data processing on the thin layer CT image data and clinical features of the chest comprise:
acquiring thin CT image data and clinical characteristics of breasts of a plurality of lung cancer patients in each risk stratification type;
carrying out image preprocessing on the acquired thin-layer CT image data of the chest;
and carrying out standardization and normalization processing on the acquired clinical characteristics.
Further, the analyzing the difference of clinical characteristics between the risk stratification types, and selecting the clinical characteristics meeting the preset requirements based on the clinical characteristics after data processing, includes:
analyzing the difference of clinical characteristics between risk stratification types by adopting two independent sample t tests, mann-Whitney U rank sum test, chi 2 test and Fisher accurate probability test methods, and selecting the clinical characteristics of which the difference meets preset requirements based on the clinical characteristics after standardization and normalization processing;
and adjusting the number of the input ends of the clinical characteristic selection unit according to the selected clinical characteristics, wherein the number of the input ends is equal to the number of the selected clinical characteristics.
In a third aspect, a processing device is provided, comprising computer program instructions, wherein the computer program instructions, when executed by the processing device, are adapted to implement the steps corresponding to the above-mentioned method for constructing a risk stratification prediction model for early lung cancer.
In a fourth aspect, a computer readable storage medium is provided, which has computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, are used to implement the steps corresponding to the above-mentioned method for constructing a risk stratification prediction model for early lung cancer.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the prediction system adopts a deep learning method, and establishes a risk stratification prediction model of the early lung cancer through a model building module based on the chest thin-layer CT image data of a user and clinical characteristics (including age, sex, family history of the lung cancer, family history of other tumors except the lung cancer, smoking history and smoking degree, serum tumor markers and the like) acquired before the operation, so that the user can predict the risk stratification type of the early lung cancer by adopting the prediction system according to the chest thin-layer CT image data and the clinical characteristics of the user, and further provides guidance for subsequent disease diagnosis and selection and prognosis judgment of clinical operation modes, thereby reducing the trauma of the patient caused by treatment and influencing the survival rate due to improper operation mode selection, improving the life quality and the survival rate of the patient, and generating better social and economic benefits.
2. Compared with a deep learning model established by only using thin-layer CT image data of the chest, the classification prediction module disclosed by the invention can be used for carrying out risk stratification on independent risk factors of early lung cancer by fusing deep learning characteristics of various clinical characteristics, so that subsequent disease diagnosis and personalized treatment schemes can be guided.
In conclusion, the invention can be widely applied to the technical field of medical treatment.
Drawings
Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like reference numerals refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a schematic flow chart of multiple modules provided in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of data transfer between multiple units and corresponding modules according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating propagation of operations among modules under the action of a training unit according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the neural network connections in the prediction module according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a connection structure of the MB convolutional layer according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a candidate area network according to an embodiment of the present invention;
fig. 7 is a schematic flow chart of a multi-module method that does not include a pathological feature module according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be used.
In recent years, with the development of artificial intelligence and computer vision algorithms, high-dimensional features which cannot be observed by naked eyes in medical digital images are extracted by using high throughput of a computer, and a data-driven (data-drive) model is established to classify diseases and predict molecular features and treatment response, so that emerging imaging omics are becoming a bridge connecting imaging medicine and precise medicine. According to the early lung cancer risk layering prediction system based on deep learning provided by the embodiment of the invention, a risk layering prediction model is constructed by using the radiolomics characteristics of chest thin-layer CT image data of a high-throughput user and adopting an artificial intelligence (deep learning) algorithm, so that the technical problems are solved, risk layering is carried out on early lung cancer, the infiltration depth, the air cavity dispersion state and the lung portal mediastinal lymph node metastasis of each patient are determined for a surgeon according to the risk layering prediction result, and a proper operation mode is selected.
Example 1
As shown in fig. 1, the embodiment provides a deep learning-based early lung cancer risk hierarchical prediction system, which includes a data acquisition module, a feature extraction module, a region candidate module, a pathological feature module, and a classification prediction module, where the feature extraction module, the region candidate module, and the classification prediction module all use pre-constructed models to complete corresponding tasks.
The data acquisition module is used for acquiring thin layer CT image data and clinical characteristics of the chest of a user.
The feature extraction module is used for extracting deep learning features from chest thin-layer CT image data of a user and achieving data dimension reduction.
The region candidate module is used for identifying suspicious regions from the extracted deep learning features, and further eliminating the features of irrelevant regions to obtain the deep learning features of the region candidates.
The pathological feature module is used for carrying out standardization and normalization processing on the acquired clinical features.
And the classification prediction module is used for predicting the risk stratification type of the early lung cancer of the user according to the clinical characteristics after standardization and normalization processing and the deep learning characteristics of the region candidates by adopting a pre-trained risk stratification prediction model.
In a preferred embodiment, the risk stratification type for early lung cancer is determined based on lung cancer invasion depth, lymph node metastasis rate and airway spread, including Adenocarcinoma In Situ (AIS), minimally Invasive Adenocarcinoma (MIA) and Invasive Adenocarcinoma (IA).
In a preferred embodiment, the system further comprises a model building module and an annotation module.
The model building module is used for building a risk stratification prediction model based on chest thin-layer CT image data and clinical characteristics of a plurality of lung cancer patients of each risk stratification type by adopting a Convolutional Neural Network (CNN), and obtaining the trained risk stratification prediction model after training, testing and verifying.
The labeling module is used for labeling the chest thin-layer CT image data of the user to obtain a three-dimensional region of interest (VOI) of the tumor, and further used for training the classification prediction module. The labeling process requires at least two radiologists to label selected thin slice CT image data of the breast.
Specifically, a data input unit, a clinical characteristic selection unit, a training unit and a risk hierarchical prediction model construction unit are arranged in the model construction module.
The data input unit is used for inputting the thin-layer CT image data and clinical characteristics of the chest of a plurality of lung cancer patients with various risk stratification types, which are acquired in advance, and downloading the thin-layer CT image data of the chest in a DICOM (Digital Imaging and Communication in Medicine) format through a PACS (Picture Archiving and Communication Systems) and performing image preprocessing. Meanwhile, the data input unit also acquires a corresponding label of the chest thin-layer CT image data of the labeling module, and sends the corresponding label to the feature extraction module, and the data flow is as shown in figure 2.
The clinical characteristic selection unit is used for analyzing the difference of clinical characteristics between AIS and MIA groups and IA groups by adopting two independent sample t test, mann-Whitney U rank sum test, chi 2 test and Fisher accurate probability test methods, and selecting the clinical characteristics of which the difference has statistical significance (meets the requirements) based on the clinical characteristics after standardization and normalization processing. Specifically, the statistical process used SPSS 26.0 software, where differences were considered statistically significant when P < 0.05. The P-value refers to the probability of the occurrence of the event that the statistical sample is the same as or even larger than the actual observed data in a probabilistic model. If the P value is less than the selected significance level (0.05 or 0.01, 0.05 is used in this example), it indicates that the original hypothesis is negated or that there is a significant statistical difference between the two sets of statistical samples.
The training unit is used for training the feature extraction module, the region candidate module and the classification prediction module to generate a pre-constructed model of the corresponding module, wherein the corresponding model constructed by the training unit is transmitted back to the feature extraction module by the region candidate module and the classification prediction module respectively by adopting a gradient descent method, the model of the region candidate module is constructed by adopting a regression type full-supervised learning method, and the model of the classification prediction module is constructed by adopting a classification type full-supervised learning method. The propagation mode of the operation of each module under the action of the training unit is shown in fig. 3.
The risk stratification prediction model construction unit is used for introducing clinical features with statistical significance in difference and deep learning features of regional candidates into a convolutional neural network, establishing a risk stratification prediction model in a training set, evaluating the prediction efficiency of the model by adopting the area under an ROC curve in the training set, evaluating the calibration efficiency of the model by adopting a calibration curve, checking the goodness of fit of the evaluation model by adopting a Hosmer-Lemeshow goodness of fit, evaluating the net gain and the clinical effectiveness of the model by adopting a decision curve, determining an optimal truncation value by adopting the maximum john index of the ROC curve, calculating the positive prediction value, the negative prediction value, the sensitivity, the specificity and the accuracy of the model in the training set and the verification set, and finally obtaining the trained risk stratification prediction model.
Specifically, inclusion criteria for several lung cancer patients of each risk stratification type include: (1) a single focus in the lung is completely removed by operation and is confirmed to be lung adenocarcinoma pathologically; (2) performing CT examination three months before operation, wherein the patient does not receive any treatment before examination; (3) the CT images are complete and can be read by the PACS system. Exclusion criteria for several lung cancer patients of each risk stratification type included: (1) multiple lesions in the lung; (2) failure of the extraction of the radiologic characteristics; (3) there was no clear pathological outcome.
In a preferred embodiment, the chest thin-layer CT image data of the user is obtained by scanning through a chest CT flat scan or enhanced examination by using a Philips Brilliance iCT 256-layer spiral CT, and is reconstructed by adopting a standard algorithm after the scanning is finished, the layer thickness and the layer gap of the chest thin-layer CT image data are respectively 1.5mm and 1mm, the tube voltage is 120kV, and the tube current is automatically adjusted.
In a preferred embodiment, as shown in fig. 7, the prediction system may not include a pathological feature module, and the classification prediction module directly uses a risk stratification prediction model trained in advance to predict the risk stratification type of the early lung cancer of the user according to the clinical features acquired by the data acquisition module and the deep learning features of the region candidates acquired by the region candidate module.
The early-stage lung cancer risk stratification prediction system based on deep learning of the invention is described in detail by taking a primary lung adenocarcinoma patient pathologically confirmed by 04-2020-2015 as a specific example:
thin-layer CT image data and clinical features of the chest were acquired for 680 patients, 16 of 680 patients were AIS,36 patients were MIA, and 628 patients were IA. To eliminate the effect of AIS/MIA imbalance, 123 of 628 as 20% were randomly selected from among the IAs, and 175 study cohorts were formed with 52 AIS/MIA, and the demographic data of 175 patients enrolled in the study cohorts are shown in table 1 below, where the difference between the AIS/MIA cohort and IA cohort in terms of gender (P = 0.421), family history (P = 0.629), smoking history (P = 0.166), etc. was statistically insignificant, and the difference between age (P < 0.001), CA199 (P = 0.003), CEA (P = 0.014) was statistically significant (P < 0.05):
table 1: correlation of clinical characteristics with depth of lung adenocarcinoma infiltration
Figure BDA0003888716320000071
Figure BDA0003888716320000081
Note: NSE-neuron specific enolase CA 125-carbohydrate antigen 125 CEA-carcinoembryonic antigen CA 199-carbohydrate antigen 199
After the last chest thin-layer CT image data before operation of 175 patients is downloaded in a DICOM format and is subjected to image preprocessing in a PACS system, two radiologists with over two-year chest imaging experience respectively adopt SNAP software to mark lung cancer on the chest thin-layer CT image data to obtain a three-dimensional tumor region of interest.
Introducing clinical features with statistical significance in difference, breast thin-layer CT image data and labels thereof into a convolutional neural network, establishing a risk hierarchical prediction model in a training set, evaluating the prediction efficiency of the model by adopting the area under an ROC curve in the training set, evaluating the calibration efficiency of the model by adopting a calibration curve, checking the fitting goodness of the evaluation model by adopting the Hosmer-Lemeshow fitting goodness, evaluating the net gain and clinical effectiveness of the model by adopting a decision curve, determining the optimal cutoff value by adopting the maximum approximation index of the ROC curve, calculating the positive prediction value, the negative prediction value, the sensitivity, the specificity and the accuracy of the model in the training set and the verification set, and finally obtaining the trained risk hierarchical prediction model.
Specifically, the structure of the convolutional neural network covers a feature extraction module, a region candidate module and a classification prediction module so as to realize end-to-end training. The feature extraction module is mainly composed of efficientnet, the configuration of each layer is shown in table 2, and the feature extraction module under the configuration has the characteristics of high operation efficiency and high accuracy. The MB convolutional layer is composed of basic neural network layers such as a common convolutional layer, a deep convolutional layer, a batch normalization layer, a pooling layer, a companding feature layer, and a swish activation function, and the connection form of the basic neural network layers is shown in fig. 5. The expression of the Swish activation function is f (x) = x sigmoid (x). The Region candidate module is composed of RPNs (Region proxy network candidate networks), and its composition and connection are shown in fig. 6. The classification prediction module is composed of a fully-connected layer network, the connection mode of the classification prediction module with other modules is shown in fig. 4, and unlike the classifier in other neural networks, the fully-connected layer input in the module is connected with the output characteristics of the deep learning characteristic and pathological characteristic module.
Table 2: structural configuration of feature extraction module
Group number Layer block Input pixel size Number of output channels
1 3x3 convolutional layer 224x224 32
2 3x3 MB convolutional layer 112x112 16
3 3x3 MB winding layer 112x112 24
4 5x5 MB convolutional layer 56x56 40
5 3x3 MB winding layer 28x28 80
6 5x5 MB convolutional layer 14x14 112
7 5x5 MB convolution layer 14x14 192
8 3x3 MB winding layer 7x7 320
9 Full layer of 7x7 1280
The calibration curves of clinical characteristics in the training set and the validation set show that the calibration curve has a good calibration effect on the prediction efficacy of the lung adenocarcinoma infiltration degree, and the results of the Hosmer-Lemeshow that the fitting in the training set (P = 0.955) and the validation set (P = 0.651) has no obvious difference, which indicates that the results predicted by the risk stratification prediction model of the invention have no significant difference with the real results.
Example 2
The embodiment provides a method for constructing a risk stratification prediction model of early lung cancer, which comprises the following steps:
1) The method comprises the steps of obtaining thin-layer CT image data and clinical characteristics of the chest of a plurality of lung cancer patients with various risk stratification types, and carrying out data processing on the obtained thin-layer CT image data and clinical characteristics of the chest, and specifically comprises the following steps:
1.1 Thin-layer CT image data and clinical features of the chest of several lung cancer patients of each risk stratification type, including carcinoma in situ, micro-invasive adenocarcinoma, and invasive adenocarcinoma, were acquired.
1.2 Pre-image processing is performed on the acquired thin-layer CT image data of the breast.
1.3 Normalizing and normalizing the acquired clinical features.
2) And (3) marking the breast thin-layer CT image data subjected to image preprocessing by at least two radiologists to obtain a three-dimensional region of interest of the tumor.
3) And extracting deep learning features from breast thin-layer CT image data after image preprocessing, and identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of region candidates.
4) Analyzing the difference of clinical characteristics between AIS and MIA groups and IA groups, and selecting clinical characteristics with statistical significance of the difference based on the clinical characteristics after standardization and normalization treatment, specifically:
4.1 Analysis of the differences in clinical characteristics between the AIS and MIA groups and the IA group using two independent sample t test, mann-Whitney U rank sum test, χ 2 test and Fisher exact probability test, and selection of clinical characteristics for which the differences are statistically significant based on the clinical characteristics after normalization and normalization.
4.2 The number of inputs of the clinical characteristics selection unit is adjusted according to the selected clinical characteristics, and the number of the inputs is equal to the number of the selected clinical characteristics.
5) The clinical features with statistically significant differences and the deep-learned features of the region candidates are combined in a specific ratio (e.g., 7: 3) Dividing the risk hierarchical prediction model into a training set and a verification set, training and verifying the risk hierarchical prediction model to obtain the trained risk hierarchical prediction model, which specifically comprises the following steps:
5.1 The clinical features with statistical significance of differences and the deep learning features of region candidates are introduced into a convolutional neural network and are divided into a training set and a verification set according to a specific proportion.
5.2 Establishing a risk hierarchical prediction model in a training set, evaluating the prediction efficiency of the model by adopting the area under an ROC curve in the training set, evaluating the calibration efficiency of the model by adopting a calibration curve, checking the goodness of fit of the evaluation model by adopting a Hosmer-Lemeshow goodness of fit, evaluating the net gain and the clinical effectiveness of the model by adopting a decision curve, determining an optimal cutoff value by adopting the maximum approximation index of the ROC curve, and calculating the positive prediction value, the negative prediction value, the sensitivity, the specificity and the accuracy of the model in the training set and the verification set to finally obtain the trained risk hierarchical prediction model.
5.3 A gradient descent method is adopted to carry out fully supervised learning on the risk stratification prediction model, and each time the training set is traversed once, the verification set is adopted to carry out verification once so as to select the most appropriate model.
Example 3
The present embodiment provides a processing device corresponding to the method for constructing a risk stratification prediction model for early lung cancer provided in embodiment 2, where the processing device may be a processing device for a client, such as a mobile phone, a laptop, a tablet, a desktop, etc., to execute the method of embodiment 2.
The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program that can be executed on the processing device, and the processing device executes the method for constructing the risk stratification prediction model for early lung cancer provided in embodiment 1 when executing the computer program.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be any type of general-purpose processor such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that the above-described configurations of computing devices are merely some of the configurations associated with the present application, and do not constitute a limitation on the computing devices to which the present application may be applied, and that a particular computing device may include more or fewer components, or some components may be combined, or have a different arrangement of components.
Example 4
The present embodiment provides a computer program product corresponding to the method for constructing a risk stratification prediction model for early lung cancer provided in the embodiment 2, and the computer program product may include a computer readable storage medium carrying computer readable program instructions for executing the method for constructing a risk stratification prediction model for early lung cancer described in the embodiment 2.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are only used for illustrating the present invention, and the structure, connection mode, manufacturing process, etc. of the components may be changed, and all equivalent changes and modifications performed on the basis of the technical solution of the present invention should not be excluded from the protection scope of the present invention.

Claims (10)

1. A deep learning-based early lung cancer risk stratification prediction system is characterized by comprising:
the data acquisition module is used for acquiring thin layer CT image data and clinical characteristics of the chest of a user;
the feature extraction module is used for extracting deep learning features from chest thin-layer CT image data of a user;
the region candidate module is used for identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of the region candidates;
and the classification prediction module is used for predicting the risk stratification type of the early lung cancer of the user according to the clinical characteristics and the deep learning characteristics of the regional candidates by adopting a pre-trained risk stratification prediction model.
2. The deep learning-based early lung cancer risk stratification prediction system according to claim 1, wherein the prediction system further comprises:
the pathological feature module is used for carrying out standardization and normalization processing on the acquired clinical features; and the classification prediction module carries out prediction according to the normalized clinical characteristics and the deep learning characteristics of the region candidates.
3. The deep learning-based early lung cancer risk stratification prediction system according to claim 2, wherein the prediction system further comprises:
the model building module is used for building a risk stratification prediction model based on chest thin-layer CT image data and clinical characteristics of a plurality of lung cancer patients of each risk stratification type by adopting a convolutional neural network, and obtaining the trained risk stratification prediction model after training, testing and verifying;
and the marking module is used for marking the thin CT image data of the chest of the user to obtain a three-dimensional region of interest of the tumor, and further used for training the classification prediction module.
4. The deep learning-based early lung cancer risk stratification prediction system according to claim 3, wherein the model building module is provided with:
the data input unit is used for inputting the thin CT image data and clinical characteristics of the chest of a plurality of lung cancer patients with each risk stratification type, which are acquired in advance, and carrying out image preprocessing on the thin CT image data of the chest;
the clinical characteristic selecting unit is used for analyzing the difference of clinical characteristics among the risk stratification types and selecting the clinical characteristics of which the difference meets the preset requirement based on the clinical characteristics after standardization and normalization processing;
the training unit is used for training the feature extraction module, the region candidate module and the classification prediction module to generate a pre-constructed model of the corresponding module;
and the risk hierarchical prediction model construction unit is used for introducing clinical characteristics with difference meeting preset requirements and deep learning characteristics of the region candidates into the convolutional neural network, and establishing a risk hierarchical prediction model in the training set for training to obtain a trained risk hierarchical prediction model.
5. The deep learning-based early lung cancer risk stratification prediction system according to claim 4, wherein the corresponding models constructed by the training unit are transmitted back to the feature extraction module from the region candidate module and the classification prediction module respectively by using a gradient descent method, the models of the region candidate module are constructed by using a regression type fully supervised learning method, and the models of the classification prediction module are constructed by using a classification type fully supervised learning method.
6. A construction method of a risk stratification prediction model of early lung cancer is characterized by comprising the following steps:
acquiring thin CT image data and clinical characteristics of the chest of a plurality of lung cancer patients in each risk stratification type, and performing data processing on the thin CT image data and the clinical characteristics of the chest;
extracting deep learning features from the breast thin-layer CT image data after data processing, and identifying suspicious regions from the extracted deep learning features to obtain the deep learning features of region candidates;
analyzing the difference of the clinical characteristics of the risk stratification type, and selecting the clinical characteristics of which the difference meets the preset requirement based on the clinical characteristics after data processing;
and dividing the clinical characteristics with the difference meeting the preset requirement and the deep learning characteristics of the regional candidates into a training set and a verification set according to a specific proportion, and training and verifying the risk hierarchical prediction model to obtain the trained risk hierarchical prediction model.
7. The method as claimed in claim 6, wherein the step of obtaining thin CT image data and clinical features of the chest of several lung cancer patients of each risk stratification type and processing the thin CT image data and clinical features of the chest comprises:
acquiring thin CT image data and clinical characteristics of breasts of a plurality of lung cancer patients in each risk stratification type;
carrying out image preprocessing on the acquired thin-layer CT image data of the chest;
and carrying out standardization and normalization processing on the acquired clinical characteristics.
8. The method according to claim 7, wherein the analyzing the difference of clinical features between risk stratification types and selecting the clinical features meeting preset requirements based on the clinical features after data processing comprises:
analyzing the difference of clinical characteristics between risk stratification types by adopting two independent sample t tests, mann-Whitney U rank sum tests, X2 tests and Fisher accurate probability test methods, and selecting the clinical characteristics of which the difference meets preset requirements based on the clinical characteristics after standardization and normalization processing;
and adjusting the number of the input ends of the clinical characteristic selection unit according to the selected clinical characteristics, wherein the number of the input ends is equal to the number of the selected clinical characteristics.
9. A processing device comprising computer program instructions, wherein the computer program instructions, when executed by the processing device, are adapted to carry out the steps corresponding to the method of constructing a hierarchical prediction model of risk of early lung cancer according to any one of claims 6 to 8.
10. A computer readable storage medium, having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, are configured to implement the steps corresponding to the method for constructing a risk stratification prediction model for early lung cancer according to any one of claims 6-8.
CN202211253156.1A 2022-10-13 2022-10-13 Early lung cancer risk layered prediction system based on deep learning Pending CN115500851A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211253156.1A CN115500851A (en) 2022-10-13 2022-10-13 Early lung cancer risk layered prediction system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211253156.1A CN115500851A (en) 2022-10-13 2022-10-13 Early lung cancer risk layered prediction system based on deep learning

Publications (1)

Publication Number Publication Date
CN115500851A true CN115500851A (en) 2022-12-23

Family

ID=84510286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211253156.1A Pending CN115500851A (en) 2022-10-13 2022-10-13 Early lung cancer risk layered prediction system based on deep learning

Country Status (1)

Country Link
CN (1) CN115500851A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779179A (en) * 2023-08-22 2023-09-19 聊城市第二人民医院 Kidney cytoma background information analysis system based on support vector machine

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779179A (en) * 2023-08-22 2023-09-19 聊城市第二人民医院 Kidney cytoma background information analysis system based on support vector machine
CN116779179B (en) * 2023-08-22 2023-11-10 聊城市第二人民医院 Kidney cytoma background information analysis system based on support vector machine

Similar Documents

Publication Publication Date Title
Reig et al. Machine learning in breast MRI
US12073948B1 (en) Systems and methods for training a model to predict survival time for a patient
JP7406745B2 (en) System and method for processing electronic images for computer detection methods
JP5184087B2 (en) Methods and computer program products for analyzing and optimizing marker candidates for cancer prognosis
Bilgin et al. Cell-graph mining for breast tissue modeling and classification
US11544851B2 (en) Systems and methods for mesothelioma feature detection and enhanced prognosis or response to treatment
Dodington et al. Analysis of tumor nuclear features using artificial intelligence to predict response to neoadjuvant chemotherapy in high-risk breast cancer patients
US11508066B2 (en) Systems and methods to process electronic images for continuous biomarker prediction
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
Gao et al. Attention‐based deep learning for the preoperative differentiation of axillary lymph node metastasis in breast cancer on DCE‐MRI
Sandarenu et al. Survival prediction in triple negative breast cancer using multiple instance learning of histopathological images
Mohammed et al. The Spreading Prediction and Severity Analysis of Blood Cancer Using Scale-Invariant Feature Transform
Zhu et al. An accurate prediction of the origin for bone metastatic cancer using deep learning on digital pathological images
Jiang et al. Intratumoral analysis of digital breast tomosynthesis for predicting the Ki‐67 level in breast cancer: a multi‐center radiomics study
CN115500851A (en) Early lung cancer risk layered prediction system based on deep learning
Gao et al. Radiomics analysis based on ultrasound images to distinguish the tumor stage and pathological grade of bladder cancer
Chen et al. Identifying primary tumor site of origin for liver metastases via a combination of handcrafted and deep learning features
CN115631387B (en) Method and device for predicting lung cancer pathology high-risk factor based on graph convolution neural network
Patra et al. Two-layer deep feature fusion for detection of breast cancer using thermography images
Ding et al. Multi-center study on predicting breast cancer lymph node status from core needle biopsy specimens using multi-modal and multi-instance deep learning
Yu et al. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images
Zhou et al. Multiparametric MRI radiomics in prostate cancer for predicting Ki-67 expression and Gleason score: a multicenter retrospective study
Sobhani et al. Automated dcis identification from multiplex immunohistochemistry using generative adversarial networks
Phan et al. Deep learning for detection and classification of nuclear protein in breast cancer tissue
Pandey et al. A Comparative Analysis of Machine Learning Approaches in Endometrial Cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination