WO2024065987A1 - 一种基于影像、病理和基因多组学的肺癌预后预测系统 - Google Patents
一种基于影像、病理和基因多组学的肺癌预后预测系统 Download PDFInfo
- Publication number
- WO2024065987A1 WO2024065987A1 PCT/CN2022/132945 CN2022132945W WO2024065987A1 WO 2024065987 A1 WO2024065987 A1 WO 2024065987A1 CN 2022132945 W CN2022132945 W CN 2022132945W WO 2024065987 A1 WO2024065987 A1 WO 2024065987A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- lung cancer
- omics
- imaging
- cancer prognosis
- Prior art date
Links
- 206010058467 Lung neoplasm malignant Diseases 0.000 title claims abstract description 29
- 201000005202 lung cancer Diseases 0.000 title claims abstract description 29
- 208000020816 lung neoplasm Diseases 0.000 title claims abstract description 29
- 238000004393 prognosis Methods 0.000 title claims abstract description 26
- 238000001959 radiotherapy Methods 0.000 claims abstract description 29
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 17
- 230000001575 pathological effect Effects 0.000 claims abstract description 12
- 230000005855 radiation Effects 0.000 claims abstract description 11
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000010801 machine learning Methods 0.000 claims abstract description 5
- 238000003384 imaging method Methods 0.000 claims description 21
- 230000007170 pathology Effects 0.000 claims description 19
- 230000002068 genetic effect Effects 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 9
- 206010028980 Neoplasm Diseases 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 210000004072 lung Anatomy 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000002790 cross-validation Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 claims description 2
- 238000007490 hematoxylin and eosin (H&E) staining Methods 0.000 claims description 2
- 238000003364 immunohistochemistry Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims 1
- 238000002059 diagnostic imaging Methods 0.000 claims 1
- 210000004881 tumor cell Anatomy 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000001558 permutation test Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241001250090 Capra ibex Species 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- 230000037149 energy metabolism Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000010837 poor prognosis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the present invention belongs to the technical field related to lung cancer prognosis prediction, and in particular relates to a lung cancer prognosis prediction system based on imaging, pathology and gene multi-omics.
- Lung cancer is one of the major malignant diseases that seriously affect human health, and radiotherapy is an important means of clinical treatment for lung cancer.
- radiotherapy is an important means of clinical treatment for lung cancer.
- the clinical efficacy evaluation of lung cancer radiotherapy is not accurate enough, and it is impossible to achieve dynamic reverse adjustment of the radiotherapy plan based on the efficacy prediction results.
- the reasons for failure of radiotherapy due to tumor heterogeneity may be diverse. Tumor cell type, gene expression, mitochondrial energy metabolism, etc. may lead to unsatisfactory radiotherapy results. If the differences in lesions in medical images and pathological sections cannot be tracked in time during radiotherapy, and targeted adjustments to the radiotherapy plan are not made, the treatment effect will be seriously affected.
- the present invention proposes a lung cancer prognosis prediction system based on imaging, pathology and genetic multi-omics, which integrates multi-omics information such as imaging, pathology and genes, and incorporates a method of radiotherapy radiation dose.
- the prognosis of radiotherapy results of lung cancer patients is predicted and analyzed through multi-scale multi-omics data types, thereby improving the accuracy of disease diagnosis and treatment.
- a multi-omics lung cancer prognosis prediction system based on imaging, pathology and genes comprising:
- Data acquisition module obtains and processes the patient's pre-radiotherapy medical impact map, pathological section scan, genetic test data, and radiotherapy radiation dose distribution data as feature data sets;
- Prediction module Use machine learning algorithms to train prediction models on the feature data in the feature data set, and obtain the trained prediction model for lung cancer prognosis prediction.
- the present invention provides a method based on deep learning, integrating multi-omics information such as imaging, pathology and genes, and incorporating radiotherapy radiation dose as a method for predicting the prognosis of radiotherapy results for patients undergoing radiotherapy for lung cancer.
- FIG1 is a schematic diagram of a flow chart of a first embodiment of the present invention.
- This embodiment discloses a multi-omics lung cancer prognosis prediction system based on imaging, pathology and genes, including:
- Data acquisition module obtains and processes the patient's pre-radiotherapy medical impact map, pathological section scan, genetic test data, and radiotherapy radiation dose distribution data as a historical data set;
- Prediction module Use machine learning algorithms to train prediction models on data in historical data sets, and obtain trained prediction models for lung cancer prognosis prediction.
- the system software product serves as a standard desktop application software that supports the reading, browsing and ROI drawing of medical images, pathological images, radiation dose distribution and genetic data.
- Integrate AI multimodal imaging omics analysis model to realize functions such as radiotherapy effectiveness evaluation and survival rate prediction.
- It adopts a modular design framework, including user interface module, file data management module, radiotherapy plan format data set reading and browsing module, radiotherapy plan format data ROI drawing and management module, pathology image reading and browsing module, pathology image ROI drawing and management module, gene data form reading and browsing module, efficacy prediction model algorithm integration and result display module, and structured report generation and preview module.
- the clinical characteristics of the patients such as age, gender, histological type, smoking status, and overall stage, were first analyzed.
- Radiomics reflects the spatial heterogeneity of tumors at a macro level, while pathological omics reflects the size, morphology and tumor microenvironment of tumor cells at a micro level.
- Radiomics features are extracted from pre-radiotherapy medical images, mainly including shape features, first-order statistical features, high-order texture features, filter transformation features and deep imaging features.
- Image shape features, light features, texture features, and Gaussian Laplacian filter features are obtained using IBEX tools, etc.
- the shape features describe the shape features and 3D dimensions of the ROI, including volume, mass (the sum of CT values within the specified ROI multiplied by the voxel size), surface area, sphericity, and roundness.
- Intensity features include first-order statistical descriptors derived directly from image intensity and intensity histograms.
- Texture features are high-order statistical descriptors calculated using gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), and neighborhood intensity difference matrix (NIDM). Texture features are calculated in different 3D directions and averaged in all directions as the final eigenvalues to approximate a rotationally invariant system.
- LoG filter features include intensity and texture features extracted after applying a medium LoG filter to the image.
- the image Before the radiomics feature extraction, the image needs to be resampled to a voxel size of 1.3 ⁇ 1.3 ⁇ 5 mm and the grayscale range needs to be re-divided to reduce the differences in radiomics features caused by different patient image scanning and reconstruction parameters.
- HE staining and immunohistochemistry were performed on the tumor tissue samples before radiotherapy, and manual counting and artificial intelligence algorithms were used to extract semi-quantitative and quantitative pathological image features, respectively.
- RNA sequencing was used to perform genetic testing on tumor tissue samples of selected patients before radiotherapy. Based on statistical methods, a statistically significant (p ⁇ 0.05) differential gene expression matrix, i.e., genomic features, was obtained.
- Clinical data as well as genetic, pathological and imaging data are individualized test data for patients and have nothing to do with clinical treatment.
- radiation dose distribution data is a treatment plan given by clinicians.
- TPS treatment planning systems such as pinnacle or eclipse
- the radiation dose distribution of the contour volume calculated after lung heterogeneity correction is obtained, and metrological factors such as the average dose are extracted from the dose-volume histogram.
- the specific dose distribution refers to the collection of the patient's dose distribution information on the planned CT image before radiotherapy, the dose distribution of the planned target volume (PTV) (Dmin, Dmax and Dmean, etc.) and the lung dose distribution (average lung dose, V5 and V20, etc.).
- the radiation dose distribution is integrated, which can cover almost all the medical data and clinical indications of the patient, and a large amount of data is cleaned, screened and modeled using feature selection and prediction methods, and the multi-center omics data collected by the algorithm is used for model verification. Specifically, feature dimension reduction is performed through the existing multi-center data, and the top 20% best predictors with the highest statistical weight are selected as omics features, and low-weight predictors are ignored.
- a machine learning network was constructed based on multi-objective Bayesian classification. Specifically, a sequential selection of the best combination of features for classifying good prognosis group and poor prognosis group backward elimination support vector machine (SBE-SVM) model was used.
- SBE-SVM backward elimination support vector machine
- the data features acquired in the data acquisition module are normalized in the range of –1 to 1 using min-max normalization to reduce potential bias caused by large fluctuations in data size.
- the training set was used to train and optimize the model in a 10-fold stratified cross validation.
- the sequential backward elimination support vector machine (SBE-SVM) model was used to select the best feature combination for the classification of the RP group with good treatment response and the non-PR group with poor treatment effect.
- SBE-SVM After all features are input into SBE-SVM, SBE-SVM removes each feature from the feature set in turn to determine whether the classification loss decreases or does not change. If the loss decreases or does not change, these features are deleted from the final feature set, and the best feature combination is finally screened out.
- SBE-SVM not only effectively reduces the dimension of features, but also has a simple algorithm and good robustness.
- a permutation test with 5000 permutations was used to evaluate the statistical significance of the model. The test set is constant for all permutations.
- the class labels of the patients in the training set were randomly permuted 5000 times to obtain 5000 versions of the training labels.
- the SBE-SVM classifier was trained and optimized using the training data and the permuted labels and then used to predict the labels of the data in the test set.
- the P value of the permutation test was defined as the number of classifiers that achieved an accuracy greater than or equal to the actual accuracy divided by the number of permutations (including classifiers using the actual labels). Models with P ⁇ 0.05 were considered statistically significant.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Epidemiology (AREA)
- Genetics & Genomics (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Artificial Intelligence (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
本发明提出了一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,包括:获取模块:获取患者放疗前医学影响图、病理切片扫描图、基因测试数据和放射治疗辐射剂量分布数据并进行处理,作为特征数据集;预测模块:对特征数据集中的特征数据利用机器学习算法训练预测模型,得到训练好的预测模型用于肺癌预后预测。
Description
本发明要求于2022年9月27日提交中国专利局、申请号为202211186592.1、发明名称为“一种基于影像、病理和基因多组学的肺癌预后预测系统”的中国专利申请的优先权,其全部内容通过引用结合在本发明中。
本发明属于肺癌预后预测相关技术领域,尤其涉及一种基于影像、病理和基因多组学的肺癌预后预测系统。
本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。
肺癌是严重影响人类健康的重大恶性疾病之一,放射治疗是肺癌临床治疗的重要手段。然而由于肿瘤细胞分型、间质和免疫微环境的不同,以及肿瘤内部空间异质性差异,使得肺癌放疗的临床疗效评估不够准确,无法实现放疗计划基于疗效预测结果的动态逆向调整。
肿瘤异质性导致的放疗方案失败原因可能是多样的。肿瘤细胞类型、基因表达、线粒体能量代谢等可能导致放疗结果不理想。如果不能在放疗过程中及时追踪病灶在医学影像和病理切片上的差异,不对放疗计划进行针对性调整,将严重治疗效果。
现有的方案中对肺癌预后通常仅使用单一类型的数据进行预后预测,其存在预测不准确的问题。
发明内容
为克服上述现有技术的不足,本发明提出一种基于影像、病理和基因多组学的肺癌预后预测系统,融合影像、病理和基因等多组学信息,并纳 入放疗辐射剂量的方法,通过多尺度多组学的数据类型对肺癌放疗患者放疗结果预后进行预测分析,提高了疾病诊断和资治疗精准度。
为实现上述目的,本发明的一个或多个实施例提供了如下技术方案:一种基于影像、病理和基因的多组学肺癌预后预测系统,包括:
数据获取模块:获取患者放疗前医学影响图、病理切片扫描图、基因测试数据和放射治疗辐射剂量分布数据并进行处理,作为特征数据集;
预测模块:对特征数据集中的特征数据利用机器学习算法训练预测模型,得到训练好的预测模型用于肺癌预后预测。
以上一个或多个技术方案存在以下有益效果:
本发明提供了基于深度学习,融合影像、病理和基因等多组学信息,并纳入放疗辐射剂量的方法,作为对肺癌放疗患者放疗结果预后预测分析。
本发明附加方面的优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。
图1为本发明实施例一中流程示意图。
应该指出,以下详细说明都是示例性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。
在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组 合。
实施例一
本实施例公开了一种基于影像、病理和基因的多组学肺癌预后预测系统,包括:
数据获取模块:获取患者放疗前医学影响图、病理切片扫描图、基因测试数据和放射治疗辐射剂量分布数据并进行处理,作为历史数据集;
预测模块:对历史数据集中的数据利用机器学习算法训练预测模型,得到训练好的预测模型用于肺癌预后预测。
基于桌面开发系统软件化产品后,做为标准的桌面应用软件,支持医学影像、病理图像、辐射剂量分布以及基因数据的读取、浏览与ROI绘制。
集成AI多模态图像组学分析模型,实现放疗结果有效性评估及生存率预测等功能。
采用模块化的设计框架,包括用户界面模块、文件数据管理模块、放疗计划格式数据集读取与浏览模块、放疗计划格式数据ROI绘制与管理模块、病理图像读取与浏览模块、病理图像ROI绘制与管理模块、基因数据表单读取与浏览模块、疗效预测模型算法集成与结果显示模块、结构化报告生成与预览模块。
在数据获取模块中,在确定入组样本后,首先分析患者年龄、性别、组织学类型、是否吸烟和总体分期等临床特征。
影像组学宏观上体现肿瘤内部空间异质性,病理组学体现的微观上肿瘤细胞的大小、形态和肿瘤微环境。对于放疗前医学影像图提取影像组学特征,主要包括:形状特征、一阶统计学特征、高阶纹理特征、滤波变换特征以及深度影像学特征。
用IBEX工具等获取图像形状特征、轻度特征、纹理特征和高斯拉普拉斯滤波特征,形状特征描述了ROI的形状特征和3维尺寸,包括体积、质量(指定ROI内CT值的总和乘以体素尺寸)、表面积、球形度和圆度。 强度特征包括直接从图像强度和强度直方图导出的一阶统计描述符。纹理特征是使用灰度共生矩阵(GLCM)、灰度游程长度矩阵(GLRLM)和邻域强度差矩阵(NIDM)计算的高阶统计描述符。纹理特征在不同的3维方向上计算,并在所有方向上平均作为最终特征值,以近似旋转不变系统。LoG过滤特征包括在将中等LoG过滤器应用于图像后提取的强度和纹理特征。
在影像组学特征提取之前,需要先对图像进行重新采样到1.3×1.3×5mm的体素大小和重新划分灰度范围,以减小因患者图像扫描和重建参数不同而造成的影像组学特征的差异。
对于病理切片扫描图,对放疗前肿瘤组织样本分别进行HE染色和免疫组化检测,分别采用人工计数和人工智能算法提取半定量化和定量化的病理图像特征。
对于基因测试数据,用bulk RNA测序方法,对入选患者放疗前的肿瘤组织样本进行基因检测,基于统计学方法,得到具有统计学意义(p≤0.05)的差异基因表达矩阵,即基因组学特征。
临床数据以及基因、病理和影像数据是患者个体化的检验数据,与临床的治疗无关。但辐射剂量分布数据作为肺癌疗效的一种潜在预测因子,是临床工作者给出的治疗方案。在pinnacle或eclipse等TPS治疗计划系统中,获取肺异质性校正计算轮廓体积的辐射剂量分布,在剂量-体积直方图中提取平均剂量等计量学因子。具体的剂量分布是指收集放疗前计划CT图像上患者的剂量分布信息,计划靶区体积(PTV)的剂量分布(Dmin、Dmax和Dmean等)和肺的剂量分布(平均肺部剂量、V5和V20等)。
本实施例中综合了辐射剂量分布,辐射剂量分布能够涵盖患者的几乎所有的医疗数据和临床指征,使用特征选择和预测方法对大量的数据进行清洗、筛选和建模,并使用算法已收集的多中心的组学数据进行模型验证。具体为通过已有的多中心数据,进行特征降维,筛选统计学上最高权重的前20%最佳预测因子作为组学特征,对于低权重预测因子予以忽略。
在本实施例中,利用10倍分层交叉验证消除了组学特征之间的冗余。在每一折中,计算放射组学特征之间的Spearman相关系数,其中|rs|>0.85表明相关性强。如果2个特征强相关,则剔除2组患者之间p值较大的特征;如果p值相同,则随机删除1。仅使用每个折叠的训练集中的数据执行,而忽略了测试集中的数据以避免引入偏差。那些在10折中至少保留2次的特征用于后续的特征选择和分类。
在预测模块中,基于多目标贝叶斯分类构建机器学习网络。具体的,使用循序选择用于分类预后良好组和预后较差组特征的最佳组合后向消除支持向量机(SBE-SVM)模型。
将数据获取模块中获取的数据特征均使用min-max归一化在–1到1的范围内归一化,以减少数据规模大波动引起的潜在偏差。
将所有患者随机分配到训练和测试集中(5:3比例)以实现模型的可靠估计表现。
训练集用于在10倍分层交叉验证中训练和优化模型。
使用顺序反向消除支持向量机(SBE-SVM)模型为治疗响应良好RP和治疗效应较差非PR组的分类选择了最佳特征组合。
SBE-SVM算法首先使用所有特征训练SVM分类模型,评估10倍分层CV中的分类损失。
所有特征被输入SBE-SVM后,SBE-SVM从特征集中依次移除每个特征,以确定分类损失是减少还是没有变化,如果损失减少或没有改变,则从最终的特征集中删除这些特征,最终筛选出最佳特征组合。
SBE-SVM不仅对特征进行了有效降维,而且算法简单具有良好的鲁棒性。使用具有5000个排列的排列检验来评估模型的统计显着性。测试集对于所有排列都是恒定的。
将训练集中患者的类别标签随机排列5000次,得到5000个版本的训练标签。对于每个版本,SBE-SVM分类器使用训练数据和置换标签进行 训练和优化,然后用于预测测试集中数据的标签。置换检验的P值定义为获得高于或等于实际准确度的准确率的分类器数除以置换数(包括使用实际标签的分类器)。P<0.05的模型被认为具有统计学意义。
上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。
Claims (10)
- 一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,包括:获取模块:获取患者放疗前医学影像特征、病理切片扫描图、基因测试数据和放射治疗辐射剂量分布数据并进行处理,作为特征数据集;预测模块:对特征数据集中的特征数据利用机器学习算法训练预测模型,得到训练好的预测模型用于肺癌预后预测。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在获取模块中,所述医学影像特征包括形状特征、一阶统计学特征、高阶纹理特征、滤波变换特征和深度影像学特征。
- 如权利要求2所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,所述形状特征描述了ROI的形状特征和3维尺寸,包括体积、质量、表面积、球形度和圆度;所述高阶纹理特征为使用灰度共生矩阵、灰度游程长度矩阵和邻域强度差矩阵计算的高阶统计描述符;所述滤波变换特征是基于LoG应用于图像所提取的强度和纹理特征。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在获取模块中,基于病理切片扫描图,对放疗前肿瘤组织样本分别进行HE染色和免疫组化检测,分别采用人工计数和人工智能算法提取半定量化和定量化的病理图像特征。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在获取模块中,放射治疗辐射剂量分布包括计划靶区体积的剂量分布和肺的剂量分布。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后 预测系统,其特征在于,在所述预测模块中,采用SBE-SVM分类器作为预测模型。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在预测模块中,在预测模型的训练过程利用分层交叉验证计算不同特征之间的相关性。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在获取模块中,对所获取的数据特征进行归一化处理。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在预测模块中,采用排列校验评估预测模型的统计显著性。
- 如权利要求1所述的一种基于影像、病理和基因的多组学肺癌预后预测系统,其特征在于,在获取模块中,对医学影像特征提取前,对医学影像图进行重新采样和重新划分灰度范围。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211186592.1A CN115497623A (zh) | 2022-09-27 | 2022-09-27 | 一种基于影像、病理和基因多组学的肺癌预后预测系统 |
CN202211186592.1 | 2022-09-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024065987A1 true WO2024065987A1 (zh) | 2024-04-04 |
Family
ID=84473172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/132945 WO2024065987A1 (zh) | 2022-09-27 | 2022-11-18 | 一种基于影像、病理和基因多组学的肺癌预后预测系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115497623A (zh) |
WO (1) | WO2024065987A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116230237B (zh) * | 2023-05-06 | 2023-07-21 | 四川省医学科学院·四川省人民医院 | 一种基于roi病灶特征的肺癌影响评价方法和系统 |
CN116862861B (zh) * | 2023-07-04 | 2024-06-21 | 浙江大学 | 一种基于多组学的胃癌治疗疗效的预测模型训练、预测方法及系统 |
CN116580841B (zh) * | 2023-07-12 | 2023-11-10 | 北京大学 | 基于多组学数据的疾病诊断设备、装置及存储介质 |
CN116682576B (zh) * | 2023-08-02 | 2023-12-19 | 浙江大学 | 一种基于双层图卷积神经网络的肝癌病理预后系统及装置 |
CN116994770B (zh) * | 2023-09-27 | 2024-01-02 | 四川省医学科学院·四川省人民医院 | 一种基于多维度分析的免疫人群确定方法和系统 |
CN117133466B (zh) * | 2023-10-26 | 2024-05-24 | 中日友好医院(中日友好临床医学研究所) | 基于转录组学和影像组学的生存预测方法及装置 |
CN117238484B (zh) * | 2023-11-13 | 2024-02-27 | 福建自贸试验区厦门片区Manteia数据科技有限公司 | 智能化诊疗装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112635063A (zh) * | 2020-12-30 | 2021-04-09 | 华南理工大学 | 一种肺癌预后综合预测模型、构建方法及装置 |
US20210110540A1 (en) * | 2019-10-11 | 2021-04-15 | Case Western Reserve University | Predicting tumor prognoses based on a combination of radiomic and clinico-pathological features |
CN112820403A (zh) * | 2021-02-25 | 2021-05-18 | 中山大学 | 一种基于多组学数据预测癌症患者预后风险的深度学习方法 |
CN112951406A (zh) * | 2021-01-27 | 2021-06-11 | 安徽理工大学 | 一种基于ct影像组学的肺癌预后辅助评估方法及系统 |
CN113610845A (zh) * | 2021-09-09 | 2021-11-05 | 汕头大学医学院附属肿瘤医院 | 肿瘤局部控制预测模型的构建方法、预测方法及电子设备 |
-
2022
- 2022-09-27 CN CN202211186592.1A patent/CN115497623A/zh active Pending
- 2022-11-18 WO PCT/CN2022/132945 patent/WO2024065987A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210110540A1 (en) * | 2019-10-11 | 2021-04-15 | Case Western Reserve University | Predicting tumor prognoses based on a combination of radiomic and clinico-pathological features |
CN112635063A (zh) * | 2020-12-30 | 2021-04-09 | 华南理工大学 | 一种肺癌预后综合预测模型、构建方法及装置 |
CN112951406A (zh) * | 2021-01-27 | 2021-06-11 | 安徽理工大学 | 一种基于ct影像组学的肺癌预后辅助评估方法及系统 |
CN112820403A (zh) * | 2021-02-25 | 2021-05-18 | 中山大学 | 一种基于多组学数据预测癌症患者预后风险的深度学习方法 |
CN113610845A (zh) * | 2021-09-09 | 2021-11-05 | 汕头大学医学院附属肿瘤医院 | 肿瘤局部控制预测模型的构建方法、预测方法及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN115497623A (zh) | 2022-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024065987A1 (zh) | 一种基于影像、病理和基因多组学的肺癌预后预测系统 | |
US10646156B1 (en) | Adaptive image processing in assisted reproductive imaging modalities | |
US20200395117A1 (en) | Adaptive image processing method and system in assisted reproductive technologies | |
US10426442B1 (en) | Adaptive image processing in assisted reproductive imaging modalities | |
Bakas et al. | Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features | |
Rahman et al. | MRI brain tumor detection and classification using parallel deep convolutional neural networks | |
US20220301714A1 (en) | Method for predicting lung cancer development based on artificial intelligence model, and analysis device therefor | |
US20200286614A1 (en) | A system and method for automated labeling and annotating unstructured medical datasets | |
EP3751513A1 (en) | Adaptive image processing in assisted reproductive imaging modalities | |
Estépar et al. | Computational vascular morphometry for the assessment of pulmonary vascular disease based on scale-space particles | |
US20230154006A1 (en) | Rapid, accurate and machine-agnostic segmentation and quantification method and device for coronavirus ct-based diagnosis | |
CN109829488B (zh) | 一种头颈部癌症局部复发预测装置 | |
Xu et al. | Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients | |
CN111784704B (zh) | Mri髋关节炎症分割与分类自动定量分级序贯方法 | |
Gates et al. | Glioma segmentation and a simple accurate model for overall survival prediction | |
CN113269799A (zh) | 一种基于深度学习的宫颈细胞分割方法 | |
JP2024043567A (ja) | 特徴分離に基づく病理画像特徴抽出器の訓練方法、訓練装置、電子機器、記憶媒体及び病理画像分類システム | |
Xiao et al. | MFMANet: Multi-feature Multi-attention Network for efficient subtype classification on non-small cell lung cancer CT images | |
Tian et al. | Radiomics and its clinical application: artificial intelligence and medical big data | |
Hu et al. | Automatic detection of melanins and sebums from skin images using a generative adversarial network | |
CN114926487A (zh) | 多模态影像脑胶质瘤靶区分割方法、系统及设备 | |
Zhang et al. | Developing a weakly supervised deep learning framework for breast cancer diagnosis with HR status based on mammography images | |
González et al. | Deep convolutional neural network to predict 1p19q co-deletion and IDH1 mutation status from MRI in low grade gliomas | |
Yang et al. | Segmentation method of magnetic resonance imaging brain tumor images based on improved UNet network | |
CN115132275B (zh) | 基于端到端三维卷积神经网络预测egfr基因突变状态方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22960584 Country of ref document: EP Kind code of ref document: A1 |