CN116449018A - 一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 - Google Patents
一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 Download PDFInfo
- Publication number
- CN116449018A CN116449018A CN202310110373.3A CN202310110373A CN116449018A CN 116449018 A CN116449018 A CN 116449018A CN 202310110373 A CN202310110373 A CN 202310110373A CN 116449018 A CN116449018 A CN 116449018A
- Authority
- CN
- China
- Prior art keywords
- protein marker
- plasma protein
- model
- adenocarcinoma
- diagnosis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000012474 protein marker Substances 0.000 title claims abstract description 46
- 102000004506 Blood Proteins Human genes 0.000 title claims abstract description 26
- 108010017384 Blood Proteins Proteins 0.000 title claims abstract description 26
- 206010051635 Gastrointestinal tract adenoma Diseases 0.000 title claims abstract description 20
- 208000009956 adenocarcinoma Diseases 0.000 title claims abstract description 20
- 238000003745 diagnosis Methods 0.000 title claims abstract description 12
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000012216 screening Methods 0.000 claims abstract description 16
- 230000014509 gene expression Effects 0.000 claims abstract description 10
- -1 CD31 Proteins 0.000 claims abstract description 3
- 102000008857 Ferritin Human genes 0.000 claims abstract description 3
- 108050000784 Ferritin Proteins 0.000 claims abstract description 3
- 238000008416 Ferritin Methods 0.000 claims abstract description 3
- 102000000802 Galectin 3 Human genes 0.000 claims abstract description 3
- 108010001517 Galectin 3 Proteins 0.000 claims abstract description 3
- 108010041834 Growth Differentiation Factor 15 Proteins 0.000 claims abstract description 3
- 102000016776 Midkine Human genes 0.000 claims abstract description 3
- 108010092801 Midkine Proteins 0.000 claims abstract description 3
- 102100023635 Alpha-fetoprotein Human genes 0.000 claims abstract 2
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 claims abstract 2
- 102100040896 Growth/differentiation factor 15 Human genes 0.000 claims abstract 2
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 claims abstract 2
- 101000623901 Homo sapiens Mucin-16 Proteins 0.000 claims abstract 2
- 101001024605 Homo sapiens Next to BRCA1 gene 1 protein Proteins 0.000 claims abstract 2
- 101000622304 Homo sapiens Vascular cell adhesion protein 1 Proteins 0.000 claims abstract 2
- 101710123134 Ice-binding protein Proteins 0.000 claims abstract 2
- 101710082837 Ice-structuring protein Proteins 0.000 claims abstract 2
- 108050003558 Interleukin-17 Proteins 0.000 claims abstract 2
- 102000013691 Interleukin-17 Human genes 0.000 claims abstract 2
- 102100023123 Mucin-16 Human genes 0.000 claims abstract 2
- 102100024616 Platelet endothelial cell adhesion molecule Human genes 0.000 claims abstract 2
- 101710107540 Type-2 ice-structuring protein Proteins 0.000 claims abstract 2
- 102100023543 Vascular cell adhesion protein 1 Human genes 0.000 claims abstract 2
- 206010009944 Colon cancer Diseases 0.000 claims description 25
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 13
- 239000000523 sample Substances 0.000 claims description 12
- 208000003200 Adenoma Diseases 0.000 claims description 9
- 230000008030 elimination Effects 0.000 claims description 9
- 238000003379 elimination reaction Methods 0.000 claims description 9
- 230000003902 lesion Effects 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 claims description 8
- 210000004369 blood Anatomy 0.000 claims description 7
- 239000008280 blood Substances 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000002790 cross-validation Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 239000007791 liquid phase Substances 0.000 claims description 3
- 210000003462 vein Anatomy 0.000 claims description 3
- 238000003556 assay Methods 0.000 claims description 2
- 238000007621 cluster analysis Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims description 2
- 239000013610 patient sample Substances 0.000 claims description 2
- 230000002349 favourable effect Effects 0.000 abstract description 3
- 206010028980 Neoplasm Diseases 0.000 description 32
- 230000035945 sensitivity Effects 0.000 description 16
- 108090000623 proteins and genes Proteins 0.000 description 10
- 102000004169 proteins and genes Human genes 0.000 description 10
- 238000012360 testing method Methods 0.000 description 6
- 208000022559 Inflammatory bowel disease Diseases 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 208000002551 irritable bowel syndrome Diseases 0.000 description 5
- 210000002381 plasma Anatomy 0.000 description 5
- 206010001233 Adenoma benign Diseases 0.000 description 4
- 239000000090 biomarker Substances 0.000 description 4
- 208000004232 Enteritis Diseases 0.000 description 3
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 3
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 230000034994 death Effects 0.000 description 3
- 231100000517 death Toxicity 0.000 description 3
- 201000002313 intestinal cancer Diseases 0.000 description 3
- 239000013642 negative control Substances 0.000 description 3
- 230000000750 progressive effect Effects 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 230000004083 survival effect Effects 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 102000004127 Cytokines Human genes 0.000 description 2
- 108090000695 Cytokines Proteins 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 201000010897 colon adenocarcinoma Diseases 0.000 description 2
- 230000000112 colonic effect Effects 0.000 description 2
- 208000029742 colonic neoplasm Diseases 0.000 description 2
- 238000002052 colonoscopy Methods 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 230000000968 intestinal effect Effects 0.000 description 2
- 239000013641 positive control Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 201000009030 Carcinoma Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 206010048832 Colon adenoma Diseases 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 201000006107 Familial adenomatous polyposis Diseases 0.000 description 1
- 206010017993 Gastrointestinal neoplasms Diseases 0.000 description 1
- 102000000597 Growth Differentiation Factor 15 Human genes 0.000 description 1
- 206010051925 Intestinal adenocarcinoma Diseases 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000005773 cancer-related death Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 208000029664 classic familial adenomatous polyposis Diseases 0.000 description 1
- 201000002758 colorectal adenoma Diseases 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000002550 fecal effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 239000004005 microsphere Substances 0.000 description 1
- 210000004877 mucosa Anatomy 0.000 description 1
- 210000004400 mucous membrane Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 239000002953 phosphate buffered saline Substances 0.000 description 1
- 230000002980 postoperative effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002271 resection Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- XOOUIPVCVHRTMJ-UHFFFAOYSA-L zinc stearate Chemical compound [Zn+2].CCCCCCCCCCCCCCCCCC([O-])=O.CCCCCCCCCCCCCCCCCC([O-])=O XOOUIPVCVHRTMJ-UHFFFAOYSA-L 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/64—Fluorescence; Phosphorescence
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6893—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids related to diseases not provided for elsewhere
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/70—Mechanisms involved in disease identification
- G01N2800/7023—(Hyper)proliferation
- G01N2800/7028—Cancer
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/30—Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Public Health (AREA)
- Chemical & Material Sciences (AREA)
- Biochemistry (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Hematology (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Urology & Nephrology (AREA)
- Analytical Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Food Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Microbiology (AREA)
- Cell Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Computation (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Software Systems (AREA)
- Primary Health Care (AREA)
Abstract
本发明公开了一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用,本发明根据肠腺瘤腺癌患者的血浆蛋白标记物表达信息,确定了与肠腺瘤腺癌相关的血浆蛋白标记物,所述血浆蛋白标记物包括GDF‑15,CD31,CD106,Galectin‑3,CD66e,Ferritin,AFP,CA125,IL‑17A和Midkine。本发明还建立了肠腺瘤腺癌患者诊断模型,便于早期筛查。本发明有助于降低基因表达检测的成本,值得在临床应用中的推广。
Description
技术领域
本发明涉及生物信息技术领域,特别涉及一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用。
背景技术
结直肠癌(colorectal cancer,CRC)是世界上第三大最常见的癌症,也是癌症相关死亡的第二大原因,2018年全球估计有180万例新病例,约88.1万人死亡。中国结直肠癌发病、死亡趋势与全球一致,但由于中国人口基数大,结直肠癌发病人数、死亡人数占全球结直肠癌发病人数、死亡人数比例相对均较高。结肠腺癌(ColonAdenocarcinoma,COAD)是结直肠癌中最常见的病理类型,而结肠腺瘤是结肠黏膜癌变过程中的一个阶段,大部分发生恶变的结肠黏膜组织在早期阶段可能是腺瘤。因此,以早筛为基础的早发现、早治疗、早诊断措施是预防其发生发展及提高存活率的重要手段,建立有效的的结直肠癌早筛诊断模型具有重要意义。
近年来,对结直肠癌早期筛查的研究不断扩大深入,但有效的早筛方式并未出现。“金标准”高准确度的侵入性结肠镜检查在中国并未完全普及,现有普及的无创筛查方式大便隐血试验虽操作简单、具有非侵入性,但是灵敏度低、特异性差。大多数使用非侵入性蛋白质生物标志物的研究只涉及1到5个蛋白质标志物。而血液检测是较为简单且普遍的诊断方法。因此,通过血浆蛋白标记物建立结直肠腺瘤腺癌早期筛查诊断模型或成为新思路。
发明内容
本发明的目的在于,提供一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用。本发明提供了用于早期结直肠癌检测的最佳血浆蛋白标记物,并依据其结果建立并验证了一种用于早期结直肠癌检测诊断模型,便于早期筛查。
本发明的技术方案:一种用于肠腺瘤腺癌诊断的血浆蛋白标记物,所述血浆蛋白标记物包括GDF-15,CD31,CD106,Galectin-3,CD66e,Ferritin,AFP,CA125,IL-17A和Midkine。
一种血浆蛋白标记物在肠腺瘤腺癌诊断模型中的应用。
上述的应用,所述诊断模型的构建方法如下:
步骤1、获取血浆蛋白标记物表达数据集:获取受试者肘静脉血液样本,使用路明克斯液相芯片高性能测定法检测血浆蛋白标记物,并通过聚类分析检测离群值并排除离群值;
步骤2、模型预测:使用R语言包中的随机森林算法训练分类器,其余样本使用生成的预测模型进行验证;
步骤3、模型训练:使用Boruta进行特征筛选,使用交叉验证递归特征消除法进行递归特征消除筛选已定义的特征;
步骤4、特征消除:使用Probatus的ShapRFECV中的机器学习模型评价指标,采用十倍交叉验证的方法,寻找训练集中所需的特征数量;在伪发现率<0.05的训练数据的每一倍至十分之九,检测血浆蛋白标记物在结直肠癌患者组和对照组之间的差异值;
步骤5、评估模型准确性:从诊断为进展期腺瘤的患者样本中作为癌前病变的独立检测。
与现有技术相比,本发明基于对血液样本进行处理得到的相应数据,应用基于路明克斯液相芯片流式荧光多重技术在血浆中与细胞因子和胃肠道癌症相关的蛋白定量91种生物标志物中,找到用于早期结直肠癌检测的最佳血浆蛋白标记物,并依据其结果通过随机森林算法建立并验证了一种用于早期结直肠癌检测的基于机器学习的血浆蛋白标记物肠腺瘤腺癌诊断模型。本发明从样本血浆中提取的miRNA进行测序和数据库比对命名后,分别通过递归特征消除法(RFE)及可加特征归因方法(SHAP算法),确定了10个最佳特征标志物,准确率基本维持在80%以上,上述实验结果阐明了以基于递归特征消除法(RFE)及可加特征归因方法(SHAP算法)构建最佳标志物的阳性高表达模型在结直肠癌早期筛查中的应用的可行性。本发明有助于降低基因表达检测的成本,值得在临床应用中的推广。
附图说明
图1展示了肠腺瘤腺癌诊断模型的构建流程示意图。
图2展示了肿瘤蛋白标志物数=1的受试者工作特征曲线。
图3展示了肿瘤蛋白标志物数=2的受试者工作特征曲线。
图4展示了肿瘤蛋白标志物数=3的受试者工作特征曲线。
图5展示了肿瘤蛋白标志物数=4的受试者工作特征曲线。
图6展示了肿瘤蛋白标志物数=5的受试者工作特征曲线。
图7展示了肿瘤蛋白标志物数=6的受试者工作特征曲线。
图8展示了肿瘤蛋白标志物数=7的受试者工作特征曲线。
图9展示了肿瘤蛋白标志物数=10的受试者工作特征曲线。
图10展示了肿瘤蛋白标志物数=15的受试者工作特征曲线。
图11展示了肿瘤蛋白标志物数=20的受试者工作特征曲线。
图12展示了肿瘤蛋白标志物数=25的受试者工作特征曲线。
图13展示了肿瘤蛋白标志物数=30的受试者工作特征曲线。
具体实施方式
下面结合附图和实施例对本发明作进一步的说明,但并不作为对本发明限制的依据。
实施例:用于早期结直肠癌检测的基于机器学习的血浆蛋白标记物肠腺瘤腺癌诊断模型的构建及验证。步骤如图1所示。
首先是肿瘤蛋白分类模型的建立以及已检测样本的统计:
(1)肿瘤蛋白分类模型的建立:本发明为了训练预测模型,从任何阶段诊断为结直肠癌(I期-IV期)的患者的样本被作为结直肠(阳性)组处理(Ⅰ/Ⅱ=100,Ⅲ/Ⅳ=104),并在模型训练中使用。同时,以健康个体(=99)、良性病变(=95)或炎症性肠病/肠易激综合征(=42)作为对照组。并且以进展期腺瘤(=94)作预测模型的独立验证。
(2)已检测样本的统计:共计检测样本数415例。其中,阴性对照组样本共176例(健康个体=63,良性病变/炎症性肠病/肠易激综合征=113),进展期腺瘤36例,共检出了99例早期结直肠癌患者(I、II期)和104例晚期结直肠(III、IV期)患者的样本。如表1所示:
表1
本发明的肿瘤蛋白样本的获取是采用xMAP微球技术分离并检测0.1mL血浆中的肿瘤蛋白,具体为:
结肠镜检查前使用两根5mL乙二胺四乙酸真空采血管从肘静脉采血。离心血液(4℃下1800×g10分钟,2次)后收集血浆,-80℃保存以备检测。检测时,0.1mL血浆用磷酸盐缓冲盐水稀释2倍。用Luminex 200路明克斯液相悬浮芯片分析系统共测量了415个样本中的91个生物标志物,包括一个人类XL细胞因子发现LxPAM固定面板(45重)和两个定制面板(24重和22重)。
根据纳入标准与排除标准对数据集进行筛选。样本的纳入标准为:
阳性对照组纳入标准:
(1)样本类型为I/II期或III/IV期结直肠癌患者血清样本;
(2)患者的无复发生存率数据可获取;
(3)检测技术为基因表达谱芯片。
以上3项标准全部满足的数据集将被纳入后续分析。
阴性对照组纳入标准:
(1)样本类型为健康人、肠炎患者或肠良性病变患者血清样本;
(2)患者的无复发生存率数据可获取;
(3)检测技术为基因表达谱芯片。
以上3项标准全部满足的数据集将被纳入后续分析。
独立验证组纳入标准:
(1)样本类型为I肠进展期腺瘤患者血清样本;
(2)检测技术为基因表达谱芯片。
以上2项标准全部满足的样本将被纳入后续分析。
样本的排除标准为:
(1)样本类型非II期结直肠癌患者手术后肿瘤组织样本;
(2)样本来源在过去4周内进行完整病灶切除术。
以上2项标准中任意1项不满足的样本将被排除。
最终纳入分析的样本包括正常人99例、肠炎患者42例、良性病变患者95例作为阴性对照组,如表2所示:
表2
肠癌I/II期患者100例、肠癌III/IV期患者104例作为阳性对照,如表3所示:
表3
进展期腺瘤患者94例作为独立验证组,如表4所示。
独立验证 |
进展期腺瘤 |
94 |
表4
(2)基于随机森林模型的机器学习;
对所选样本进行模型预测,从每组中随机选择70%的样本,使用R语言包(scikit-learn包)中随机森林(RF)算法训练分类器,其余样本使用生成的预测模型进行验证。以健康/良性病变/肠炎作为阴性,肠癌作为阳性,获得训练集和测试集,如表5所示:
训练集阳性病例数 | 训练集阴性病例数 | 测试集阳性病例数 | 测试集阴性病例数 |
142 | 122 | 61 | 53 |
表5
使用T检验进行特征(生物标志物)差异筛选,使用交叉验证递归特征消除法(RFECV方法)进行递归特征消除(RFE)方法筛选建模特征,且使用10折交叉验证来保证筛选到的特征在训练集中稳定性。然后使用python sklean模块中的sklearn.ensemble.RandomForestClassifier(sklearn.ensemble.RandomForestClassifier—scikit-learn1.1.3documentation)函数对筛选到的特征来进行建立模型。本发明中肠腺瘤腺癌诊断模型中使用的蛋白标志物如下表6所示:
表6
在剩余训练样本上评估模型预测的准确性,并绘制了去除家族性腺瘤性息肉病患者数据的肿瘤蛋白预测模型的受试者工作特征曲线,如图2-13所示。图2展示了肿瘤蛋白标志物数=1的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.937,模型特异性(Specificity)为0.868,灵敏度(sensitivity)达0.869。图3展示了肿瘤蛋白标志物数=2的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.955,模型特异性(Specificity)为0.943,灵敏度(sensitivity)达0.836。图4展示了肿瘤蛋白标志物数=3的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.929,模型特异性(Specificity)为0.925,灵敏度(sensitivity)达0.852。图5展示了肿瘤蛋白标志物数=4的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.899,模型特异性(Specificity)为0.830,灵敏度(sensitivity)达0.902。图6展示了肿瘤蛋白标志物数=5的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.868,模型特异性(Specificity)为0.736,灵敏度(sensitivity)达0.885。图7展示了肿瘤蛋白标志物数=6的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.818,模型特异性(Specificity)为0.755,灵敏度(sensitivity)达0.885。图8展示了肿瘤蛋白标志物数=7的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.969,模型特异性(Specificity)为0.943,灵敏度(sensitivity)达0.902。图9展示了肿瘤蛋白标志物数=10的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.960,模型特异性(Specificity)为0.925,灵敏度(sensitivity)达0.934。图10展示了肿瘤蛋白标志物数=15的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.964,模型特异性(Specificity)为0.943,灵敏度(sensitivity)达0.869。图11展示了肿瘤蛋白标志物数=20的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.963,模型特异性(Specificity)为0.887,灵敏度(sensitivity)达0.967。图12展示了肿瘤蛋白标志物数=25的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.977,模型特异性(Specificity)为0.943,灵敏度(sensitivity)达0.902。图13展示了肿瘤蛋白标志物数=30的受试者工作特征曲线,其中相应受试者工作特征曲线下面积(AUC)为0.971,模型特异性(Specificity)为0.943,灵敏度(sensitivity)达0.902。
从诊断为进展期腺瘤的患者(数量=35)样本中作为癌前病变的独立检测,检验各随机森林模型在结直肠癌早期检测中的性能。
经过试验,肠腺瘤腺癌诊断模型的预测性能如表7所示:
表7
去除炎症性肠病/肠易激综合征后的肿瘤蛋白预测样本汇总如表8所示:
表8
去除炎症性肠病/肠易激综合征后的肠腺瘤腺癌诊断模型的预测性能如表9所示:
表9
从表7和表9中可以看出,本发明所构建的肠腺瘤腺癌诊断模型的预测性能基本维持在80%-90%左右,特异性和灵敏度也基本在90%左右,去除炎症性肠病/肠易激综合征后的预测性能也在80-90%,特异性和灵敏度维持在80%以上。这证实本发明提出的最佳血浆蛋白标记物可以用于早期结直肠癌检测。同时本发明所建立的肠腺瘤腺癌诊断模型也具有较高的准确性和灵敏度。上述实验结果阐明了以基于递归特征消除法(RFE)及可加特征归因方法(SHAP算法)构建最佳标志物的阳性高表达模型在结直肠癌早期筛查中的应用的可行性。本发明有助于降低基因表达检测的成本,值得在临床应用中的推广。
Claims (3)
1.一种用于肠腺瘤腺癌诊断的血浆蛋白标记物,其特征在于:所述血浆蛋白标记物包括GDF-15,CD31,CD106,Galectin-3,CD66e,Ferritin,AFP,CA125,IL-17A和Midkine。
2.如权利要求1所述血浆蛋白标记物在肠腺瘤腺癌诊断模型中的应用。
3.根据权利要求2所述的应用,其特征在于:所述诊断模型的构建方法如下:
步骤1、获取血浆蛋白标记物表达数据集:获取受试者肘静脉血液样本,使用路明克斯液相芯片高性能测定法检测血浆蛋白标记物,并通过聚类分析检测离群值并排除离群值;
步骤2、模型预测:使用R语言包中的随机森林算法训练分类器,其余样本使用生成的预测模型进行验证;
步骤3、模型训练:使用Boruta进行特征筛选,使用交叉验证递归特征消除法进行递归特征消除筛选已定义的特征;
步骤4、特征消除:使用Probatus的ShapRFECV中的机器学习模型评价指标,采用十倍交叉验证的方法,寻找训练集中所需的特征数量;在伪发现率<0.05的训练数据的每一倍至十分之九,检测血浆蛋白标记物在结直肠癌患者组和对照组之间的差异值;
步骤5、评估模型准确性:从诊断为进展期腺瘤的患者样本中作为癌前病变的独立检测。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310110373.3A CN116449018B (zh) | 2023-02-14 | 2023-02-14 | 一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310110373.3A CN116449018B (zh) | 2023-02-14 | 2023-02-14 | 一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116449018A true CN116449018A (zh) | 2023-07-18 |
CN116449018B CN116449018B (zh) | 2023-09-05 |
Family
ID=87132676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310110373.3A Active CN116449018B (zh) | 2023-02-14 | 2023-02-14 | 一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116449018B (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060269972A1 (en) * | 2003-08-14 | 2006-11-30 | Ian Smith | Method of diagnosing colorectal adenomas and cancer using infrared spectroscopy |
US20070184499A1 (en) * | 2004-08-13 | 2007-08-09 | Hartmut Juhl | Use of transthyretin as a biomarker for colorectal adenoma and/or carcinoma; method for detection and test system |
WO2010065968A1 (en) * | 2008-12-05 | 2010-06-10 | Myriad Genetics, Inc. | Cancer detection markers |
CN109085359A (zh) * | 2017-06-13 | 2018-12-25 | 中国医学科学院肿瘤医院 | 血清蛋白标志物组合在结直肠癌筛查和诊治中的应用 |
CN114924073A (zh) * | 2022-03-28 | 2022-08-19 | 武汉迈特维尔生物科技有限公司 | 结直肠进展期肿瘤诊断标志物组合及其应用 |
CN115656511A (zh) * | 2022-11-13 | 2023-01-31 | 诸暨市人民医院 | 用于消化系统肿瘤体外诊断的标志物及试剂盒 |
-
2023
- 2023-02-14 CN CN202310110373.3A patent/CN116449018B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060269972A1 (en) * | 2003-08-14 | 2006-11-30 | Ian Smith | Method of diagnosing colorectal adenomas and cancer using infrared spectroscopy |
US20070184499A1 (en) * | 2004-08-13 | 2007-08-09 | Hartmut Juhl | Use of transthyretin as a biomarker for colorectal adenoma and/or carcinoma; method for detection and test system |
WO2010065968A1 (en) * | 2008-12-05 | 2010-06-10 | Myriad Genetics, Inc. | Cancer detection markers |
CN109085359A (zh) * | 2017-06-13 | 2018-12-25 | 中国医学科学院肿瘤医院 | 血清蛋白标志物组合在结直肠癌筛查和诊治中的应用 |
CN114924073A (zh) * | 2022-03-28 | 2022-08-19 | 武汉迈特维尔生物科技有限公司 | 结直肠进展期肿瘤诊断标志物组合及其应用 |
CN115656511A (zh) * | 2022-11-13 | 2023-01-31 | 诸暨市人民医院 | 用于消化系统肿瘤体外诊断的标志物及试剂盒 |
Also Published As
Publication number | Publication date |
---|---|
CN116449018B (zh) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109830264B (zh) | 肿瘤患者基于甲基化位点进行分类的方法 | |
CN109061164B (zh) | 用于非小细胞肺癌诊断的组合标志物及其应用 | |
CN111863250B (zh) | 一种早期乳腺癌的联合诊断模型及系统 | |
CN111833965A (zh) | 一种尿沉渣基因组dna的分类方法、装置和用途 | |
US20170059581A1 (en) | Methods for diagnosis and prognosis of inflammatory bowel disease using cytokine profiles | |
KR101378919B1 (ko) | 혈액으로부터 직접 폐암 진단 및 폐암의 서브타입 진단이 가능한 마커를 선별하는 시스템 생물학적 방법 및 이로부터 선별된 혈액으로부터 직접 폐암 진단 및 폐암 서브타입 진단이 가능한 마커 | |
US11401560B2 (en) | Set of genes for bladder cancer detection and use thereof | |
CN111833963A (zh) | 一种cfDNA分类方法、装置和用途 | |
CN114277143B (zh) | 外泌体arpc5、cda等在肺癌诊断中的应用 | |
CN115410713A (zh) | 一种基于免疫相关基因的肝细胞癌预后风险预测模型构建 | |
CN112831562A (zh) | 一种用于预测肝癌患者切除术后复发风险的生物标志物组合、试剂盒 | |
CN106156541B (zh) | 分析个体两类状态的免疫差异的方法和装置 | |
WO2015042454A1 (en) | Compositions, methods and kits for diagnosis of lung cancer | |
CN106156540B (zh) | 分析个体两类状态的免疫差异、辅助确定个体状态的方法 | |
CN111763740B (zh) | 基于lncRNA分子模型预测食管鳞癌患者新辅助放化疗的疗效和预后的系统 | |
CN116449018B (zh) | 一种用于肠腺瘤腺癌诊断的血浆蛋白标记物及应用 | |
CN114758719B (zh) | 一种结直肠癌预测系统及其应用 | |
CN115521982A (zh) | 基于MLP构建结直肠癌血清外泌体miRNA诊断分类器 | |
CN115820860A (zh) | 基于增强子甲基化差异的非小细胞肺癌标志物筛选方法及其标志物和应用 | |
CN116344027B (zh) | 基于外周血循环微核糖核酸及蛋白的肠腺瘤腺癌诊断方法 | |
CN116287248B (zh) | 一种用于肠腺瘤腺癌诊断的miRNA基因及应用 | |
CN114854859A (zh) | 一种基于血小板内FlnA基因表达量的肺结节良恶性诊断方法 | |
CN113637760A (zh) | 血浆游离dna甲基化检测辅助卵巢癌早期诊断的方法 | |
CN116287248A (zh) | 一种用于肠腺瘤腺癌诊断的miRNA基因及应用 | |
CN110993092A (zh) | 一种基于n-糖指纹图谱和大数据算法鉴别肝硬化及肝癌的方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |