CN111916154B - 一种预测肠癌肝转移的诊断标志物及用途 - Google Patents
一种预测肠癌肝转移的诊断标志物及用途 Download PDFInfo
- Publication number
- CN111916154B CN111916154B CN202010712472.5A CN202010712472A CN111916154B CN 111916154 B CN111916154 B CN 111916154B CN 202010712472 A CN202010712472 A CN 202010712472A CN 111916154 B CN111916154 B CN 111916154B
- Authority
- CN
- China
- Prior art keywords
- intestinal cancer
- methylation
- liver metastasis
- metastasis
- dmr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010027476 Metastases Diseases 0.000 title claims abstract description 74
- 230000009401 metastasis Effects 0.000 title claims abstract description 73
- 210000004185 liver Anatomy 0.000 title claims abstract description 66
- 208000005016 Intestinal Neoplasms Diseases 0.000 title claims abstract description 44
- 201000002313 intestinal cancer Diseases 0.000 title claims abstract description 44
- 239000003550 marker Substances 0.000 title claims abstract description 20
- 230000011987 methylation Effects 0.000 claims abstract description 49
- 238000007069 methylation reaction Methods 0.000 claims abstract description 49
- 239000003153 chemical reaction reagent Substances 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 6
- 238000007637 random forest analysis Methods 0.000 abstract description 5
- 238000012502 risk assessment Methods 0.000 abstract 1
- 238000013058 risk prediction model Methods 0.000 abstract 1
- 101000782147 Homo sapiens WD repeat-containing protein 20 Proteins 0.000 description 29
- 238000012216 screening Methods 0.000 description 17
- 238000012163 sequencing technique Methods 0.000 description 15
- 108091029430 CpG site Proteins 0.000 description 13
- 210000001519 tissue Anatomy 0.000 description 11
- 206010009944 Colon cancer Diseases 0.000 description 10
- 239000000523 sample Substances 0.000 description 10
- 238000012549 training Methods 0.000 description 10
- 208000029742 colonic neoplasm Diseases 0.000 description 8
- 238000010200 validation analysis Methods 0.000 description 6
- 201000011510 cancer Diseases 0.000 description 5
- 238000003745 diagnosis Methods 0.000 description 5
- 238000012165 high-throughput sequencing Methods 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 206010028980 Neoplasm Diseases 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 3
- 238000013145 classification model Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012164 methylation sequencing Methods 0.000 description 3
- 102000012406 Carcinoembryonic Antigen Human genes 0.000 description 2
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 2
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 2
- 206010027457 Metastases to liver Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000000427 antigen Substances 0.000 description 2
- 108091007433 antigens Proteins 0.000 description 2
- 102000036639 antigens Human genes 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000002271 resection Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 101150090724 3 gene Proteins 0.000 description 1
- 101150033839 4 gene Proteins 0.000 description 1
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 description 1
- 101150000258 Ca4 gene Proteins 0.000 description 1
- 101100004286 Caenorhabditis elegans best-5 gene Proteins 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 1
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 1
- 101150088271 ZRANB3 gene Proteins 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 238000002052 colonoscopy Methods 0.000 description 1
- 238000002591 computed tomography Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002440 hepatic effect Effects 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000002175 menstrual effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 210000003240 portal vein Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 101150114784 rgs7bp gene Proteins 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Pathology (AREA)
- Evolutionary Biology (AREA)
- Genetics & Genomics (AREA)
- Immunology (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Microbiology (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Biochemistry (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Quality & Reliability (AREA)
- Databases & Information Systems (AREA)
- Oncology (AREA)
- Epidemiology (AREA)
Abstract
本发明涉及一种肠癌肝转移的甲基化标志物及用途,属于分子生物医学技术领域。首次通过肠癌的甲基化来研究肠癌肝转移,首次发现肠癌的肝转移相关的甲基化差异在肠癌早期时已出现,并筛选出肠癌发生肝转移与未发生肝转移的甲基化差异位点,通过随机森林的方法,以最佳的5个差异性甲基化区域(Differentially Methylated Region,DMR)建立肠癌甲基化肝转移风险预测模型,适用于预测早期肠癌将来发生肝转移的风险评估。
Description
技术领域
本发明涉及一种肠癌肝转移的甲基化标志物及用途,属于分子生物医学技术领域。
背景技术
在全球范围内,结肠癌的发病率处于恶性肿瘤的第三位,每年大约有120万新增病例。根据2015年最新统计,我国肠癌病死率占恶性肿瘤的第5位,总死亡率为191/10万,其中男性111/10万,女性80/10万。研究显示,50%以上结直肠癌患者确诊时均发生远处转移,且远处转移主要是肝脏。肠癌的转移途径主要有经血行转移、腹膜转移和经淋巴结转移等,最常转移的器官是肝脏,由于肠系膜血管在解剖学结构上向门静脉引流,因此结肠癌肝转移的机率较高,初次手术时通常即有10%-20%的患者已伴有肝转移,约有40%-50%在术后2年左右出现肝转移,而结肠癌病例死亡尸解时肝转移率更可高达80%。结肠癌肝转移后最常用的治疗方法为手术切除肝转移灶,但只有很少一部分患者(10%-20%)适合手术切除,且70%患者术后可能复发。肝转移是结肠癌临床治疗的重大难题之一,也是结肠癌引起高死亡率的主要因素,因此及时发现预测以进行有效的阻断结肠癌的肝转移成为提高结肠癌患者生存率的重要突破点。实验室检查包括大便常规、大便潜血试验,影像检查、B超、CT扫描、结肠镜检查等等;常用标志物为癌胚抗原(CEA)、结直肠癌抗原(CCA)、CA19-9,但是这些抗原检测一般只能够判断是否患癌,或者癌症是否复发,无法预测是否会发生肝转移。因此,发现肝转移相关标记物用于预测肝转移尤为重要。
发明内容
本发明的目的是:提供了一种对早期肠癌手术原发灶组织样本进行甲基化高通量测序,通过对高通量测序结果进行肝转移与非肝转移组差异甲基化分析、构建模型,实现了对肠癌肝转移和非肝转移精准预测的目的。
本发明的第一个方面,提供了:
一种肠癌肝转移的诊断标志物,包括5个甲基化区域,所述的甲基化区域在基因组上的位置如下所示:
chr5:63862001-63863000(RGS7BP Gene body);chr17:58236001-58237000(CA4Gene body);chr2:21856001-21857000(Intergenic);chr2:241626001-241627000(Intergenic);chr2:136279001-136280000(ZRANB3 Gene body)。
本发明的第二个方面,提供了:
上述的诊断标志物在用于制备对肠癌肝转移诊断试剂中的应用。
在一个实施方式中,所述的应用还包括如下步骤:
S1:获取肠癌组织样本,提取DNA,并构建甲基化测序文库,进行测序;
S2:测序数据对比至参考基因组,获得标志物的测序数据结果;
S3:获取每个标志物的区域上发生了甲基化的CpG位点的甲基化率数值;
S4:将各个标志物的区域的甲基化率数值作为自变量,是否发生肝转移作为因变量,构建分类器,进行模型的训练后,得到分类模型;再根据分类模型对待测样本进行是否发生肝转移的预测。
在一个实施方式中,步骤S3中,甲基化率是通过在标志物的区域中发生了甲基化的CpG位点上的发生甲基化的reads数除以所述的CpG位点的总reads数计算得到。
在一个实施方式中,所述的参考基因组是hg19版。
在一个实施方式中,所述的分类器是指XGBoost(eXtreme Gradient Boosting)算法构建的分类器。
在一个实施方式中,所述的分类模型是以发生肝转移的概率作为输出值。
在一个实施方式中,所述的应用是指用于提高预测肝转移的特异性和敏感性。
本发明的第三个方面,提供了:
一种用于对肠癌肝转移的诊断标志物进行筛选的系统,包括:
DNA提取模块,用于对获得的肠癌组织样本进行DNA提取;
甲基化文库构建模块,用于对获得的肠癌组织样本进行甲基化处理并进行测序文库的构建;
测序模块,用于对甲基化文库进行高通量测序;
对比模块,用于将测序数据比对到参考基因组并获得每个甲基化区域中的发生了甲基化的CpG位点,并获得在所述的CpG位点上的有甲基化的reads数和未甲基化的reads数;
甲基化率数值计算模块,用于计算每个甲基化区域上的甲基化率;
第一筛选模块,用于挑选出发生和未发生肠癌肝转移的患者中具有显著性差异的甲基化区域,作为第一筛选标志物结果;
第二筛选模块,用于对第一筛选标志物结果进行重要性排序,获得重要性靠前的甲基化区域,作为第二筛选标志物结果;
第三筛选模块,用于对第二筛选标志物按照对是否发生肠癌肝转移的分类能力进行排序,获得预测性好的区域,作为用于对肠癌肝转移的诊断标志物。
在一个实施方式中,所述的每个甲基化区域上的甲基化率是根据在这个区域上的全部的发生了甲基化的CpG位点上的有甲基化的reads数除以甲基化与未甲基化的总reads数计算得到。
在一个实施方式中,第二筛选模块可以运行随机森林分类器。
在一个实施方式中,第三筛选模块可以运行XGBoost(eXtreme GradientBoosting)算法分类器。
本发明的第四个方面,提供了:
一种计算机可读取介质,其记载有可以运行对肠癌肝转移进行诊断的计算机程序;所述的计算机程序包括执行以下步骤:
获得肠癌组织样品进行甲基化测序后得到的测序数据;
将测序数据比对到参考基因组并获得每个甲基化区域中的发生了甲基化的CpG位点,并获得在所述的CpG位点上的有甲基化的reads数和未甲基化的reads数;
计算每个甲基化区域上的甲基化率;
挑选出发生和未发生肠癌肝转移的患者中具有显著性差异的甲基化区域,作为第一筛选标志物结果;
对第一筛选标志物结果进行重要性排序,获得重要性靠前的甲基化区域,作为第二筛选标志物结果;
对第二筛选标志物按照对是否发生肠癌肝转移的分类能力进行排序,获得预测性好的区域,作为用于对肠癌肝转移的诊断标志物。
有益效果
本发明首次基于原发灶手术组织样本高通量测序,提供了甲基化与肠癌肝转移关系的诊断模型,该模型能够诊断早期肠癌中发展为肝转移的可能性,具有通量高、检测特异性和敏感性高的优点。
附图说明
图1:肠癌甲基化差异DMR作为肠癌肝转移预测模型的研究设计和实验流程图
图2:肝转移原发灶和无肝转移的原发灶差异甲基化heatmap图
图3:最佳建模DMR组合筛选
图4:最佳5个DMR组合在肝转移和非肝转移中甲基化差异柱状图
图5:5个DMR通过留一法结合XGBoost建模,训练组40次结果的ROC曲线图
图6:5个DMR通过留一法结合XGBoost建模,验证组40次结果的ROC曲线图
图7:独立验证集19例样本输入模型进行验证的ROC曲线图
具体实施方式
本发明首次基于早期肠癌原发灶手术组织样本高通量测序,提供了甲基化与肠癌肝转移关系的诊断模型,该模型能够诊断早期肠癌中发展为肝转移的可能性,提高了肠癌肝转移预测的特异性和敏感性。
本发明的实验方法步骤如图1所示。
本发明中的涉及到的人群样本的情况
从2012.7-2018.12收集了59例肠癌患者原发灶组织和部分患者癌旁以及转移灶的样本,且患者具有5年及以上的随访监测确定其在肠癌原发灶手术切除后后期有无进展发生肝转移,其中有10例患者同时具备原发灶和后续出现转移的转移灶样本,所有入组的患者签署知情同意书。肠癌原发灶和肝转移灶样本均为活体组织经病理学确认结果,将入组人员分为训练组和验证组,信息如下:
表1训练组人员的临床信息
表2验证组人员的临床信息
测序方法
本发明中对获取待测肠癌组织样本,提取DNA,重亚硫酸盐处理,构建甲基化测序文库。采用使用illumina测序平台对甲基化文库进行测序。测序完成下机后,使用bcl2fastq生成fastq原始序列,再使用trimmomatic对原始数据进行质控,去除接头和低质量的碱基。得到的cleanData使用bismark进行基因组(hg19)的比对。比对后得到发生了甲基化的CpG位点,再根据获得的位点确定出在每个CpG位点甲基化的reads数和在这个位点区域上未甲基化的reads数。然后,通过DMRfinder软件,对样品组进行成对的比较以发现差异性甲基化区域(DMR,Differentially Methylated Region),在一个DMR区域内含有一个或多个CpG位点,并将在这个DMR区域内所有CpG位点甲基化reads数之和除以在这个DMR区间内所有CpG位点甲基化与未甲基化总reads数之和,得到DMR的甲基化率。通过以上的测序和数据处理步骤,可以获得每个患者样本中的每个DMR区域的甲基化率。
另外,将10例同时具备癌旁组织、早期原发灶和后续疾病进展出现转移的转移灶样本的甲基化率进行分析,将肝转移原发灶和无肝转移的原发灶进行差异甲基化对比分析,将具有显著差异的DMR做heatmap,如图2所示。发现早期肝转移对比未发生肝转移的显著差异在晚期转移灶中同步看到该趋势差异,这表明肝转移相关的甲基化信号在早期中已出现。
建模的DMR筛选
从肠癌肝转移的原发灶和未肝转移的原发灶中选取40例作为训练集,剩余19例作为验证集,将训练集样本通过对比肝转移原发灶和未发生肝转移原发灶的甲基化值比对,用于判定对于是否发生肝转移哪些DMR区域的甲基化值是具有显著差异的,进而初步筛选出具有显著差异的DMR,共得到197个DMR。
接下来,通过随机森林方法对初步筛选得到的197个DMR的预测能力(对于是否发生了肝转移的分类)进行排序,对训练集执行了100次重复随机森林计算,并为每个森林选择了1100棵树。根据袋外误差逐步消除DMR,然后按100次重复计算重要性总名次从前到后对候选DMR进行排名。如表3所示。
重要性以平均基尼(Gini)指数下降程度评估。平均Gini指数下降程度,是计算每个变量对分类树每个节点上观测值的异质性的影响,从而比较变量的重要性,该值越大表示该变量的重要性越大。该排序是以平均基尼(Gini)指数下降程度来进行排列。
表3候选DMR对肝转移影响的重要性从最高到低进行排名列表(top 100 DMR)
模型构建
按照上面得到的排列顺序,再进一步地从随机森林排布的top 1个DMR到top 197个DMR组合通过留一法结合XGBoost对训练集40个样本进行计算,如图3所示,横坐标为TOP1-197个DMR组合,纵坐标为每次组合所计算的模型的AUC,筛选出top 5个DMR结合对肠癌肝转移的预测能力最强,AUC达到最高,为0.94。
所筛选的5个DMR为:chr5:63862001-63863000(RGS7BP Gene body);chr17:58236001-58237000(CA4 Gene body);chr2:21856001-21857000(Intergenic);chr2:241626001-241627000(Intergenic);chr2:136279001-136280000(ZRANB3 Gene body)。
所筛选出的5个DMR性能在训练组肝转移和非肝转移中表现如图4所示。通过留一法5个DMR建模训练组AUC为1,如图5所示。且每次留取的一个样本作为验证组,共运行40次,40次验证组结果的AUC达到0.94,如图6所示。
模型验证
将独立验证集19个样本的5个DMR甲基化率输入已构建的模型中进行模型性能进一步验证,验证所得AUC值达到0.87,如图7所示。模型的敏感性和特异性分别为85.7%和91.7%。
表4模型在验证集中敏感性与特异性
可以看出本方案中得到的标记物可以较好预测肠癌肝转移。
Claims (1)
1.诊断标志物在制备肠癌肝转移诊断试剂中的应用,其特征在于,所述的诊断标志物是由5个甲基化区域所组成,所述的甲基化区域在参考基因组上的位置如下所示:
chr5:63862001-63863000;
chr17:58236001-58237000;
chr2:21856001-21857000;
chr2:241626001-241627000;
chr2:136279001-136280000;
所述的参考基因组是hg19版。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010712472.5A CN111916154B (zh) | 2020-07-22 | 2020-07-22 | 一种预测肠癌肝转移的诊断标志物及用途 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010712472.5A CN111916154B (zh) | 2020-07-22 | 2020-07-22 | 一种预测肠癌肝转移的诊断标志物及用途 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111916154A CN111916154A (zh) | 2020-11-10 |
CN111916154B true CN111916154B (zh) | 2023-12-01 |
Family
ID=73280632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010712472.5A Active CN111916154B (zh) | 2020-07-22 | 2020-07-22 | 一种预测肠癌肝转移的诊断标志物及用途 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111916154B (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113436741B (zh) * | 2021-07-16 | 2023-02-28 | 四川大学华西医院 | 基于组织特异增强子区域dna甲基化的肺癌复发预测方法 |
CN113913333B (zh) * | 2021-10-20 | 2022-09-02 | 南京世和基因生物技术股份有限公司 | 一种肺癌诊断标志物及用途 |
CN115094142B (zh) * | 2022-07-19 | 2024-05-28 | 中国医学科学院肿瘤医院 | 用于诊断肺肠型腺癌的甲基化标志物 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110157804A (zh) * | 2019-04-04 | 2019-08-23 | 广州优泽生物技术有限公司 | 用于肺癌诊断、疗效预测或预后的甲基化位点、检测引物及试剂盒 |
CN110656173A (zh) * | 2019-09-06 | 2020-01-07 | 中国医学科学院肿瘤医院 | 乳腺癌预后评估模型及其建立方法 |
CN111172279A (zh) * | 2019-12-17 | 2020-05-19 | 中国医学科学院肿瘤医院 | 外周血甲基化基因及idh1联合检测诊断肺癌模型 |
-
2020
- 2020-07-22 CN CN202010712472.5A patent/CN111916154B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110157804A (zh) * | 2019-04-04 | 2019-08-23 | 广州优泽生物技术有限公司 | 用于肺癌诊断、疗效预测或预后的甲基化位点、检测引物及试剂盒 |
CN110656173A (zh) * | 2019-09-06 | 2020-01-07 | 中国医学科学院肿瘤医院 | 乳腺癌预后评估模型及其建立方法 |
CN111172279A (zh) * | 2019-12-17 | 2020-05-19 | 中国医学科学院肿瘤医院 | 外周血甲基化基因及idh1联合检测诊断肺癌模型 |
Also Published As
Publication number | Publication date |
---|---|
CN111916154A (zh) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Radiomic analysis of contrast-enhanced CT predicts microvascular invasion and outcome in hepatocellular carcinoma | |
Kuntz et al. | Gastrointestinal cancer classification and prognostication from histology using deep learning: Systematic review | |
CN111916154B (zh) | 一种预测肠癌肝转移的诊断标志物及用途 | |
CN114171115B (zh) | 一种差异性甲基化区域筛选方法及其装置 | |
TW202410055A (zh) | 用於資料分類之卷積神經網路系統及方法 | |
Pesch et al. | Biomarker research with prospective study designs for the early detection of cancer | |
CN105319364B (zh) | 用于预测小肝癌复发的联合诊断标记 | |
CN110438228A (zh) | 结直肠癌dna甲基化标志物 | |
CN109830264B (zh) | 肿瘤患者基于甲基化位点进行分类的方法 | |
CN108021788B (zh) | 基于细胞游离dna的深度测序数据提取生物标记物的方法和装置 | |
Kim et al. | Pre-operative prediction of advanced prostatic cancer using clinical decision support systems: accuracy comparison between support vector machine and artificial neural network | |
CN104046624B (zh) | 用于肺癌预后的基因及其应用 | |
CN107326065A (zh) | 一种基因标识物的筛选方法及其应用 | |
Wang et al. | Dual energy CT image prediction on primary tumor of lung cancer for nodal metastasis using deep learning | |
Veerankutty et al. | Artificial Intelligence in hepatology, liver surgery and transplantation: Emerging applications and frontiers of research | |
CN106929567A (zh) | 评估一个体罹患肝癌之风险以及罹患肝癌预后的方法 | |
Gurbani et al. | Evaluation of radiomics and machine learning in identification of aggressive tumor features in renal cell carcinoma (RCC) | |
Zidane et al. | A review on deep learning applications in highly multiplexed tissue imaging data analysis | |
CN115881312A (zh) | 一种ii期结直肠癌预后预测方法、系统、智能终端和计算机可读存储介质 | |
CN115287353B (zh) | 一种肝癌血浆游离dna来源的甲基化标志物及用途 | |
CN116805509A (zh) | 结直肠癌免疫治疗预测标志物的构建方法及应用 | |
CN113811621A (zh) | 确定rcc亚型的方法 | |
Su | An Old Concept with a New Twist | |
Caranfil et al. | Artificial Intelligence and Lung Pathology | |
CN115295126B (zh) | 预测胃癌错配修复基因缺陷的模型 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |