CN113223714B - 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法 - Google Patents

一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法 Download PDF

Info

Publication number
CN113223714B
CN113223714B CN202110510509.0A CN202110510509A CN113223714B CN 113223714 B CN113223714 B CN 113223714B CN 202110510509 A CN202110510509 A CN 202110510509A CN 113223714 B CN113223714 B CN 113223714B
Authority
CN
China
Prior art keywords
samples
serum
model
pregnancy
random forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110510509.0A
Other languages
English (en)
Other versions
CN113223714A (zh
Inventor
陈颖
左红斌
魏本杰
马玲玉
丛华剑
杜昭励
王合
于沛勇
苏鹤
杨海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Province Yinfeng Bioengineering Technology Co ltd
Jilin University
Original Assignee
Jilin Province Yinfeng Bioengineering Technology Co ltd
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Province Yinfeng Bioengineering Technology Co ltd, Jilin University filed Critical Jilin Province Yinfeng Bioengineering Technology Co ltd
Priority to CN202110510509.0A priority Critical patent/CN113223714B/zh
Publication of CN113223714A publication Critical patent/CN113223714A/zh
Application granted granted Critical
Publication of CN113223714B publication Critical patent/CN113223714B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法,属于生物医学领域,本发明利用基因多态性检测,筛选出易感基因499个,结合46个临床检测数据,利用计算机深度学习方法,制备子痫前期风险预测模型,可以实现子痫前期风险的预测。本发明的模型设计主要依赖于计算机机器学习中的随机森林算法,将基因多态性检测结果与临床检测数据转化为构建模型所需的数字特征向量,随机森林中决策树的数量设置为1000棵,训练过程采用有放回地随机抽样方法来构建训练集,将袋外错误率样本(未抽取到的样本)作为测试集来计算模型的误差率。

Description

一种用于预测子痫前期风险的基因组合、子痫前期风险预测 模型及其构建方法
技术领域
本发明属于生物医学技术领域,具体涉及一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法。
背景技术
子痫前期是指妊娠20周以后,出现血压升高和蛋白尿,并可出现头痛、眼花、恶心、呕吐、上腹不适等症状。子痫是由子痫前期发展成更为严重的症状,引起抽搐发作或昏迷,可导致严重的母儿并发症。子痫前期的发生率约为孕妇的5-10%,初产妇、有高血压及血管疾病的孕妇更为常见。子痫前期根据发病时间的早晚,分为早发型和晚发型子痫前期。早发型子痫前期是指在妊娠34周前发病的子痫前期,由于发病早,胎儿尚未成熟,因此对孕产妇的影响尤为严重。由于子痫前期和子痫的病因尚不清楚,多因素在其中发挥作用,主要包括遗传因素、营养、免疫、代谢、生活习惯等,是一种环境与遗传多因素共同发挥作用的疾病。一项超过32000名女性参与的的荟萃分析结果证实,在对于本病高风险人群,在妊娠中期口服阿司匹林可以降低15%的发病风险,也是防治本病的一个重要突破口。显然,单纯依赖阿司匹林并不能完全预防本病的发生。因此从多角度挖掘本病的致病原因,针对于每个个体进行个性化综合评估,在疾病发生之前进行准确的预测高危人群,成为高危产科医生近年来研究的热点。
子痫前期的遗传易感性在国内外均有报道,其中较为常见的表现是基因多态性变化。基因多态性是指在一个生物群体中,同时和经常存在两种或多种不连续的变异型或基因型(genotype)或等位基因(allele),亦称遗传多态性(genetic polymorphism)或基因多态性。生物群体基因多态性现象十分普遍,通常分为3大类:DNA片段长度多态性、DNA重复序列多态性、单核苷酸多态性。人类基因多态性在阐明人体对疾病、毒物的易感性与耐受性,疾病临床表现的多样性,以及对药物治疗的反应性上都起着重要的作用。基因多态性是目前复杂性疾病的重要发病机制之一,而子痫前期这一疾病,也有大篇幅的多种SNP位点被广泛报道。但是,目前尚未证实某一单基因是子痫前期的直接致病基因。而环境因素也广泛参与其中,例如营养、代谢、免疫、BMI、甲状腺功能异常等,这些所谓的环境因素其实也存在着各自的遗传背景,那么,就很难区别这些环境因素是否与子痫前期存在共同的遗传背景。
此外,基因遗传本身存在明显的种族和地域差异,在不同种族中,基因的作用存在明显差异,因此,其它民族的研究成果无法在我国照搬应用,我国必需设计本民族自发研究的基因panel。基因检测Panel是高通量基因检测和基因测序发展起来后用的一个词语,它是指在检测中不只是检测一个位点、一个基因,而是同时检测多个基因和多个位点,这些位点和基因需要按照一个标准进行选择和组合,从而构成一个基因检测Panel。因此,基因检测Panel可以翻译成为基因组合。因此,基因Panel实际上是一个基因组合,在基因检测中使用基因Panel所检测的基因比单一的位点要多,比PCR技术检测的序列要长,相对来说,获得的基因信息量要多一些。
子痫前期的预测模型一直以来都是产科领域的研究热点,目前包括如下几个方面:1、病史联合体检指标的预测模型:本模型涉入预测参数包括病史、家族史、身高、体重、孕前血压。模型缺点是不具备个性化,预测效能会因为参数的变化存在可变性,不准确,难以获得信服并推广。2、特异性蛋白检测:多种差异表达的蛋白检测用于本病的预测,通常这一模型的应用在妊娠中期采用。常用的差异表达的蛋白水平提示患者已经出现了早期病理生理改变,此外蛋白水平存在较多的影响因素,检测过程也存在保存困难,水平不稳定的问题。3、基因预测模型:临床中的单基因预测模型多为国外的基因公司研发的试剂盒,由于基因位点存在种族差异,因此无法在我国开展。4、通过母亲因素,子宫动脉搏动指数(UtA-PI),平均动脉压(MAP)和血清胎盘生长因子(PlGF)的综合筛查可以预测40%的早产PE和33%的足月PE,这一组合性筛查是目前国际妊娠高血压研究学会所推荐的效率最高的一种方法,但超声监测子宫动脉的技术需要专业培训,且受监测者操作水平的影响较大,因此尚不能广泛普及。
综上所述,目前现有的子痫前期风险预测模型的准确率还有待于提高,而所有的模型与疾病的防治之间存在失关联状态,从而难以确定针对性的诊疗方案。因此,研发高效的用于预测子痫前期风险的基因组合和子痫前期风险预测模型成为了目前本病防治的突破口。
发明内容
本发明提供一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法,以实现高效的子痫前期风险预测。
本发明为解决技术问题所采用的技术方案如下:
本发明的一种用于预测子痫前期风险的基因组合,包括499个SNP位点,分别为:rs1111875、rs3764650、rs890293、rs4934、rs2230806、rs405509、rs12983082、rs2516049、rs10830963、rs2000813、rs7202116、rs3764261、rs7079、rs1805094、rs8178847、rs1421811、rs715987、rs8096897、rs574957、rs10744835、rs1991391、rs7977406、rs11066188、rs2036914、rs8099917、rs12979860、rs10048158、rs1801689、rs9264942、rs9277535、rs2523393、rs2230204、rs7107152、rs12086634、rs1242229、rs2236242、rs3779512、rs1864163、rs1549758、rs5491、rs7578597、rs7209395、rs6042935、rs121912702、rs155524、rs7640747、rs7926335、rs743395、rs1800206、rs11977526、rs3804099、rs1799750、rs1049337、rs28362491、rs10120688、rs1051740、rs2167270、rs7804372、rs11568821、rs12203592、rs1799724、rs1805192、rs2066847、rs1800451、rs5030737、rs4766578、rs4149056、rs4675378、rs281874770、rs4321325、rs11018628、rs1800794、rs1805010、rs780093、rs3853839、rs7805969、rs7574865、rs8178822、rs333、rs2384550、rs925489、rs57659670、rs853326、rs180223、rs1472565、rs2596623、rs231779、rs4850755、rs6885099、rs1800693、rs2046045、rs4704397、rs1137617、rs10733113、rs653178、rs10774625、rs2517532、rs9910950、rs8178848、rs16950081、rs4925295、rs963167、rs2647528、rs2319125、rs4581、rs2288493、rs79154414、rs2277698、rs2279744、rs2217332、rs8178819、rs6906021、rs2736340、rs4915077、rs1523127、rs1554973、rs17145713、rs1041981、rs11538264、rs204999、rs2476601、rs6457452、rs9277534、rs3099844、rs4149584、rs492899、rs17630235、rs9322331、rs556442、rs3890182、rs10938397、rs4773724、rs1275988、rs1558902、rs1800437、rs17577、rs13385、rs2074311、rs642858、rs11651270、rs2237895、rs231840、rs1799999、rs2292239、rs7163757、rs2670660、rs11066280、rs7178572、rs4721、rs391300、rs4828038、rs7713645、rs3087243、rs17817449、rs455060、rs2075290、rs13702、rs3021094、rs1260326、rs2237892、rs1550805、rs6259、rs2083637、rs2779248、rs1137933、rs3847987、rs77060950、rs7709243、rs7770619、rs1801278、rs7903146、rs4506565、rs17681684、rs17696736、rs1050828、rs231357、rs199972616、rs4416547、rs10509305、rs12762303、rs2280275、rs2014408、rs1532624、rs3746429、rs1926723、rs11105354、rs7439293、rs445925、rs3850641、rs7194256、rs867186、rs3184504、rs5743293、rs825476、rs7580658、rs1801693、rs854560、rs1056917、rs61996318、rs9898、rs13146272、rs4311994、rs1654433、rs11105357、rs4148189、rs4576240、rs11751198、rs11968400、rs11970154、rs12210887、rs13118、rs2075015、rs2285321、rs2394392、rs2508015、rs2523995、rs2524272、rs25527、rs28986465、rs29243、rs3130663、rs3130685、rs3132584、rs64036、rs6457254、rs6924270、rs6933400、rs6937357、rs73055442、rs7749235、rs9261800、rs9267547、rs9468805、rs9500864、rs1441756、rs15285、rs9941065、rs2596574、rs11085421、rs3785617、rs78010183、rs2292354、rs289717、rs247616、rs879922、rs1799945、rs2097055、rs662、rs1532085、rs3757354、rs3813082、rs326、rs662799、rs12597002、rs4131229、rs964184、rs13105517、rs1260333、rs2075291、rs5104、rs670、rs17482753、rs2043085、rs4969168、rs7350481、rs12678919、rs6671879、rs4743771、rs328、rs285、rs6720173、rs10096633、rs301、rs1003723、rs2303790、rs2266788、rs429358、rs10503669、rs1051931、rs7756935、rs10790162、rs5174、rs562556、rs4253728、rs2269702、rs66698963、rs9326246、rs7016880、rs1801394、rs1805087、rs9852991、rs12230074、rs9815354、rs6768438、rs957525、rs9816772、rs11024074、rs10506974、rs3754777、rs1173771、rs13394970、rs17249754、rs176185、rs2070759、rs6410、rs11105378、rs2586886、rs2004776、rs11191548、rs35444、rs6433027、rs3749585、rs16998073、rs1378942、rs2681492、rs6495122、rs12413409、rs7726475、rs5049、rs1401982、rs1105378、rs16948048、rs11014166、rs11568020、rs381815、rs11065987、rs1131882、rs11572325、rs880315、rs4612666、rs651007、rs671、rs10507391、rs3135506、rs10743565、rs1426409、rs193921036、rs200898934、rs201223301、rs374976508、rs7080、rs2237076、rs3765407、rs3820059、rs9831647、rs1862176、rs9939609、rs9638978、rs9393931、rs9381475、rs9380142、rs9340799、rs927332、rs909253、rs846910、rs843010、rs842991、rs836135、rs836132、rs821466、rs7963771、rs7943316、rs763780、rs7579169、rs7571613、rs7564968、rs7412、rs732609、rs699947、rs699、rs698090、rs6802220、rs661348、rs6594013、rs6550005、rs6489992、rs6478974、rs6269、rs6025、rs56124946、rs5442、rs5051、rs4842666、rs4818、rs479200、rs4784744、rs4769613、rs4762、rs4633、rs4289236、rs4150196、rs3918227、rs3905000、rs3819526、rs3812475、rs3803012、rs3801266、rs3783550、rs3773663、rs3773640、rs3761548、rs3735481、rs366510、rs35821928、rs3025039、rs2954033、rs2854371、rs284277、rs2681472、rs266729、rs2638953、rs261334、rs2596622、rs25648、rs2549782、rs233115、rs2322659、rs231775、rs2297518、rs2287848、rs2287845、rs2275913、rs2271037、rs2241766、rs2236852、rs2236711、rs2234693、rs2232365、rs2230820、rs222133、rs2200733、rs2161983、rs2074611、rs2070744、rs2059806、rs2010963、rs1991515、rs193741、rs1884082、rs1805388、rs1805017、rs1801133、rs1801131、rs1800896、rs1800872、rs1800629、rs1800469、rs1800450、rs1799983、rs1799963、rs1799889、rs17783344、rs17686866、rs1710、rs16972197、rs16972194、rs1695、rs16846876、rs1610696、rs1570360、rs1501299、rs1424954、rs1358340、rs13429458、rs1341667、rs13405728、rs13401889、rs1319501、rs12831006、rs12711941、rs12707079、rs12579302、rs12150550、rs12150220、rs1205、rs12035521、rs11895934、rs11792480、rs11646213、rs1155708、rs115015150、rs1143627、rs1130409、rs11209026、rs11190179、rs11129420、rs11105368、rs11105364、rs11105328、rs111033530、rs10898392、rs10889677、rs10811661、rs10739778、rs1063320、rs1014064、rs10121110、rs1010、rs1004467。
本发明还提供一种采用上述的用于预测子痫前期风险的基因组合构建的子痫前期风险预测模型。
本发明的一种子痫前期风险预测模型,主要包括:
数据预处理模块,将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量表示每一例样本,对每一例样本数据标注其患病情况;
模型构建模块,根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型;
模型误差率计算模块,随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的袋外错误率样本,将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率。
优选的,所述46个临床检测数据,分别为:
年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR、血红蛋白、白细胞、中性粒细胞、血小板分布宽度、平均血小板体积、促甲状腺素、游离甲状腺素T3、游离甲状腺素T4、胆固醇、甘油三酯、高密度脂蛋白、低密度脂蛋白、TG/HDL、国际标准化比值、凝血酶原活动度、凝血酶原时间、凝血酶时间、凝血酶原时间比值、纤维蛋白原、活化部分凝血活酶时间、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性。
本发明的一种子痫前期风险预测模型的构建方法,主要包括以下步骤:
步骤一、数据预处理
将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量来表示每一例样本,同时对每一例样本数据标注其患病情况;
步骤二、模型构建
根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型;
步骤三、模型误差率计算
随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的袋外错误率样本,将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率。
优选的,步骤一中,所述46个临床检测数据,分别为:
年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR、血红蛋白、白细胞、中性粒细胞、血小板分布宽度、平均血小板体积、促甲状腺素、游离甲状腺素T3、游离甲状腺素T4、胆固醇、甘油三酯、高密度脂蛋白、低密度脂蛋白、TG/HDL、国际标准化比值、凝血酶原活动度、凝血酶原时间、凝血酶时间、凝血酶原时间比值、纤维蛋白原、活化部分凝血活酶时间、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性。
优选的,步骤二的具体过程如下:
样本总数为401例,97例早发型子痫前期患者作为患病组一,107例晚发型子痫前期患者作为患病组二,197例正常妊娠女性作为对照组;
(1)训练
训练过程采用有放回地随机抽样方法来构建训练集,对401例样本进行有放回地抽样401次,将抽到的样本所对应的数字特征向量用于一棵决策树的构建;
(2)选择决策树上的每一个节点的特征
设M为输入样本的特征数,对于每个节点分裂时,先从这M个特征中选择m个特征,在m个特征中选择最佳的分裂点进行分裂;
(3)完成单棵决策树的生长;
(4)多棵决策树生成随机森林模型
将生成的多棵决策树合并融合起来,生成一个包含1000棵决策树的随机森林模型;
(5)结果预测
统计随机森林中每一棵决策树的预测结果,通过投票法从这些预测结果中选出最佳的预测结果作为最终的预测结果。
优选的,步骤三的具体过程如下:
将这1/3未抽取到的样本经过随机森林算法预测得到类别,然后与真实值进行比较,求出模型误差率。
本发明的有益效果是:
本发明设计了一组遗传易感基因的基因组合(基因Panel),用于子痫前期风险的预测,在此基础上结合病史、体格检查及临床检测数据综合性构建了子痫前期风险预测模型。与现有技术相比,本发明具有以下优点:
1、现有的基因Panel多数以与子痫前期相关的位点作为设计基础,收集的SNPs位点数量有限,且并未发现明确的致病位点。而本发明所设计的基因Panel,在吸取了前人工作经验的基础上,囊括了近30年国内外公开发表的子痫前期风险基因,同时收集了与子痫前期风险因素相关的SNPs位点,基因Panel的设计结果,相对稳定,是在现有经费支持下的最大基因检测Panel。
2、由于本发明的基因Panel不仅涉及子痫前期,而且还囊括了免疫、感染、肿瘤、原发性高血压、抗磷脂抗体综合征、甲状腺功能异常、血栓、肾病、不孕、红斑狼疮、血管损伤、免疫紊乱、血小板聚集等多领域疾病,因此本发明的基因Panel的研发及应用将扩展到上述提到的多领域疾病的预测、诊断、治疗,为今后我国多基因疾病的诊疗工作奠定一定基础。
3、本发明的基因Panel可用于子痫前期的基因诊断、治疗以及相关蛋白水平的检测和治疗。
4、本发明依靠我国孕产妇的遗传信息,同时结合临床检查和生化检测结果建立了子痫前期风险预测模型,该模型具有良好的稳定性和个性化,患者遗传信息稳定,检测一次,基因数据不会改变,终身受用;每一份检验标本均实现遗传-环境交联性的个性化报告,实现继承传统研发模式基础上本土化、民族化、个性化、专业化;生化指标及BMI指数作为可调整指标,根据计算机深度学习评估,有助于在妊娠不同阶段及时调整,降低发病风险;标本采样,可采用外周血或口腔粘膜细胞收集DNA,标本易于采集、保存和运输;该模型预测敏感性达到94-99%,效能高,可信度强。
5、本发明的子痫前期风险预测模型可以在妊娠任何时间段进行检测,临床检测参数随着妊娠的进展发生波动,在妊娠的不同时期可以动态分析患者的患病风险,妊娠早期用于预防;妊娠中期用于调整及改善,加强监管,降低环境风险;妊娠晚期进行预警,加强监护,防止不良事件出现,如胎盘早剥、死胎、心衰、HELLP综合征、子痫等这些严重威胁孕产妇安全的恶性事件的出现,降低孕产妇死亡和疾病的发生;未来还可参与治疗及预后判定。
附图说明
图1为本发明的子痫前期风险预测模型对于子痫前期早发型预测的准确率。
图2为本发明的子痫前期风险预测模型对于子痫前期晚发型预测的准确率。
图3为本发明的子痫前期风险预测模型对于子痫前期早发型+子痫前期晚发型预测的准确率。
具体实施方式
本发明主要包括两个方面,一方面是用于预测子痫前期风险的基因组合的设计,另一方面是子痫前期风险预测模型的构建。
本发明从子痫前期风险因素出发,基于风险因素的基础上,收集了大量与子痫前期风险因素相关的SNPs位点,设计了一组用于预测子痫前期风险的基因组合。通过PubMed、NCBI、DiseaseDX、Phenolyzer、GVS、SNPinfo网站收集相关信息,筛选可能与子痫前期发病相关的基因多态性(SNP)位点,查阅超过3000个SNP位点,选择与子痫前期、血脂代谢、内分泌疾病、高血压、免疫、肿瘤等相关的SNP位点499个,形成本发明的基因组合,主要用于预测子痫前期风险。
本发明的一种用于预测子痫前期风险的基因组合,包括499个SNP位点,分别为:rs1111875、rs3764650、rs890293、rs4934、rs2230806、rs405509、rs12983082、rs2516049、rs10830963、rs2000813、rs7202116、rs3764261、rs7079、rs1805094、rs8178847、rs1421811、rs715987、rs8096897、rs574957、rs10744835、rs1991391、rs7977406、rs11066188、rs2036914、rs8099917、rs12979860、rs10048158、rs1801689、rs9264942、rs9277535、rs2523393、rs2230204、rs7107152、rs12086634、rs1242229、rs2236242、rs3779512、rs1864163、rs1549758、rs5491、rs7578597、rs7209395、rs6042935、rs121912702、rs155524、rs7640747、rs7926335、rs743395、rs1800206、rs11977526、rs3804099、rs1799750、rs1049337、rs28362491、rs10120688、rs1051740、rs2167270、rs7804372、rs11568821、rs12203592、rs1799724、rs1805192、rs2066847、rs1800451、rs5030737、rs4766578、rs4149056、rs4675378、rs281874770、rs4321325、rs11018628、rs1800794、rs1805010、rs780093、rs3853839、rs7805969、rs7574865、rs8178822、rs333、rs2384550、rs925489、rs57659670、rs853326、rs180223、rs1472565、rs2596623、rs231779、rs4850755、rs6885099、rs1800693、rs2046045、rs4704397、rs1137617、rs10733113、rs653178、rs10774625、rs2517532、rs9910950、rs8178848、rs16950081、rs4925295、rs963167、rs2647528、rs2319125、rs4581、rs2288493、rs79154414、rs2277698、rs2279744、rs2217332、rs8178819、rs6906021、rs2736340、rs4915077、rs1523127、rs1554973、rs17145713、rs1041981、rs11538264、rs204999、rs2476601、rs6457452、rs9277534、rs3099844、rs4149584、rs492899、rs17630235、rs9322331、rs556442、rs3890182、rs10938397、rs4773724、rs1275988、rs1558902、rs1800437、rs17577、rs13385、rs2074311、rs642858、rs11651270、rs2237895、rs231840、rs1799999、rs2292239、rs7163757、rs2670660、rs11066280、rs7178572、rs4721、rs391300、rs4828038、rs7713645、rs3087243、rs17817449、rs455060、rs2075290、rs13702、rs3021094、rs1260326、rs2237892、rs1550805、rs6259、rs2083637、rs2779248、rs1137933、rs3847987、rs77060950、rs7709243、rs7770619、rs1801278、rs7903146、rs4506565、rs17681684、rs17696736、rs1050828、rs231357、rs199972616、rs4416547、rs10509305、rs12762303、rs2280275、rs2014408、rs1532624、rs3746429、rs1926723、rs11105354、rs7439293、rs445925、rs3850641、rs7194256、rs867186、rs3184504、rs5743293、rs825476、rs7580658、rs1801693、rs854560、rs1056917、rs61996318、rs9898、rs13146272、rs4311994、rs1654433、rs11105357、rs4148189、rs4576240、rs11751198、rs11968400、rs11970154、rs12210887、rs13118、rs2075015、rs2285321、rs2394392、rs2508015、rs2523995、rs2524272、rs25527、rs28986465、rs29243、rs3130663、rs3130685、rs3132584、rs64036、rs6457254、rs6924270、rs6933400、rs6937357、rs73055442、rs7749235、rs9261800、rs9267547、rs9468805、rs9500864、rs1441756、rs15285、rs9941065、rs2596574、rs11085421、rs3785617、rs78010183、rs2292354、rs289717、rs247616、rs879922、rs1799945、rs2097055、rs662、rs1532085、rs3757354、rs3813082、rs326、rs662799、rs12597002、rs4131229、rs964184、rs13105517、rs1260333、rs2075291、rs5104、rs670、rs17482753、rs2043085、rs4969168、rs7350481、rs12678919、rs6671879、rs4743771、rs328、rs285、rs6720173、rs10096633、rs301、rs1003723、rs2303790、rs2266788、rs429358、rs10503669、rs1051931、rs7756935、rs10790162、rs5174、rs562556、rs4253728、rs2269702、rs66698963、rs9326246、rs7016880、rs1801394、rs1805087、rs9852991、rs12230074、rs9815354、rs6768438、rs957525、rs9816772、rs11024074、rs10506974、rs3754777、rs1173771、rs13394970、rs17249754、rs176185、rs2070759、rs6410、rs11105378、rs2586886、rs2004776、rs11191548、rs35444、rs6433027、rs3749585、rs16998073、rs1378942、rs2681492、rs6495122、rs12413409、rs7726475、rs5049、rs1401982、rs1105378、rs16948048、rs11014166、rs11568020、rs381815、rs11065987、rs1131882、rs11572325、rs880315、rs4612666、rs651007、rs671、rs10507391、rs3135506、rs10743565、rs1426409、rs193921036、rs200898934、rs201223301、rs374976508、rs7080、rs2237076、rs3765407、rs3820059、rs9831647、rs1862176、rs9939609、rs9638978、rs9393931、rs9381475、rs9380142、rs9340799、rs927332、rs909253、rs846910、rs843010、rs842991、rs836135、rs836132、rs821466、rs7963771、rs7943316、rs763780、rs7579169、rs7571613、rs7564968、rs7412、rs732609、rs699947、rs699、rs698090、rs6802220、rs661348、rs6594013、rs6550005、rs6489992、rs6478974、rs6269、rs6025、rs56124946、rs5442、rs5051、rs4842666、rs4818、rs479200、rs4784744、rs4769613、rs4762、rs4633、rs4289236、rs4150196、rs3918227、rs3905000、rs3819526、rs3812475、rs3803012、rs3801266、rs3783550、rs3773663、rs3773640、rs3761548、rs3735481、rs366510、rs35821928、rs3025039、rs2954033、rs2854371、rs284277、rs2681472、rs266729、rs2638953、rs261334、rs2596622、rs25648、rs2549782、rs233115、rs2322659、rs231775、rs2297518、rs2287848、rs2287845、rs2275913、rs2271037、rs2241766、rs2236852、rs2236711、rs2234693、rs2232365、rs2230820、rs222133、rs2200733、rs2161983、rs2074611、rs2070744、rs2059806、rs2010963、rs1991515、rs193741、rs1884082、rs1805388、rs1805017、rs1801133、rs1801131、rs1800896、rs1800872、rs1800629、rs1800469、rs1800450、rs1799983、rs1799963、rs1799889、rs17783344、rs17686866、rs1710、rs16972197、rs16972194、rs1695、rs16846876、rs1610696、rs1570360、rs1501299、rs1424954、rs1358340、rs13429458、rs1341667、rs13405728、rs13401889、rs1319501、rs12831006、rs12711941、rs12707079、rs12579302、rs12150550、rs12150220、rs1205、rs12035521、rs11895934、rs11792480、rs11646213、rs1155708、rs115015150、rs1143627、rs1130409、rs11209026、rs11190179、rs11129420、rs11105368、rs11105364、rs11105328、rs111033530、rs10898392、rs10889677、rs10811661、rs10739778、rs1063320、rs1014064、rs10121110、rs1010、rs1004467。
本发明的一种采用上述的用于预测子痫前期风险的基因组合构建的子痫前期风险预测模型,主要包括:数据预处理模块、模型构建模块和模型准确率计算模块。其中各模块的功能和作用如下:
数据预处理模块,将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量表示每一例样本,对每一例样本数据标注其患病情况。
其中,这46个临床检测数据,分别为:
病史和体格检查:年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR。
化验检查:血红蛋白(HBG)、白细胞(WBC)、中性粒细胞(NE)、血小板分布宽度(PDW)、平均血小板体积(MPV)、促甲状腺素(TSH)、游离甲状腺素T3(FT3)、游离甲状腺素T4(FT4)、胆固醇(TCH)、甘油三酯(TG)、高密度脂蛋白(HDL)、低密度脂蛋白(LDL)、TG/HDL、国际标准化比值(INR)、凝血酶原活动度(PTA)、凝血酶原时间(PT)、凝血酶时间(TT)、凝血酶原时间比值(PTR)、纤维蛋白原(FBG)、活化部分凝血活酶时间(APTT)、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体(ANA)阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性(BIL)。
模型构建模块,根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型。
模型误差率计算模块,随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的袋外错误率样本,将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率。
本发明的一种子痫前期风险预测模型的构建方法,主要包括以下步骤:
步骤一、数据预处理
将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量来表示每一例样本,同时对每一例样本数据标注其患病情况。
其中,46个临床检测数据,分别为:
病史和体格检查:年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR。
化验检查:血红蛋白(HBG)、白细胞(WBC)、中性粒细胞(NE)、血小板分布宽度(PDW)、平均血小板体积(MPV)、促甲状腺素(TSH)、游离甲状腺素T3(FT3)、游离甲状腺素T4(FT4)、胆固醇(TCH)、甘油三酯(TG)、高密度脂蛋白(HDL)、低密度脂蛋白(LDL)、TG/HDL、国际标准化比值(INR)、凝血酶原活动度(PTA)、凝血酶原时间(PT)、凝血酶时间(TT)、凝血酶原时间比值(PTR)、纤维蛋白原(FBG)、活化部分凝血活酶时间(APTT)、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体(ANA)阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性(BIL)。
步骤二、模型构建
根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型。
具体的是:
样本总数为401例,97例早发型子痫前期患者作为患病组一,107例晚发型子痫前期患者作为患病组二,197例正常妊娠女性作为对照组。
(1)训练
训练过程采用有放回地随机抽样方法来构建训练集,对401例样本进行有放回地抽样401次,将抽到的样本所对应的数字特征向量用于一棵决策树的构建。
(2)选择决策树上的每一个节点的特征
设M为输入样本的特征数,对于每个节点分裂时,先从这M个特征中选择m(m<<M)个特征,在m个特征中选择最佳的分裂点进行分裂;
(3)完成单棵决策树的生长
每一棵决策树都尽可能的生长,没有剪枝。
(4)多棵决策树生成随机森林模型
将生成的多棵决策树合并融合起来,生成一个包含1000棵决策树的随机森林模型。
(5)结果预测
本发明中随机森林算法共构建1000棵决策树,统计随机森林中每一棵决策树的预测结果,通过投票法从这些预测结果中选出最佳的预测结果作为最终的预测结果。
步骤三、模型误差率计算
随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的oob样本(袋外错误率样本),将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率。
具体的是:将这1/3未抽取到的样本经过随机森林算法预测得到类别,然后与真实值进行比较,求出模型误差率。
利用本发明的子痫前期风险预测模型对401例样本进行临床检测,根据患病情况,97例早发型子痫前期患者作为患病组一,107例晚发型子痫前期患者作为患病组二,197例正常妊娠女性作为对照组。检测结果如图1至图3所示。
如图1所示,本发明的子痫前期风险预测模型,对于子痫前期早发型预测的准确率高达94%;如图2所示,本发明的子痫前期风险预测模型,对于子痫前期晚发型预测的准确率高达99%;如图3所示,本发明的子痫前期风险预测模型,对于子痫前期早发型+子痫前期晚发型预测的准确率高达94%。综上,本发明的子痫前期风险预测模型,预测准确率为94-99%。
本发明公开了一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法,本领域技术人员可以借鉴本文内容,适当改进工艺参数实现。特别需要指出的是,所有类似的替换和改动对本领域技术人员来说是显而易见的,它们都被视为包括在本发明。本发明的产品已经通过较佳实施例进行了描述,相关人员明显能在不脱离本发明内容、精神和范围内对本文所述的产品进行改动或适当变更与组合,来实现和应用本发明技术。

Claims (6)

1.一种构建子痫前期风险预测模型的装置,其特征在于,包括:
数据预处理模块,将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量表示每一例样本,对每一例样本数据标注其患病情况;
模型构建模块,根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型;
模型误差率计算模块,随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的袋外错误率样本,将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率;
所述499个易感基因分别为:rs1111875、rs3764650、rs890293、rs4934、rs2230806、rs405509、rs12983082、rs2516049、rs10830963、rs2000813、rs7202116、rs3764261、rs7079、rs1805094、rs8178847、rs1421811、rs715987、rs8096897、rs574957、rs10744835、rs1991391、rs7977406、rs11066188、rs2036914、rs8099917、rs12979860、rs10048158、rs1801689、rs9264942、rs9277535、rs2523393、rs2230204、rs7107152、rs12086634、rs1242229、rs2236242、rs3779512、rs1864163、rs1549758、rs5491、rs7578597、rs7209395、rs6042935、rs121912702、rs155524、rs7640747、rs7926335、rs743395、rs1800206、rs11977526、rs3804099、rs1799750、rs1049337、rs28362491、rs10120688、rs1051740、rs2167270、rs7804372、rs11568821、rs12203592、rs1799724、rs1805192、rs2066847、rs1800451、rs5030737、rs4766578、rs4149056、rs4675378、rs281874770、rs4321325、rs11018628、rs1800794、rs1805010、rs780093、rs3853839、rs7805969、rs7574865、rs8178822、rs333、rs2384550、rs925489、rs57659670、rs853326、rs180223、rs1472565、rs2596623、rs231779、rs4850755、rs6885099、rs1800693、rs2046045、rs4704397、rs1137617、rs10733113、rs653178、rs10774625、rs2517532、rs9910950、rs8178848、rs16950081、rs4925295、rs963167、rs2647528、rs2319125、rs4581、rs2288493、rs79154414、rs2277698、rs2279744、rs2217332、rs8178819、rs6906021、rs2736340、rs4915077、rs1523127、rs1554973、rs17145713、rs1041981、rs11538264、rs204999、rs2476601、rs6457452、rs9277534、rs3099844、rs4149584、rs492899、rs17630235、rs9322331、rs556442、rs3890182、rs10938397、rs4773724、rs1275988、rs1558902、rs1800437、rs17577、rs13385、rs2074311、rs642858、rs11651270、rs2237895、rs231840、rs1799999、rs2292239、rs7163757、rs2670660、rs11066280、rs7178572、rs4721、rs391300、rs4828038、rs7713645、rs3087243、rs17817449、rs455060、rs2075290、rs13702、rs3021094、rs1260326、rs2237892、rs1550805、rs6259、rs2083637、rs2779248、rs1137933、rs3847987、rs77060950、rs7709243、rs7770619、rs1801278、rs7903146、rs4506565、rs17681684、rs17696736、rs1050828、rs231357、rs199972616、rs4416547、rs10509305、rs12762303、rs2280275、rs2014408、rs1532624、rs3746429、rs1926723、rs11105354、rs7439293、rs445925、rs3850641、rs7194256、rs867186、rs3184504、rs5743293、rs825476、rs7580658、rs1801693、rs854560、rs1056917、rs61996318、rs9898、rs13146272、rs4311994、rs1654433、rs11105357、rs4148189、rs4576240、rs11751198、rs11968400、rs11970154、rs12210887、rs13118、rs2075015、rs2285321、rs2394392、rs2508015、rs2523995、rs2524272、rs25527、rs28986465、rs29243、rs3130663、rs3130685、rs3132584、rs64036、rs6457254、rs6924270、rs6933400、rs6937357、rs73055442、rs7749235、rs9261800、rs9267547、rs9468805、rs9500864、rs1441756、rs15285、rs9941065、rs2596574、rs11085421、rs3785617、rs78010183、rs2292354、rs289717、rs247616、rs879922、rs1799945、rs2097055、rs662、rs1532085、rs3757354、rs3813082、rs326、rs662799、rs12597002、rs4131229、rs964184、rs13105517、rs1260333、rs2075291、rs5104、rs670、rs17482753、rs2043085、rs4969168、rs7350481、rs12678919、rs6671879、rs4743771、rs328、rs285、rs6720173、rs10096633、rs301、rs1003723、rs2303790、rs2266788、rs429358、rs10503669、rs1051931、rs7756935、rs10790162、rs5174、rs562556、rs4253728、rs2269702、rs66698963、rs9326246、rs7016880、rs1801394、rs1805087、rs9852991、rs12230074、rs9815354、rs6768438、rs957525、rs9816772、rs11024074、rs10506974、rs3754777、rs1173771、rs13394970、rs17249754、rs176185、rs2070759、rs6410、rs11105378、rs2586886、rs2004776、rs11191548、rs35444、rs6433027、rs3749585、rs16998073、rs1378942、rs2681492、rs6495122、rs12413409、rs7726475、rs5049、rs1401982、rs1105378、rs16948048、rs11014166、rs11568020、rs381815、rs11065987、rs1131882、rs11572325、rs880315、rs4612666、rs651007、rs671、rs10507391、rs3135506、rs10743565、rs1426409、rs193921036、rs200898934、rs201223301、rs374976508、rs7080、rs2237076、rs3765407、rs3820059、rs9831647、rs1862176、rs9939609、rs9638978、rs9393931、rs9381475、rs9380142、rs9340799、rs927332、rs909253、rs846910、rs843010、rs842991、rs836135、rs836132、rs821466、rs7963771、rs7943316、rs763780、rs7579169、rs7571613、rs7564968、rs7412、rs732609、rs699947、rs699、rs698090、rs6802220、rs661348、rs6594013、rs6550005、rs6489992、rs6478974、rs6269、rs6025、rs56124946、rs5442、rs5051、rs4842666、rs4818、rs479200、rs4784744、rs4769613、rs4762、rs4633、rs4289236、rs4150196、rs3918227、rs3905000、rs3819526、rs3812475、rs3803012、rs3801266、rs3783550、rs3773663、rs3773640、rs3761548、rs3735481、rs366510、rs35821928、rs3025039、rs2954033、rs2854371、rs284277、rs2681472、rs266729、rs2638953、rs261334、rs2596622、rs25648、rs2549782、rs233115、rs2322659、rs231775、rs2297518、rs2287848、rs2287845、rs2275913、rs2271037、rs2241766、rs2236852、rs2236711、rs2234693、rs2232365、rs2230820、rs222133、rs2200733、rs2161983、rs2074611、rs2070744、rs2059806、rs2010963、rs1991515、rs193741、rs1884082、rs1805388、rs1805017、rs1801133、rs1801131、rs1800896、rs1800872、rs1800629、rs1800469、rs1800450、rs1799983、rs1799963、rs1799889、rs17783344、rs17686866、rs1710、rs16972197、rs16972194、rs1695、rs16846876、rs1610696、rs1570360、rs1501299、rs1424954、rs1358340、rs13429458、rs1341667、rs13405728、rs13401889、rs1319501、rs12831006、rs12711941、rs12707079、rs12579302、rs12150550、rs12150220、rs1205、rs12035521、rs11895934、rs11792480、rs11646213、rs1155708、rs115015150、rs1143627、rs1130409、rs11209026、rs11190179、rs11129420、rs11105368、rs11105364、rs11105328、rs111033530、rs10898392、rs10889677、rs10811661、rs10739778、rs1063320、rs1014064、rs10121110、rs1010、rs1004467。
2.根据权利要求1所述的装置,其特征在于,所述46个临床检测数据,分别为:
年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR、血红蛋白、白细胞、中性粒细胞、血小板分布宽度、平均血小板体积、促甲状腺素、游离甲状腺素T3、游离甲状腺素T4、胆固醇、甘油三酯、高密度脂蛋白、低密度脂蛋白、TG/HDL、国际标准化比值、凝血酶原活动度、凝血酶原时间、凝血酶时间、凝血酶原时间比值、纤维蛋白原、活化部分凝血活酶时间、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性。
3.一种构建子痫前期风险预测模型的方法,其特征在于,包括以下步骤:
步骤一、数据预处理
将所获得的499个易感基因与46个临床检测数据转换成数字特征向量,用数字特征向量来表示每一例样本,同时对每一例样本数据标注其患病情况;
所述499个易感基因分别为:rs1111875、rs3764650、rs890293、rs4934、rs2230806、rs405509、rs12983082、rs2516049、rs10830963、rs2000813、rs7202116、rs3764261、rs7079、rs1805094、rs8178847、rs1421811、rs715987、rs8096897、rs574957、rs10744835、rs1991391、rs7977406、rs11066188、rs2036914、rs8099917、rs12979860、rs10048158、rs1801689、rs9264942、rs9277535、rs2523393、rs2230204、rs7107152、rs12086634、rs1242229、rs2236242、rs3779512、rs1864163、rs1549758、rs5491、rs7578597、rs7209395、rs6042935、rs121912702、rs155524、rs7640747、rs7926335、rs743395、rs1800206、rs11977526、rs3804099、rs1799750、rs1049337、rs28362491、rs10120688、rs1051740、rs2167270、rs7804372、rs11568821、rs12203592、rs1799724、rs1805192、rs2066847、rs1800451、rs5030737、rs4766578、rs4149056、rs4675378、rs281874770、rs4321325、rs11018628、rs1800794、rs1805010、rs780093、rs3853839、rs7805969、rs7574865、rs8178822、rs333、rs2384550、rs925489、rs57659670、rs853326、rs180223、rs1472565、rs2596623、rs231779、rs4850755、rs6885099、rs1800693、rs2046045、rs4704397、rs1137617、rs10733113、rs653178、rs10774625、rs2517532、rs9910950、rs8178848、rs16950081、rs4925295、rs963167、rs2647528、rs2319125、rs4581、rs2288493、rs79154414、rs2277698、rs2279744、rs2217332、rs8178819、rs6906021、rs2736340、rs4915077、rs1523127、rs1554973、rs17145713、rs1041981、rs11538264、rs204999、rs2476601、rs6457452、rs9277534、rs3099844、rs4149584、rs492899、rs17630235、rs9322331、rs556442、rs3890182、rs10938397、rs4773724、rs1275988、rs1558902、rs1800437、rs17577、rs13385、rs2074311、rs642858、rs11651270、rs2237895、rs231840、rs1799999、rs2292239、rs7163757、rs2670660、rs11066280、rs7178572、rs4721、rs391300、rs4828038、rs7713645、rs3087243、rs17817449、rs455060、rs2075290、rs13702、rs3021094、rs1260326、rs2237892、rs1550805、rs6259、rs2083637、rs2779248、rs1137933、rs3847987、rs77060950、rs7709243、rs7770619、rs1801278、rs7903146、rs4506565、rs17681684、rs17696736、rs1050828、rs231357、rs199972616、rs4416547、rs10509305、rs12762303、rs2280275、rs2014408、rs1532624、rs3746429、rs1926723、rs11105354、rs7439293、rs445925、rs3850641、rs7194256、rs867186、rs3184504、rs5743293、rs825476、rs7580658、rs1801693、rs854560、rs1056917、rs61996318、rs9898、rs13146272、rs4311994、rs1654433、rs11105357、rs4148189、rs4576240、rs11751198、rs11968400、rs11970154、rs12210887、rs13118、rs2075015、rs2285321、rs2394392、rs2508015、rs2523995、rs2524272、rs25527、rs28986465、rs29243、rs3130663、rs3130685、rs3132584、rs64036、rs6457254、rs6924270、rs6933400、rs6937357、rs73055442、rs7749235、rs9261800、rs9267547、rs9468805、rs9500864、rs1441756、rs15285、rs9941065、rs2596574、rs11085421、rs3785617、rs78010183、rs2292354、rs289717、rs247616、rs879922、rs1799945、rs2097055、rs662、rs1532085、rs3757354、rs3813082、rs326、rs662799、rs12597002、rs4131229、rs964184、rs13105517、rs1260333、rs2075291、rs5104、rs670、rs17482753、rs2043085、rs4969168、rs7350481、rs12678919、rs6671879、rs4743771、rs328、rs285、rs6720173、rs10096633、rs301、rs1003723、rs2303790、rs2266788、rs429358、rs10503669、rs1051931、rs7756935、rs10790162、rs5174、rs562556、rs4253728、rs2269702、rs66698963、rs9326246、rs7016880、rs1801394、rs1805087、rs9852991、rs12230074、rs9815354、rs6768438、rs957525、rs9816772、rs11024074、rs10506974、rs3754777、rs1173771、rs13394970、rs17249754、rs176185、rs2070759、rs6410、rs11105378、rs2586886、rs2004776、rs11191548、rs35444、rs6433027、rs3749585、rs16998073、rs1378942、rs2681492、rs6495122、rs12413409、rs7726475、rs5049、rs1401982、rs1105378、rs16948048、rs11014166、rs11568020、rs381815、rs11065987、rs1131882、rs11572325、rs880315、rs4612666、rs651007、rs671、rs10507391、rs3135506、rs10743565、rs1426409、rs193921036、rs200898934、rs201223301、rs374976508、rs7080、rs2237076、rs3765407、rs3820059、rs9831647、rs1862176、rs9939609、rs9638978、rs9393931、rs9381475、rs9380142、rs9340799、rs927332、rs909253、rs846910、rs843010、rs842991、rs836135、rs836132、rs821466、rs7963771、rs7943316、rs763780、rs7579169、rs7571613、rs7564968、rs7412、rs732609、rs699947、rs699、rs698090、rs6802220、rs661348、rs6594013、rs6550005、rs6489992、rs6478974、rs6269、rs6025、rs56124946、rs5442、rs5051、rs4842666、rs4818、rs479200、rs4784744、rs4769613、rs4762、rs4633、rs4289236、rs4150196、rs3918227、rs3905000、rs3819526、rs3812475、rs3803012、rs3801266、rs3783550、rs3773663、rs3773640、rs3761548、rs3735481、rs366510、rs35821928、rs3025039、rs2954033、rs2854371、rs284277、rs2681472、rs266729、rs2638953、rs261334、rs2596622、rs25648、rs2549782、rs233115、rs2322659、rs231775、rs2297518、rs2287848、rs2287845、rs2275913、rs2271037、rs2241766、rs2236852、rs2236711、rs2234693、rs2232365、rs2230820、rs222133、rs2200733、rs2161983、rs2074611、rs2070744、rs2059806、rs2010963、rs1991515、rs193741、rs1884082、rs1805388、rs1805017、rs1801133、rs1801131、rs1800896、rs1800872、rs1800629、rs1800469、rs1800450、rs1799983、rs1799963、rs1799889、rs17783344、rs17686866、rs1710、rs16972197、rs16972194、rs1695、rs16846876、rs1610696、rs1570360、rs1501299、rs1424954、rs1358340、rs13429458、rs1341667、rs13405728、rs13401889、rs1319501、rs12831006、rs12711941、rs12707079、rs12579302、rs12150550、rs12150220、rs1205、rs12035521、rs11895934、rs11792480、rs11646213、rs1155708、rs115015150、rs1143627、rs1130409、rs11209026、rs11190179、rs11129420、rs11105368、rs11105364、rs11105328、rs111033530、rs10898392、rs10889677、rs10811661、rs10739778、rs1063320、rs1014064、rs10121110、rs1010、rs1004467;
步骤二、模型构建
根据随机森林算法的模型构建规则,对样本所对应的数字特征向量进行模型构建,生成一个包含1000棵决策树的随机森林模型;
步骤三、模型误差率计算
随机森林在生成每颗决策树时随机且有放回地抽取样本,每棵决策树有1/3的样本未抽取到,这1/3未抽取到的样本是每棵决策树的袋外错误率样本,将这1/3未抽取到的样本作为测试集来计算随机森林模型的误差率。
4.根据权利要求3所述的方法,其特征在于,步骤一中,所述46个临床检测数据,分别为:
年龄、身高、高血压病史、人工授精、双胎、孕前体重、孕前BMI、孕期体重、孕期BMI、孕期体重增长、孕期与孕前BMI差、IUGR、血红蛋白、白细胞、中性粒细胞、血小板分布宽度、平均血小板体积、促甲状腺素、游离甲状腺素T3、游离甲状腺素T4、胆固醇、甘油三酯、高密度脂蛋白、低密度脂蛋白、TG/HDL、国际标准化比值、凝血酶原活动度、凝血酶原时间、凝血酶时间、凝血酶原时间比值、纤维蛋白原、活化部分凝血活酶时间、血型、血清钾、血清钙、血清钠、血清氯、生化血糖、抗心磷脂抗体IgM抗体阳性、抗心磷脂抗体IgG抗体阳性、抗β2糖蛋白阳性、抗核抗体阳性、抗心磷脂抗体IgA抗体阳性、尿酮体、尿糖、尿胆红素阳性。
5.根据权利要求3所述的方法,其特征在于,步骤二的具体过程如下:
样本总数为401例,97例早发型子痫前期患者作为患病组一,107例晚发型子痫前期患者作为患病组二,197例正常妊娠女性作为对照组;
(1)训练
训练过程采用有放回地随机抽样方法来构建训练集,对401例样本进行有放回地抽样401次,将抽到的样本所对应的数字特征向量用于一棵决策树的构建;
(2)选择决策树上的每一个节点的特征
设M为输入样本的特征数,对于每个节点分裂时,先从这M个特征中选择m个特征,在m个特征中选择最佳的分裂点进行分裂;
(3)完成单棵决策树的生长;
(4)多棵决策树生成随机森林模型
将生成的多棵决策树合并融合起来,生成一个包含1000棵决策树的随机森林模型;
(5)结果预测
统计随机森林中每一棵决策树的预测结果,通过投票法从这些预测结果中选出最佳的预测结果作为最终的预测结果。
6.根据权利要求3所述的方法,其特征在于,步骤三的具体过程如下:
将这1/3未抽取到的样本经过随机森林算法预测得到类别,然后与真实值进行比较,求出模型误差率。
CN202110510509.0A 2021-05-11 2021-05-11 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法 Active CN113223714B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110510509.0A CN113223714B (zh) 2021-05-11 2021-05-11 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110510509.0A CN113223714B (zh) 2021-05-11 2021-05-11 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法

Publications (2)

Publication Number Publication Date
CN113223714A CN113223714A (zh) 2021-08-06
CN113223714B true CN113223714B (zh) 2022-07-05

Family

ID=77094606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110510509.0A Active CN113223714B (zh) 2021-05-11 2021-05-11 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法

Country Status (1)

Country Link
CN (1) CN113223714B (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724873B (zh) * 2021-08-31 2024-01-12 陕西佰美基因股份有限公司 一种基于mlp多平台校准的子痫前期风险预测方法
CN114822682B (zh) * 2022-04-12 2023-07-21 苏州市立医院 与早发型重度子痫前期发生相关的基因组合及其应用
CN116246752B (zh) * 2023-03-27 2024-01-16 中国医学科学院肿瘤医院 一种全身麻醉术后恶心呕吐预测模型的生成和使用方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005078128A1 (en) * 2004-02-18 2005-08-25 Oy Jurilab Ltd Method for detecting the risk of preeclampsia by analysing a dimethylarginine dimethylaminohydrolase gene
KR20110085436A (ko) * 2010-01-20 2011-07-27 연세대학교 산학협력단 비만 또는 당뇨에 관련된 단일염기다형성 및 그의 용도
CN106755492A (zh) * 2017-01-24 2017-05-31 深圳金蕊科技有限公司 用于预测子痫前期的成套snp及其应用
CN108450003A (zh) * 2015-06-19 2018-08-24 赛拉预测公司 用于预测早产的生物标志物对
CN110305954A (zh) * 2019-07-19 2019-10-08 广州市达瑞生物技术股份有限公司 一种早期准确检测先兆子痫的预测模型
CN112485162A (zh) * 2020-11-16 2021-03-12 天津奇云诺德生物医学有限公司 一种使用血液标志物预测性别的方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104487593A (zh) * 2012-05-08 2015-04-01 斯坦福大学托管董事会 用于提供先兆子痫评估的方法和组合物
WO2017079741A1 (en) * 2015-11-05 2017-05-11 Wayne State University Kits and methods for prediction and treatment of preeclampsia

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005078128A1 (en) * 2004-02-18 2005-08-25 Oy Jurilab Ltd Method for detecting the risk of preeclampsia by analysing a dimethylarginine dimethylaminohydrolase gene
KR20110085436A (ko) * 2010-01-20 2011-07-27 연세대학교 산학협력단 비만 또는 당뇨에 관련된 단일염기다형성 및 그의 용도
CN108450003A (zh) * 2015-06-19 2018-08-24 赛拉预测公司 用于预测早产的生物标志物对
CN106755492A (zh) * 2017-01-24 2017-05-31 深圳金蕊科技有限公司 用于预测子痫前期的成套snp及其应用
CN110305954A (zh) * 2019-07-19 2019-10-08 广州市达瑞生物技术股份有限公司 一种早期准确检测先兆子痫的预测模型
CN112485162A (zh) * 2020-11-16 2021-03-12 天津奇云诺德生物医学有限公司 一种使用血液标志物预测性别的方法

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Impact of HLA-G analysis in prevention, diagnosis and treatment of pathological conditions;Daria Bortolotti等;《World Journal of Methodology》;20140326(第01期);第17-31页 *
MYLIP和ABCA1基因多态性与子痫前期的相关性研究;王合;《中国优秀博硕士学位论文全文数据库(硕士) 医药卫生科技辑》;20220115;E068-511 *
基于多因子降维法的子痫前期易感基因-基因交互作用研究;周璐等;《现代妇产科进展》;20180116(第01期);第4-8页 *
子痫前期患者胎盘组织PHLDA2基因印迹初步研究;黄桂琼等;《四川大学学报(医学版)》;20150115(第01期);第104-107页 *

Also Published As

Publication number Publication date
CN113223714A (zh) 2021-08-06

Similar Documents

Publication Publication Date Title
CN113223714B (zh) 一种用于预测子痫前期风险的基因组合、子痫前期风险预测模型及其构建方法
Norwitz et al. Noninvasive prenatal testing: the future is now
McColl et al. Prothrombin 20210 G→ A, MTHFR C677T mutations in women with venous thromboembolism associated with pregnancy
Papachristou et al. Is the monocyte chemotactic protein-1− 2518 G allele a risk factor for severe acute pancreatitis?
Van der Molen et al. Hyperhomocysteinemia and other thrombotic risk factors in women with placental vasculopathy
CN104487593A (zh) 用于提供先兆子痫评估的方法和组合物
Purwosunu et al. Prediction of preeclampsia by analysis of cell-free messenger RNA in maternal plasma
Seremak-Mrozikiewicz et al. The significance of genetic polymorphisms of factor V Leiden and prothrombin in the preeclamptic Polish women
CN110232974A (zh) 一种新型多发性骨髓瘤综合风险评分方法
CN113092777A (zh) 孕早期进行重症子痫前期患者筛查的方法
Rasti et al. The IL-6-634C/G polymorphism: a candidate genetic marker for the prediction of idiopathic recurrent pregnancy loss
CN112466460A (zh) 早孕期孕妇MAP、PlGF和PAPP-A联合构建模型预测妊娠期高血压疾病的方法
CN109891239A (zh) 用于提供子痫前期评估和预测早产的方法和试剂盒
He et al. Fetal anemia and hydrops fetalis associated with homozygous Hb Constant Spring (HBA2: c. 427T> C)
CN113640516B (zh) 外周血EPCs作为老老年人生存时间预测标志物的应用
Udumudi et al. Genetic markers for inherited thrombophilia related pregnancy loss and implantation failure in Indian population–implications for diagnosis and clinical management
Cramer et al. Characteristics of women with a family history of ovarian cancer. I. Galactose consumption and metabolism
Buhimschi et al. Multidimensional system biology: genetic markers and proteomic biomarkers of adverse pregnancy outcome in preterm birth
CN117219293A (zh) 一种类风湿糖尿病胰岛素抵抗监测与干预方法
Cui et al. Thrombospondin-4 1186G> C (A387P) is a sex-dependent risk factor for myocardial infarction: a large replication study with increased sample size from the same population
Buzzard et al. Birth weight and placental proximity in like-sexed twins.
Wang et al. Elongated axial length and myopia-related fundus changes associated with the Arg130Cys mutation in the LIM2 gene in four Chinese families with congenital cataracts
CN112485162A (zh) 一种使用血液标志物预测性别的方法
CN111883258B (zh) 一种构建ohss分度分型预测模型的方法
Reclos et al. Evaluation of glucose-6-phosphate dehydrogenase activity in two different ethnic groups using a kit employing the haemoglobin normalization procedure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant