CN114045333B - Method for predicting age by pyrosequencing and random forest regression analysis - Google Patents
Method for predicting age by pyrosequencing and random forest regression analysis Download PDFInfo
- Publication number
- CN114045333B CN114045333B CN202111223180.6A CN202111223180A CN114045333B CN 114045333 B CN114045333 B CN 114045333B CN 202111223180 A CN202111223180 A CN 202111223180A CN 114045333 B CN114045333 B CN 114045333B
- Authority
- CN
- China
- Prior art keywords
- cpg
- gene
- random forest
- age
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 30
- 238000012175 pyrosequencing Methods 0.000 title claims abstract description 20
- 238000000611 regression analysis Methods 0.000 title claims abstract description 15
- 101001046596 Homo sapiens Krueppel-like factor 14 Proteins 0.000 claims description 22
- 101000868883 Homo sapiens Transcription factor Sp6 Proteins 0.000 claims description 22
- 101000795350 Homo sapiens Tripartite motif-containing protein 59 Proteins 0.000 claims description 22
- 102100022329 Krueppel-like factor 14 Human genes 0.000 claims description 20
- 102100029717 Tripartite motif-containing protein 59 Human genes 0.000 claims description 20
- 230000011987 methylation Effects 0.000 claims description 19
- 238000007069 methylation reaction Methods 0.000 claims description 19
- 101000921368 Homo sapiens Elongation of very long chain fatty acids protein 2 Proteins 0.000 claims description 14
- 230000007067 DNA methylation Effects 0.000 claims description 12
- 102100032050 Elongation of very long chain fatty acids protein 2 Human genes 0.000 claims description 12
- 108091029430 CpG site Proteins 0.000 claims description 10
- 108090000623 proteins and genes Proteins 0.000 claims description 9
- 101001108246 Homo sapiens Neuronal pentraxin-2 Proteins 0.000 claims description 6
- 239000003153 chemical reaction reagent Substances 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 3
- 230000002759 chromosomal effect Effects 0.000 claims 10
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 claims 2
- 210000000349 chromosome Anatomy 0.000 claims 2
- 238000010187 selection method Methods 0.000 abstract description 3
- 108020004414 DNA Proteins 0.000 description 46
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 16
- 210000004369 blood Anatomy 0.000 description 13
- 239000008280 blood Substances 0.000 description 13
- 238000001514 detection method Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 10
- 102100038644 Four and a half LIM domains protein 2 Human genes 0.000 description 9
- 101001031714 Homo sapiens Four and a half LIM domains protein 2 Proteins 0.000 description 9
- 229960002685 biotin Drugs 0.000 description 8
- 235000020958 biotin Nutrition 0.000 description 8
- 239000011616 biotin Substances 0.000 description 8
- 238000007400 DNA extraction Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- 230000035945 sensitivity Effects 0.000 description 6
- 238000012163 sequencing technique Methods 0.000 description 6
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 description 5
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- -1 C1orf132 Proteins 0.000 description 3
- 102100021878 Neuronal pentraxin-2 Human genes 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 210000001124 body fluid Anatomy 0.000 description 3
- 238000010219 correlation analysis Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 208000023463 mandibuloacral dysplasia Diseases 0.000 description 3
- 238000000120 microwave digestion Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000012408 PCR amplification Methods 0.000 description 2
- 108010090804 Streptavidin Proteins 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011842 forensic investigation Methods 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000010197 meta-analysis Methods 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 108010005094 Advanced Glycation End Products Proteins 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 238000001276 Kolmogorov–Smirnov test Methods 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- LSNNMFCWUKXFEE-UHFFFAOYSA-N Sulfurous acid Chemical compound OS(O)=O LSNNMFCWUKXFEE-UHFFFAOYSA-N 0.000 description 1
- 108091008874 T cell receptors Proteins 0.000 description 1
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 238000011166 aliquoting Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 229940009098 aspartate Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000006340 racemization Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000037390 scarring Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- WBHQBSYUUJJSRZ-UHFFFAOYSA-M sodium bisulfate Chemical compound [Na+].OS([O-])(=O)=O WBHQBSYUUJJSRZ-UHFFFAOYSA-M 0.000 description 1
- 229910000342 sodium bisulfate Inorganic materials 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- 238000005382 thermal cycling Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本发明提供了一种用于年龄预测的方法,所述的方法中包括焦磷酸测序和随机森林回归分析,所述的随机森林回归分析模型使用R package random Forest构建,并采用正向选择法确定最佳的位点组合。本发明提供的方法仅需0.1ng模板DNA,可用于难度较高的法医血痕检材;整个过程可在10小时内完成;针对性别差异,分性别建立两个独立的年龄预测模型;仅使用3‑4个CpG位点,预测年龄的准确性可达到MAD<3年。
The present invention provides a method for age prediction. The method includes pyrosequencing and random forest regression analysis. The random forest regression analysis model is constructed using R package random Forest, and determined by forward selection method. optimal site combination. The method provided by the invention only needs 0.1 ng of template DNA, which can be used for forensic bloodstain samples with relatively high difficulty; the whole process can be completed within 10 hours; according to gender differences, two independent age prediction models are established by gender; only 3 ‑4 CpG loci, the accuracy of predicting age can reach MAD<3 years.
Description
技术领域technical field
本发明属于法医学领域,具体涉及利用焦磷酸测序和随机森林回归分析进行年龄预测的方法。The invention belongs to the field of forensic medicine, and in particular relates to a method for age prediction using pyrosequencing and random forest regression analysis.
背景技术Background technique
对未知样本捐赠者的生理年龄评估是法医调查中最重要的工具之一。它缩小了犯罪嫌疑人的范围,进而对罪犯的外部可见特征预测和生物地理祖先推断进行补充。先前建立的年龄分类方法涉及对骨骼特征的形态学分析。当骨骼和牙齿等固体组织可用时,可通过人类学方法精确地确定年龄。然而,由于在法医调查过程中更容易遇到其他组织,如体液,因此在实践中很难使用此类方法。最近,提出了几种基于分子水平的方法来估算年龄,包括端粒长度分析,线粒体DNA的年龄依赖性缺失或T细胞DNA重排,以及蛋白质改变,如天冬氨酸的外消旋作用和晚期糖基化终产物。然而,所有这些方法都有局限性,限制了它们在犯罪现场的适用性,特别是它们的低准确性和严格的样本要求。例如,基于信号联合T细胞受体重排切除环(sjTRECs)量化的年龄预测标准误差为±8.0年。The biological age assessment of donors of unknown samples is one of the most important tools in forensic investigations. It narrows down the criminal suspects, which in turn complements the offender's externally visible feature prediction and biogeographic ancestry inference. Previously established age classification methods involve morphological analysis of skeletal features. When solid tissues such as bones and teeth are available, age can be accurately determined by anthropological methods. However, it is difficult to use such methods in practice as other tissues, such as bodily fluids, are more likely to be encountered during forensic investigations. Recently, several molecular-level-based methods have been proposed to estimate age, including telomere length analysis, age-dependent deletions of mitochondrial DNA or T-cell DNA rearrangements, and protein alterations such as racemization of aspartate and Advanced glycation end products. However, all these methods have limitations that limit their applicability to crime scenes, especially their low accuracy and strict sample requirements. For example, the standard error of age prediction based on the quantification of signaling combined with T-cell receptor rearrangement excision circles (sjTRECs) is ±8.0 years.
这些方法的一个可能替代方法是检测表观遗传修饰(例如甲基化),现在已知这些修饰可随年龄变化。迄今为止,法医学年龄预测的研究主要集中在全血样本上,平均绝对偏差(MAD)为3-10年,主要采用多元线性回归模型。少量研究使用机器学习算法,如支持向量机(SVM)、人工神经网络(ANN)和随机森林回归(RFR),实现了相对较低的预测误差(3.24-4.7年);然而,这些研究仅在新鲜体液中进行。此外,基于斑痕的年龄预测(在犯罪现场调查中更常见)尚未得到系统研究。A possible alternative to these methods is the detection of epigenetic modifications, such as methylation, which are now known to vary with age. To date, studies on forensic age prediction have mainly focused on whole blood samples, with mean absolute deviation (MAD) ranging from 3 to 10 years, mainly using multiple linear regression models. A small number of studies have achieved relatively low prediction errors (3.24-4.7 years) using machine learning algorithms such as Support Vector Machines (SVM), Artificial Neural Networks (ANN), and Random Forest Regression (RFR); however, these studies only in fresh body fluids. In addition, age prediction based on scarring (more common in crime scene investigations) has not been systematically studied.
因此,本发明旨在建立一种灵敏、快速、可靠的基于焦磷酸测序技术和随机森林回归计算模型,适用于包括血痕在内的各种检材的年龄预测方法。Therefore, the present invention aims to establish a sensitive, fast, and reliable method for age prediction based on pyrosequencing technology and random forest regression, which is suitable for age prediction of various samples including bloodstains.
发明内容SUMMARY OF THE INVENTION
本发明在基因组序列中筛选出一套用于分析法医学案件中检材的DNA甲基化年龄预测位点,并对每一位点设计了引物,使用焦磷酸测序技术对各位点甲基化水平进行分析,而后利用随机森林回归分别为男性和女性建立年龄预测模型。旨在发明一种灵敏、快速、可靠且使用较少位点仍能保持高准确度的年龄预测分析方法,该检测方法可用于血痕等检材的年龄预测,在该方法中我们对DNA提取、引物设计和测序方案都进行了优化。The present invention screens out a set of DNA methylation age prediction sites for analyzing the samples in forensic cases from the genome sequence, designs primers for each site, and uses pyrosequencing technology to analyze the methylation level of each site. Analysis, and then use random forest regression to build age prediction models for males and females separately. The aim is to invent a sensitive, fast, reliable and high-accuracy age prediction analysis method using fewer sites. This detection method can be used for age prediction of blood stains and other samples. Primer design and sequencing protocols were optimized.
术语:the term:
RFR:Random Forest Regressor,随机森林回归。RFR: Random Forest Regressor, random forest regression.
SVR:Support Vector Regression,支持向量回归。SVR: Support Vector Regression, support vector regression.
MAD:Mean Absolute Deviation,平均绝对误差。MAD: Mean Absolute Deviation, mean absolute error.
一方面,本发明提供了一种用于年龄预测的方法。In one aspect, the present invention provides a method for age prediction.
所述的方法中包括焦磷酸测序和随机森林回归分析,所述的随机森林回归分析模型使用R package random Forest构建,并采用正向选择法确定最佳的位点组合。The method includes pyrosequencing and random forest regression analysis. The random forest regression analysis model is constructed using the R package random Forest, and the forward selection method is used to determine the best site combination.
所述的方法中随机森林回归分析模型构建中的参数设置:mtry参数与每次建模的CpG位点数相同,最小节点大小为5,树的数量设置为1000。The parameter settings in the random forest regression analysis model construction in the described method: the mtry parameter is the same as the number of CpG sites for each modeling, the minimum node size is 5, and the number of trees is set to 1000.
所述的随机森林回归分析模型选用与年龄相关的DNA甲基化标记分别位于ELOVL2、C1orf132、TRIM59、KLF14、FHL2和NPTX2基因。The random forest regression analysis model selected age-related DNA methylation markers located in ELOVL2, C1orf132, TRIM59, KLF14, FHL2 and NPTX2 genes, respectively.
所述的随机森林回归建立年龄预测模型时,共选用7个与年龄相关的DNA甲基化位点,其中男性3个为:TRIM59.pos7、KLF14.pos2、ELOVL2.pos7;女性4个为TRIM59.pos8、KLF14.pos3、Clorf132.pos2和FHL2.pos6。When the described random forest regression established the age prediction model, a total of 7 age-related DNA methylation sites were selected, of which 3 were male: TRIM59.pos7, KLF14.pos2, ELOVL2.pos7; 4 females were TRIM59 .pos8, KLF14.pos3, Clorf132.pos2 and FHL2.pos6.
所述的焦磷酸测序中使用的PCR产物的体积为12μL。The volume of PCR product used in the described pyrosequencing was 12 μL.
在一些实施例中,所述的方法中包括以下步骤:In some embodiments, the method includes the following steps:
(1)DNA提取;(1) DNA extraction;
(2)亚硫酸盐转化;(2) sulfite conversion;
(3)PCR;(3) PCR;
(4)焦磷酸测序;(4) Pyrosequencing;
(5)模型预测。(5) Model prediction.
另一方面,本发明提供了一组用于随机森林回归分析进行年龄预测的基因组合。In another aspect, the present invention provides a set of gene combinations for age prediction by random forest regression analysis.
所述的基因组合中包括ELOVL2、C1orf132、TRIM59、KLF14、FHL2和NPTX2。The gene panel includes ELOVL2, C1orf132, TRIM59, KLF14, FHL2 and NPTX2.
所述的甲基化位点中包括男性相关位点:TRIM59.pos7、KLF14.pos2、ELOVL2.pos7和女性相关位点:TRIM59.pos8、KLF14.pos3、Clorf132.pos2、FHL2.pos6。The methylation sites include male-related sites: TRIM59.pos7, KLF14.pos2, ELOVL2.pos7 and female-related sites: TRIM59.pos8, KLF14.pos3, Clorf132.pos2, FHL2.pos6.
再一方面,本发明提供了一组用于随机森林回归分析进行年龄预测的引物。In yet another aspect, the present invention provides a set of primers for age prediction by random forest regression analysis.
所述的引物用于焦磷酸测序。The primers described were used for pyrosequencing.
所述的引物及其测序位点如下:The primers and their sequencing sites are as follows:
其中,引物序列F、R、S分别代表正向引物、反向引物和测序引物,序列前标记biotin表示引物带有生物素标记。The primer sequences F, R, and S represent forward primer, reverse primer and sequencing primer, respectively, and the label biotin before the sequence indicates that the primer is labeled with biotin.
又一方面,本发明提供了前述的方法和/或基因组合和/或甲基化位点和/或引物在制备用于预测年龄的试剂盒中的应用。In yet another aspect, the present invention provides the aforementioned method and/or application of gene combination and/or methylation site and/or primer in preparing a kit for predicting age.
又一方面,本发明提供了一种用于预测年龄的试剂盒。In yet another aspect, the present invention provides a kit for predicting age.
所述的试剂盒中包括以下引物:The kit includes the following primers:
所述的试剂盒中还包括其他用于焦磷酸测序的试剂。The kit also includes other reagents for pyrosequencing.
所述的试剂盒与随机森林回归模型联合使用。The kit was used in conjunction with a random forest regression model.
本发明的有益效果:Beneficial effects of the present invention:
(1)仅需0.1ng模板DNA,可用于难度较高的法医血痕检材(1) Only 0.1ng of template DNA is needed, which can be used for difficult forensic blood stains
许多技术,如EpiTYPER、Snapshots、焦磷酸测序和大规模平行测序(MPS)都可以提供较准确的DNA甲基化测量方法。而限制EpiTYPER分析法在法医学中应用的一个主要原因是其需要高达1μg的基因组DNA,然而实际犯罪现场调查中很难获得如此高量的DNA,往往在犯罪现场更常遇见体液斑迹。相较于EpiTYPER,MPS所需的模板DNA可降至10ng,Snapshots需要4ng模板DNA。而在本发明中,使用0.1ng的模板DNA即可进行准确的年龄预测。先前的研究表明基于甲基化进行成功的年龄预测需要10-20ng模板DNA。因此,本发明的检测方法在现有的血痕检测中具有最高的灵敏度,并具有良好的法医学应用前景。Many techniques, such as EpiTYPER, Snapshots, pyrosequencing, and massively parallel sequencing (MPS), can provide relatively accurate DNA methylation measurements. One of the main reasons that limit the application of EpiTYPER analysis in forensics is that it requires up to 1 μg of genomic DNA. However, it is difficult to obtain such a high amount of DNA in actual crime scene investigations, and bodily fluid stains are often encountered at crime scenes. Compared with EpiTYPER, the template DNA required for MPS can be reduced to 10ng, and Snapshots require 4ng template DNA. In the present invention, however, accurate age prediction can be performed using 0.1 ng of template DNA. Previous studies have shown that 10-20ng of template DNA is required for successful age prediction based on methylation. Therefore, the detection method of the present invention has the highest sensitivity in the existing blood stain detection, and has a good forensic application prospect.
(2)整个过程可在10小时内完成(2) The whole process can be completed within 10 hours
本发明的方法可在一天内完成,远远快于其他可用的方法。DNA提取/定量、硫酸氢钠转化、PCR和焦磷酸测序试验分别需要2h、2.5h、3h和2h。相比之下,EpiTyper和MPS的标准程序都需要2天以上的时间。特别是,MPS需要专门的设备和复杂的生物信息学分析系统,难以在3天内完成。The method of the present invention can be completed in one day, much faster than other available methods. DNA extraction/quantification, sodium bisulfate conversion, PCR and pyrosequencing assays took 2h, 2.5h, 3h and 2h, respectively. In comparison, the standard procedures for both EpiTyper and MPS take more than 2 days. In particular, MPS requires specialized equipment and a complex bioinformatics analysis system, which is difficult to complete within 3 days.
(3)针对性别差异,分性别建立两个独立的年龄预测模型(3) For gender differences, establish two independent age prediction models by gender
选择随机森林回归(random forest regression,RFR)建立年龄预测模型,分别使用男性3个(TRIM59.pos7、KLF14.pos2、ELOVL2.pos7)和女性4个(TRIM59.pos8、KLF14.pos3、Clorf132.pos2和FHL2.pos6)位点,共7个位点的最终模型为男性和女性的预测平均绝对误差(MAD)分别为2.8年(R=0.99)和2.93年(R=0.98)。Select random forest regression (RFR) to establish age prediction model, using 3 males (TRIM59.pos7, KLF14.pos2, ELOVL2.pos7) and 4 females (TRIM59.pos8, KLF14.pos3, Clorf132.pos2) and FHL2.pos6) loci, the final model for a total of 7 loci had a mean absolute error (MAD) of prediction for males and females of 2.8 years (R=0.99) and 2.93 years (R=0.98), respectively.
(4)仅使用3-4个CpG位点,预测年龄的准确性可达到MAD<3年(4) Using only 3-4 CpG sites, the accuracy of predicting age can reach MAD<3 years
对过去几年的年龄预测研究进行的荟萃分析表明,先前研究建立的年龄预测模型,几乎所有MAD的年龄均>3年。由于使用RFR,我们的模型是最有效的(MAD<3年,且只需3-4个CpG位点,男性样本仅使用3个CpG位点,女性样本仅使用4个CpG位点)。本发明的位点少且仍能保持高准确度的年龄预测模型对于法医推断更为实用。A meta-analysis of age-prediction studies over the past few years showed that, in age-prediction models established by previous studies, nearly all MADs were >3 years old. Our model is the most efficient due to the use of RFR (MAD < 3 years and requires only 3-4 CpG loci, using only 3 CpG loci in male samples and 4 CpG loci in female samples). The age prediction model of the present invention, which has few sites and still maintains high accuracy, is more practical for forensic inference.
附图说明Description of drawings
图1为随机森林回归(RFR)在年龄预测方面优于支持向量回归(SVR)。Figure 1 shows that random forest regression (RFR) outperforms support vector regression (SVR) in age prediction.
图2为随机森林回归(RFR)测试数据集的预测年龄与实际年龄。Figure 2 shows the predicted age versus actual age for the random forest regression (RFR) test dataset.
图3为微量DNA中7种甲基化标记的灵敏度检测。Figure 3 shows the sensitivity detection of seven methylation markers in trace amounts of DNA.
图4为7个CpG位点甲基化水平与年龄的相关性分析。Figure 4 shows the correlation analysis between the methylation levels of 7 CpG sites and age.
图5为与已发表研究的年龄预测方法准确度比较。Figure 5 is a comparison of the accuracy of age prediction methods with published studies.
具体实施方式Detailed ways
下面结合具体实施例,对本发明作进一步详细的阐述,下述实施例不用于限制本发明,仅用于说明本发明。以下实施例中所使用的实验方法如无特殊说明,实施例中未注明具体条件的实验方法,通常按照常规条件,下述实施例中所使用的材料、试剂等,如无特殊说明,均可从商业途径得到。The present invention will be described in further detail below with reference to specific embodiments. The following embodiments are not intended to limit the present invention, but are only used to illustrate the present invention. The experimental methods used in the following examples, unless otherwise specified, the experimental methods that do not specify specific conditions in the examples are usually in accordance with conventional conditions, and the materials, reagents, etc. used in the following examples, unless otherwise specified, are all Commercially available.
实施例1DNA提取及位点筛选Example 1 DNA extraction and site screening
(1)DNA提取:(1) DNA extraction:
优化DNA提取方案,减少血痕微量DNA损失。Optimize the DNA extraction protocol to reduce the loss of trace DNA in blood.
甲基化分析的准确性取决于从血迹中提取高质量的DNA。QIAamp DNAInvestigator kit已被认为是从法医样本中提取DNA的更可靠方法,可在2小时内获得成功提取出高质量的DNA。我们对该试剂盒进行了进一步优化,包括在较高温度下缩短孵育时间、在裂解液中添加载体RNA以及加热溶解DNA的试剂。The accuracy of methylation analysis depends on extracting high-quality DNA from bloodstains. The QIAamp DNAInvestigator kit has been recognized as a more reliable method for DNA extraction from forensic samples, resulting in successful high-quality DNA extraction within 2 hours. We have further optimized the kit to include shorter incubation times at higher temperatures, addition of carrier RNA to the lysate, and heating reagents to dissolve the DNA.
先前的方法是样本在56℃下孵育1小时,改进后的方法是样本在85℃下孵育10分钟,然后在56℃下二次孵育1小时,以此增加血痕DNA的提取量。The previous method was to incubate the sample at 56°C for 1 hour, and the improved method is to incubate the sample at 85°C for 10 minutes, followed by a second incubation at 56°C for 1 hour to increase the amount of bloodstained DNA extracted.
加热溶解DNA的试剂也可以加速血斑上的细胞脱落和增加DNA溶解,以此来减少微量DNA的损失。Heating reagents that lyse DNA can also accelerate cell shedding on blood spots and increase DNA lysis, thereby reducing the loss of trace amounts of DNA.
(2)位点选择:(2) Site selection:
根据文献,选择了六个与年龄相关的DNA甲基化标记,分别位于ELOVL2、C1orf132、TRIM59、KLF14、FHL2和NPTX2,以确保我们关注的是与年龄相关的区域。Based on the literature, six age-related DNA methylation marks, located at ELOVL2, C1orf132, TRIM59, KLF14, FHL2, and NPTX2, were selected to ensure that we focused on age-related regions.
(3)引物设计:(3) Primer design:
由于从血痕中获得的DNA数量极少且质量较低,PCR的准确性和敏感性至关重要。使用PyroMark Assay Design version 2.0(Qiagen,德国)设计PCR引物和测序引物。设计引物时,对目标序列进行调整,使引物包含尽可能多的胞嘧啶(C),以检测更多的甲基化位点。我们避免了目标区域的SNP和其他多态性,因为它们可能会导致测序反应出现偏差。此外,排除引物结合序列中可能的甲基化位点,将GC含量保持在60%以下,选择具有高特异性的引物(即不形成引物二聚体)。必要时,我们改变已公布的方法(例如,添加二甲基亚砜(DMSO)以避免二聚体形成)以优化方案。PCR的引物中,一条引物的5’端需使用生物素标记,以与链霉亲和素包被的磁珠结合,用于后续单链PCR产物的分离纯化,另一条不要标记。生物素标记的引物中含有游离的生物素,游离生物素会与模板竞争结合到链霉亲和素包被的磁珠上,而降低信号水平,须使用HPLC纯化的生物素标记的引物。每个目的基因的扩增子长度范围为105-306bp。最终得到的引物如下表所示:Since the DNA obtained from bloodstains is extremely small in quantity and of low quality, the accuracy and sensitivity of PCR is critical. PCR primers and sequencing primers were designed using PyroMark Assay Design version 2.0 (Qiagen, Germany). When designing primers, adjust the target sequence so that the primers contain as many cytosines (C) as possible to detect more methylation sites. We avoided SNPs and other polymorphisms in the target region because they could bias the sequencing reaction. Furthermore, possible methylation sites in the primer-binding sequence were excluded, the GC content was kept below 60%, and primers with high specificity (ie, no primer-dimer formation) were selected. When necessary, we modified published methods (eg, adding dimethyl sulfoxide (DMSO) to avoid dimer formation) to optimize the protocol. Among the PCR primers, the 5' end of one primer should be labeled with biotin to bind to streptavidin-coated magnetic beads for subsequent separation and purification of single-stranded PCR products, and the other should not be labeled. Biotin-labeled primers contain free biotin. Free biotin will compete with the template for binding to streptavidin-coated magnetic beads and reduce the signal level. HPLC-purified biotin-labeled primers must be used. The amplicon length of each gene of interest ranged from 105-306 bp. The final primers are shown in the table below:
表1年龄相关甲基化分析的PCR引物、焦磷酸测序引物和CpG序列Table 1 PCR primers, pyrosequencing primers and CpG sequences for age-related methylation analysis
实施例2焦磷酸测序技术检测DNA甲基化Example 2 Detection of DNA methylation by pyrosequencing technology
(1)亚硫酸氢盐转化(1) bisulfite conversion
使用EpiTect fast DNA亚硫酸氢盐试剂盒(德国,Qiagen)对提取的DNA(40μL)进行亚硫酸氢盐转化。将DNA样本与CT转化试剂(亚硫酸氢盐试剂盒)混合以获得最终体积为140μL的产物,然后在95℃下孵育5分钟,60℃20分钟,然后纯化。Extracted DNA (40 μL) was subjected to bisulfite conversion using EpiTect fast DNA bisulfite kit (Qiagen, Germany). DNA samples were mixed with CT conversion reagent (bisulfite kit) to obtain a final volume of 140 μL of product, then incubated at 95°C for 5 minutes, 60°C for 20 minutes, and then purified.
(2)PCR(2) PCR
反应混合物(25μL)包含2μL转化DNA、12.5μL PCR预混物(德国,Qiagen)和0.1-0.5mM引物。调整引物浓度以获得不含二聚体的特异性DNA产物。热循环条件如下:95℃变性10分钟;在95℃下进行45次循环,持续30秒,在56℃下进行30秒(NPTX2 58℃,30秒),在72℃下进行30秒;然后在72℃下进行5分钟的最终延伸。使用琼脂糖凝胶电泳进行电泳检测。The reaction mixture (25 μL) contained 2 μL of transforming DNA, 12.5 μL PCR master mix (Qiagen, Germany) and 0.1-0.5 mM primers. Primer concentrations were adjusted to obtain dimer-free specific DNA products. Thermal cycling conditions were as follows: denaturation at 95°C for 10 min; 45 cycles at 95°C for 30 s, 30 s at 56°C (NPTX2 58°C, 30 s), 30 s at 72°C; A final extension was performed at 72°C for 5 minutes. Electrophoretic detection was performed using agarose gel electrophoresis.
(3)焦磷酸测序(3) Pyrosequencing
使用Pyromark Q48热测序仪(德国,Qiagen)和Pyro-Gold试剂盒(德国,Qiagen)对生物素标记的PCR扩增产物制备的模板进行测序。先前的焦磷酸测序过程中,PCR产物的体积为10μL,会产生无法与背景信号明确区分的不稳定信号。我们的方法将PCR产物的体积增加到12μL,可有效避免不稳定信号的产生。Templates prepared from biotin-labeled PCR amplification products were sequenced using a Pyromark Q48 thermal sequencer (Qiagen, Germany) and a Pyro-Gold kit (Qiagen, Germany). During the previous pyrosequencing procedure, the volume of PCR product was 10 μL, which produced an unstable signal that was indistinguishable from the background signal. Our method increases the volume of PCR products to 12 μL, which can effectively avoid the generation of unstable signals.
实施例3构建血痕年龄预测模型Example 3 Construction of bloodstain age prediction model
(1)对比SVR和RFR模型的年龄预测准确性(1) Compare the age prediction accuracy of SVR and RFR models
我们先前的研究结果表明,SVR模型比多元线性回归、多元非线性回归和反向传播神经网络等方法更精确,因此,我们利用SVR和RFR模型,基于所有46个CpG位点进行组合,建立最佳拟合年龄预测模型,并计算其预测精度。SVR模型是在R package e1071中构建,参数设置:cost=2,gamma=0.8,epsilon=0.1。RFR模型用R package random Forest构建,mtry参数与每次建模的CpG位点数相同,最小节点大小为5,树的数量设置为1000。Our previous findings showed that the SVR model was more accurate than methods such as multiple linear regression, multiple nonlinear regression, and backpropagation neural networks, therefore, we utilized the SVR and RFR models, based on the combination of all 46 CpG sites, to establish the most The best fit age prediction model is calculated and its prediction accuracy is calculated. The SVR model was built in R package e1071, with parameter settings: cost=2, gamma=0.8, epsilon=0.1. The RFR model was constructed with the R package random Forest, the mtry parameter was the same as the number of CpG sites for each modeling, the minimum node size was 5, and the number of trees was set to 1000.
为了提高计算速度,采用正向选择法确定最佳的位点组合。从241个血痕样本(年龄范围为10-79岁的241名健康中国汉族志愿者,其中包括128名男性和113名女性的全血样本。所有捐助者都提供了知情同意书,中国科学院北京基因组研究所通过了这项研究的伦理批准)中随机抽取70%的样本形成训练数据集,剩余的30%作为测试数据集,以评估RFR模型的准确性。训练重复100次,每次选择最佳位点(即最小MAD)。选择记录频率最高的位点作为最终模型的合适位点。在双位点训练模型中,在最佳位点之后,记录频率最多、MAD最小的位点作为第二最佳位点。In order to improve the calculation speed, the forward selection method was used to determine the optimal site combination. Whole blood samples from 241 bloodstain samples (age range 10-79 years) from 241 healthy Chinese Han volunteers, including 128 males and 113 females. All donors provided informed consent, Chinese Academy of Sciences Beijing Genome The Institute passed the ethical approval of this study) randomly selected 70% of the samples to form the training dataset and the remaining 30% as the test dataset to evaluate the accuracy of the RFR model. The training was repeated 100 times, each time selecting the best site (ie, the smallest MAD). The site with the highest recording frequency was selected as a suitable site for the final model. In the two-site training model, after the best site, the site with the highest frequency and the smallest MAD was recorded as the second best site.
RFR构建的年龄预测模型,女性使用4个位点(TRIM59.pos8、KLF14.pos3、Clorf132.pos2和FHL2.pos6)男性使用3个位点(TRIM59.pos7、KLF14.pos2、ELOVL2.pos7),所得的MADs<3年。在SVR模型下,即使男性和女性都有8个位点,MAD稳定在4.5年左右,这一结果表明,RFR在年龄预测方面优于SVR(图1)。The age prediction model constructed by RFR, females use 4 loci (TRIM59.pos8, KLF14.pos3, Clorf132.pos2 and FHL2.pos6) males use 3 loci (TRIM59.pos7, KLF14.pos2, ELOVL2.pos7), Resulting MADs < 3 years. Under the SVR model, even though both males and females had 8 loci, the MAD was stable at around 4.5 years, a result suggesting that RFR outperformed SVR in predicting age (Fig. 1).
(2)测试数据集验证预测准确性(2) Test data set to verify the prediction accuracy
剩余的30%的血痕样本(男性38名,女性33名)作为测试数据集,在RFR模型中验证最终模型筛选出的7个位点(男性3个位点:TRIM59.pos7、KLF14.pos2、ELOVL2.pos7;女性4个位点:TRIM59.pos8、KLF14.pos3、Clorf132.pos2和FHL2.pos6)的年龄预测准确性,得出男性和女性的预测MAD分别为2.8年(R=0.99)和2.93年(R=0.98)(图2)。The remaining 30% of the blood stain samples (38 males and 33 females) were used as the test dataset to verify the 7 loci (3 loci in males: TRIM59.pos7, KLF14.pos2, ELOVL2.pos7; female 4 loci: TRIM59.pos8, KLF14.pos3, Clorf132.pos2, and FHL2.pos6) age prediction accuracy, resulting in a predicted MAD of 2.8 years for males and females (R=0.99) and 2.93 years (R=0.98) (Figure 2).
实施例4灵敏度检测Example 4 Sensitivity detection
收集年龄范围为10-79岁的241名健康中国汉族志愿者(128名男性和113名女性)的全血样本。所有捐助者都提供了知情同意书,中国科学院北京基因组研究所通过了这项研究的伦理批准。Whole blood samples were collected from 241 healthy Chinese Han volunteers (128 males and 113 females) ranging in age from 10-79 years. All donors provided informed consent, and the Beijing Institute of Genomics, Chinese Academy of Sciences received ethical approval for this study.
将20μL全血等分到滤纸上制备血迹,然后在室温下保存1年。为了确定检测灵敏度,将从血痕中提取的DNA连续稀释至100、50、10、5、2.5、1.0、0.50、0.25和0.10ng。不同浓度的血痕样本均进行甲基化分析,先进行亚硫酸氢盐转化,然后进行PCR扩增和焦磷酸测序(参照实施例1的方法)。对比0.1ng DNA和较高DNA浓度之间甲基化百分比的差异,判定我们所提出的甲基化检测方法在血痕检测中的灵敏度。Bloodstains were prepared by aliquoting 20 μL of whole blood onto filter paper and then stored at room temperature for 1 year. To determine detection sensitivity, DNA extracted from bloodstains was serially diluted to 100, 50, 10, 5, 2.5, 1.0, 0.50, 0.25, and 0.10 ng. Bloodstain samples with different concentrations were all subjected to methylation analysis, firstly bisulfite conversion, and then PCR amplification and pyrosequencing (refer to the method of Example 1). The difference in methylation percentage between 0.1 ng DNA and higher DNA concentrations was compared to judge the sensitivity of our proposed methylation detection method in blood stain detection.
我们观察到用于年龄预测的女性4个CpG位点(TRIM59.pos8、KLF14.pos3、Clorf132.pos2和FHL2.pos6)和男性3个CpG位点(TRIM59.pos7、KLF14.pos2、ELOVL2.pos7),0.1ng DNA与较高浓度DNA之间的甲基化百分比无显著差异(P≥0.05,KS检验;图3)。ELOVL2.pos7位点,需要1.0ng DNA能达到相似的水平。We observed 4 CpG loci (TRIM59.pos8, KLF14.pos3, Clorf132.pos2 and FHL2.pos6) in females and 3 CpG loci in males (TRIM59.pos7, KLF14.pos2, ELOVL2.pos7) for age prediction ), there was no significant difference in percent methylation between 0.1 ng of DNA and higher concentrations of DNA (P≥0.05, KS test; Figure 3). For the ELOVL2.pos7 locus, 1.0 ng of DNA is required to achieve similar levels.
实施例5DNA甲基化水平与年龄的相关性分析Example 5 Correlation analysis between DNA methylation level and age
采集年龄范围为10-79岁的241名健康中国汉族志愿者(128名男性和113名女性)的全血样本,制备成血痕样本并进行甲基化分析,将本发明中位点与年龄进行相关性分析。本发明最终形成的血痕年龄预测模型包含跨越3个基因的7个CpG位点,3个已知位点,4个新CpG位点。结果表明,其中的5个CpG位点来自3个基因(TRIM59、KLF14和C1orf132),在中国受试者的血痕分析中与年龄相关(图4)。Whole blood samples were collected from 241 healthy Chinese Han volunteers (128 males and 113 females) with an age range of 10-79 years, and blood stain samples were prepared and subjected to methylation analysis. Correlation analysis. The bloodstain age prediction model finally formed by the present invention includes 7 CpG sites spanning 3 genes, 3 known sites and 4 new CpG sites. The results showed that five of these CpG sites were derived from three genes (TRIM59, KLF14, and C1orf132), which were age-related in bloodstain analysis of Chinese subjects (Fig. 4).
实施例6对比已发表研究的年龄预测准确度Example 6 Comparison of Age Prediction Accuracy of Published Studies
对过去多年的年龄预测研究进行的荟萃分析表明,几乎所有MAD的年龄均>3年(图5)。与之前的研究相比,由于使用RFR,我们的模型是最有效的(MAD<3年,只需3-4个CpG位点)。图5中,实心点代表已公布的结果,不同模型中的数学方法以不同的形状表示。而“十”字和“米”字符号分别代表我们对女性和男性建立的年龄预测结果。A meta-analysis of age-prediction studies over the past several years showed that almost all MADs were >3 years old (Figure 5). Compared to previous studies, our model is the most efficient (MAD < 3 years with only 3-4 CpG sites) due to the use of RFR. In Figure 5, the solid dots represent published results, and the math in different models is represented by different shapes. The "ten" and "meter" symbols represent our age prediction results for women and men, respectively.
序列表sequence listing
<110> 山西医科大学<110> Shanxi Medical University
<120> 利用焦磷酸测序和随机森林回归分析进行年龄预测的方法<120> Methods for Age Prediction Using Pyrosequencing and Random Forest Regression Analysis
<160> 18<160> 18
<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0
<210> 1<210> 1
<211> 29<211> 29
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 1<400> 1
tagtaaatat ataagtgggg gaagaaggg 29tagtaaatat ataagtgggg gaagaaggg 29
<210> 2<210> 2
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 2<400> 2
ttaataaaac caaattctaa aacattc 27ttaataaaac caaattctaa aacattc 27
<210> 3<210> 3
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 3<400> 3
caccttacca ccaaaccaaa attt 24caccttacca ccaaaccaaa attt 24
<210> 4<210> 4
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 4<400> 4
aggggagtag ggtaagtgag g 21aggggagtag ggtaagtgag g 21
<210> 5<210> 5
<211> 30<211> 30
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 5<400> 5
caaaaccatt tccccctaat atatacttca 30caaaaccatt tccccctaat atatacttca 30
<210> 6<210> 6
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 6<400> 6
gggaggagat ttgtaggttt 20
<210> 7<210> 7
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 7<400> 7
gggttttggg agtatagtag t 21gggttttggg agtatagtag t 21
<210> 8<210> 8
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 8<400> 8
acacctccta aaacttctcc aatctcc 27acacctccta aaacttctcc aatctcc 27
<210> 9<210> 9
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 9<400> 9
gttttgggag tatagtagtt a 21gttttgggag tatagtagtt a 21
<210> 10<210> 10
<211> 28<211> 28
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 10<400> 10
ggttttaggt taagttatgt ttaatagt 28ggttttaggt taagttatgt ttaatagt 28
<210> 11<210> 11
<211> 30<211> 30
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 11<400> 11
actaaaaaat ttccctctat taccattacc 30actaaaaaat ttccctctat taccattacc 30
<210> 12<210> 12
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 12<400> 12
atagttttag aaattatttt gttt 24atagttttag aaattatttt gttt 24
<210> 13<210> 13
<211> 29<211> 29
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 13<400> 13
tagtaaatat ataagtgggg gaagaaggg 29tagtaaatat ataagtgggg gaagaaggg 29
<210> 14<210> 14
<211> 28<211> 28
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 14<400> 14
atttaataaa accaaattct aaaacatt 28atttaataaa accaaattct aaaacatt 28
<210> 15<210> 15
<211> 25<211> 25
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 15<400> 15
ggggttaagt tattaagttt tgaag 25ggggttaagt tattaagttt tgaag 25
<210> 16<210> 16
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 16<400> 16
tataggtggt ttgggggaga g 21tataggtggt ttggggggaga g 21
<210> 17<210> 17
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 17<400> 17
aaaaaacact accctccaca acataac 27aaaaaacact accctccaca acataac 27
<210> 18<210> 18
<211> 15<211> 15
<212> DNA<212> DNA
<213> 人工序列(Artificial Sequence)<213> Artificial Sequence
<400> 18<400> 18
ttgggggaga ggttg 15
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223180.6A CN114045333B (en) | 2021-10-20 | 2021-10-20 | Method for predicting age by pyrosequencing and random forest regression analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223180.6A CN114045333B (en) | 2021-10-20 | 2021-10-20 | Method for predicting age by pyrosequencing and random forest regression analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114045333A CN114045333A (en) | 2022-02-15 |
CN114045333B true CN114045333B (en) | 2022-10-11 |
Family
ID=80205735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111223180.6A Active CN114045333B (en) | 2021-10-20 | 2021-10-20 | Method for predicting age by pyrosequencing and random forest regression analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114045333B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115992259B (en) * | 2022-11-23 | 2023-10-10 | 四川大学 | A primer set and kit based on 13 Y chromosome methylation genetic markers |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012162139A1 (en) * | 2011-05-20 | 2012-11-29 | The Regents Of The University Of California | Method to estimate age of individual based on epigenetic markers in biological sample |
CN110257494B (en) * | 2019-07-19 | 2020-08-11 | 华中科技大学 | A method, system and amplification detection system for obtaining individual age of Chinese population |
CN111139292A (en) * | 2019-12-03 | 2020-05-12 | 河南远止生物技术有限公司 | Biological age inference method established based on pyrosequencing |
CN113373236B (en) * | 2021-02-19 | 2021-12-31 | 中国科学院北京基因组研究所(国家生物信息中心) | Method for obtaining individual age of Chinese population |
-
2021
- 2021-10-20 CN CN202111223180.6A patent/CN114045333B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114045333A (en) | 2022-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3377647B1 (en) | Nucleic acids and methods for detecting methylation status | |
CN110257494B (en) | A method, system and amplification detection system for obtaining individual age of Chinese population | |
US9617598B2 (en) | Methods of amplifying whole genome of a single cell | |
US11186866B2 (en) | Method for multiplex detection of methylated DNA | |
JP2015534807A (en) | Non-invasive method for detecting fetal chromosomal aneuploidy | |
EP3904515A1 (en) | Tumor marker stamp-ep3 based on methylation modification | |
EP3916092A1 (en) | Tumor marker stamp-ep5 based on methylated modification | |
Turner et al. | The basics of commonly used molecular techniques for diagnosis, and application of molecular testing in cytology | |
EP3950945A1 (en) | Methylation modification-based tumor marker stamp-ep4 | |
CN114045333B (en) | Method for predicting age by pyrosequencing and random forest regression analysis | |
EP3904516A1 (en) | Methylation modification-based tumor marker stamp-ep6 | |
JP2025016597A (en) | DNA methylation biomarker combinations, detection methods and reagent kits | |
AU2018201992A1 (en) | Sequencing methods and compositions for prenatal diagnoses | |
EP3964578A1 (en) | Methylation-based modified tumor marker stamp-ep8 and application thereof | |
EP4372103A1 (en) | Substance and method for tumor assessment | |
EP3964580A1 (en) | Tumor marker stamp-ep9 based on methylation modification and application thereof | |
CN115772565A (en) | Methylation site for auxiliary detection of lung cancer somatic cell EGFR gene mutation and application thereof | |
EP3964581A1 (en) | Tumor marker stamp-ep7 based on methylated modification and use thereof | |
Al-Turkmani et al. | Molecular assessment of human diseases in the clinical laboratory | |
CN105886497A (en) | Allelic ladder of polymorphic short tandem repeat (STR) loci as well as preparation method, identification method and application thereof | |
Deharvengt et al. | Molecular assessment of human diseases in the clinical laboratory | |
Chun et al. | Second-generation sequencing for cancer genome analysis | |
CN109457019B (en) | KCNH2 gene SCD-related SNP detection kit and detection method | |
Albujja | Biological Fluid Identification by Epigenetic Approaches | |
Agborbesong et al. | Investigation of DNA Methylation in Autosomal Dominant Polycystic Kidney Disease |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |