CN117737216A - Method for detecting genome information based on restriction enzyme - Google Patents
Method for detecting genome information based on restriction enzyme Download PDFInfo
- Publication number
- CN117737216A CN117737216A CN202410122596.6A CN202410122596A CN117737216A CN 117737216 A CN117737216 A CN 117737216A CN 202410122596 A CN202410122596 A CN 202410122596A CN 117737216 A CN117737216 A CN 117737216A
- Authority
- CN
- China
- Prior art keywords
- genome
- dna
- information
- sequencing
- long
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 108091008146 restriction endonucleases Proteins 0.000 title claims abstract description 50
- 238000012163 sequencing technique Methods 0.000 claims abstract description 84
- 239000012634 fragment Substances 0.000 claims abstract description 81
- 108700028369 Alleles Proteins 0.000 claims abstract description 50
- 230000035772 mutation Effects 0.000 claims abstract description 14
- 238000012217 deletion Methods 0.000 claims abstract description 10
- 230000037430 deletion Effects 0.000 claims abstract description 10
- 238000005520 cutting process Methods 0.000 claims abstract description 7
- 210000004027 cell Anatomy 0.000 claims description 98
- 108020004414 DNA Proteins 0.000 claims description 79
- 210000000349 chromosome Anatomy 0.000 claims description 45
- 230000003321 amplification Effects 0.000 claims description 26
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 26
- 238000001514 detection method Methods 0.000 claims description 22
- 239000002773 nucleotide Substances 0.000 claims description 16
- 238000009826 distribution Methods 0.000 claims description 15
- 125000003729 nucleotide group Chemical group 0.000 claims description 14
- 238000012216 screening Methods 0.000 claims description 14
- 102000004190 Enzymes Human genes 0.000 claims description 13
- 108090000790 Enzymes Proteins 0.000 claims description 13
- 239000011324 bead Substances 0.000 claims description 11
- 238000003780 insertion Methods 0.000 claims description 11
- 230000037431 insertion Effects 0.000 claims description 11
- 230000007067 DNA methylation Effects 0.000 claims description 8
- 241000894007 species Species 0.000 claims description 8
- 230000001973 epigenetic effect Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012408 PCR amplification Methods 0.000 claims description 6
- 238000007672 fourth generation sequencing Methods 0.000 claims description 6
- 238000003752 polymerase chain reaction Methods 0.000 claims description 6
- 108091081062 Repeated sequence (DNA) Proteins 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000000746 purification Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 230000008439 repair process Effects 0.000 claims description 4
- 238000004088 simulation Methods 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 3
- 230000005945 translocation Effects 0.000 claims description 3
- 241000700605 Viruses Species 0.000 claims description 2
- 210000003763 chloroplast Anatomy 0.000 claims description 2
- 238000003776 cleavage reaction Methods 0.000 claims description 2
- 210000003470 mitochondria Anatomy 0.000 claims description 2
- 210000004940 nucleus Anatomy 0.000 claims description 2
- 230000007017 scission Effects 0.000 claims description 2
- 206010028980 Neoplasm Diseases 0.000 abstract description 9
- 238000013399 early diagnosis Methods 0.000 abstract description 2
- 108090000623 proteins and genes Proteins 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 16
- 208000036878 aneuploidy Diseases 0.000 description 14
- 210000001840 diploid cell Anatomy 0.000 description 14
- 235000013601 eggs Nutrition 0.000 description 12
- 230000004048 modification Effects 0.000 description 12
- 238000012986 modification Methods 0.000 description 12
- 102000053602 DNA Human genes 0.000 description 11
- 238000007481 next generation sequencing Methods 0.000 description 10
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 8
- 230000003322 aneuploid effect Effects 0.000 description 8
- 238000013412 genome amplification Methods 0.000 description 8
- 101000702488 Rattus norvegicus High affinity cationic amino acid transporter 1 Proteins 0.000 description 7
- 241000282414 Homo sapiens Species 0.000 description 6
- 241000699670 Mus sp. Species 0.000 description 6
- 231100001075 aneuploidy Toxicity 0.000 description 6
- 238000010276 construction Methods 0.000 description 6
- 230000001717 pathogenic effect Effects 0.000 description 6
- 239000011148 porous material Substances 0.000 description 6
- 238000007671 third-generation sequencing Methods 0.000 description 6
- 230000002068 genetic effect Effects 0.000 description 5
- 238000012268 genome sequencing Methods 0.000 description 5
- 210000001161 mammalian embryo Anatomy 0.000 description 5
- 230000021121 meiosis Effects 0.000 description 5
- 210000000287 oocyte Anatomy 0.000 description 5
- 229920001223 polyethylene glycol Polymers 0.000 description 5
- 102000004169 proteins and genes Human genes 0.000 description 5
- 108091093088 Amplicon Proteins 0.000 description 4
- 108010042407 Endonucleases Proteins 0.000 description 4
- 238000012864 cross contamination Methods 0.000 description 4
- 238000001976 enzyme digestion Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 239000012528 membrane Substances 0.000 description 4
- 230000008186 parthenogenesis Effects 0.000 description 4
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 102000004533 Endonucleases Human genes 0.000 description 3
- 208000026350 Inborn Genetic disease Diseases 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 230000006037 cell lysis Effects 0.000 description 3
- 230000002759 chromosomal effect Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 239000003792 electrolyte Substances 0.000 description 3
- 208000016361 genetic disease Diseases 0.000 description 3
- 210000003783 haploid cell Anatomy 0.000 description 3
- 230000006801 homologous recombination Effects 0.000 description 3
- 238000002744 homologous recombination Methods 0.000 description 3
- 230000036571 hydration Effects 0.000 description 3
- 238000006703 hydration reaction Methods 0.000 description 3
- 230000011987 methylation Effects 0.000 description 3
- 238000007069 methylation reaction Methods 0.000 description 3
- 210000004508 polar body Anatomy 0.000 description 3
- 230000003252 repetitive effect Effects 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- 108010077544 Chromatin Proteins 0.000 description 2
- 206010061764 Chromosomal deletion Diseases 0.000 description 2
- 208000031639 Chromosome Deletion Diseases 0.000 description 2
- 108020004638 Circular DNA Proteins 0.000 description 2
- 108010009540 DNA (Cytosine-5-)-Methyltransferase 1 Proteins 0.000 description 2
- 102100036279 DNA (cytosine-5)-methyltransferase 1 Human genes 0.000 description 2
- 102100024812 DNA (cytosine-5)-methyltransferase 3A Human genes 0.000 description 2
- 102100024810 DNA (cytosine-5)-methyltransferase 3B Human genes 0.000 description 2
- 101710123222 DNA (cytosine-5)-methyltransferase 3B Proteins 0.000 description 2
- 108010024491 DNA Methyltransferase 3A Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 108010033040 Histones Proteins 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 230000026279 RNA modification Effects 0.000 description 2
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 210000001766 X chromosome Anatomy 0.000 description 2
- 210000002593 Y chromosome Anatomy 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000003149 assay kit Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 210000002230 centromere Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 210000003483 chromatin Anatomy 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000004049 epigenetic modification Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 210000004420 female germ cell Anatomy 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 210000003313 haploid nucleated cell Anatomy 0.000 description 2
- 230000007062 hydrolysis Effects 0.000 description 2
- 238000006460 hydrolysis reaction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 244000005700 microbiome Species 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- VLEIUWBSEKKKFX-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;2-[2-[bis(carboxymethyl)amino]ethyl-(carboxymethyl)amino]acetic acid Chemical compound OCC(N)(CO)CO.OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O VLEIUWBSEKKKFX-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 241000193830 Bacillus <bacterium> Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 1
- 230000035131 DNA demethylation Effects 0.000 description 1
- 230000008836 DNA modification Effects 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- 108091092584 GDNA Proteins 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 208000029726 Neurodevelopmental disease Diseases 0.000 description 1
- 108010047956 Nucleosomes Proteins 0.000 description 1
- 208000037273 Pathologic Processes Diseases 0.000 description 1
- 230000006093 RNA methylation Effects 0.000 description 1
- -1 RNA modifications Proteins 0.000 description 1
- 102000043123 TET family Human genes 0.000 description 1
- 108091084976 TET family Proteins 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical compound OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 108010012306 Tn5 transposase Proteins 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 108010064978 Type II Site-Specific Deoxyribonucleases Proteins 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 210000002308 embryonic cell Anatomy 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 238000000295 emission spectrum Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000001125 extrusion Methods 0.000 description 1
- 230000004720 fertilization Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 102000054766 genetic haplotypes Human genes 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 238000007031 hydroxymethylation reaction Methods 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000000415 inactivating effect Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 230000031864 metaphase Effects 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000000394 mitotic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000012452 mother liquor Substances 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 210000001623 nucleosome Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- 230000009054 pathological process Effects 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003793 prenatal diagnosis Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000023455 reciprocal meiotic recombination Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 229910001415 sodium ion Inorganic materials 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for detecting genome information based on restriction enzymes. The method comprises the steps of cutting a genome of a sample by using restriction enzymes to obtain genome DNA fragments with different lengths, enriching the amplified or non-amplified DNA into long fragments, sequencing the enriched long fragment genome DNA fragments on a long length sequencing platform, and finally analyzing the sequenced data by a computer. The method of the invention obviously improves the probability of detecting two alleles at the same time, obviously reduces the allele deletion rate, can better detect heterozygous tumor mutation, and has important significance for early diagnosis and treatment of tumors.
Description
Technical Field
The invention belongs to the technical field of genome detection, and particularly relates to a method for detecting genome information based on restriction enzymes.
Background
Cells are the fundamental building blocks of organisms, in each of which genetic information is stored in the form of chromosomes. It is generally considered that all cells of each individual have the same genome, so that genomic studies can be performed at the species or individual level, but in several cases, one needs to conduct genomic studies from the single cell scale: (1) Cells are at a premium and are rare in number, e.g., human oocytes, embryonic cells, and circulating tumor cells; (2) Different cells have their unique genomes, e.g., sperm cells of the same individual possess different genomes due to meiotic homologous recombination; (3) Cell lineage tracking, the change of the genome of a single cell with time, and the change of the genome can be used for reflecting the evolution of the cell with time; (4) Single cell genomes have heterogeneity such as tumors, nerves, immunity, and chimeras. Single cell genome sequencing techniques have therefore evolved.
The single cell genome has only about 6 pg of DNA, which is much less than the amount of DNA required for high throughput sequencing, so uniform amplification is required before sequencing can be performed. The development of Whole genome amplification (white-genome amplification, WGA) technology has enabled the amplification of sufficiently sequenced genomic DNA in Single cells to investigate the genetic heterogeneity of cells, including Single-nucleotide variations (Single-Nucleotide Variations, SNVs), copy-number variations (Copy-Number Variations, CNVs) and structural variations (Structural Variations, SVs). A variety of single cell whole genome amplification techniques have been developed based on the second generation sequencing (Next-Generation Sequencing, NGS) platform, such as degenerate oligonucleotide primer polymerase chain reaction (Degenerate Oligonucleotide-Primed Polymerase Chain Reaction, DOP-PCR), multiplex displacement amplification (Multiple Displacement Amplification, MDA), multiplex annealing and loop-based amplification (Multiple Annealing and Looping-Based Amplification Cycles, MALDBAC), transposon insertion linear amplification (Linear Amplification via Transposon Insertion, LIANTI), primary template directed amplification (primary template-directed amplification, PTA) and complementary strand multiplex end marker amplification (multiplexed end-tagging amplification of complementary strands, META-CS). The method of META-CS published by the group Xie Xiaoliang is used for eliminating almost all false positives in SNV detection by utilizing the complementarity of DNA, and only mutation sites supported by both a forward chain and a reverse chain are judged as SNVs, so that the highest precision is achieved up to now.
Because of the high sequencing accuracy of NGS platforms, these techniques are very powerful in the detection of CNVs and SNVs, but suffer from limitations in read length, and therefore have poor performance in the detection of SVs. SVs, including deletions, insertions, repeats, and translocations, are important types of variation for many heritable diseases (e.g., cancer). Therefore, studying SVs at single cell resolution is a critical issue.
Based on the long-read long sequencing platform, the Third Generation Sequencing (TGS) platform, the subject group developed long-fragment single-molecule real-time sequencing (single-molecule real-time sequencing of long fragments amplified through transposon insertion, SMOOTH-seq) amplified by transposon insertion, single-cell genomic DNA was randomly fragmented using low concentrations of Tn5 transposase to achieve relatively uniform genomic amplification. In addition to CNVs and SNVs, SMOOTH-seq can also effectively detect SVs. However, in diploid cells, the simultaneous coverage of both alleles is very limited, so that SMOOTH-seq has high false negatives when detecting heterozygous SNPs (hetSNPs).
Allele deletions are an important problem faced by single cell whole genome amplification techniques. When a diploid cell heterozygously mutates, either allele can lead to allele loss if it cannot be amplified and detected, which is a major cause of false negatives in SNVs. Previous single cell whole genome approaches either amplified by random primers or by random fragmentation of Tn5, these random disruption or amplification approaches do not favor simultaneous allele capture. For example, for a pair of alleles A and B, if the genome coverage is n%, i.e., there is a n% probability of capturing either gene A or gene B, then the probability of capturing both alleles is n%. Times.n%, i.e., n 2%o. The probability of capturing both alleles at the same time is very low.
Since both alleles in a diploid genome typically have identical restriction enzyme recognition sites, homologous DNA fragments generated by digestion typically have identical lengths, and are easier to amplify simultaneously than DNA fragments generated by random transposition of Tn5 or random primer amplification. Based on the above, the invention develops a single-cell long-reading long whole genome sequencing technology Refresh-seq (Restriction fragments ligation-based genome amplification and third-generation sequencing) based on a restriction enzyme cutting and connecting strategy, and the probability of simultaneous detection of two alleles is remarkably improved. The absence of alleles results in false negatives in prenatal diagnosis, as one of the alleles may be detected and the other may not be detected, and for a single-gene pathogenic abnormal embryo (heterozygous mutation), if only normal ones are detected, the absence of pathogenic mutant ones due to allele deletions may misjudge this embryo as a normal embryo. The method significantly reduces allele loss rate, can reduce such misjudgment, and can select healthy embryo to promote prenatal and postnatal care. In addition, tumor cells have higher mutation load, and the mutation is usually heterozygous, and the previous method is easy to underestimate the mutation condition of tumor genome due to allele deletion, so that the method can better detect heterozygous tumor mutation, and has important significance for early diagnosis and treatment of tumors.
Disclosure of Invention
The invention aims to provide a method for detecting genome information based on restriction enzymes.
A method for detecting genomic information based on restriction enzymes, comprising the steps of:
(1) Cutting the genome of the sample by adopting restriction enzyme to obtain genome DNA fragments with different lengths; alleles of homologous chromosomes are now typically cut into DNA fragments of the same length;
the invention deduces the distribution of genome fragments after enzyme digestion by performing enzyme section simulation on the genome of a target species, thereby selecting proper restriction enzymes (figure 3); cell lysis is carried out under a small volume system to release genomic DNA;
the restriction enzyme is a restriction enzyme recognizing a specific sequence of 4-10 bp, preferably,in order to recognize restriction enzymes of the specific sequences 6 bp, 8 bp, more preferably, the restriction enzymes areEcoR I、SacI andAsiS I。
for the human genome, the restriction enzyme fragments of the 6 bp recognition sequence were mostly distributed in the range of 1-8 kb, while the restriction enzyme fragments of the 8 bp recognition sequence were mostly distributed in the range of 15 kb-16 Mb (FIG. 3). Thus, it is desirable to obtain a higher coverage of restriction enzymes that select for the 6 bp recognition sequence, e.g EcoR I、SacI, selection of endonucleases with 8 bp recognition sequences when better enrichment is desired, e.gAsiS I. The enzyme sections are required to be distributed in a concentrated manner as much as possible when higher coverage is desired, i.e., the cut DNA fragments have similar lengths and are concentrated between 1 and 3 and kb, and thus better amplification uniformity is achieved, and both genome coverage and the detection rate of two alleles can be achieved.
In the above cell lysis step, the cells may be derived from any of human, animal, plant and microorganism;
(2) Carrying out long genome DNA fragment enrichment on the genome sample which is amplified or not amplified;
(3) Sequencing the enriched long genome DNA fragments on a sequencing platform by a machine;
(4) And carrying out computer analysis on the data obtained by sequencing, and obtaining the sequence information of the sample in the genome region through comparing and calculating by replying the long genome DNA fragment to the genome region. The sequence information includes genetic and epigenetic information.
The genomic samples are episomal DNA, DNA released from cells (e.g., embryos or eggs) in culture medium, one or more cells or nuclei, viruses, mitochondria, chloroplasts, and other sample genomes.
The restriction enzyme selected in the step (1) is selected by performing enzyme section simulation on the genome of the target species and deducing the distribution of genome fragments after enzyme cutting.
Preferably, the step (2) performs end repair, A addition and adaptor connection on the genome DNA fragments, performs PCR amplification, and enriches the long genome DNA fragments after the amplification. The linker used may be a non-bar coded linker or a bar coded linker.
Each PCR tube is independently carried out in the subsequent purification and library establishment process by using the joint without the bar code, and the joints of the 5 'end and the 3' end are arranged during PCR amplification; the joint with the bar code (namely, the joint is provided with a 5 'end joint when the joint is connected), and after the joint is connected, sample tubes with different bar codes are mixed and purified, amplified in one tube and amplified to form a 3' end joint.
The long genomic DNA fragment in step (2) refers to a fragment having a length of more than 700 nucleotide pairs, preferably a fragment having a length of more than 1000 nucleotide pairs.
And (3) amplifying the DNA into a polymerase chain reaction, and enriching long genomic DNA fragments by adopting the polymerase chain reaction and fragment screening, wherein the fragment screening is film-running fragment screening or magnetic bead fragment screening.
For large initial amounts of samples, genomic fragment screening was performed directly after restriction enzyme cleavage. By adopting the enzyme digestion method, the fixed region has a fixed fragment size, and then fragments with specific sizes are enriched through fragment screening, so that the fixed region can be enriched, and the sequencing region is concentrated in the genome region with specific size, that is, the sequencing depth of the genome region with specific fragment length is increased, and the sequencing depth of the genome region with non-specific fragment length is reduced or not detected. Allele information in these regions can thus be detected more sensitively.
For a small amount of initial samples and single cell samples, the method comprises the steps of connecting and amplifying after enzyme digestion, and screening film-running fragments or magnetic bead fragments. The adaptor ligation and PCR amplification also have an effect on fragment screening, i.e., the adaptor ligation efficiency of excessively long DNA fragments is reduced, and PCR amplification preferentially amplifies short fragments, thus filtering out excessively long fragments. Small fragments were filtered out by run-to-film or magnetic bead fragment screening, so the ability to screen sample fragments amplified by ligation junctions was greater, and the final fragment length of the library was distributed predominantly between 1-3 kb. In addition, PCR can further enrich for alleles, increasing the probability of simultaneous detection of both alleles, since the amplification efficiency of the allele regions tends to be consistent.
The sequencing platform in the step (3) is a long-reading long-sequencing platform, and optionally, the sequencing platform is a Nanopore sequencing platform or a PacBio sequencing platform and other long-reading long-sequencing platforms which are developed later.
The problem of sequencing quality due to amplification errors introduced by PCR during the construction of libraries in NGS sequencing, the read length limitations of the sequenced sequences (typically less than 500 bp) make NGS technology difficult to meet the higher requirements of some modern biological problems: such as the determination of longer repetitive fragments on DNA, the determination of DNA/RNA methylation modification problems, the determination of structural variants, and the like. The advent of long-read long sequencing technology has made up for the shortcomings of NGS. Currently, there are two types of platforms, namely Single-molecule real-time Sequencing (SMRT) of Pacific Biosciences (pacdio) and Single-nanopore sequencing of Oxford Nanopore Technologies (ONT) of the company. PacBio is an SMRT based on zero-mode waveguiding (ZMW). ZMW is a nanophotonic confinement structure consisting of circular holes in an aluminum coating placed on a transparent silica substrate. The ZMW holes were about 70 a nm a in diameter and about 100 a nm a in depth. Because of the small aperture of the ZMW, the optical field exhibits an exponential decay as light passes through the aperture of the ZMW. The activity of a DNA polymerase comprising a single nucleotide can be readily detected in the irradiated ZMW well. PacBIO SMRT sequencing technology uses topological circular DNA molecules as a template library (called SMRT bell), wherein the SMRT bell consists of two ends of an inserted double-stranded DNA fragment through a hairpin structure connected, and is a closed single-stranded circular DNA. Wherein the length of the inserted DNA fragment may vary from 1 to more than hundred thousand bases, such that long sequencing reads may be generated. After SMRTbell is assembled, it is bound by DNA polymerase and loaded onto SMRT Cell, which contains up to 800 tens of thousands of ZMW. In each ZMW, a single polymerase is immobilized at the bottom, which can bind to any of the hairpin junctions of SMRT bell and begin replication. In sequencing-by-synthesis, the polymerase processes with SMRT bell as a template to incorporate four fluorescent-labeled deoxynucleoside triphosphates that produce different emission spectra into the nascent strand. The fluorescence mark on the nucleotide substrate can be excited by the excitation light at the bottom of the small hole, and then the fluorescence signal is recorded by the monitoring system, so that the base information is obtained. The whole sequencing process DNA molecules do not need to be amplified by PCR, so that the individual sequencing of each DNA molecule is realized. Currently, pacBio sequencing has two common modes of sequencing, continuous long reads (Continuous Long Reads, CLR) and rolling circle sequencing (Circular Consensus Sequencing, CCS). ONT is an electrical signal based sequencing technology, the core of which is protein nanopores. The basic working principle of the nanopore is: a nanoscale pore is formed between the two electrolyte chambers, and a water impermeable membrane is provided between the electrolyte chambers, with the protein nanopores being embedded in a synthetic membrane, about hundreds to thousands of nanopores, which are immersed in the electrophysiological solution (the synthetic membrane has a very high electrical resistance, while the nature of the protein nanopores is such that channels are formed in the membrane). When a voltage is applied to the electrolyte chamber, a steady state ionic current is generated through the pores. The passage of macromolecules in the pores can cause transient changes in the ion flux through the pores, and thus monitoring the current through the pores can enable molecular sensing. These current fluctuations convey many characteristics of the sample, including biomolecule size, concentration, and structure. By controlling the size of the pores, their surface properties, the applied voltage and the solution conditions, one can tailor different nanopores to detect different types of biomolecules. Meanwhile, the nanopore sensing does not need biomolecule modification, labeling or surface fixation, so the technology can be used for detecting molecules and complexes with a wide range. ONT technology uses linear DNA molecules, which are typically one to hundreds of kilobases in length, but some can reach several megabases. ONT sequencing first ligates double stranded DNA molecules to sequencing adaptors preloaded with a motor protein which untwists the double stranded DNA and drives the negatively charged DNA through the well at a controlled rate with current. As DNA passes through the nanopore, it causes characteristic damage to the current flow, which is analyzed in real time to determine the base sequence in the DNA strand. The long-reading data can now be generated on any of three standard ONT platforms: minION, gridION and Promethion. Another type of reading generated by the ONT sequencing platform is an ONT ultra-long reading. These reads were first generated by Josh Quick, typically greater than 100 kb a long, but could be several megabases long.
Genomic Structural Variations (SVs) mainly include types of variations such as DNA deletions, insertions, and fragment duplications of large fragments on the genome. Studies have shown that SV is associated with a variety of complex genetic diseases such as cancer, autism, and neurodevelopmental disorders, and have been receiving attention in the fields of medicine and genetics in recent years. NGS is greatly limited in terms of SV detection due to read length limitations. Advances and popularity of long-reading long-genome sequencing technology have led to the continued discovery and study of a large number of structural variations, some of which are highly pathogenic, being increasingly validated. The method is based on a long-reading long-sequencing platform, can efficiently detect the SV, and can perform high-precision whole chromosome typing on the haploid SV.
Based on the long sequencing platform, the linkage information of SNP or other mutation can be better detected. Because the length of NGS sequencing read is shorter, most reads only have at most one mutation information, and the length of length sequencing can detect multiple mutation on the same reads, so that the method can be used for researching linkage of mutation such as SNP, SV and the like. Linkage information is critical to the judgment of a disease, for example, for recessive genetic disease, if two different loci of a gene are mutated, if the two mutations are located on the same chromosome, i.e., linked, then the gene on that chromosome will be disabled, and the other chromosome will have a normal copy, so the cell will not have a mutated phenotype; if these two mutations are located on different chromosomes, i.e., both alleles are mutated, they may be in a pathogenic state. In addition, linkage information helps to determine whether the genetic disease mutant gene is from the father or mother. Genomic imprinting refers to the occurrence of different clinical phenotypes due to differences in the relatedness of pathogenic genes (i.e., parent or maternal), some genes are transcriptionally active only from the father, genes from the mother are not expressed, and conversely, some genes are transcriptionally active only from the mother, and genes from the father are not expressed. At this time, whether the mutant gene is from the father or the mother can be judged whether the pathogenic gene is expressed or not, thereby judging the health state of the embryo.
Long-read long sequencing has the potential to directly acquire DNA/RNA modifications (without antibody or chemical treatment), and has important significance. Modification will alter the efficiency of nucleotide matching and SMRT sequencing calculates modifications of single nucleotide precision by detecting the time difference between different fluorescently labeled dntps/NTPs binding to the target nucleotide. At the same time, modifications will also alter the electrical signal of the nucleotide, while Nanopore sequencing calculates the carried modifications by detecting the electrical signal of the nucleotide through the Nanopore. Based on this principle, when the method is applied to a large number of initial samples, the apparent modification information can be retained as no amplification is required, and can be directly read through long-reading long sequencing. Thus, the method can realize detection and comparison of the apparent modification of the alleles in a large number of samples. Is not achievable by NGS sequencing. And linkage relationships between different types of genomic variations and apparent states can be explored.
The fragment information in step (4) comprises one or more of the following: 1) Fragment length information; 2) Fragment abundance information; 3) Heterozygous single nucleotide polymorphism information; 4) Genomic structural variation information including one or more of insertions, deletions, duplications, inversions, translocations; 5) Repeating sequence information including one or more of a short stroma element, a long terminal repeat element, a DNA repeat element, a simple repeat, a satellite dish, other repeat elements; 6) Genome copy number variation information; 7) Allele information; 8) Linkage of allele information; 9) Epigenetic information including DNA methylation and DNA methylolation.
The method can detect genome information of as low as single cells at the same time, has high sensitivity and high probability of detecting two alleles at the same time, and can analyze as few as single cells or cell nuclei. The method is named as a third generation single cell whole genome amplification method (Restriction fragments ligation-based genome amplification and third-generation sequencing, refresh-seq) based on restriction enzymes and fragment ligation, hereinafter abbreviated as Refresh-seq. Wherein the tagged linker is referred to as Refresh-seq (multiplexed).
The terms in the present invention:
restriction endonuclease (restriction endonuclease): an endonuclease capable of recognizing and cleaving a specific double-stranded DNA sequence in an organism, comprising a type I restriction enzyme and a type II restriction enzyme, the type I restriction enzyme catalyzing both methylation of a host DNA and hydrolysis of unmethylated DNA; while type II restriction enzymes only catalyze the hydrolysis of unmethylated DNA. Restriction enzymes are generally composed of the first letter of the genus name of the microorganism and the first two letters of the species name, the fourth letter representing the strain. For example, from Bacillus amylolique faciensThe restriction enzyme extracted from H is calledBamH, several enzymes of different specificities recognizing different base sequences obtained in the same strain of bacteria can be coded into different numbers, e.gHindII、HindIII、HpaI、HpaII, etc.
Homologous chromosome (homologous chromosomes): two chromosomes with the same length as the mitotic point position seen in the metaphase of the cell, or paired chromosomes seen in meiosis, one from the parent and one from the mother; their morphology, size and structure are generally the same.
Allele (allele): genes located at the same position on a pair of homologous chromosomes and controlling different morphologies of the same trait.
Allele information: allele information referred to in this patent includes all types of variation at alleles on homologous chromosomes, including SNPs, SVs, repeat information (short-locus elements, long-terminal repeat elements, DNA repeat elements, simple repeats, satellite foci), epigenetic information, and the like on alleles.
Epigenetic information: epigenetic modification refers to the regulation of gene expression, by chemical modification of DNA and proteins on chromosomes, thereby affecting gene expression. Such modifications can affect multiple layers of gene transcription, splicing, stability, translation, nucleosome assembly, and chromatin structure, thereby affecting the physiological and pathological processes of the cell, as well as the phenotype of the offspring. Common epigenetic modifications include DNA methylation, histone modifications, non-coding RNAs, RNA modifications, and chromatin remodeling, among others.
DNA methylation: DNA methylation refers to the addition of a methyl group to a DNA molecule, thereby altering the chemical nature and structure of the DNA and affecting gene expression. Typically on CpG dinucleotides, can inhibit gene expression. This modification is catalyzed by DNA methyltransferases (DNMTs). There are three major DNMTs in humans, DNMT1, DNMT3A and DNMT3B, respectively. Among them, DNMT1 is mainly responsible for maintaining methylation patterns, while DNMT3A and DNMT3B are responsible for new methylation.
DNA methylolation: DNA methylolation (DNA Hydroxymethylation) is the oxidation of 5-methylcytosine (5 mC) in DNA methylation to form 5-hydroxymethylcytosine (5 hmC) under the catalysis of a TET family enzyme. 5hmC has very important biological functions, and 5hmC not only participates in chromosome reprogramming and transcriptional control of gene expression, but also plays an important role in the DNA demethylation process. And studies showed that 5hmC is closely related to the occurrence of tumors.
And (3) joint: two DNA molecules or two ends of one DNA molecule can be paired by enzyme digestion and then covalently connected by ligase.
Magnetic bead fragment screening: the magnetic beads can be interacted with DNA under a certain condition to be adsorbed together, in a PEG and NaCl solution with higher concentration, PEG deprives water on a hydration layer outside DNA molecules, so that the hydration layer is damaged, the DNA molecules are aggregated and precipitated, negatively charged phosphate groups are exposed, a salt bridge is formed by sodium ions and carboxyl groups on the surface of the magnetic beads, or called as a bridge, so that the DNA is adsorbed on the surface of the magnetic beads, the longer the DNA is, the more the negatively charged phosphate groups are exposed on the surface, the more negative electricity is carried on the whole molecule, the easier the adsorption to the magnetic beads is, and the recovery can be realized only by PEG and NaCl with lower concentration; the shorter the DNA, the higher concentration of PEG and NaCl is needed, the more thoroughly the hydration layer on the surface is destroyed, and enough negatively charged phosphate groups are exposed and can be adsorbed by the magnetic beads, so that the phosphate groups are recovered; therefore, by controlling the concentration of PEG and NaCl and the amount of magnetic beads, DNA fragments of different lengths can be selected.
The invention has the beneficial effects that: the invention combines the restriction enzyme cutting and connecting strategy with the third generation single cell genome sequencing platform for the first time, and develops a long-reading genome sequencing technology Refresh-seq. Compared to SMOOTH-seq based on the random cut principle, refresh-seq increases genome coverage and uniformity. It increases the probability of simultaneous detection of both alleles of a diploid cell, and gives a considerable probability of detection even at very shallow sequencing depths, and thus has great potential for medical applications, such as pre-implantation genetic diagnosis. The method can be regulated according to different restriction enzymes to meet different requirements. In general, refresh-seq utilizes a restriction enzyme that recognizes the sequence of 6 bp (e.g.EcoR I and is provided withSacI) Relatively high genome coverage can be obtained during cutting, and the method is the first choice for sequencing whole genome of single cells; restriction enzymes using 8 bp recognition sequences (e.gAsiS I) the Refresh-seq can enrich reads into specific genomic regions (FIG. 1) with equal sequencing amounts, thereby enabling simplified genomic sequencing. Refresh-seq is based on a third generation sequencing platform, and can effectively detect structural variations and repetitive elements. Refresh-seq also has limitations. Because of the efficiency of the ligation reaction, the amplicon length is only 2-3 kb, much shorter than the total length of about 6 kb SMOOTH-seq. Refresh-seq therefore cannot capture very long insertion events due to the limitation of its amplicon length range. The library construction of Refresh-seq can be completed in one day, with the library construction cost of a single tube version being 20 yuan/cell and the library construction cost of a multiple tube version being 12 yuan/cell. The present invention successfully uses the Refresh-seq technique to study meiosis in single germ cells in male and female B6D2F1 mice. Average coverage of sperm, PG oocytes and PB2 by sequencing at 0.1-0.3 Xdepth About 5%, the average coverage of oocytes and PB2 was 7.7%. This is consistent with coverage of MALDBAC-amplified sperm and oocytes. The inventors obtained high resolution genetic maps of male and female meiotic recombination at low sequencing depth and revealed female and male differences. The Refresh-seq has the characteristics of high uniformity and low allele loss rate, and has good application prospect in aneuploid sperm and egg cell screening. It is also advantageous in detecting SVs of highly repetitive or low complexity genomic regions due to its longer read length compared to the NGS platform. The inventors successfully performed holochromosomal hetSV typing of sperm cells and female haploid germ cells using Refresh-seq data, respectively, and analyzed the repeat element characteristics of these SVs.
Drawings
Fig. 1 is a library-building flowchart of embodiment 1.
FIG. 2 is a graph showing the effect of a test for enrichment of long genomic DNA fragments using PCR.
FIG. 3 is a simulated view of an enzyme slice segment;
in the figure a-EcoR I the distribution of enzyme sections is simulated; b-SacI, simulating the distribution condition of enzyme sections; c-AsiS I the distribution of the enzyme sections was simulated.
FIG. 4 is a chart of the test for Refresh-seq (multiplexed) cross-contamination.
FIG. 5 shows the case where both alleles are detected simultaneously.
Sequencing amount of each HG002 cell and proportion of heterozygous SNP in the panel a; b-quantification of the proportion of heterozygous SNP of the three methods at a depth of 0.25 Xsequencing; c-allele loss rate calculated for the region covering 5 reads or more.
FIG. 6 shows the performance of Refresh-seq on different cell lines using different endonucleases;
the graph a-shows the Refresh-seq [ ]EcoR I/SacI) And Refresh-seq (multiplexed)%EcoR I/SacI/AsiS I) sequencing amount and genome coverage of expanded HG001 cells, wherein the SMOOTH-seq data is from HG002 cell line; b-display Refresh-seqEcoR I/SacI) And Refresh-seq (multiplexed)(EcoR I/SacI/AsiS I) sequencing amount of expanded HG001 cells and heterozygous SNP detection rate, wherein the smoothh-seq data is from HG002 cell line; c-display Refresh-seqEcoR I/SacI)、Refresh-seq(multiplexed)(EcoR I/SacI/AsiS I) sequencing amount and genome coverage of SMOOTH-seq expanded HG002 cells; d-display Refresh-seqEcoR I/SacI)、Refresh-seq(multiplexed)(EcoR I/SacI/AsiS I) sequencing amount and heterozygous SNP detection rate of SMOOTH-seq amplified HG002 cells; e-use of SMOOTH-seq and different restriction endonucleasesEcoR I/SacI/AsiS I) HG002 cells were subjected to the sequencing depth of Refresh-seq and Refresh-seq (multiplexed).
FIG. 7 is the application of Refresh-seq to sperm;
In the figure, a-hybrid mouse sperm meiosis process schematic diagram and single sperm Refresh-seq are obtained, mature sperm of B6D2F1 (B6XDBA F1 heterozygote) subjected to meiosis homologous recombination are obtained, and after flow sorting, each single sperm is subjected to Refresh-seq; b-displaying the sequencing data quantity and genome coverage rate of each sperm, selecting the sperm with the genome coverage rate of more than 1% for subsequent analysis, and marking boundaries by red dotted lines; c-showing the sequencing data amount and genome coverage of each sperm cell by quality control, fitting linear regression in a 95% confidence interval; d-Refresh-seq amplified single sperm average read length distribution; e-distribution of the number of hetsnps covered in each sperm; f-identifying a diploid cell (excluding the most frequent autosomes at this time) by a discontinuity score for each sperm, the discontinuity score of the diploid cell being much higher than that of a haploid sperm, the red dotted line marking an inflection point beyond which cells are marked as potential diploid cells; g-the number of reads on the X and Y chromosomes is used to distinguish X sperm from Y sperm.
FIG. 8 is a Refresh-seq identification of aneuploid sperm;
panel a-discontinuity scoring of all chromosomes in each sperm, diploid cells labeled D1-D12, aneuploid sperm cells labeled A1-A7; b-h-non-continuous scoring of specific chromosomes in each sperm. Diploid cells have higher non-continuity scores on most chromosomes, aneuploid sperm have higher non-continuity scores only on isolated abnormal chromosomes; i-comparing the proportion of hetSNPs in 19 autosomes of 7 aneuploid sperm cells to a gold standard, blue dots indicating loss of dye monomer, red dots indicating increase of dye monomer, size of dots indicating deviation from average ratio, verified aneuploid chromosomes highlighted with rectangles, sperm A7 more likely to be a non-uniformly amplified sample (technical error) than true aneuploid; the ratio of two alleles is covered in j-sperm cells. The heat map shows the heterozygosity of 19 autosomes from 12 diploid cells (2N), 7 aneuploidy sperm cells (1 n±m) and several haploid sperm cells (1N).
FIG. 9 is the identification of structural variations and chromosome typing by Refresh-seq;
panel a-true positive structural variability distribution for each sperm; b-length distribution of the identified true positive SVs, local peaks of SV length being indicated by orange dashed lines; accuracy of c-Refresh-seq detected SVs (deletions and insertions), percentage of true positives of SVs of different numbers of supporting cells; d-accuracy of whole genome typing of SV on a chromosome scale; e-recall rate of correctly typed SVs; f-proportion of different types of elements of the typed deletion event; the ratio of the different types of elements of the insertion event of g-typing.
FIG. 10 is a view of a Refresh-seq for use with an egg cell, polar body;
in the figure, a-hybrid female mice are sampled and shown as a schematic diagram, MII oocytes subjected to meiosis homologous recombination of B6D2F1 are fertilized with DBA male mice or parthenogenesis activated to induce PB2 extrusion, haploid PB2, parthenogenesis activated egg cells and diploid PB1, MII and fertilized eggs are obtained, and the haploid PB2, the parthenogenesis activated egg cells and the diploid PB1, the MII and the fertilized eggs are separated through capillaries; b-number and ploidy of different cell types; c-displaying sequencing data and genome coverage for each cell; d-cross-number distribution of haploid female germ cells; resolution of e-female haploid cell crossover assay; the cross-position density map of all chromosomes of f-male and female mice shows the cross-density from centromere to telomere.
Detailed Description
The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
The simulation of fragment length was performed based on human genome sequences and selected restriction enzyme recognition sequences. As shown in the figure 3 of the drawings,Ecor I and is provided withSacI mimic cut fragments are mostly between 1-3 kb in length and are therefore suitable for whole genome amplification and sequencing.AsiS I recognition sequence 8 bp is more sparsely distributed across the genome and therefore when usedAsiS I simplified genomic sequencing can be achieved when refreshing-seq library construction, with deep sequencing of specific regions with the same amount of data.
The specific steps of the Refresh-seq are shown in FIG. 1: in this example, two human cell lines (HG 002 and HG 001) were used, and after washing the cells three times with PBS containing 0.1% BSA, single cells were sorted by a mouth pipette or flow cytometer and placed into eight consecutive PCR tubes containing 2.5. Mu.L of lysis buffer. Digesting the histone at 50 ℃ and 3-h, and inactivating protease at 70 ℃ for 30 min; the cell lysates were 10 mM Tris-EDTA (1M Tris+0.1M EDTA), 1 mg/mL Qiagen protease, 0.3% triton X-100, 20mM KCL, and 15 mM DTT.
After single cell lysis, single cell gDNA was digested with 0.5. Mu.L of 10 Xrestriction buffer, 1.9. Mu.L of water and 0.1. Mu.L of restriction enzyme. The reaction procedure is regulated according to the restriction enzyme used. For the followingEcoRI (NEW ENGLAND BioLabs, cat#R3101L) andSaci (NEW ENGLAND BioLabs, cat#R3156S), digested for 15 min at 37℃and then inactivated at 65℃for 20 min. For the followingAsiSI (NEW ENGLAND BioLabs, cat#R0630S), digested for 1 hour at 37℃and then inactivated at 80℃for 20 minutes. End repair and addition of a (Kapa Biosystems, kapa HyperPrep kit,cat#kk 8504), ligating dsDNA adaptors (NEBNext Singleplex Oligos for Illumina) to the end-plus-a molecules, and then adding USER enzyme (uracil specific excision reagent, NEW ENGLAND BioLabs, cat#m5505L) to cut the circular adaptor into "Y" adaptors. Each sample was purified with 1 XAMPure XP (BECKMAN COULTER, cat#A 63882) using Barcode-P5 (GCTA- [24 bp P5-Barcode 81-96)]- TACACTCTTTCCCT
ACACGACGCTCTTCCGATCT) and Barcode-P3 (ATCG- [24 bp P3-Barcode 1-24] -GACTGGAGTTCAGACGTGTGCT). The PCR procedure was 98℃45 s, 98℃15 s, then 98℃15 s, 65℃30s, 72℃5 min, 20 cycles. Thereafter, the cells were purified twice with 0.7×ampure XP (haploid cells were purified twice with 0.65×ampure XP). Purified amplicon was quantified using equalbert1× dsDNA HS Assay Kit. And mixing samples according to sequencing requirements, and then sequencing on a machine.
For Refresh-seq (multiplexed), this example uses a barcode-attached linker to increase flux. Paired single stranded oligonucleotides were first joined into "Y" junctions, NEB same-A:5' phosphorylation (GATCGGAAGAGCACACGTCTGAACTCCAGTC and Barcoded-B: ACACACT)
CTTTCCCTACACGAC- [24 bp adapter-barcode 31-46] -GCTCTTCCGATC. Mu.M in mother liquor is dissolved in water, 1:1 mixture is cooled to an anal to give barcoded adaptors at a concentration of 50. Mu.M. The restriction endonuclease-disrupted genomic DNA was subjected to end repair and A addition and then ligated to a barcoded adaptor, and after 1 XAMPure XP purification of cells ligated to different barcoded adaptors, amplified using Common-P5 (ACACTCTTTCCCTACACGAC) and Barcode-P3 (ATCG- [24 bp P3-Barcode 1-24] -GACTGGAGTTCAGACGTGTGCT). After that, purification was performed twice with 0.7×ampure XP. Purified amplicon was quantified using equalbert1× dsDNA HS Assay Kit. Mixing samples according to sequencing requirements, and then sequencing on a machine (figure 2).
TABLE 1 primer sequences involved in Refresh-seq
* Represents thiophosphoric acid.
After sequencing the Refresh-seq library using ONT nanopore sequencing technology and obtaining the raw sequencing data, the inventors' basic process of the data is to align reads to the reference genome, which comprises the following steps:
The raw data generated by ONT sequencing is converted to fastq format. According to the Refresh-seq double-ended barcode library structure, this example uses nanoplexer v0.1 to disassemble each single cell twice consecutively, and uses Cutadapt v3.4 to remove the linker sequences at both ends of reads and reads with a length less than 500 bp. These reads were then aligned to either human reference genome hg38 or mouse reference genome mm10 by minimap2 v 2.24. This example filters reads with mapping quality less than 30 with samtools v1.14 and removes PCR duplicates.
Cross-contamination assessment: to evaluate the cross-contamination of Refresh-seq (multiplexed), this example used the mixed genome localization strategy of hg38 and mm10 human mice, the mixed genome being indexed by minimap2, the parameter being '-I10G'. The number and proportion of reads per single cell aligned to the mm10 and hg38 genomes were calculated in this example. The reference genome species that are mainly aligned (greater than 90%) are judged to be the single cell species. Cross-contaminated cells were judged if the ratio of reads to the minor species genome was greater than 10%. The results are shown in FIG. 4: as a result, the quality-controlled cells were not judged as human-mouse mixed cells, indicating that Refresh-seq (multiplexed) cross-contamination was very small.
SNP site heterozygosity analysis: this example uses whotshap v.1.5 to calculate the likelihood of all three genotypes (0/0, 0/1, 1/1) at a given heterozygous SNP site and outputs them into the VCF file along with genotype predictions. The run command "whatshap genotype-reference ref. Fasta-o genetyed. Vcf derivatives, vcf reads. Bam". The variants. Vcf file is the HG002 or HG001 SNP benchmark set downloaded from the GIAB.
The results are shown in FIG. 5, which is based on the principle of amplification of Tn5 randomly cleaved genomic fragments compared with SMOOTH-seqEcoR I Refresh-seq has better amplification uniformity, higher genome-wide coverage and higher singlenessDouble allele detection rate of nucleotide diversity sites. At a sequencing depth of 0.25×Refresh-seq detected 1.64% of heterozygous SNPs, 5 times that of SMOOTH-seq (0.33%). Among more than 5 reads-covered heterozygous SNP sites, the average biallelic capture rate of Refresh-seq was 62%, which is significantly higher than the 10% capture rate of SMOOTH-seq.
FIG. 6 shows that Refresh-seq and Refresh-seq (multiplexed) are shown to be consistent on HG001 cells and HG002 cells. UsingEcoR I and is provided withSacI Refresh-seq and Refresh-seq (multiplexed) have a genome coverage higher than SMOOTH-seq and more heterozygous SNPs are detected. But use it AsiRefresh-seq (multiplexed) of S I gave a deeper sequencing depth with the same sequencing amount.
The above experiments demonstrate the universality and advantages of the Refresh-seq technique, which has better genome coverage and the probability of simultaneous detection of both alleles. And reads enrichment can be achieved using the Refresh-seq of long recognition sequence endonucleases, thereby simplifying genomic sequencing.
Example 2Refresh-seq technology applied to Single sperm sequencing
In the present embodiment use is made ofEcoR I Refresh-seq was performed and the specific library construction procedure was identical to example 1, except that the library was purified twice with 0.65 XAMPure XP purification. 676 sperm cells were amplified with Refresh-seq (single tube version) and 152 sperm cells were amplified with Refresh-seq (multiplexed). Since there is no difference in the detection of crossover events between Refresh-seq and Refresh-seq (multiplexed), there is no subsequent distinction between different versions of Refresh-seq.
As shown in FIG. 7, the results of the experiment are that Refresh-seq can obtain sufficient genome coverage at low sequencing amounts. 700 of 828 sperm passed quality control with a genome coverage of greater than 1% in 0.1-0.3 Xdepth sequencing (FIG. 7 b). Genome coverage increased approximately linearly with increasing sequencing data, with an average coverage of about 5% at 0.1-1 Gb sequencing amounts (fig. 7 c). The average reads length was 1.9 kb (fig. 7 d), and the average reads number per sperm was 143,914. Each sperm detected up to 250,000 hetSNPs on average (FIG. 7 e), with an accuracy of SNP detection exceeding 98.9%. By defining a discontinuity score (i.e., the frequency with which consecutive SNPs are shifted between parent and parent sources), refresh-seq can efficiently screen contaminating diploid cells and make accurate X-sperm and Y-sperm determinations. Of the 700 sperm cells that were quality controlled, there were 688 haploid sperm cells and 12 contaminated diploid cells (fig. 7 f). Diploid cells were labeled D1 to D12 (fig. 7 f) and the authenticity of these 12 diploid cells was verified using a profile. Then, X sperm cells and Y sperm cells were distinguished according to the number and proportion of reads mapped to the X and Y chromosomes (FIG. 7 g). A total of 344X sperm cells and 329Y sperm cells were identified, of which 8 sperm cells were indistinguishable (sex chromosome increase or decrease), with X sperm and Y sperm ratios approaching 1:1, conforming to mendelian's law of separation.
Example 3 application of Refresh-seq technology to the identification of aneuploidy
Because of the greater ability of Refresh-seq to detect both alleles at the same time, aneuploidy prime was first performed by calculating the discontinuity score for each chromosome. Then, the heterozygosity of the SNP locus is utilized to confirm a chromosome increase event, if one chromosome is subjected to copy number increase, the situation that two alleles are detected at the same time by the same SNP locus often occurs in one chromosome of heterozygous offspring, and the event of chromosome increase is confirmed according to the fact that the steep increase of the number of the two allele events is detected at the same time in a1 Mb interval; in the case of a chromosome deletion, the number of detectable SNP sites is suddenly reduced as compared with normal, and therefore, the occurrence of a chromosome deletion can be determined by increasing or decreasing the number of SNPs detected at 1 Mb.
As shown in FIG. 8, the Refresh-seq technique allows the identification of aneuploidy by a variety of methods. The genome of the haploid sperm only contains one of two sets of chromosomes from parents, when a chromosome increasing event occurs, the chromosomes simultaneously have two different sets of genes from parents, and ideally, each SNP locus can simultaneously detect genotypes of the parents, however, due to the ubiquitous phenomenon of allele loss, most SNP loci can only detect one genotype, the genotypes of the SNP loci in the chromosome increasing interval randomly alternate with the genotypes of the parents, and the frequency difference of the alternate genotypes of the parents exists between the chromosome increasing interval and the haploid interval, namely the discontinuity score is obviously increased. Thus, the first method screens single sperm cells A1, A2, A3, A4 and A6 for increased chromosomal occurrence by calculating the discontinuity score for each chromosome (FIGS. 8 a-h). In the case of random amplification, chromosomal addition means that more DNA fragments can be captured, chromosomal deletion means that the captured DNA fragments are reduced, there is more coverage of sequencing reads in the positions that appear as chromosomal addition in the sequencing data, more SNPs can be detected, and fewer reads in the positions of chromosomal deletion, fewer SNPs can be detected. The second method is therefore able to learn the events of chromosome increase and decrease by the deviation of the SNP number from the mean of all other chromosome SNP numbers (fig. 8 i). Consistent with the principle of method one, method three has the advantage of capturing two alleles simultaneously using Refresh-seq, and more double-genotype SNP sites can be detected in chromosomes where there is an increase in chromosome occurrence, i.e., an increase in heterozygosity, and shows a decrease in heterozygosity when the chromosome is lost (fig. 8 j). The aneuploid chromosomes found by these three methods are mutually verified and can be verified by chromosome profiling and CNV. Finally, 6 sperm with autosomal aneuploidy were found, wherein A1, A3 and A6 were increased in chromosome, A5 was deleted in chromosome, and A4 and A6 were increased and deleted simultaneously in chromosome (chr 3). Sperm A7 is more likely to be a non-uniformly amplified sample (technical error) than a true aneuploidy.
Example 4 application of Refresh-seq technology to the identification of structural variations in sperm
In this embodiment, the Refresh-seq uses a high-sensitivity, fast signaling software cutSV suitable for third generation sequencing data to detect Structural Variations (SVs) of long-reading long data generated by Nanopore. The parameters were set to default parameters specific to Nanopore and the minimum supported read number was set to 1 to achieve single cell single molecule resolution. In the analysis of multi-cell support accuracy of structural variation detection, SVs of all cells are first combined using surfivor, and SV accuracy of different cell supports is calculated according to formula accuracy = true positive/(true positive + false positive), and the reference set uses a large number of single sperm cell initiated third generation Nanopore sequencing data. In haplotyping of SVs, a 0/1 matrix is firstly established according to the parental genotype condition of SVs in a reference set in a single sperm, wherein 0 represents a C57 parent type, 1 represents a DBA parent type, whether the SVs in the reference set are consistent with the SVs in the reference set needs to meet the condition that the SVs are similar in length to the SVs in the reference set and are located within +/-100 bp, the genotypes consistent with the SVs in the reference set are marked if the SVs are consistent with the SVs in the reference set, and the genotypes inconsistent with the SVs in the reference set are not consistent with the SVs. The generated matrix is filtered by using a tool 'hapiFrame selection' in R package Hapi, wherein less than 5 SVs supported by sperms are filtered, and then 100 cells with the largest SVs are selected as a precursor frame for subsequent typing. To improve typing accuracy, it is necessary to perform HMM calibration on the precursor frame, and if more than half of the cells in each position support that an error occurs, the genotype is inverted. Thus, a basic framework is formed, and the deleted genotypes are iteratively filled in with reference to other cells using the 'inputtionFun 1' function. And then, carrying out haploid primary typing by using a 'hapi phase' function, and assembling the haplotype with high resolution and high consistency by using a 'hapi assembly' after 'hapi Block MPR' calibration.
As a result, as shown in FIG. 9, refresh-seq was able to identify structural variations in sperm, with an average of 973 structural variation events detected per cell (FIG. 9 a). In all detected structural variation length distributions 0, two peaks appear around 180 bp and 6 kb-7 kb, corresponding to the B1 element (equivalent to Alu in humans) and the LINE1 element, respectively (fig. 9B). The accuracy of the structural variation events detected by Refresh-seq can reach 80% with more than three cells (FIG. 9 c), while the accuracy of chromosome-scale haplotyping is as high as 98% (FIG. 9 d), and genome elements were successfully annotated for these typed structural variations (FIGS. 9 e-g).
EXAMPLE 5 application of Refresh-seq technology to identification of egg cells, polar bodies
In this example, the egg cells and polar bodies were obtained by fertilization and parthenogenesis, usingEcoR I Refresh-seq was performed and the specific pooling procedure was consistent with example 1. A total of 185 second poles, 87 parthenocarpic activated egg cells, 132 second poles, 33 second meiotic phase cells and 26 zygote cells were collected. Wherein the second diode and parthenocarpic activated egg are haploid cells, and an average of 14 crossover events are detected.
The experimental results are shown in FIG. 10, and in addition to the sperm cells applied to males, refresh-seq also gave better results in female germ cells. Refresh-seq can also achieve adequate genome coverage at low sequencing amounts, where diploid cells have higher genome coverage with equal sequencing amounts, and genome coverage increases with increasing sequencing data, indicating saturation of coverage is not achieved (FIG. 10 c). The second diode and parthenocarpic activated egg were able to detect an average of 14 crossover events (fig. 10 d), ranging from 6 to 25 single cell crossover events. The median of crossover resolution is 283 kb, so Refresh-seq can also be used to obtain high resolution crossover data in shallow sequencing in female haploid germ cells. The cross-distribution density plot shows that female mice have a relatively small cross-distribution near the centromeres relative to near the subterminals, while more cross-distribution near the subterminals, and that females are less enriched near the subterminals relative to males (fig. 10 f).
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (11)
1. A method for detecting genomic information based on restriction enzymes, comprising the steps of:
(1) Cutting the genome of the sample by adopting restriction enzyme to obtain genome DNA fragments with different lengths;
(2) Carrying out long genome DNA fragment enrichment on the genome sample which is amplified or not amplified;
(3) Sequencing the enriched long genome DNA fragment on a long reading and long sequencing platform;
(4) And carrying out computer analysis on the data obtained by sequencing, and obtaining the sequence information of the sample in the genome region through comparing and calculating by replying the long genome DNA fragment to the genome region.
2. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the restriction enzymes are those recognizing specific sequences of 4-10 bp, preferably 6 bp, 8 bp, more preferably the restriction enzymes are selected for higher coverageEcoR I、SacI, selecting when the enrichment effect is betterAsiS I; the goal is to obtain a higher coverage, the cut DNA fragments have similar lengths and are concentrated between 1-3 kb.
3. The method of restriction enzyme based detection of genomic information according to claim 1, wherein the genomic sample is episomal DNA, DNA released by cells in culture, one or more cells or nuclei, viruses, mitochondria or chloroplasts.
4. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the step (2) performs end repair, A addition, and adaptor ligation on genomic DNA fragments, performs PCR amplification, and enriches long genomic DNA fragments after amplification.
5. The method for detecting genomic information based on restriction enzymes according to claim 4, wherein the linker used in the amplification in the step (2) is a linker without a bar code or a linker with a bar code; each PCR tube is independently carried out in the subsequent purification and library establishment process by using the joint without the bar code, and the joints of the 5 'end and the 3' end are arranged during PCR amplification; and (3) using the joint with the bar code, mixing and purifying sample tubes with different bar codes after the joint is connected, amplifying in one tube, and connecting the 3' end joint on an amplified band.
6. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the sequencing platform in step (3) is a long-read long sequencing platform, optionally a Nanopore sequencing platform or a PacBio sequencing platform.
7. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the restriction enzymes selected in the step (1) are selected by performing an enzyme fragment simulation on the genome of the target species, and deducing the distribution of genome fragments after the enzyme cleavage.
8. The method for restriction enzyme based detection of genomic information according to claim 1, wherein the long genomic DNA fragment in step (2) refers to a fragment having a length of more than 700 nucleotide pairs, preferably a fragment having a length of more than 1000 nucleotide pairs.
9. The method of restriction enzyme-based detection of genomic information according to claim 1, wherein the amplification in step (2) is a polymerase chain reaction, and the long genomic DNA fragments are enriched using the polymerase chain reaction and fragment screening, which is a running film fragment screening or a magnetic bead fragment screening.
10. The method for restriction enzyme-based detection of genomic information according to claim 1, wherein the sequence information in step (4) comprises one or more of the following: 1) Fragment length information; 2) Fragment abundance information; 3) Heterozygous single nucleotide polymorphism information; 4) Genomic structural variation information including insertions, deletions, duplications, inversions, translocations; 5) Repeating sequence information including a short stroma element, a long terminal repeating element, a DNA repeating element, a simple repetition, and a satellite oven; 6) Genome copy number variation information; 7) Allele information; 8) Linkage of allele information; 9) Epigenetic information including DNA methylation and DNA methylolation.
11. The method for detecting genomic information based on restriction enzymes according to claim 10, wherein the allele information is a mutation type at an allele on a homologous chromosome, including SNP, SV, repeat information, epigenetic information on the allele; the repeated sequence information comprises a short scattered seat element, a long terminal repeated element, a DNA repeated element, simple repetition and a satellite cooker; the epigenetic information includes DNA methylation and DNA methylolation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410122596.6A CN117737216A (en) | 2024-01-30 | 2024-01-30 | Method for detecting genome information based on restriction enzyme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410122596.6A CN117737216A (en) | 2024-01-30 | 2024-01-30 | Method for detecting genome information based on restriction enzyme |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117737216A true CN117737216A (en) | 2024-03-22 |
Family
ID=90281614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410122596.6A Pending CN117737216A (en) | 2024-01-30 | 2024-01-30 | Method for detecting genome information based on restriction enzyme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117737216A (en) |
-
2024
- 2024-01-30 CN CN202410122596.6A patent/CN117737216A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102427319B1 (en) | Determination of base modifications of nucleic acids | |
US20190024141A1 (en) | Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers | |
US11319589B2 (en) | Methods of determining the presence or absence of a plurality of target polynucleotides in a sample | |
CA3096668A1 (en) | Compositions and methods for cancer or neoplasia assessment | |
CN108624668A (en) | The method of phase is determined for genome assembling and haplotype | |
CN103987857A (en) | Sequencing small amounts of complex nucleic acids | |
CN103088120A (en) | Large-scale genetic typing method based on SLAF-seq (Specific-Locus Amplified Fragment Sequencing) technology | |
CN116083605B (en) | Genetic marker system containing 67 high-efficiency autosomal micro haplotypes and detection primer and application thereof | |
AU2021359279B2 (en) | Nucleic acid library construction method and application thereof in analysis of abnormal chromosome structure in preimplantation embryo | |
Laufer et al. | Applications of advanced technologies for detecting genomic structural variation | |
CN117737216A (en) | Method for detecting genome information based on restriction enzyme | |
US20230235320A1 (en) | Methods and compositions for analyzing nucleic acid | |
US20220136043A1 (en) | Systems and methods for separating decoded arrays | |
Choo | Loose Ends in Cancer Genome Structure | |
Payne | Scalable Methods for In Situ Genomics | |
WO2022112751A1 (en) | Methods for the accurate detection of mutations in single molecules of dna | |
CN105602937A (en) | Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |