WO2014013218A1 - Methods and systems for determining haplotypes and phasing of haplotypes - Google Patents
Methods and systems for determining haplotypes and phasing of haplotypes Download PDFInfo
- Publication number
- WO2014013218A1 WO2014013218A1 PCT/GB2013/051305 GB2013051305W WO2014013218A1 WO 2014013218 A1 WO2014013218 A1 WO 2014013218A1 GB 2013051305 W GB2013051305 W GB 2013051305W WO 2014013218 A1 WO2014013218 A1 WO 2014013218A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- modified
- sequence
- fragments
- haplotypes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 160
- 102000054766 genetic haplotypes Human genes 0.000 title claims abstract description 108
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 199
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 142
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 142
- 102000054765 polymorphisms of proteins Human genes 0.000 claims abstract description 141
- 239000012634 fragment Substances 0.000 claims abstract description 113
- 125000003729 nucleotide group Chemical group 0.000 claims description 150
- 238000012163 sequencing technique Methods 0.000 claims description 117
- 239000002773 nucleotide Substances 0.000 claims description 75
- 238000006243 chemical reaction Methods 0.000 claims description 51
- 238000010348 incorporation Methods 0.000 claims description 33
- 238000003752 polymerase chain reaction Methods 0.000 claims description 26
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 claims description 23
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 claims description 14
- 238000009396 hybridization Methods 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 claims description 7
- -1 dPTP Chemical compound 0.000 claims description 7
- 230000004048 modification Effects 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 6
- 238000002864 sequence alignment Methods 0.000 claims description 5
- 238000001917 fluorescence detection Methods 0.000 claims description 4
- 238000007672 fourth generation sequencing Methods 0.000 claims description 4
- 238000012175 pyrosequencing Methods 0.000 claims description 4
- UBKVUFQGVWHZIR-UHFFFAOYSA-N 8-oxoguanine Chemical compound O=C1NC(N)=NC2=NC(=O)N=C21 UBKVUFQGVWHZIR-UHFFFAOYSA-N 0.000 claims 2
- 102000053602 DNA Human genes 0.000 description 123
- 108020004414 DNA Proteins 0.000 description 122
- 239000000523 sample Substances 0.000 description 52
- 238000005516 engineering process Methods 0.000 description 36
- 230000003321 amplification Effects 0.000 description 32
- 238000003199 nucleic acid amplification method Methods 0.000 description 32
- 108700028369 Alleles Proteins 0.000 description 30
- 238000002360 preparation method Methods 0.000 description 30
- 102000040430 polynucleotide Human genes 0.000 description 25
- 108091033319 polynucleotide Proteins 0.000 description 25
- 239000002157 polynucleotide Substances 0.000 description 25
- 210000000349 chromosome Anatomy 0.000 description 20
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 20
- 239000000203 mixture Substances 0.000 description 19
- 201000010099 disease Diseases 0.000 description 17
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 14
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 14
- 238000013467 fragmentation Methods 0.000 description 14
- 238000006062 fragmentation reaction Methods 0.000 description 14
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 13
- CLGFIVUFZRGQRP-UHFFFAOYSA-N 7,8-dihydro-8-oxoguanine Chemical compound O=C1NC(N)=NC2=C1NC(=O)N2 CLGFIVUFZRGQRP-UHFFFAOYSA-N 0.000 description 12
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 11
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical class O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 11
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 10
- 230000002068 genetic effect Effects 0.000 description 10
- 150000002500 ions Chemical class 0.000 description 10
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 9
- 108091028043 Nucleic acid sequence Proteins 0.000 description 8
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 8
- 239000000839 emulsion Substances 0.000 description 8
- 108090000623 proteins and genes Proteins 0.000 description 8
- 229930024421 Adenine Natural products 0.000 description 7
- 229960000643 adenine Drugs 0.000 description 7
- 238000003556 assay Methods 0.000 description 7
- 239000011324 bead Substances 0.000 description 7
- 229960002685 biotin Drugs 0.000 description 7
- 235000020958 biotin Nutrition 0.000 description 7
- 239000011616 biotin Substances 0.000 description 7
- 210000004027 cell Anatomy 0.000 description 7
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 7
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 7
- 230000010076 replication Effects 0.000 description 7
- 229920002477 rna polymer Polymers 0.000 description 7
- 239000000758 substrate Substances 0.000 description 7
- 108020004635 Complementary DNA Proteins 0.000 description 6
- 238000010804 cDNA synthesis Methods 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 6
- 239000002299 complementary DNA Substances 0.000 description 6
- 230000002596 correlated effect Effects 0.000 description 6
- 239000011807 nanoball Substances 0.000 description 6
- 108700025694 p53 Genes Proteins 0.000 description 6
- 108091034117 Oligonucleotide Proteins 0.000 description 5
- 108010090804 Streptavidin Proteins 0.000 description 5
- 238000002955 isolation Methods 0.000 description 5
- 238000007481 next generation sequencing Methods 0.000 description 5
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical class CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 5
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 4
- 108091092878 Microsatellite Proteins 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000011987 methylation Effects 0.000 description 4
- 238000007069 methylation reaction Methods 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010008 shearing Methods 0.000 description 4
- 238000000527 sonication Methods 0.000 description 4
- 238000011269 treatment regimen Methods 0.000 description 4
- 108091093088 Amplicon Proteins 0.000 description 3
- LYCAIKOWRPUZTN-UHFFFAOYSA-N Ethylene glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 241000282412 Homo Species 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 150000007513 acids Chemical class 0.000 description 3
- 239000012472 biological sample Substances 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 229940104302 cytosine Drugs 0.000 description 3
- 208000035475 disorder Diseases 0.000 description 3
- 230000007614 genetic variation Effects 0.000 description 3
- 230000005865 ionizing radiation Effects 0.000 description 3
- 238000002663 nebulization Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 239000001226 triphosphate Substances 0.000 description 3
- 235000011178 triphosphate Nutrition 0.000 description 3
- UNXRWKVEANCORM-UHFFFAOYSA-N triphosphoric acid Chemical compound OP(O)(=O)OP(O)(=O)OP(O)(O)=O UNXRWKVEANCORM-UHFFFAOYSA-N 0.000 description 3
- BUZOGVVQWCXXDP-VPENINKCSA-N 8-oxo-dGTP Chemical compound O=C1NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 BUZOGVVQWCXXDP-VPENINKCSA-N 0.000 description 2
- DLFVBJFMPXGRIB-UHFFFAOYSA-N Acetamide Chemical compound CC(N)=O DLFVBJFMPXGRIB-UHFFFAOYSA-N 0.000 description 2
- 230000004543 DNA replication Effects 0.000 description 2
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 2
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 2
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 238000012181 QIAquick gel extraction kit Methods 0.000 description 2
- DWAQJAXMDSEUJJ-UHFFFAOYSA-M Sodium bisulfite Chemical compound [Na+].OS([O-])=O DWAQJAXMDSEUJJ-UHFFFAOYSA-M 0.000 description 2
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 239000002671 adjuvant Substances 0.000 description 2
- 239000011543 agarose gel Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 125000002680 canonical nucleotide group Chemical group 0.000 description 2
- 108091092356 cellular DNA Proteins 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 239000001301 oxygen Substances 0.000 description 2
- 239000013612 plasmid Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 235000010267 sodium hydrogen sulphite Nutrition 0.000 description 2
- 238000011285 therapeutic regimen Methods 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- DNIAPMSPPWPWGF-GSVOUGTGSA-N (R)-(-)-Propylene glycol Chemical compound C[C@@H](O)CO DNIAPMSPPWPWGF-GSVOUGTGSA-N 0.000 description 1
- WMHLZRDNWFNTCU-UHFFFAOYSA-N 2-nitroso-3,7-dihydropurin-6-one Chemical compound O=C1NC(N=O)=NC2=C1N=CN2 WMHLZRDNWFNTCU-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- 229920001817 Agar Polymers 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 241000282465 Canis Species 0.000 description 1
- 108020004394 Complementary RNA Proteins 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 108010063362 DNA-(Apurinic or Apyrimidinic Site) Lyase Proteins 0.000 description 1
- 102100035619 DNA-(apurinic or apyrimidinic site) lyase Human genes 0.000 description 1
- 108010000577 DNA-Formamidopyrimidine Glycosylase Proteins 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 241000282324 Felis Species 0.000 description 1
- 101000958041 Homo sapiens Musculin Proteins 0.000 description 1
- 241000283953 Lagomorpha Species 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 229920006068 Minlon® Polymers 0.000 description 1
- OKIZCWYLBDKLSU-UHFFFAOYSA-M N,N,N-Trimethylmethanaminium chloride Chemical compound [Cl-].C[N+](C)(C)C OKIZCWYLBDKLSU-UHFFFAOYSA-M 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 239000002202 Polyethylene glycol Substances 0.000 description 1
- 208000020584 Polyploidy Diseases 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 241000282849 Ruminantia Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 208000026487 Triploidy Diseases 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- NOXMCJDDSWCSIE-DAGMQNCNSA-N [[(2R,3S,4R,5R)-5-(2-amino-4-oxo-3H-pyrrolo[2,3-d]pyrimidin-7-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=2NC(N)=NC(=O)C=2C=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O NOXMCJDDSWCSIE-DAGMQNCNSA-N 0.000 description 1
- RPGRVLDVCSQZTK-XLPZGREQSA-N [hydroxy-[[(2r,3s,5r)-3-hydroxy-5-(5-methyl-4-oxo-2-sulfanylidenepyrimidin-1-yl)oxolan-2-yl]methoxy]phosphoryl] phosphono hydrogen phosphate Chemical compound S=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 RPGRVLDVCSQZTK-XLPZGREQSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000008272 agar Substances 0.000 description 1
- 150000001412 amines Chemical class 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 238000010170 biological method Methods 0.000 description 1
- 238000001369 bisulfite sequencing Methods 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000010836 blood and blood product Substances 0.000 description 1
- 229940125691 blood product Drugs 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 230000005757 colony formation Effects 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003412 degenerative effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 230000008029 eradication Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- KWIUHFFTVRNATP-UHFFFAOYSA-N glycine betaine Chemical compound C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 102000046949 human MSC Human genes 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000011901 isothermal amplification Methods 0.000 description 1
- 125000000468 ketone group Chemical group 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000004020 luminiscence type Methods 0.000 description 1
- 235000019689 luncheon sausage Nutrition 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 238000010297 mechanical methods and process Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- DNIAPMSPPWPWGF-UHFFFAOYSA-N monopropylene glycol Natural products CC(O)CO DNIAPMSPPWPWGF-UHFFFAOYSA-N 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 108010087904 neutravidin Proteins 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004792 oxidative damage Effects 0.000 description 1
- 230000008775 paternal effect Effects 0.000 description 1
- 210000002381 plasma Anatomy 0.000 description 1
- 229920001223 polyethylene glycol Polymers 0.000 description 1
- 235000013772 propylene glycol Nutrition 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007841 sequencing by ligation Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 229910052719 titanium Inorganic materials 0.000 description 1
- 239000010936 titanium Substances 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 239000007762 w/o emulsion Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01L—CHEMICAL OR PHYSICAL LABORATORY APPARATUS FOR GENERAL USE
- B01L7/00—Heating or cooling apparatus; Heat insulating devices
- B01L7/52—Heating or cooling apparatus; Heat insulating devices with provision for submitting samples to a predetermined sequence of different temperatures, e.g. for treating nucleic acid samples
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/20—Identification of molecular entities, parts thereof or of chemical compositions
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2523/00—Reactions characterised by treatment of reaction samples
- C12Q2523/10—Characterised by chemical treatment
- C12Q2523/125—Bisulfite(s)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2525/00—Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
- C12Q2525/10—Modifications characterised by
- C12Q2525/117—Modifications characterised by incorporating modified base
Definitions
- the efforts of the Human Genome Project opened a broader window to the human genome.
- the work to further unlock the human genome is ongoing.
- the HapMap (Haplotype Map) Project is a global scientific effort directed at discovering genetic variants that lead to disease by comparing genomic information from people without a particular disease to those with that disease. Alleles, one or more forms of a DNA sequence for a particular gene, can contain one or more different genetic variants. Identifying haplotypes, or combinations of alleles at different locations, or loci, on a particular chromosome is a main focus of the HapMap Project. Identified haplotypes where the two groups differ might correlate to locations of genetic anomalies that cause disease.
- HapMap results will help to describe the common patterns of genetic variation in humans and whether those variations are potentially correlated to disease.
- Research efforts in determining haplotypes will help illuminate the common patterns of genetic variation in humans and whether those variations are potentially correlated to a particular disease.
- haplotyping a genome will be advantageous, if not essential, in relating genetic variation to phenotype and disease.
- a particular haplotype may be correlated to the success or failure of a treatment regimen and as such could be useful in helping a clinician decide on a therapeutic regimen for a particular individual that might have the highest degree of success in disease eradication in that individual.
- there are many technical challenges associated with genomic haplotyping are many technical challenges associated with genomic haplotyping.
- next generation sequencing technologies while increasing the capacity and accuracy of sequencing efforts, in many cases result in short sequence reads, for example several commercial platforms currently output per fragment reads that are less than 400 nucleotides long.
- two or more genetic variants located on a chromosome are further apart than the sequence read length, even if that read length is thousands of base pairs long, it may be difficult if not impossible to define a haplotype.
- what are needed are methods and compositions that allow for haplotyping, in particular for genetic variants that are farther apart on a chromosome than the sequenced length of a piece of DNA upon which they are found.
- Sequencing technologies associated with next generation sequencing can result in short sequence reads thereby making it difficult to determine the haplotype phasing of a genome when the sequences of interest are located far enough apart on the chromosome such that they are outside the window provided by the length of the sequence read.
- nucleic acid fragments can be modified to convert native nucleotides to synthetic or artificial polymorphisms, such as single nucleotide polymorphisms (SNPs), or other genetic anomalies thereby producing a pattern of engineered polymorphisms in the nucleic acid fragments to be sequenced.
- SNPs single nucleotide polymorphisms
- the pattern of synthetic polymorphisms can be aligned among the fragments and the haplotype can be determined as a result of the alignment (e.g. haplotype content or phase can be determined). In this manner, a population of modified fragments derived from a genomic sample can be haplotyped even if the alleles for haplotyping lie on different genomic fragments.
- Methods and compositions provided herein for creating artificial polymorphisms in a nucleic acid sequence find particular utility for haplotype determination and characterization and/or haplotype phasing; however they can also be advantageous for other purposes.
- the methods described herein could also be used to facilitate de novo sequence assembly.
- repeat regions that are nearly identical, for example repeated nucleotide regions such as short tandem repeats, intermediate tandem repeats, etc. as used for forensic DNA fingerprinting could be distinguished from one another by a unique pattern of artificially introduced polymorphisms and thus a more accurate sequence assembly achieved.
- the order of intermixed repeat regions, and/or the number of repeats can be performed using the methods herein if the repeated regions are sufficiently long such that they cannot be fully sequenced in a single, or a paired end, sequence read.
- haplotype determination and/or haplotype phasing can provide critical information useful for, for example, disease and therapeutic regimen correlation.
- haplotypes and their phase determinations may become critical in personalized medicine where an individual's haplotype may not only be correlated to a disease, but may also correlate to treatment regimen success and the like for a particular individual.
- the present disclosure provides methods for determining the sequence of a nucleic acid sample comprising providing a plurality of nucleic acid fragments of a first length modified to comprise a plurality of synthetic polymorphisms, preparing a nucleic acid library comprising a second plurality of fragments of nucleic acids of a second length less than that of the first length of fragments from said first plurality of nucleic acid fragments comprising a plurality of synthetic polymorphisms, sequencing said nucleic acid library, and aligning the plurality of synthetic
- the synthetic polymorphisms are a plurality of modified nucleotides that replace the native nucleotides at a particular location and the modified nucleotides are selected from the group consisting of 8-oxoguanine, dPTP, isocytosine and isoguanine.
- modifications to the nucleic acids comprise partial and incomplete bisulfite conversion of cytosines in said plurality of nucleic acid fragments.
- the synthetic polymorphism alignment comprises matching (i.e.
- a nucleic acid library is sequenced using a method selected from the group consisting of sequence by synthesis, sequence by hybridization, sequence by ligation, single molecule sequencing, nanopore sequencing, pyrosequencing and polymerase chain reaction. In some instances, a sequence is determined by fluorescence detection.
- the determined sequence comprises one or more haplotypes and further comprises determining the phase of two or more haplotypes in the nucleic acid sample.
- the haplotypes for phasing are located on different sequenced fragments. The above disclosed methods could also be used for de novo sequencing.
- the present application discloses a method for characterizing one or more haplotypes of a nucleic acid sample comprising providing a pool of fragmented nucleic acids, introducing a plurality of synthetic polymorphisms such as single nucleotide polymorphisms in the fragmented nucleic acids of said pool to produce fragments comprising a plurality of synthetic polymorphisms, preparing a library of nucleic acid fragments that are shorter in length than the original pool of fragments comprising a plurality of modified nucleic acids, sequencing nucleic acid fragments in the library, aligning the synthetic polymorphisms of the sequenced nucleic acid fragments, and characterizing one or more haplotypes of the nucleic sample from the aligned synthetic polymorphisms of the sequenced fragments.
- the plurality of synthetic single nucleotide polymorphisms replaces the native nucleotides at the site of incorporation and comprises a plurality of modified nucleotides.
- the modified nucleotides are selected from the group consisting of 8-oxoguanine, isocytosine, isoguanine and dPTP.
- introduction of the synthetic polymorphisms is accomplished by partial and incomplete bisulfite conversion of cytosines in the nucleic acid fragments.
- the synthetic polymorphisms are aligned by matching (i.e., by a computer implemented program) a pattern of synthetic polymorphisms in a first nucleic acid fragment sequence with a like pattern of synthetic polymorphisms in a second nucleic acid fragment sequence and repeating said matching in a plurality of nucleic acid fragment sequences thereby creating a sequence alignment from the synthetic polymorphisms in the sequenced nucleic acid fragments.
- sequencing is performed by one of sequence by synthesis, sequence by hybridization, sequence by ligation, single molecule sequencing, nanopore sequencing, pyrosequencing and polymerase chain reaction methodologies.
- sequences are determined by fluorescence detection.
- sequences are used to determine the phase of two or more haplotypes in the nucleic acid sample. Oftentimes, the haplotypes for phasing are located on different sequenced fragments. In other instances, the method described above can be used for de novo sequencing.
- the present disclosure describes a method for identifying one or more haplotypes of a nucleic acid sample comprising providing a nucleic acid molecule having a plurality of nucleotides, modifying a plurality of the nucleotides in the nucleic acid molecule, thereby producing a modified nucleic acid molecule comprising natural and modified nucleotides, amplifying the modified nucleic acid molecule to produce a plurality of modified nucleic acid copies of a first length, fragmenting the amplified modified nucleic acid copies under conditions to produce a library of nucleic acid fragments of a second length, wherein individual nucleic acid fragments in the library have a region of sequence overlap with at least one other nucleic acid fragment in the library and wherein the region of sequence overlap comprises at least one modified nucleotide, determining the sequence of nucleic acid fragments of the library, and aligning the sequence of nucleic acid fragments by the locations of the modified nucleotides in the regions of sequence overlap to
- the nucleic acid molecule comprises several different nucleotide types along the length of sequence and one of the nucleotide types may be modified in the modified nucleic acid or all of the nucleotides of the one type may be modified in the modified nucleic acid. In some instances, only a subset of the nucleotides of the one type is modified in the modified nucleic acid. In some instances, methods for identifying a haplotype further comprises determining the phase for at least two haplotypes in the nucleic acid molecule.
- the haplotypes for phasing are located on different sequenced fragments.
- the nucleic acid molecule comprises several different nucleotide types along the length of sequence, wherein the at least two haplotypes are bi-allelic for two of the nucleotide types, and wherein a third nucleotide type is modified in the modified nucleic acid.
- at least two haplotypes are bi- allelic for nucleotide types that are selected from the group consisting of A, T and G, and wherein C is modified to U in the modified nucleic acid.
- At least two haplotypes are bi-allelic for T and G, and wherein C is modified to U in the modified nucleic acid.
- at least two haplotypes are bi-allelic for nucleotide types that are selected from the group consisting of A, T and C, and wherein G is modified to 8-oxo-G in the modified nucleic acid.
- at least two haplotypes are bi-allelic for C and T, and further G is modified to 8-oxo-G in the modified nucleic acid.
- Figure 1 shows an embodiment for incorporating the modified nucleotide 8- oxoguanine (8-oxo G) into DNA thereby converting natural nucleotides in a sequence to synthetic polymorphisms in a sequence.
- Figure 2 shows an embodiment for incorporating synthetic polymorphisms into a polynucleotide by partial sodium bisulfite conversion of cytosines to uracils in DNA.
- Figure 3 depicts an embodiment for incorporating synthetic polymorphisms into a polynucleotide by incorporating the modified nucleotides isocytosine and isoguanine into DNA in lieu of the native nucleotides.
- Figure 4 demonstrates an embodiment where the target DNA contains artificial polymorphisms created using sodium bisulfite conversion methodology.
- Figure 5 shows an example of haplotype reconstruction.
- the incorporated artificial SNPs are depicted as vertical lines on the linear DNA fragments Allele 1 and Allele 2.
- the DNA is fragmented, sequenced and the sequencing reads are aligned based on the unique pattern of the incorporated synthetic SNPs (allele 2 from Figure 4 depicted in this Figure).
- the alignment of the artificial SNPs in the overlapping fragments allows for the rebuilding of the original genomic fragment sequence and the reconstruction of the haplotype for allele 2 can be determined.
- Figure 6 shows an example of how the embodiment for a "first strand extension reaction" can be used to incorporate synthetic polymorphisms into a DNA target.
- Figure 7 shows sequencing data for the percent of modified nucleotides (% error rates) incorporated into phiX template DNA extension products for flowcell lanes 1 , 2, 3 and 4.
- Figure 8 shows sequencing data for the percent of phiX sequencing reads by cycle having 0, ⁇ 1, ⁇ 2, ⁇ 3 or ⁇ 4 incorporated modified nucleotides.
- Y axis is % reads with X errors or less 0-100%,
- X axis is cycle number 0-100.
- Figure 9 shows a composite of the types and frequency (error rate) of synthetic polymorphisms that were introduced into the phiX template DNA for each flowcell lane during first strand extension.
- Figure 10 A-D are representative of the distribution or coverage of artificial polymorphisms introduced into a phiX template DNA.
- FIG 11 shows coverage plots representing the sequencing data of three clones Panel A) Clone A, Panel B) Clone B and Panel C) Clone D.
- the graphs represent the coverage and locations of synthetic and natural heterozygous SNPs incorporated into p53 gene sequences derived from the DNA of a Yoruban male (NA18507).
- Each graph reports the sequence in the approximate same region of the p53 gene for each clone and the stars mark the approximate locations of natural heterozygous SNPs among the randomly distributed introduced synthetic SNPs.
- the top horizontal line with peaks represents the reference calls and the continuous baseline with vertical peaks under the horizontal line represents the non-reference calls.
- haplotyping may help to map human disease genes.
- Disease maps could be used to diagnose, prognose and/or identify disease or risk of disease for a patient as well as determine potential treatment therapies unique to any one person. Such is one of the goals of personalized healthcare.
- sequence knowledge such as haplotyping could also be used to advantage in veterinary and plant sciences.
- determining a haplotype and/or phasing of haplotypes is important from both a biological and clinical point of view. Sequencing a sample provides sequence information with which an investigator can start to unravel and determine such correlations.
- haplotype refers to a haploid genotype, a
- a haplotype can provide a distinctive genetic pattern of an individual.
- a haplotype can be determined for one locus, several loci, over a portion of or for an entire chromosome.
- the term "allele" is used consistent with its meaning in the art of biology.
- An allele is one or more alternative forms of a gene, genetic sequence or single nucleotide (e.g. a single nucleotide polymorphism or SNP) found at a specific location, or locus, on a chromosome.
- locus is used consistent with its meaning in the art of biology.
- a locus refers to a specific location or place on a chromosome identified with a gene, genetic sequence or single nucleotide.
- one or more alleles for a particular gene can be found at a particular locus on a chromosome.
- Different genes can be identified with different loci on a chromosome, wherein each gene, for example, may be associated with one or more different allelic sequences.
- Alleles are not limited to any specific type and may include, for example, normal genetic sequences or variant genetic sequences. For example, single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), etc.
- phased alleles refers to the distribution of the particular alleles on a chromosome. Accordingly, the "phase" of two alleles can refer to a characterization or determination of whether the alleles are located on a single chromosome or two separate chromosomes (e.g. a maternally or paternally inherited chromosome).
- next-generation sequencing technology may increase the accuracy of sequencing and may be useful for calling variants
- the technology can be of limited use when phase, or haplotype information, is desired. Phasing information derived from short sequence reads have previously been very difficult to determine unless the two polymorphisms of interest were so close to one another that they were present on the same sequenced fragment of DNA, or perhaps in a case where one polymorphism was determined to be present from a first sequence read and the second polymorphism was detected in the second sequence read of the same pair of nucleic acid fragments.
- Instances resulting from the second case are contemplated to be rare since, on average, the human genome has one polymorphism for every 1000 nucleotides. As such, the probability of a particular read containing a polymorphism may be approximately 15% (sequence read
- the combined probability of both reads belonging to a pair of sequences having each one polymorphism is the product of the individual probabilities (15% x 15%). Therefore, it is contemplated that a small subset of fragment read pairs, for example approximately 2.25% of short fragment read pairs, could contain two variant sequences that form a haplotype. This is further complicated when taking into account that the average insert size distribution of the typical sequencing library, for example a library created for a next generation sequencing technology can range from approximately ⁇ 50 bp (e.g., Life Technologies SOLiD sequencing at mate paired sequencing) to approximately ⁇ 400 bp (e.g., 454 Life Sciences GS FLX Titanium sequencing).
- the present disclosure provides solutions for characterizing genomic haplotypes (e.g. haplotype content or phase) which are particularly useful when dealing with short read length sequence information.
- the present disclosure provides methods and compositions for enabling haplotype characterization from sequence information, in particular when the alleles of interest are located on different sequenced nucleic acid fragments.
- Embodiments herein disclose methods for creating "artificial polymorphisms" or “synthetic polymorphisms” such as artificial or synthetic single nucleotide
- synthetic SNPs polymorphisms or "artificial SNPs"
- synthetic SNPs synthetic SNPs
- synthetic polymorphisms represent sequences in a nucleic acid sample that are not naturally occurring in the nucleic acid sample, but instead are incorporated by methodological means into the nucleic acid sample.
- the synthetic polymorphism could be inserted into the sequence of a genome, or the synthetic polymorphism could replace a sequence of the nucleic acid sample.
- synthetic polymorphisms include, but are not limited to, single nucleotide polymorphisms (i.e., artificial or synthetic SNPs), dinucleotide polymorphisms, insertions of nucleic acids (e.g., one or more nucleic acids, etc.) and deletions of nucleic acids (e.g., one or more nucleic acids, etc.).
- the artificial sequences for incorporation into a natural nucleic acid or polynucleotide sample comprise modified nucleotides including, but not limited to, 2-thio thymidine triphosphate, 5-(2'-deoxy-D-ribofuranosyl)-3-methyl-2-pyridone-5'triphosphate, 8- oxoguanine (8-hydroxyguanine, 8-oxo-7,8-dihydroguanine or 2-amino-7,9-dihydro-lH- purine-6,8-dione), 8-Oxo-2'-deoxyguanosine-5 '-triphosphate, 2'-Deoxy-P-nucleoside- 5'triphosphate (dPTP), d 5m CTP for example, m7G(5')ppp(5'); Pl-5'-(7-Methyl)- guanosine-P3 -5" -guanosine triphosphate, methyl5-dCTP, hydroxy
- the artificial or synthetic polymorphisms can be incorporated, for example, at a certain frequency such that they can be aligned and phased even from short sequence reads or pairs of reads.
- polymorphisms in a nucleic acid strand comprises incorporating a plurality of nucleic acid analogs, for example a guanine analog such as 8-oxoguanine (8-oxo G), into a nucleic acid strand.
- a guanine analog such as 8-oxoguanine (8-oxo G
- modified nucleotide 8-oxoguanine (8- hydroxyguanine, 8-oxo-7,8-dihydroguanine or 2-amino-7,9-dihydro-lH-purine-6,8- dione (IUPAC)) found normally in mammalian DNA increases in DNA, for example that is damaged due to oxidative damage caused by oxygen free radical species and/or ionizing radiation (1992, Cheng et al., J Biol Chem 267: 166-172, incorporated herein by reference in its entirety).
- 8-oxo G can base pair to either a cytosine (C) and/or adenine (A) via Hoogsteen base pairing (LePage et al, Nucl Acids Res, 1998, 26: 1276-1281, incorporated herein by reference in its entirety).
- the 8-oxo G e.g., by incorporation during an extension reaction of 8-Oxo-2'-deoxyguanosine-5'- triphosphate or 80xodGTP
- 80xodGTP can be incorporated into a polynucleotide by a variety of means, for example by ionizing radiation or another means of oxidatively stressing the cellular DNA.
- the modified nucleotide can be added to a dNTP mix and, during an extension reaction of one or both strands of a polynucleotide can be incorporated into an extended DNA strand thereby replacing the normally incorporated non-modified nucleotide at a certain frequency.
- adenine mispairing can be accomplished during a DNA replication step by pairing of an adenine in the replicating strand opposite the 8- oxo G in the parent strand.
- 8-oxo G can be incorporated into a polynucleotide prior to library preparation for sequencing.
- a genomic DNA sample can be fragmented, the fragment ends repaired, adenines added to the ends via A-tailing and primer adaptors added to the ends for replication and amplification, for example.
- 80xodGTP can be added along with a canonical dNTP mix (dATP, dTTP, dGTP and dCTP) which would result in the replacement of a plurality of guanines with a plurality of 8-oxo G guanine analogs into the DNA fragment in a random fashion.
- the percent of 80xodGTP can be empirically determined. In some embodiments, the percent of 80xodGTP is at least 10%, at least 20%, at least 30%, at least 30%, at least 50%, at least 60%, at least 70%, at least 80% at least 90% or at least 100% of guanines (e.g., as a replacement for dGTP) available for incorporation during fragment replication.
- the percentage, and therefore ratio, of guanine analog compared to the canonical dGTP can be empirically determined for the amount of replacement desired by the user. It will be understood that similar percentages or ratios can be used for other nucleotides (or modified nucleotides) that are incorporated into nucleic acids using methods and compositions set forth herein, for example, in order to introduce artificial SNPs.
- the genomic fragments containing 8-oxo G can be subsequently isolated from those fragments that lack 8-oxo G. Isolation of the 8-oxo G containing fragment can be by any means.
- a primer used during replication could be complexed with a binding molecule that binds a binding partner for isolation purposes.
- binding partner pairs include, but are not limited to, haptens, small molecules, dyes and antibodies such as for example biotin/streptavidin, biotin/avidin, biotin/neutravidin, DNP/anti-DNP, DIG/anti-DIG, etc.
- Isolation of 8-oxo G containing DNA can also be isolated by capture with an 8-oxo G specific antibody such as Oxoguanine 8 antibody [2Q2311 ] (ab64548 from AbCam).
- the 8-oxo G containing DNA can also be eliminated from downstream haplotyping methods by either denaturation and washing or digestion for example with formamidopyrimidine DNA glycosylase (Fpg) (also known as 8-Oxoguanine DNA glycosylase, NEB).
- Fpg formamidopyrimidine DNA glycosylase
- NEB 8-Oxoguanine DNA glycosylase
- Figure 1 exemplifies an embodiment using 80xodGTP in methods for incorporating synthetic polymorphisms into genomic DNA.
- genomic DNA can be randomly fragmented into large fragments.
- the size of the initial large fragments can be at least 500bp, at least 750bp, at least lOOObp, at least 1500bp, at least 2000bp, at least 3000bp, at least 4000bp, at least 5000bp.
- the size of the initial fragments can be determined empirically and may vary between different regions of the genome that have different frequencies of guanines which would affect the amount of downstream guanine analog incorporation.
- Fragmentation can be by any means, for example sonication, Hydroshearing, nebulization, mechanical shearing and transposon methodologies, etc.
- the fragments can be end repaired, A-tailed and adaptor ligated.
- the nucleotide 8-oxo G can be incorporated into a strand of the genomic fragment by primer extension and a dNTP mix that includes 80xodGTP.
- the primer utilized for DNA extension and incorporation of the modified nucleotide can be complexed with biotin which can be subsequently captured by a streptavidin molecule for isolation of the 8-oxo G containing strand.
- the captured 8-oxo G containing templates can be replicated resulting in 8-oxo G mispairs with adenines, thereby creating double stranded DNA molecules wherein the template contains the guanine analogs and the copied strand contains the mispaired adenines.
- the primer used for replication of the second strand can be affixed to a capture moiety such as biotin and capture by streptavidin can be performed.
- the remaining adenine containing polynucleotides can be further amplified and processed to create a library of fragments for sequencing.
- the created synthetic adenine SNPs in the fragments are random and, due to the randomness of the guanine substitutions with 8-oxo G, the pattern of introduced synthetic SNPs can be used to uniquely identify the parental fragments.
- the artificial SNP patterns can be aligned among all the fragments thereby combining the fragment sequences in the original genomic order for haplotype determination, such as determination of haplotype content or phase.
- a method for introducing artificial polymorphisms in a genomic DNA for sequencing comprises modifying DNA with bisulfite thereby creating a pattern of artificial polymorphisms.
- applying bisulfite to a nucleic acid sample in low concentration or for a short period of time can modify DNA by incompletely and partially converting a subset of unmethylated cytosine residues to uracils and uracils into thymines thereafter to create artificial thymine polymorphisms at a plurality of locations in the genomic DNA.
- methylated cytosines e.g., 5-methylcytosine
- cytosine residues that are not methylated are converted to uracils. Therefore, by utilizing the methylation status of a genomic DNA sample and treating genomic DNA with bisulfite a pattern of artificial T SNPs (C to U to T) can be created which can be aligned among the fragments after sequencing to reconstruct the genomic DNA chromosomal sequence for subsequent haplotype characterization (e.g. identification of the haplotype content or phase).
- haplotype characterization e.g. identification of the haplotype content or phase.
- partial and incomplete conversion of methylated cytosine residues is preferred when practicing methods disclosed herein for creating a pattern of synthetic polymorphisms in a polynucleotide.
- Examples of natural cytosine sequence configurations which could be targets for partial bisulfite conversion include, but are not limited to CG methylation dinucleotides (1994, Clark et al, Nucl Acids Res 22:2990-2997, incorporated herein by reference in its entirety), CpT and CpA dinucleotide regions (2000, Lyko et al, Nature 408:538-540; 2000, Ramsahoye et al, Proc Nat Acad Sci 97:5237-5242; 2001, Haines et al, Dev Biol 240:585-598, incorporated herein by reference in their entireties) and CHG and CHH in stem cells wherein H can be either an adenine (A), cytosine (C) or thymine (T) (2009, Lister et al, Nature 462:315-322, incorporated herein by reference in its entirety).
- A adenine
- C cytosine
- T th
- DNA can be modified in vitro to include methylated nucleotides (e.g., modified nucleotides which are non-native methylated nucleotides).
- methylated nucleotides can be incorporated into a plurality of locations in a polynucleotide by amplification, such as amplification of a nucleic acid in the presence of canonical dNTPs wherein one of the dNTPs is replaced in whole, preferentially in part, with a methylated dNTP including, but not limited to, d 5m CTP, m7G(5')ppp(5'); Pl-5'-(7-Methyl)-guanosine-P3-5"-guanosine triphosphate (Roche Applied Science), methyl5-dCTP (Zymo Research), or hydroxymethyl dCTP (Bioline).
- amplification such as amplification of a nucleic acid in the presence of canonical dNTPs wherein one of the dNTPs is replaced in whole, preferentially in part, with a methylated dNTP including, but not limited to, d 5m CTP,
- methylated dNTPs can be spiked into an amplification reaction in a background of canonical dNTPs. Partial bisulfite conversion could then be carried out on the in vitro modified DNA as described herein for creating a pattern of synthetic polymorphisms in a nucleic acid sample.
- genomic DNA is fragmented as previously described and the fragment ends are repaired and A-tailed using methods known in the art (for example, see Molecular Cloning; A Laboratory Manual, Eds. Sambrook, Fritsch and Maniatus, Cold Spring Harbor Laboratory Press) as previously exemplified in Figure 1.
- the prepared genomic fragments can be ligated to adaptors for subsequent amplification of the fragments.
- the adaptors for use with the bisulfite conversion method for creating artificial SNPs can be designed so that they are extendable and amplifiable following bisulfite treatment.
- the adaptors can be pre-methylated (i.e., methylated adaptors), or adaptors could be designed which lack cytosine nucleotides where primer binding occurs.
- the adaptor ligated fragments can be amplified and copied using dTTP to replace the uracils prior to library preparation. Following library preparation and sequencing the artificial SNP patterns in the fragmented sequences can be aligned to reconstruct the original genomic DNA which can then be haplotyped.
- the partial conversion of cytosines by bisulfite conversion creates synthetic SNPs in the fragments wherein, due to the randomness of the conversions, the pattern of synthetic SNPs can be used to uniquely identify the parental fragments.
- the partial conversion of cytosines to uracils can be performed prior to genomic DNA fragmentation and/or adaptor ligation, in which case the ligated adapters need not be methylated or otherwise designed to resist bisulfite treatment of cytosines.
- methods for determining haplotype of a genomic sequence comprise the use of modified nucleotides such as isoC and isoG.
- modified nucleotides such as isoC and isoG.
- Isocytosine (isoC, iC) and isoguanine (isoG, iG) modified nucleotides having the amine and ketone groups inverted as compared to the standard cytosine and guanine nucleotides, can be misincorporated into a DNA strand resulting in the random placement of artificial polymorphisms.
- isoC and isoG the polymorphisms created can be copied or sequenced in later steps using the correct complementary non-natural partner.
- Figure 3 is exemplary of the use of modified nucleotides in methods for creating artificial polymorphisms in DNA.
- genomic DNA can be fragmented as previously described.
- Adaptors can be ligated to the ends of the random fragments as previously described.
- Exemplary naturally occurring SNPs A and T are depicted on one of the fragments; these SNPs being targeted as an example for haplotyping.
- a modified nucleotide, in this example iC can be incorporated into the extended strand which is further end labeled with a binding moiety affixed to the extension primer, in this example biotin.
- the modified nucleotide deoxyisocytosine diCTP can be part of the extension dNTP mix in a defined ratio or percentage. Such ratios or percentages can be determined empirically for the amount of synthetic polymorphism incorporation desired by an investigator.
- the strand comprising the modified nucleotide can be captured with the binding partner, in this case streptavidin and subsequent strand duplication can incorporate the mate to the modified nucleotide, in this case iG as described for iC.
- the double stranded fragments, which comprise iC on one strand and iG on the other can be amplified thereby creating multiple fragments comprising both modified nucleotides for use in library preparation.
- synthetic polymorphisms can alternatively be incorporated into genomic library fragments downstream of fragment library preparation.
- genomic library can be created (by any means known to a skilled artisan, for example as discussed herein)
- synthetic polymorphisms can be incorporated in steps between the library preparation and sequencing.
- synthetic polymorphisms can be incorporated during colony formation prior to sequence by synthesis methodologies.
- the DNA library can be hybridized to primers affixed on a substrate and a first strand extension reaction can be utilized to incorporate modified nucleotides into the fragment library. This "first strand extension reaction" format is exemplified in Figure 6.
- two primers which are homologous to primers affixed to the ends of the DNA library fragments are bound to locations on a substrate such as a flowcell (e.g., lanes or wells on a flowcell), wells, plates, and the like.
- the template DNA library fragments can be hybridized to the substrate bound primers and a complementary DNA strand can be synthesized (e.g., 1 ⁇ strand extension on Figure 6) in the presence of modified nucleotides.
- Clustering, sequencing and aligning can be performed to align the incorporated artificial polymorphisms to provide a sequence useful for haplotype determination.
- libraries for sequencing can be prepared using a method compatible with the downstream sequencing instrument.
- the sequences of fragments, once determined, can be aligned on the basis of the synthetic SNPs present in the fragments and a haplotype can be constructed and determined based on that alignment, for example when the length of the sequence read is shorter than the distance between the two alleles for haplotype determination.
- the first sequences in Figure 4 A and B shows two exemplary alleles (allele 1 and 2) comprising naturally occurring polymorphisms, in this example SNPs, which are separated by more than 400 nucleotides (G-C in allele 1 and T-A in allele 2). As the distance between these SNPs is greater than the average insert size of the library preparatory method for sequencing, phasing or haplotyping of the two SNPs would not be determinable using unmodified nucleotides.
- the second sequences in Figure 4 A and B show the same region from exemplary alleles 1 and 2 after practicing a method of the present disclosure, for example practicing the method of partial bisulfite conversion of the parental genomic fragments prior to sequencing.
- the two modified allelic sequences demonstrate an example of a unique pattern of artificial polymorphisms which could be created by bisulfite conversion as disclosed herein.
- the short length sequence reads After sequencing, the short length sequence reads would be aligned based on the artificial polymorphisms to recreate the unique pattern for each allele, thereby reconstructing the original genomic DNA fragment ( Figure 5).
- the haplotype reconstruction of the two alleles, using allele 2 in Figure 5, is determined following fragment alignment based on synthetic polymorphic patterns.
- incorporating synthetic polymorphisms into a nucleic acid molecule prior to sequencing allows for a unique synthetic pattern which can be subsequently aligned post sequencing among the different sequence fragments, thereby providing a means for bridging the distance between the naturally occurring SNPs to determine their haplotype content or phase.
- methods disclosed herein provide a means for determining the origin of the sequenced fragments. For example, the relative frequency of artificial polymorphism creation and their random nature enables the determination of whether or not two DNA sequencing populations (e.g., two or more DNA clusters, isolated populations of DNA amplicons derived from one template, etc.) are derived from the same original parental DNA molecule. If two or more populations share the same overlapping pattern of artificial polymorphisms, it is contemplated that they are derived from the same chromosome and therefore all of the natural SNPs present in the populations can be haplotypes or phased together.
- two DNA sequencing populations e.g., two or more DNA clusters, isolated populations of DNA amplicons derived from one template, etc.
- the methods of creating artificial polymorphisms in a target genomic sequence which are designed to occur at a much higher frequency (or in closer proximity) in the target genomic DNA compared to the frequency (or proximity) of naturally occurring SNPs can be exploited to link naturally occurring SNPs in a target sequence when it was not previously possible due to the distance of separation between the naturally occurring SNPs in the target relative to the sequence read length.
- embodiments for creating artificial polymorphisms in a target genomic DNA as disclosed herein require no prior knowledge of the sequence being haplotyped.
- methods for determining a haplotype of a nucleic acid sample comprise incorporating artificial polymorphisms into the nucleic acid by biased amplification.
- Exemplary methods for performing biased amplification can be found at, for example, WO2011/106368 (incorporated herein by reference in its entirety).
- Biased amplification i.e., the process of increasing the numbers of a polynucleotide which can be linear or exponential
- dNTP deoxyribonucleotide triphosphate
- the methods may use a pool of dNTPs, wherein not all of the dNTPs (i.e., dATP, dTTP, dCTP, dGTP) are present at the same concentration in the pool.
- Pools of nucleotides may also include modified nucleotides such as those previously mentioned, which incorporate less efficiently (or less often) than canonical nucleotides.
- one or more of the dNTPs may be present at a concentration that is less than half of the combined concentrations of any other nucleotide in a step carried out in a method set forth herein such as an amplification reaction step.
- concentration of any one type of dNTP may be, for example, less than 1/4 the concentration of the other combined nucleotides, less than 1/5 the concentration of the other combined nucleotides, less than 1/10 the concentration of the other combined nucleotides, etc.
- concentration of a particular type of dNTP in an amplification reaction may be less than 20uM, less than lOuM, less than 0.2uM compared to the concentration of the remaining dNTPs (e.g., 200uM) present for an amplification reaction.
- the concentration of a particular type of dNTP in a composition or method set forth herein could be at least 5 fold less, at least 10 fold less, at least 20 fold less, at least 50 fold less than the concentration of the remaining dNTPs that are present.
- one or more adjuvants may be added.
- Concentrations of the one or more adjuvants may be between, for example, 2 to 5M.
- concentration of the one or more adjuvants may be between, for example, 2 to 5M.
- conditions may vary from reaction to reaction; as such some optimization for any particular system is contemplated (for example, amplification reaction conditions can be optimized in accordance with WO2011/106368, which is incorporated herein by reference in its entirety).
- incorporating the synthetic polymorphisms as described herein into target nucleic acids of interest prior to library preparation is advantageous for a variety of reasons.
- the methods for incorporating synthetic polynucleotides into nucleic acids as described herein can be performed in conjunction with any library preparation method regardless of assay instrument (e.g., library preparation protocols for use in sequencing instrumentation including, but not limited to, those of Illumina, Inc., Applied Biosystems®, Ion Torrent®, 454 Life Sciences, Complete Genomics, Pacific Biosciences, Oxford Nanopore Technology, etc.).
- teaching the methods described herein upstream of library preparation protocols allows the synthetic polymorphisms to be fixed and determinable prior to library preparation.
- practicing the methods described herein provides for an initial fragmentation of genomic DNA into longer fragments, for example more than lOObp, more than 300bp, more than 500bp, more than lOOObp, more than 2000bp, more than 10,000bp, etc.
- Longer fragments while typically not advantageous for next- generation sequencing, allow for the incorporation of more synthetic polymorphisms than would shorter fragments (e.g., ⁇ 300bp); as such providing a pattern of synthetic polymorphisms which, upon additional fragmentation of longer fragments into shorter fragments, can be readily discernible and alignable after sequencing.
- Another advantage of longer fragments is that longer fragments have the possibility of containing greater than one natural SNP as such more SNPs can be identified and aligned using fewer fragments.
- synthetic nucleotides can be incorporated into nucleic acids prior to nucleic acid fragmentation.
- modified nucleotides could be incorporated into cellular nucleic acids during cell culture.
- Modified nucleotides could be incorporated into cellular nucleic acids for example by modifying the culture media to include the modified nucleotides in a concentration sufficient to cause incorporation of the modified nucleotides into cellular DNA.
- genomic DNA can be rendered into smaller genomic molecules comprising modified nucleotides without the need for mechanical, chemical, or biological fragmentation following by modified nucleotide incorporation.
- randomers e.g., random sequence hexamers
- genomic DNA template e.g., genomic DNA template, randomers could be hybridized to genomic DNA and extended (e.g., by rolling circle amplification) thereby creating long strands of DNA which would serve the same purpose of other forms of fragmentation disclosed herein (e.g., create smaller polynucleotides for library preparation for sequencing).
- extension products resulting from the extension could then be used in bisulfite conversion methods for converting natural nucleotides to synthetic polymorphisms.
- modified nucleotides e.g., pPTP, 8- oxo-G, isoC, isoG, etc.
- extension reaction resulting in extension products that contain the modified nucleotides thereby concatenating the steps of creating shorter molecules from genomic DNA comprising modified nucleotides, which can then be used for further library preparatory methods.
- the resulting polynucleotides comprising the synthetic polymorphisms can be used for downstream assays.
- the modified nucleic acid molecules can be utilized for sequencing.
- the nucleic acid molecules comprising the synthetic polymorphisms find particular utility for determining or characterizing a haplotype of a sample.
- the nucleic acid molecules comprising the synthetic polymorphisms also find particular utility for de novo sequencing where shorter sequence reads can be aligned and assembled to create full length, and sometimes novel, sequences.
- the nucleic acid molecules comprising the synthetic polymorphisms also find particular utility when sequencing regions in the genome that comprise high incidence of repeated regions which can be difficult to align due to their repetitive nature.
- the random nature of incorporating the synthetic polymorphisms using the methods disclosed herein provides a modified nucleic acid molecule with a pattern of incorporated polymorphisms, that random pattern of which, once determined, can be aligned and reported for determining a sample haplotype (e.g. haplotype content or phase), a de novo sequence, verification of a sample sequence, the sequence of genomic locations that were previously deemed difficult to determine, etc.
- a sample haplotype e.g. haplotype content or phase
- Sequences determined by practicing methods disclosed herein can be used by diagnosticians, clinicians, researchers and other parties for example for correlating sequences to disease states (e.g., cancers, neurological disorders, degenerative disorders, etc.) information which in turn can be utilized to diagnose and predict whether or not an individual may or may not have, or may or may not have a predisposition to, a particular disease or disorder.
- certain sequences for example a haplotype, may be correlated to preferential treatment regimens for a particular disease or disorder which may be used by health care professionals to determine a treatment regimen specific to any particular individual.
- methods can be used to determine the type and number of repeated regions in a genome, for example for forensic purposes.
- the modified nucleic acid molecules comprising synthetic polymorphisms can find particular utility in sequencing, for example for determining a haplotype, for de novo sequencing, etc.
- the modified nucleic acid molecules comprising synthetic polymorphisms can be sequenced by any means.
- Target nucleic acids for example genomic DNA
- RNA may be harvested from a sample and cDNA created from the isolated RNA, wherein the cDNA can be used for sequencing.
- the terms "nucleic acid” and “polynucleotide” refer to deoxyribonucleic acid (DNA), ribonucleic acid (RNA), complementary DNA (cDNA) or analogues of DNA, cDNA or RNA.
- the nucleic acids can be single stranded or double stranded molecules.
- the nucleic acids or polynucleotides may have originated in single stranded form, such as ssDNA or RNA, or they may have originated in double stranded form (dsDNA) such as that found in genomic DNA, amplification products, and/or fragments thereof, and the like.
- the nucleic acids or polynucleotides regardless of stranded nature, may derive from any number of sources including, but not limited to, a sample from an entire genomic complement of an organism, a fragment of an entire genomic complement of an organism.
- Nucleic acids may include intronic and exonic sequences or any number of regulatory and/or non-regulatory sequences.
- a sample can be from any source, for example, prokaryote, archaea or eukaryote. Further, a sample can be liquid (i.e., blood, serum, plasma, cerebral spinal fluid, urine, etc.) or solid (i.e., cells, tissues, etc.).
- sample is used consistent with its meaning in the art of biology and chemistry. In one sense, it is meant to include a nucleic acid or polynucleotide or fragment thereof from a specimen or culture obtained from any source such as biological and environmental samples.
- Biological samples may be obtained from animals including, but not limited to humans, non-human primates, and non-human animals including, but not limited to, vertebrates such as rodents, ovines, bovines, ruminants, lagomorphs, porcines, caprines, equines, canines, felines, aves, etc.
- Biological samples include, but are not limited to, fluids such as blood products, tissues, cells, and the like.
- Biological samples can further be of plant origin, monocotyledonous or dicotyledonous, deciduous or evergreen, herbaceous or woody, including but not limited to agricultural plants, landscape plants, nursery plants, and the like.
- Environmental samples may be bacterial, viral, fungal, and the like, in origin.
- Preferred samples are eukaryotic in origin. Particularly useful samples are those derived from organisms having more than one set of haploid chromosomes (the set being one or more different chromosomes).
- a sample can be derived from an organism that is diploid, triploid or polyploid. Basically, any organismal nucleic acid sample source of interest to an investigator in determining sequence information is amenable to the present methods.
- a sample can also include a synthetic nucleic acid or fragment thereof. Derivatives or products of nucleic acids such as amplified copies or chemically modified species are also included.
- a sample is derived from a mammal, for example a human.
- nucleic acids can be processed further prior to sequencing, for example following library preparation protocols. Processing may differ depending on which sequencing instrument and technology is being utilized by the investigator. Methods and systems disclosed herein are not necessarily limited to any particular library preparation method or technology. Figures 1-3 exemplify practicing the disclosed methods, for example in some embodiments, prior to practicing library preparation. Even though there are advantages for performing the methods disclosed herein prior to typical library protocols wherein smaller fragments of genomic DNA are desired, the methods can be incorporated into the workflow of a typical library preparation methodology. For example, the methods disclosed herein could also be incorporated into any library preparation step prior to sequencing of the sample.
- the methods for incorporating synthetic polymorphisms into target DNA can be incorporated into a library workflow following library fragmentation of the sample and prior to sequencing the sample DNA.
- the method described herein may be incorporated into, or used in combination with, the sample preparation workflow for PACBIO RS DNA Template Preparation Kit ( Pacific Biosciences, Inc. , Menlo Park, CA) which utilizes SMRTbell (TM) technology library format where insert lengths for sequencing can be between 250 and 6000bp long.
- PACBIO RS DNA Template Preparation Kit Pacific Biosciences, Inc. , Menlo Park, CA
- TM SMRTbell
- An investigator can utilize PCR related methods for library preparation or can alternatively employ non-PCR based methods for library preparation.
- genomic DNA represented as a pair of homologous chromosomes can be randomly fragmented into long pieces of DNA fragments, for example fragments at least 300bp, at least 500bp, at least 750bp, at least lOOObp, at least 2000bp, at least 3000bp, at least 5000bp long. Random
- fragmentation can be accomplished by a variety of means known to a skilled artisan.
- mechanical and/or acoustic shearing can be used to fragment genomic DNA such as by repeatedly forcing a genomic DNA sample through a small bore syringe, by nebulization, by hydroshearing or by sonication.
- Initial fragmentation of nucleic acids can be the same or different as those utilized for a variety of library preparation protocols. Examples of nebulization effected fragmentation of DNA is described in the Paired-End Sample preparation kits by
- shearing of DNA is accomplished by acoustic/mechanical means such as that provided by Covaris® adaptive focused acoustics (AFA) processes.
- AFA adaptive focused acoustics
- sonication may also be used for fragmenting genomic DNA for example as exemplified in the workflow of the
- transposon based technology can be utilized for fragmenting DNA, for example as exemplified in the workflow for NexteraTM DNA sample preparation kits (Illumina, Inc.) wherein genomic DNA can be fragmented by an engineered transposome that simultaneously fragments and tags input DNA
- fragmentation thereby creating a population of fragmented nucleic acid molecules which comprise unique adapter sequences at the ends of the fragments.
- Transposon based methodologies are particularly advantageous when long nucleic acid fragments are desired.
- enzymatic fragmentation can be utilized to fragment genomic DNA, for example as employed in the workflow of Ion Plus and Ion XpressTM Plus and fragment library kits (Ion TorrentTM Life Technologies, Carlsbad, CA). As demonstrated, there are a myriad methods for fragmenting large nucleic acid molecules, such as genomic DNA, and a skilled artisan will understand that the method may be determined based on a particular assay technology and instrument.
- nucleic acids for assay are initially fragmented into long fragments as previously described further processing of the sample may be performed.
- additional sequences such as adapter sequences
- Adapter sequences may be used for additional downstream methods such as amplification, polymerase chain reaction, molecule capture methods, and the like.
- Such adapter sequences may be primer sequences which may be the same or different than adapter sequences utilized in downstream library preparation kits and methods.
- Adaptors may be double stranded, single stranded, forked (i.e., a portion of the adaptor being double stranded and a portion of the adaptor being two single strands) or in hairpin configuration (i.e., a portion of the adaptor being double stranded and a portion being a single stranded loop structure).
- Adaptors could also include unique sequences, such as barcodes, useful in identifying a particular target DNA.
- the methods disclosed herein are not necessarily limited to any particular use or sequence of adapters, and a skilled artisan will understand that use of adapters may be chosen based on the assay and instrument being used.
- Figures 1 -3 show exemplary embodiments for incorporation of synthetic polymorphisms into nucleic acids. For example, as seen in Figures 1-3 the
- incorporation of a modified nucleotide e.g., 8-oxo G
- bisulfite conversion of C to U and incorporation of a modified nucleotide (e.g., iC), respectively, can be performed for creating synthetic polymorphisms in nucleic acids.
- the modified nucleotide 8-oxo G can be incorporated into double stranded DNA by exposing the nucleic acid fragments to oxygen free radical species and/or ionizing radiation.
- 8-oxo G can be incorporated into a nucleic acid by annealing and extension of a primer on the nucleic acid in the presence of canonical nucleotides dATP, dTTP, dCTP and a ratio of dGTP to the analog 80xodGTP.
- canonical nucleotides dATP, dTTP, dCTP and a ratio of dGTP to the analog 80xodGTP canonical nucleotides dATP, dTTP, dCTP and a ratio of dGTP to the analog 80xodGTP.
- the ratio of dGTP to 80xodGTP is at least 1 : 1 , 1 :2, 1 :3, 1 :4, 1 :5, 1 :10, 1 :20, 1 :30, 1 :40, 1 :50, 1 :75, 1 :99.
- the percentage of 80xodGTP in a method for incorporating synthetic polymorphisms is 100% (i.e., no dGTP is added to a reaction).
- modified nucleotides such as iC and iG, as exemplified in Figure 3.
- conventional methods for bisulfite conversion known to a skilled artisan can be followed for partial conversion of cytosines to uracils in DNA as exemplified in Figure 2.
- one or more primers utilized to bind to the adapter sequences for incorporation of modified nucleotides by annealing and extension of the primers may be further associated with a binding moiety for effecting capture and purification of the modified nucleic acid strand from the non-modified strands (i.e., nucleic acid strands with no incorporated synthetic polymorphisms).
- the hapten biotin can be associated with a primer for subsequent capture by its binding partner streptavidin, thereby purifying it away from the non- modified nucleic acids.
- the present methods are not necessarily limited by a particular type or set of binding partners or capture system.
- the modified strand can be duplicated and synthetic
- polymorphisms replicated, for example by primer binding to an adapter affixed to the end of a nucleic acid followed by duplication to create a double stranded nucleic acid molecule with incorporated synthetic polymorphisms.
- Figure 2 demonstrates a method for incorporating synthetic polymorphisms wherein selective capture is not performed. This demonstrates that even though strand selection is advantageous it is not always required.
- the selected strand can be replicated by, for example, primer extension methods, wherein such replication or duplication incorporates synthetic polymorphisms opposite the location in the parent strand wherein resides the modified nucleotides.
- duplication of the template nucleic acid strand comprising 8-oxo G results in a complementary strand comprising newly incorporated adenines (A) or occasionally cytosines (C) opposite the location of 8-oxo G nucleotides in the template strand.
- adenines are exemplary of a nucleotide which mispairs with 8-oxo G.
- Cytosines can also pair with the modified nucleotide 8-oxo G.
- 8-oxo G is utilized as the modified nucleotide for incorporating synthetic polymorphisms
- adenines and/or cytosines can be incorporated as synthetic polymorphisms.
- the resulting synthetic polymorphism being incorporated can be a nucleotide which pairs with that specific modified nucleotide.
- Figure 1 demonstrates the removal of the exemplary modified nucleotide 8-oxo G prior to sequencing.
- the nucleotide 8-oxo G can pair with either adenines or cytosines, as such the maintenance of the 8-oxo G in a fragment for sequencing would not be preferential.
- a modified nucleotide is maintained in nucleic acid fragments used for sequencing. For example, the incorporation of isoC
- the nucleic acid fragments comprising the synthetic polymorphisms can be amplified. Such amplification can enrich a library for only those nucleic acid fragments that comprise adapters at both ends as well as to increase the amount of DNA in the fragment pool going into the library preparation process.
- polymerase chain reaction (PCR) amplification can be performed after incorporation of synthetic polymorphisms into nucleic acid fragments using primers that anneal to the adapters ligated to the ends of the nucleic acid fragments.
- Adapters as used herein may serve many functions, one of which is for hybridization to homologous sequences affixed to substrates, for example for performing emulsion PCR (emPCR) or clonal generation for use in sequence by synthesis methodologies.
- a library preparation for sequencing can be produced, for example, by performing the methods recommended by a particular sequencing method and instrument. For example, as described in protocols and manuals for use in any number of sequencing systems including, but not limited to, Illumina, Inc.
- a DNA library sample may be further amplified for sequencing by, for example, multiple strand displacement amplification (MDA) techniques.
- MDA multiple strand displacement amplification
- nucleic acid libraries A skilled artisan will recognize additional methods and technologies for producing nucleic acid libraries which could also be used in combination with methods described herein for incorporating synthetic polymorphisms into nucleic acid fragments. As such, embodiments described herein are not necessarily limited to any particular method for creating libraries, other than, in particular embodiments, the incorporation or creation of synthetic polymorphisms prior to or within those methods.
- DNA libraries comprising synthetic polymorphisms are advantageous for use in sequencing assays, for example for determining haplotypes, de novo sequence determinations and forensic nucleotide applications (i.e., nucleotide repeat regions, etc.) to name a few.
- DNA libraries comprising synthetic polymorphisms can be immobilized on a flowcell.
- the immobilized nucleic acids can be sequenced using single molecule resolution techniques or the immobilized nucleic acids can be amplified, for example via bridge amplification, for ensemble-based detection. Bridge amplification can be performed on the immobilized polynucleotides prior to sequencing, for example for sequence by synthesis methodologies.
- an immobilized polynucleotide e.g., from a DNA library
- an immobilized oligonucleotide primer is hybridized to an immobilized oligonucleotide primer. The 3 ' end of the immobilized
- polynucleotide molecule provides the template for a polymerase-catalyzed, template- directed elongation reaction (e.g., primer extension) extending from the immobilized oligonucleotide primer.
- primer extension e.g., primer extension
- the resulting double-stranded product "bridges" the two primers and both strands are covalently attached to the support.
- both immobilized strands can serve as templates for new primer extension.
- the first and second portions can be amplified to produce a plurality of clusters in a process known as
- Clusters and colonies are used interchangeably and refer to a plurality of copies of a nucleic acid sequence and/or complements thereof attached to a surface.
- the cluster comprises a plurality of copies of a nucleic acid sequence and/or complements thereof, attached via their 5' termini to the surface.
- Exemplary bridge amplification and clustering methodology are described, for example, in PCT Patent Publ. Nos. WO00/18957 and W098/44151, U.S. Patent No. 5,641,658; U.S. Patent Publ. No. 2002/0055100; U.S. Patent No. 7,115,400; U.S. Patent Publ. No.
- compositions and methods as described herein are particularly useful in sequence by synthesis methodologies utilizing a flowcell comprising clusters.
- Emulsion PCR Emulsion PCR
- Emulsion PCR comprises PCR amplification of an adaptor flanked shotgun DNA library in a water-in-oil emulsion.
- the PCR is multi-template PCR; in particular embodiments only a single primer pair is used.
- One of the PCR primers is tethered to the surface (5' attached) of microscale beads.
- a low template concentration results in most bead-containing emulsion microvesicles having zero or one template molecule present.
- PCR amplicons can be captured to the surface of the bead.
- beads bearing amplification products can be selectively enriched.
- Each clonally amplified bead will bear on its surface PCR products corresponding to amplification of a single molecule from the template library.
- the beads can then be arrayed on a surface of a flow cell for sequencing.
- Various embodiments of emulsion PCR methods are set forth in Dressman et al, Proc. Natl. Acad. Sci. USA 100:8817-8822 (2003), PCT Patent Publ. No. WO 05/010145, U.S. Patent Publ. Nos. 2005/0130173, 2005/0064460, and 2005/0042648, each of which is incorporated herein by reference in its entirety.
- DNA nanoballs can also be used in combination with methods and compositions as described herein.
- Methods for creating and utilizing DNA nanoballs for genomic sequencing can be found at, for example, US patents and publications 7,910,354, 2009/0264299, 2009/0011943, 2009/0005252, 2009/0155781, 2009/0118488 and as described in, for example, Drmanac et al, 2010, Science 327(5961): 78-81; all of which are incorporated herein by reference in their entireties.
- genomic library DNA fragmentation adaptors are ligated to the fragments, the adapter ligated fragments are circularized by ligation with a circle ligase and rolling circle
- amplification is carried out (as described in Lizardi et al, 1998. Nat. Genet. 19:225-232 and US 2007/0099208 Al, each of which is incorporated herein by reference in its entirety).
- the extended concatameric structure of the amplicons promotes coiling thereby creating compact DNA nanoballs.
- the DNA nanoballs can be captured on substrates, preferably to create an ordered or patterned array such that distance between each nanoball is maintained thereby allowing sequencing of the separate DNA nanoballs.
- consecutive rounds of adapter ligation, amplification and digestion are carried out prior to circularization to produce head to tail constructs having several genomic DNA fragments separated by adapter sequences.
- Sequencing by synthesis generally comprises sequential addition of one or more nucleotides to a growing polynucleotide chain in the 5' to 3' direction using a polymerase.
- the extended polynucleotide chain is complementary to the nucleic acid template affixed on the substrate (e.g., flowcell, chip, slide, etc.); the target sequence comprising the synthetic polymorphism.
- Disclosed method for determining haplotype, de novo sequence, etc. by incorporation of synthetic polymorphisms into a polynucleotide or fragment thereof also find utility when used in sequencing by ligation, sequencing by hybridization, and other sequencing technologies.
- An exemplary sequence by ligation methodology is di-base encoding (e.g., color space sequencing) utilized by Applied Biosystems' SOLiDTM sequencing system (Voelkerding et al, 2009, Clin Chem 55:641 -658; incorporated herein by reference in its entirety).
- Sequence by hybridization comprises the use of an array of short sequences of nucleotide probes to which is added fragmented, labeled target DNA (Drmanac et al, 2002, Adv Biochem Eng Biotechnol 77:75-101 ; Lizardi et al., 2008, Nat Biotech 26:649-650, US Patent 7,071,324; incorporated herein by reference in their entireties). Further improvements to sequence by hybridization can be found at, for example, US patent application publications 2007/0178516, 2010/0063264 and 2006/0287833 (incorporated herein by reference in their entireties).
- Sequencing approaches which combine hybridization and ligation biochemistries have been developed and commercialized, such as the genomic sequencing technology practiced by Complete Genomics, Mountain View, CA.
- combinatorial probe- anchor ligation or cPALTM (Drmanac et al, 2010, Science 327(5961): 78-81) utilizes ligation biochemistry while exploiting advantages of sequence by hybridization.
- the methods for haplotyping, de novo sequencing, etc. disclosed herein could be utilized in combinatorial probe-anchor ligation sequencing technologies. It is contemplated that the methods as described herein for use of synthetic polymorphisms to determine haplotype, de novo sequence, etc. are not limited by any particular sequencing methodology. Additional sequencing technologies include, but are not limited to, those practiced by one or more of polony sequencing technology (Dover Systems), sequencing by hybridization fluorescent platforms (Complete Genomics) and sTOP technology (Industrial Technology Research Institute).
- Single molecule sequencing can also be used with methods as disclosed herein.
- non-amplified DNA libraries for sequencing can be prepared as previously described.
- the library fragments can be hybridized and captured on a substrate such as a flow cell and assayed on, for example, a HeliScopeTM Single molecule sequencing.
- nucleic acid detection systems such as those provided by Illumina®, Inc. (HiSeq 1000, HiSeq 2000, HiSeq 2500, Genome Analyzers, MiSeq, HiScan, iScan, BeadExpress systems), Applied BiosystemsTM Life Technologies (ABI PRISM® Sequence detection systems, SOLiDTM System), Ion TorrentTM Life Technologies (Ion PGMTM, Ion ProtonTM) 454 Life Sciences (GS Junior, GS FLX+), PacBio RS (Pacific Biosciences®), Oxford
- Nanopore Technologies® (GridlON, MinlON) or other sequencing instruments, further as those described in, for example, United States patents and patent applications 5,888,737, 6,175,002, 5,695,934, 6,140,489, 5,863,722, 2007/007991, 2009/0247414, 2010/0111768 and PCT application WO2007/123744, and United States patent application serial nos. 61/431,425, 61/431,440, 61/431,439, 61/431,429, 61/438,486 each of which is incorporated herein by reference in its entirety.
- Output from a sequencing instrument can be of any sort.
- some current technologies utilize a light generating readable output, such as fluorescence or luminescence.
- Other technologies utilize semiconductors which detect ion release and digitally output sequence based on hydrogen ions released during incorporation of nucleotides during sequencing.
- the present methods are not limited to the type of readable output as long as differences in output signal for a particular sequence of interest is potentially determinable.
- analysis software examples include, but are not limited to, Pipeline, CASAVA and GenomeStudio data analysis software (Illumina®, Inc.), SOLiDTM, DNASTAR® SeqMan® NGen® and Partek® Genomics SuiteTM data analysis software (Life Technologies), Feature Extraction and Agilent Genomics Workbench data analysis software (Agilent Technologies), Genotyping ConsoleTM, Chromosome Analysis Suite data analysis software (Affymetrix®).
- one or more software programs for use with methods and compositions disclosed herein will have the capacity to recognize the incorporated synthetic polymorphism patterns present in the fragment sequence data, align the polymorphisms identified in the fragment sequence data and output a sequence based on that alignment.
- the output may comprise a haplotype (e.g. haplotype content or phase) for the target sample.
- the output may comprise de novo sequence information for the target sample.
- output may comprise forensic nucleotide repeat information, such a type (i.e., sequence of repeat, location of repeat, number of short or intermediate tandem repeats, etc.
- sequence analysis and alignment comprises aligning the sequence reads against a reference genome, or de novo assembly of alignable regions, for example by barcoding introduced into the library fragments for sequencing as known to a skilled artisan.
- standard alignment software tools could be used. For example, if synthetic SNP density is high, then alignment programs could be modified such that alignments are adequately permissive enough to place sequence reads.
- existing modified alignment pipelines for bisulfite sequencing could be used when synthetic SNPs are incorporated by bisulfite conversion methodologies (e.g., as described at www.bioinformatics.babraham.ac.uk/projects/bismark).
- built-in error correction modules can be disabled for standard short read assemblers when reading sequence derived from practicing methods disclosed herein (2008, Zerbino and Birney, 2008, Genome Res 18:821 -829, incorporated herein by reference in its entirety).
- Algorithms for building haplotype blocks from short-sequence reads could be used with methods disclosed herein (Bansal and Bafna, 2008, Bioinformatics 24: il 53- il 59). Such algorithms may, however, be modified away from the standard assumption of two discrete haplotypes as would be expected when sequencing a normal diploid human DNA molecule. For example, the introduced synthetic SNPs would result in a larger number of apparent or artificial haplotypes corresponding to each original sequence fragment and therefore modifications would be made in the algorithms to accommodate this non-standard information.
- the synthetic SNPs could be identified from normal nucleotide sequences in a number of ways.
- the original sequence which has not been modified could serve as the reference sequence and therefore as the control without the synthetic SNPs.
- the polymorphisms that are not present in the original sequence could be identified and correlated with those locations in the modified sequence, thereby identifying the locations in the modified sequence where synthetic SNPs were incorporated. Alignment could then take place using those identified modified nucleotides.
- the synthetic polymorphisms would be expected to be unique to the original sequence. As such, by sequencing original fragments at a particular genomic position, the frequency of the polymorphisms across the synthetic haplotypes could be estimated and compared to the expected frequency in a normal diploid human sample.
- the merging of artificial haplotypes can be performed by algorithms which are modified to identify the synthetic polymorphisms, such as
- HapCUT or modifications thereto (2009, Bansal and Bafna).
- the algorithms could be modified to merge SNPs identified as non-synthetic SNPs but derived from different synthetic haplotypes, thereby creating the true underlying haplotype aligned map.
- output from aligned sequences comprising both natural and synthetic polymorphisms could include both the locations of the natural polymorphisms and the locations of the synthetic polymorphisms in the reconstructed haplotype.
- output could include just the natural polymorphisms in the reconstructed haplotypes with the synthetic polymorphisms being screened out.
- Visualization can be accomplished in a number of ways, for example a standard genome browser such as an integrative genomics viewer (IGV) could be utilized (2011,
- the reconstructed haplotypes could be annotated in the genome browser to highlight the positions of the true, natural polymorphisms and/or the synthetic polymorphisms (e.g., if present in the output).
- other visualization tools may also be used as known to a skilled artisan.
- the present methods are not necessarily limited to the algorithms, methods or systems used for aligning and outputting or visualizing the sequences derived from practicing the methods disclosed herein.
- genomic DNA Prior to library preparation the genomic DNA can be modified to include artificial polymorphisms.
- the genomic DNA can be initially fragmented into large pieces (for example several kilobases). The larger fragment size maximizes the occurrence of two or more artificial SNPs in the same fragment while maximizing the occurrence of more heterozygous SNPs.
- Transposon mediated fragmentation of nucleic acids and hydroshearing are examples of methods for generating initial DNA fragments of, for example, between l,000-40,000bp.
- phiX A bacteriophage reference genome, phi X 174 or phiX was used as phiX has a small, well defined genomic sequence of 5386 bases.
- a standard paired end Illumina flow cell was seeded with a standard phiX library at a concentration of 2pM following manufacturer's protocols. Following hybridization of the library to the flowcell bound oligonucleotides, DNA molecules were copied in the flowcell lanes using the first strand extension method by incubating the flow cell at 40°C for 1 hour in the presence of a DNA polymerase and various nucleotide mixes (natural and unnatural) as found in Table 1.
- Table 2 shows a summary of a sequencing run for each lane of the flowcell.
- Lane 1 is the control lane and is representative of sequencing output from a normal sequencing run using normal dNTPs.
- Lanes 2-6 show sequencing run output when one or both modified nucleotides are incorporated in combination with, or replacing, normal dNTPs during first strand extension (dNTP concentrations from Table 1). The % Error
- Figure 7 shows graphs of cycle versus error rates for the control (A) lane
- Lane 6 results were basically the same as lane
- Figure 8 shows that incorporating the modified nucleotides into first strand extension resulted in a large number of sequenced fragments containing 1, 2, 3, 4 or more synthetic SNPs relative to the control, which would allow for fragment alignment of synthetic SNPs and hence haplotype determination.
- Figure 9 shows a lane by lane comparison of the mutations resulting from the use of dPTP during incorporation and prevalence (error rate) in the sequencing reads.
- dPTP can base-pair to both A and G thereby allowing for the following mutations to occur when dPTP is incorporated into the first strand extension product; A ⁇ G, G ⁇ A, T ⁇ C and C ⁇ T.
- the G ⁇ A mutation dominates over other types of mutations.
- small amounts of dCTP and dTTP are present during the incorporation reaction (lanes 4, 6 and 8) that mutational domination is minimal.
- reaction conditions for Lane 5 were too extreme, resulting in sequencing failure for this lane.
- a region of the p53 gene was further sequenced using PTP modified nucleotide inserted into the gene prior to sequencing.
- a region of the p53 gene was amplified using oligonucleotides TP53 Exonl 3. IF (Tail-
- PCR mix consisted of IX Thermopol buffer, 26U/ml of Taq DNA polymerase, 0.52 ⁇ of each oligonucleotide.
- Reaction 1 contained 200 ⁇ of each natural nucleotide (dATP, dCTP, dGTP, dTTP).
- Reaction 2 contained approximately 200uM of dATP and dGTP, 198 ⁇ of dCTP and dTTP and 2 ⁇ of dPTP.
- Reaction 3 contained approximately 200 ⁇ of dATP and dGTP, 180 ⁇ of dCTP and dTTP and 20 ⁇ of dPTP. Amplification was carried out using the following conditions: 94°C for 3 minutes followed by 38 cycles of 94°C for 30 seconds, 50°C for 30 seconds, 72°C for 5 minutes. After cycling, samples were incubated at 72°C for 5 minutes and the temperature was lowered to 4°C.
- the p53 target template was an aliquot of a PCR product amplified from sample NA18507 (human 1) using Phusion polymerase in a master mix (IX final concentration). A negative control (no template) was also included.
- PCR reactions 1 and 3 were loaded onto a SYBR® Safe pre-stained 1% agarose gel in TAE and the gel bands of the expected size were excised using the QIAQuick Gel extraction kit following manufacturer's protocol.
- DNA was eluted in 30 ⁇ 1 of Elution Buffer.
- a second round of amplification was performed with Phusion polymerase in HiFi buffer with the primers previously described.
- One ⁇ of the previous eluted DNA was used as template for the second PCR reaction (1 ⁇ total volume).
- PCR conditions were as follows: 98°C for 1 minute followed by 38 cycles of 98°C for 10 seconds, 50°C for 30 seconds, 72°C for 5 minutes. After cycling, samples were incubated at 72°C for 5 minutes and stored at 4°C.
- PCR reactions were loaded onto a SYBR® Safe pre-stained 1% agarose gel in TAE and the DNA bands of the expected size were excised using a QIAQuick Gel extraction kit. DNA was eluted in
- Eluted DNA was A-tailed at 74°C for 30 minutes with dATP and Taq in IX Thermopol buffer in a total volume of ⁇ per sample following standard protocols.
- a 3.5 ⁇ 1 aliquot of A-tailed DNA was ligated into pGEM®-T Easy vector (Promega) using Quick ligase (New England Bio labs). Ligations were transformed into XL10 Gold competent cells (Stratagene). After an overnight incubation at 37°C on antibiotic containing agar plates, single colonies were picked and inoculated into Luria Broth. Plasmid DNA was prepared from approximately 3ml of bacterial culture from each clone using a QIAprep Spin Miniprep kit (QIAGEN).
- Plasmid DNA was eluted in 50 ⁇ 1 of EB. Clones were screened for the presence of the insert by restriction enzyme digestion with EcoRI. Positive clones (three clones from the PCR with natural dNTPs and 6 clones from the PCR in the presence of dPTP) were sequenced by capillary sequencing with the SP6 and T7 primers homologous to pGEM®-T Easy vector sequences and also with an internal primer specific to the p53 sequence inserts for verification of modified nucleotide incorporation.
- Figure 11 shows the SBS sequencing results from three random clones A, B and D.
- the sequences represent sequence runs from a region of a p53gene demonstrating natural SNPs interspersed with incorporated synthetic SNPs.
- the approximate locations of the natural heterozygous SNPs are represented by stars on the graphs.
- the vertical lines represent locations of SNPs and demonstrate the random and spatially distributed nature of the synthetic SNP incorporation.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Zoology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Wood Science & Technology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Computing Systems (AREA)
- Clinical Laboratory Science (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201380029854.6A CN104508144B (en) | 2012-07-18 | 2013-05-20 | For the method and system for determining haplotype He determining phase haplotype |
EP13727321.5A EP2875150B1 (en) | 2012-07-18 | 2013-05-20 | Methods and systems for determining haplotypes and phasing of haplotypes |
JP2015522158A JP6091613B2 (en) | 2012-07-18 | 2013-05-20 | Method and system for haplotype determination and haplotype fading |
AU2013291816A AU2013291816B2 (en) | 2012-07-18 | 2013-05-20 | Methods and systems for determining haplotypes and phasing of haplotypes |
CN201810250540.3A CN108486236B (en) | 2012-07-18 | 2013-05-20 | Methods and systems for determining haplotypes and phasing haplotypes |
CA2873327A CA2873327C (en) | 2012-07-18 | 2013-05-20 | Methods and systems for determining haplotypes and phasing of haplotypes |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261673052P | 2012-07-18 | 2012-07-18 | |
US61/673,052 | 2012-07-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014013218A1 true WO2014013218A1 (en) | 2014-01-23 |
Family
ID=48577136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2013/051305 WO2014013218A1 (en) | 2012-07-18 | 2013-05-20 | Methods and systems for determining haplotypes and phasing of haplotypes |
Country Status (7)
Country | Link |
---|---|
US (4) | US9977861B2 (en) |
EP (1) | EP2875150B1 (en) |
JP (1) | JP6091613B2 (en) |
CN (2) | CN108486236B (en) |
AU (1) | AU2013291816B2 (en) |
CA (1) | CA2873327C (en) |
WO (1) | WO2014013218A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3204521A4 (en) * | 2014-10-10 | 2018-03-21 | Cold Spring Harbor Laboratories | Random nucleotide mutation for nucleotide template counting and assembly |
US9977861B2 (en) | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
WO2020035669A1 (en) * | 2018-08-13 | 2020-02-20 | Longas Technologies Pty Ltd | Sequencing algorithm |
US11421238B2 (en) | 2018-02-20 | 2022-08-23 | Longas Technologies Pty Ltd | Method for introducing mutations |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9238836B2 (en) | 2012-03-30 | 2016-01-19 | Pacific Biosciences Of California, Inc. | Methods and compositions for sequencing modified nucleic acids |
US10691775B2 (en) | 2013-01-17 | 2020-06-23 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
US10847251B2 (en) | 2013-01-17 | 2020-11-24 | Illumina, Inc. | Genomic infrastructure for on-site or cloud-based DNA and RNA processing and analysis |
US9679104B2 (en) | 2013-01-17 | 2017-06-13 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
EP2994749A4 (en) | 2013-01-17 | 2017-07-19 | Edico Genome Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
US9792405B2 (en) | 2013-01-17 | 2017-10-17 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
US10068054B2 (en) | 2013-01-17 | 2018-09-04 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
GB201410646D0 (en) * | 2014-06-14 | 2014-07-30 | Illumina Cambridge Ltd | Methods of increasing sequencing accuracy |
KR20200020997A (en) | 2015-02-10 | 2020-02-26 | 일루미나, 인코포레이티드 | The method and the composition for analyzing the cellular constituent |
EP3329491A2 (en) | 2015-03-23 | 2018-06-06 | Edico Genome Corporation | Method and system for genomic visualization |
WO2017040695A1 (en) * | 2015-09-01 | 2017-03-09 | Recombinetics, Inc. | Method of identifying the presence of foreign alleles in a desired haplotype |
US10068183B1 (en) | 2017-02-23 | 2018-09-04 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on a quantum processing platform |
US20170270245A1 (en) | 2016-01-11 | 2017-09-21 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods for performing secondary and/or tertiary processing |
CN108241792B (en) * | 2016-12-23 | 2021-03-23 | 深圳华大基因科技服务有限公司 | Method and device for integrating multi-platform genotyping results |
WO2018232580A1 (en) * | 2017-06-20 | 2018-12-27 | 深圳华大基因研究院 | Method and device for haplotype phasing of diploid genome based on third generation capture sequencing |
CA3094717A1 (en) | 2018-04-02 | 2019-10-10 | Grail, Inc. | Methylation markers and targeted methylation probe panels |
CN109273052B (en) * | 2018-09-13 | 2022-03-18 | 北京百迈客生物科技有限公司 | Genome haploid assembling method and device |
CN113286881A (en) | 2018-09-27 | 2021-08-20 | 格里尔公司 | Methylation signatures and target methylation probe plates |
US11211147B2 (en) | 2020-02-18 | 2021-12-28 | Tempus Labs, Inc. | Estimation of circulating tumor fraction using off-target reads of targeted-panel sequencing |
US11211144B2 (en) | 2020-02-18 | 2021-12-28 | Tempus Labs, Inc. | Methods and systems for refining copy number variation in a liquid biopsy assay |
US11475981B2 (en) | 2020-02-18 | 2022-10-18 | Tempus Labs, Inc. | Methods and systems for dynamic variant thresholding in a liquid biopsy assay |
CN117711488B (en) * | 2023-11-29 | 2024-07-02 | 东莞博奥木华基因科技有限公司 | Gene haplotype detection method based on long-reading long-sequencing and application thereof |
Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5641658A (en) | 1994-08-03 | 1997-06-24 | Mosaic Technologies, Inc. | Method for performing amplification of nucleic acid with two primers bound to a single solid support |
US5695934A (en) | 1994-10-13 | 1997-12-09 | Lynx Therapeutics, Inc. | Massively parallel sequencing of sorted polynucleotides |
WO1998044151A1 (en) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Method of nucleic acid amplification |
US5888737A (en) | 1997-04-15 | 1999-03-30 | Lynx Therapeutics, Inc. | Adaptor-based sequence analysis |
WO2000018957A1 (en) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Methods of nucleic acid amplification and sequencing |
US20020055100A1 (en) | 1997-04-01 | 2002-05-09 | Kawashima Eric H. | Method of nucleic acid sequencing |
US20040002090A1 (en) | 2002-03-05 | 2004-01-01 | Pascal Mayer | Methods for detecting genome-wide sequence variations associated with a phenotype |
US20040096853A1 (en) | 2000-12-08 | 2004-05-20 | Pascal Mayer | Isothermal amplification of nucleic acids on a solid support |
US20040175702A1 (en) * | 2003-03-07 | 2004-09-09 | Illumigen Biosciences, Inc. | Method and apparatus for pattern identification in diploid DNA sequence data |
WO2005010145A2 (en) | 2003-07-05 | 2005-02-03 | The Johns Hopkins University | Method and compositions for detection and enumeration of genetic variations |
US20050042648A1 (en) | 1997-07-07 | 2005-02-24 | Andrew Griffiths | Vitro sorting method |
US20050064460A1 (en) | 2001-11-16 | 2005-03-24 | Medical Research Council | Emulsion compositions |
US20050130173A1 (en) | 2003-01-29 | 2005-06-16 | Leamon John H. | Methods of amplifying and sequencing nucleic acids |
US7071324B2 (en) | 1998-10-13 | 2006-07-04 | Brown University Research Foundation | Systems and methods for sequencing by hybridization |
US20060287833A1 (en) | 2005-06-17 | 2006-12-21 | Zohar Yakhini | Method and system for sequencing nucleic acid molecules using sequencing by hybridization and comparison with decoration patterns |
US20070007991A1 (en) | 2005-06-29 | 2007-01-11 | Altera Corporation | I/O circuitry for reducing ground bounce and VCC sag in integrated circuit devices |
US20070099208A1 (en) | 2005-06-15 | 2007-05-03 | Radoje Drmanac | Single molecule arrays for genetic and chemical analysis |
US20070128624A1 (en) | 2005-11-01 | 2007-06-07 | Gormley Niall A | Method of preparing libraries of template polynucleotides |
US20070178516A1 (en) | 1993-11-01 | 2007-08-02 | Nanogen, Inc. | Self-addressable self-assembling microelectronic integrated systems, component devices, mechanisms, methods, and procedures for molecular biological analysis and diagnostics |
WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
US20080009420A1 (en) | 2006-03-17 | 2008-01-10 | Schroth Gary P | Isothermal methods for creating clonal single molecule arrays |
US20090005252A1 (en) | 2006-02-24 | 2009-01-01 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090011943A1 (en) | 2005-06-15 | 2009-01-08 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090247414A1 (en) | 2005-04-18 | 2009-10-01 | Bojan Obradovic | Method and device for nucleic acid sequencing using a planar waveguide |
US20100063264A1 (en) | 2003-11-17 | 2010-03-11 | Jacobson Joseph M | Nucleotide sequencing via repetitive single molecule hybridization |
US7910354B2 (en) | 2006-10-27 | 2011-03-22 | Complete Genomics, Inc. | Efficient arrays of amplified polynucleotides |
WO2011106368A2 (en) | 2010-02-23 | 2011-09-01 | Illumina, Inc. | Amplification methods to minimise sequence specific bias |
WO2011157846A1 (en) * | 2010-06-18 | 2011-12-22 | Katholieke Universiteit Leuven | Methods for haplotyping single cells |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10055368A1 (en) | 2000-11-08 | 2002-05-29 | Agrobiogen Gmbh Biotechnologie | Method for labeling samples containing DNA using oligonucleotides |
GB0115194D0 (en) | 2001-06-21 | 2001-08-15 | Leuven K U Res & Dev | Novel technology for genetic mapping |
CA2473125A1 (en) | 2002-01-09 | 2003-07-24 | Jeffrey Christopher Gladnick | Chair lift accessory for accommodating snowboarders and mountain bikers |
US20040005294A1 (en) * | 2002-02-25 | 2004-01-08 | Ho-Young Lee | IGFBP-3 in the diagnosis and treatment of cancer |
WO2004042078A1 (en) | 2002-11-05 | 2004-05-21 | The University Of Queensland | Nucleotide sequence analysis by quantification of mutagenesis |
DE602004021902D1 (en) * | 2003-01-17 | 2009-08-20 | Univ Boston | haplotype analysis |
WO2005090607A1 (en) * | 2004-03-08 | 2005-09-29 | Rubicon Genomics, Inc. | Methods and compositions for generating and amplifying dna libraries for sensitive detection and analysis of dna methylation |
US8148085B2 (en) | 2006-05-15 | 2012-04-03 | Sea Lane Biotechnologies, Llc | Donor specific antibody libraries |
US8852864B2 (en) * | 2008-01-17 | 2014-10-07 | Sequenom Inc. | Methods and compositions for the analysis of nucleic acids |
JP2009215171A (en) | 2008-03-07 | 2009-09-24 | Tokyo Institute Of Technology | Nucleoside triphosphate derivative |
US20150211070A1 (en) * | 2011-09-22 | 2015-07-30 | Immu-Metrix, Llc | Compositions and methods for analyzing heterogeneous samples |
US9977861B2 (en) * | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
-
2013
- 2013-03-11 US US13/793,676 patent/US9977861B2/en active Active
- 2013-05-20 JP JP2015522158A patent/JP6091613B2/en active Active
- 2013-05-20 CA CA2873327A patent/CA2873327C/en active Active
- 2013-05-20 CN CN201810250540.3A patent/CN108486236B/en active Active
- 2013-05-20 CN CN201380029854.6A patent/CN104508144B/en active Active
- 2013-05-20 EP EP13727321.5A patent/EP2875150B1/en active Active
- 2013-05-20 AU AU2013291816A patent/AU2013291816B2/en active Active
- 2013-05-20 WO PCT/GB2013/051305 patent/WO2014013218A1/en active Application Filing
-
2018
- 2018-05-11 US US15/977,814 patent/US11257568B2/en active Active
-
2021
- 2021-12-15 US US17/552,326 patent/US20220180969A1/en active Pending
-
2022
- 2022-02-18 US US17/675,295 patent/US11605446B2/en active Active
Patent Citations (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070178516A1 (en) | 1993-11-01 | 2007-08-02 | Nanogen, Inc. | Self-addressable self-assembling microelectronic integrated systems, component devices, mechanisms, methods, and procedures for molecular biological analysis and diagnostics |
US5641658A (en) | 1994-08-03 | 1997-06-24 | Mosaic Technologies, Inc. | Method for performing amplification of nucleic acid with two primers bound to a single solid support |
US5695934A (en) | 1994-10-13 | 1997-12-09 | Lynx Therapeutics, Inc. | Massively parallel sequencing of sorted polynucleotides |
US5863722A (en) | 1994-10-13 | 1999-01-26 | Lynx Therapeutics, Inc. | Method of sorting polynucleotides |
US6140489A (en) | 1994-10-13 | 2000-10-31 | Lynx Therapeutics, Inc. | Compositions for sorting polynucleotides |
US20020055100A1 (en) | 1997-04-01 | 2002-05-09 | Kawashima Eric H. | Method of nucleic acid sequencing |
WO1998044151A1 (en) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Method of nucleic acid amplification |
US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
US5888737A (en) | 1997-04-15 | 1999-03-30 | Lynx Therapeutics, Inc. | Adaptor-based sequence analysis |
US6175002B1 (en) | 1997-04-15 | 2001-01-16 | Lynx Therapeutics, Inc. | Adaptor-based sequence analysis |
US20050042648A1 (en) | 1997-07-07 | 2005-02-24 | Andrew Griffiths | Vitro sorting method |
WO2000018957A1 (en) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Methods of nucleic acid amplification and sequencing |
US7115400B1 (en) | 1998-09-30 | 2006-10-03 | Solexa Ltd. | Methods of nucleic acid amplification and sequencing |
US7071324B2 (en) | 1998-10-13 | 2006-07-04 | Brown University Research Foundation | Systems and methods for sequencing by hybridization |
US20040096853A1 (en) | 2000-12-08 | 2004-05-20 | Pascal Mayer | Isothermal amplification of nucleic acids on a solid support |
US20050064460A1 (en) | 2001-11-16 | 2005-03-24 | Medical Research Council | Emulsion compositions |
US20040002090A1 (en) | 2002-03-05 | 2004-01-01 | Pascal Mayer | Methods for detecting genome-wide sequence variations associated with a phenotype |
US20050130173A1 (en) | 2003-01-29 | 2005-06-16 | Leamon John H. | Methods of amplifying and sequencing nucleic acids |
US20040175702A1 (en) * | 2003-03-07 | 2004-09-09 | Illumigen Biosciences, Inc. | Method and apparatus for pattern identification in diploid DNA sequence data |
WO2005010145A2 (en) | 2003-07-05 | 2005-02-03 | The Johns Hopkins University | Method and compositions for detection and enumeration of genetic variations |
US20100063264A1 (en) | 2003-11-17 | 2010-03-11 | Jacobson Joseph M | Nucleotide sequencing via repetitive single molecule hybridization |
US20090247414A1 (en) | 2005-04-18 | 2009-10-01 | Bojan Obradovic | Method and device for nucleic acid sequencing using a planar waveguide |
US20090011943A1 (en) | 2005-06-15 | 2009-01-08 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20070099208A1 (en) | 2005-06-15 | 2007-05-03 | Radoje Drmanac | Single molecule arrays for genetic and chemical analysis |
US20060287833A1 (en) | 2005-06-17 | 2006-12-21 | Zohar Yakhini | Method and system for sequencing nucleic acid molecules using sequencing by hybridization and comparison with decoration patterns |
US20070007991A1 (en) | 2005-06-29 | 2007-01-11 | Altera Corporation | I/O circuitry for reducing ground bounce and VCC sag in integrated circuit devices |
US20070128624A1 (en) | 2005-11-01 | 2007-06-07 | Gormley Niall A | Method of preparing libraries of template polynucleotides |
US20090005252A1 (en) | 2006-02-24 | 2009-01-01 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090118488A1 (en) | 2006-02-24 | 2009-05-07 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090155781A1 (en) | 2006-02-24 | 2009-06-18 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090264299A1 (en) | 2006-02-24 | 2009-10-22 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20080009420A1 (en) | 2006-03-17 | 2008-01-10 | Schroth Gary P | Isothermal methods for creating clonal single molecule arrays |
WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
US20100111768A1 (en) | 2006-03-31 | 2010-05-06 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
US7910354B2 (en) | 2006-10-27 | 2011-03-22 | Complete Genomics, Inc. | Efficient arrays of amplified polynucleotides |
WO2011106368A2 (en) | 2010-02-23 | 2011-09-01 | Illumina, Inc. | Amplification methods to minimise sequence specific bias |
WO2011157846A1 (en) * | 2010-06-18 | 2011-12-22 | Katholieke Universiteit Leuven | Methods for haplotyping single cells |
Non-Patent Citations (26)
Title |
---|
"Current Protocols in Molecular Biology", JOHN WILEY & SONS, INC. |
AUSUBEL ET AL.: "Short Protocols in Molecular Biology", JOHN WILEY & SONS, INC. |
BANSAL; BAFNA, BIOINFORMATICS, vol. 24, 2008, pages I153 - I159 |
CHENG ET AL., JBIOL CHEM, vol. 267, 1992, pages 166 - 172 |
CLARK ET AL., NUCL ACIDS RES, vol. 22, 1994, pages 2990 - 2997 |
CLARK SUSAN J ET AL: "High sensitivity mapping of methylated cytosines", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 22, no. 15, 1 January 1994 (1994-01-01), pages 2990 - 2997, XP002210107, ISSN: 0305-1048 * |
DRESSMAN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 100, 2003, pages 8817 - 8822 |
DRMANAC ET AL., ADV BIOCHEM ENG BIOTECHNOL, vol. 77, 2002, pages 75 - 101 |
DRMANAC ET AL., SCIENCE, vol. 327, no. 5961, 2010, pages 78 - 81 |
HAINES ET AL., DEV BIOL, vol. 240, pages 585 - 598 |
LEPAGE ET AL., NUCL ACIDS RES, vol. 26, 1998, pages 1276 - 1281 |
LISTER ET AL., NATURE, vol. 462, 2009, pages 315 - 322 |
LIZARDI ET AL., NAT BIOTECH, vol. 26, 2008, pages 649 - 650 |
LIZARDI ET AL., NAT. GENET., vol. 19, 1998, pages 225 - 232 |
LYKO ET AL., NATURE, vol. 408, 2000, pages 538 - 540 |
PUCHKAREV ET AL., NAT. BIOTECHNOL., vol. 27, 2009, pages 847 - 52 |
RAMSAHOYE ET AL., PROC NAT ACAD SCI, vol. 97, 2001, pages 5237 - 5242 |
ROBINSON ET AL., NAT BIOTECH, vol. 29, 2011, pages 24 - 26 |
SAMBROOK, FRITSCH AND MANIATUS,: "Molecular Cloning: A Laboratory Manual", COLD SPRING HARBOR LABORATORY |
SAMBROOK, FRITSCH AND MANIATUS,: "Molecular Cloning; A Laboratory Manual", COLD SPRING HARBOR LABORATORY PRESS |
SHARON R. BROWNING ET AL: "Haplotype phasing: existing methods and new developments", NATURE REVIEWS GENETICS, vol. 12, no. 10, 1 January 2011 (2011-01-01), pages 703 - 714, XP055008581, ISSN: 1471-0056, DOI: 10.1038/nrg3054 * |
SISMOUR; BENNER, NUCL ACIDS RES, vol. 33, 2005, pages 5640 - 5646 |
THOMPSON; STEINMANN, CURR. PROT. MOL. BIOL., 2010 |
VOELKERDING ET AL., CLIN CHEM, vol. 55, 2009, pages 641 - 658 |
ZERBINO; BIRNEY, 2008, GENOME RES, vol. 18, 2008, pages 821 - 829 |
ZHANG KUN ET AL: "Long-range polony haplotyping of individual human chromosome molecules", NATURE GENETICS, NATURE PUBLISHING GROUP, NEW YORK, US, vol. 38, no. 3, 1 March 2006 (2006-03-01), pages 382 - 387, XP002588422, ISSN: 1061-4036, [retrieved on 20060219], DOI: 10.1038/NG1741 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9977861B2 (en) | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
US11257568B2 (en) | 2012-07-18 | 2022-02-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
US11605446B2 (en) | 2012-07-18 | 2023-03-14 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
EP3204521A4 (en) * | 2014-10-10 | 2018-03-21 | Cold Spring Harbor Laboratories | Random nucleotide mutation for nucleotide template counting and assembly |
US11008606B2 (en) | 2014-10-10 | 2021-05-18 | Cold Spring Harbor Laboratory | Random nucleotide mutation for nucleotide template counting and assembly |
EP3957742A1 (en) * | 2014-10-10 | 2022-02-23 | Cold Spring Harbor Laboratory | Random nucleotide mutation for nucleotide template counting and assembly |
US11421238B2 (en) | 2018-02-20 | 2022-08-23 | Longas Technologies Pty Ltd | Method for introducing mutations |
WO2020035669A1 (en) * | 2018-08-13 | 2020-02-20 | Longas Technologies Pty Ltd | Sequencing algorithm |
US20210174905A1 (en) * | 2018-08-13 | 2021-06-10 | Longas Technologies Pty Ltd. | Sequencing Algorithm |
EP4293123A3 (en) * | 2018-08-13 | 2024-01-17 | Illumina Singapore PTE. Ltd. | Sequencing algorithm |
Also Published As
Publication number | Publication date |
---|---|
US9977861B2 (en) | 2018-05-22 |
US20220180970A1 (en) | 2022-06-09 |
US20140024537A1 (en) | 2014-01-23 |
CN104508144B (en) | 2018-04-17 |
US11257568B2 (en) | 2022-02-22 |
AU2013291816A1 (en) | 2014-11-27 |
CN108486236A (en) | 2018-09-04 |
EP2875150A1 (en) | 2015-05-27 |
EP2875150B1 (en) | 2017-05-17 |
US20220180969A1 (en) | 2022-06-09 |
CA2873327A1 (en) | 2014-01-23 |
CN104508144A (en) | 2015-04-08 |
CA2873327C (en) | 2019-07-09 |
AU2013291816B2 (en) | 2019-01-17 |
JP2015522289A (en) | 2015-08-06 |
US20180322243A1 (en) | 2018-11-08 |
JP6091613B2 (en) | 2017-03-08 |
US11605446B2 (en) | 2023-03-14 |
CN108486236B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11605446B2 (en) | Methods and systems for determining haplotypes and phasing of haplotypes | |
US11365445B2 (en) | Linked paired strand sequencing | |
AU2017370655B2 (en) | Compositions and methods for identifying nucleic acid molecules | |
EP3572528A1 (en) | Direct capture, amplification and sequencing of target dna using immobilized primers | |
US11608518B2 (en) | Methods for analyzing nucleic acids | |
CN115667507A (en) | Polynucleotide barcodes for long read sequencing | |
US12123055B2 (en) | Linked paired strand sequencing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13727321 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2873327 Country of ref document: CA Ref document number: 2015522158 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2013727321 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013727321 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2013291816 Country of ref document: AU Date of ref document: 20130520 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |