US20210180123A1 - Methods and systems for sequencing long nucleic acids - Google Patents
Methods and systems for sequencing long nucleic acids Download PDFInfo
- Publication number
- US20210180123A1 US20210180123A1 US17/005,496 US202017005496A US2021180123A1 US 20210180123 A1 US20210180123 A1 US 20210180123A1 US 202017005496 A US202017005496 A US 202017005496A US 2021180123 A1 US2021180123 A1 US 2021180123A1
- Authority
- US
- United States
- Prior art keywords
- sequencing
- extension
- nucleic acid
- sequence
- primer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 503
- 150000007523 nucleic acids Chemical class 0.000 title claims abstract description 268
- 102000039446 nucleic acids Human genes 0.000 title claims abstract description 252
- 108020004707 nucleic acids Proteins 0.000 title claims abstract description 252
- 238000000034 method Methods 0.000 title claims abstract description 199
- 125000003729 nucleotide group Chemical group 0.000 claims description 268
- 239000002773 nucleotide Substances 0.000 claims description 245
- 239000000758 substrate Substances 0.000 claims description 70
- 230000002441 reversible effect Effects 0.000 claims description 67
- 238000001514 detection method Methods 0.000 claims description 48
- 108060002716 Exonuclease Proteins 0.000 claims description 23
- 102000013165 exonuclease Human genes 0.000 claims description 23
- 102000004190 Enzymes Human genes 0.000 claims description 21
- 108090000790 Enzymes Proteins 0.000 claims description 21
- 230000029087 digestion Effects 0.000 claims description 18
- 238000005406 washing Methods 0.000 claims description 18
- 235000011180 diphosphates Nutrition 0.000 claims description 11
- 230000000593 degrading effect Effects 0.000 claims description 10
- 230000006862 enzymatic digestion Effects 0.000 claims description 5
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 claims description 3
- 239000012634 fragment Substances 0.000 abstract description 48
- 239000003153 chemical reaction reagent Substances 0.000 abstract description 23
- 239000000047 product Substances 0.000 description 102
- 108020004414 DNA Proteins 0.000 description 96
- 102000053602 DNA Human genes 0.000 description 96
- 239000000523 sample Substances 0.000 description 95
- 238000006243 chemical reaction Methods 0.000 description 88
- 210000004027 cell Anatomy 0.000 description 76
- 102000040430 polynucleotide Human genes 0.000 description 55
- 108091033319 polynucleotide Proteins 0.000 description 55
- 239000002157 polynucleotide Substances 0.000 description 55
- 230000003321 amplification Effects 0.000 description 48
- 238000003199 nucleic acid amplification method Methods 0.000 description 48
- 238000010348 incorporation Methods 0.000 description 45
- 238000003752 polymerase chain reaction Methods 0.000 description 45
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 42
- 239000011324 bead Substances 0.000 description 35
- 230000000295 complement effect Effects 0.000 description 32
- 230000008569 process Effects 0.000 description 31
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 30
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 30
- 239000000203 mixture Substances 0.000 description 30
- 229920002477 rna polymer Polymers 0.000 description 29
- 150000002500 ions Chemical class 0.000 description 27
- 238000005516 engineering process Methods 0.000 description 21
- 239000000243 solution Substances 0.000 description 21
- 206010028980 Neoplasm Diseases 0.000 description 20
- 108091034117 Oligonucleotide Proteins 0.000 description 20
- 108090000623 proteins and genes Proteins 0.000 description 20
- 201000011510 cancer Diseases 0.000 description 19
- 102000007347 Apyrase Human genes 0.000 description 18
- 108010007730 Apyrase Proteins 0.000 description 18
- 239000000872 buffer Substances 0.000 description 18
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 18
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 18
- 238000009396 hybridization Methods 0.000 description 18
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 17
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 17
- 238000007792 addition Methods 0.000 description 16
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 16
- 108091028043 Nucleic acid sequence Proteins 0.000 description 15
- 238000009826 distribution Methods 0.000 description 14
- 239000012099 Alexa Fluor family Substances 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 13
- 238000013459 approach Methods 0.000 description 13
- -1 at least 10 Chemical class 0.000 description 13
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 13
- 230000002068 genetic effect Effects 0.000 description 13
- 244000005700 microbiome Species 0.000 description 12
- 230000035772 mutation Effects 0.000 description 12
- 241000588724 Escherichia coli Species 0.000 description 11
- 230000001419 dependent effect Effects 0.000 description 11
- 239000003814 drug Substances 0.000 description 11
- 239000007787 solid Substances 0.000 description 11
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 10
- 201000010099 disease Diseases 0.000 description 10
- 241000894007 species Species 0.000 description 10
- 238000003786 synthesis reaction Methods 0.000 description 10
- 108020004635 Complementary DNA Proteins 0.000 description 9
- 108010014594 Heterogeneous Nuclear Ribonucleoprotein A1 Proteins 0.000 description 9
- 229940079593 drug Drugs 0.000 description 9
- 102000009609 Pyrophosphatases Human genes 0.000 description 8
- 108010009413 Pyrophosphatases Proteins 0.000 description 8
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 8
- 238000003491 array Methods 0.000 description 8
- 238000010804 cDNA synthesis Methods 0.000 description 8
- 239000002299 complementary DNA Substances 0.000 description 8
- 238000004925 denaturation Methods 0.000 description 8
- 230000036425 denaturation Effects 0.000 description 8
- 238000011065 in-situ storage Methods 0.000 description 8
- 239000010410 layer Substances 0.000 description 8
- 244000052769 pathogen Species 0.000 description 8
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 7
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 7
- 108091093088 Amplicon Proteins 0.000 description 7
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 7
- 238000000137 annealing Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 7
- XPPKVPWEQAFLFU-UHFFFAOYSA-N diphosphoric acid Chemical compound OP(O)(=O)OP(O)(O)=O XPPKVPWEQAFLFU-UHFFFAOYSA-N 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 239000011521 glass Substances 0.000 description 7
- 238000012165 high-throughput sequencing Methods 0.000 description 7
- 238000007834 ligase chain reaction Methods 0.000 description 7
- 239000000463 material Substances 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 6
- 241000196324 Embryophyta Species 0.000 description 6
- ISWSIDIOOBJBQZ-UHFFFAOYSA-N Phenol Chemical compound OC1=CC=CC=C1 ISWSIDIOOBJBQZ-UHFFFAOYSA-N 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 6
- 230000037430 deletion Effects 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 239000007788 liquid Substances 0.000 description 6
- 239000012528 membrane Substances 0.000 description 6
- 238000007481 next generation sequencing Methods 0.000 description 6
- 210000003463 organelle Anatomy 0.000 description 6
- 230000036961 partial effect Effects 0.000 description 6
- 238000006116 polymerization reaction Methods 0.000 description 6
- 238000002360 preparation method Methods 0.000 description 6
- 238000003908 quality control method Methods 0.000 description 6
- 239000011535 reaction buffer Substances 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 238000005096 rolling process Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 241000894006 Bacteria Species 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 5
- 229910019142 PO4 Inorganic materials 0.000 description 5
- 230000001580 bacterial effect Effects 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 5
- 238000003745 diagnosis Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 239000000839 emulsion Substances 0.000 description 5
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 5
- 239000000499 gel Substances 0.000 description 5
- 230000005055 memory storage Effects 0.000 description 5
- 108020004999 messenger RNA Proteins 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000005257 nucleotidylation Effects 0.000 description 5
- 235000021317 phosphate Nutrition 0.000 description 5
- 239000013612 plasmid Substances 0.000 description 5
- 102000004169 proteins and genes Human genes 0.000 description 5
- PYWVYCXTNDRMGF-UHFFFAOYSA-N rhodamine B Chemical compound [Cl-].C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=CC=C1C(O)=O PYWVYCXTNDRMGF-UHFFFAOYSA-N 0.000 description 5
- 238000007480 sanger sequencing Methods 0.000 description 5
- MPLHNVLQVRSVEE-UHFFFAOYSA-N texas red Chemical compound [O-]S(=O)(=O)C1=CC(S(Cl)(=O)=O)=CC=C1C(C1=CC=2CCCN3CCCC(C=23)=C1O1)=C2C1=C(CCC1)C3=[N+]1CCCC3=C2 MPLHNVLQVRSVEE-UHFFFAOYSA-N 0.000 description 5
- 108700028369 Alleles Proteins 0.000 description 4
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 4
- 108091035707 Consensus sequence Proteins 0.000 description 4
- 102000003960 Ligases Human genes 0.000 description 4
- 108090000364 Ligases Proteins 0.000 description 4
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 4
- 241000700605 Viruses Species 0.000 description 4
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 4
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 239000004202 carbamide Substances 0.000 description 4
- 210000000349 chromosome Anatomy 0.000 description 4
- 150000001875 compounds Chemical class 0.000 description 4
- 239000003398 denaturant Substances 0.000 description 4
- 239000012530 fluid Substances 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 238000002955 isolation Methods 0.000 description 4
- 239000003446 ligand Substances 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000000813 microbial effect Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000001717 pathogenic effect Effects 0.000 description 4
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 4
- 239000010452 phosphate Substances 0.000 description 4
- 230000000379 polymerizing effect Effects 0.000 description 4
- 238000004393 prognosis Methods 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 210000001519 tissue Anatomy 0.000 description 4
- 230000002103 transcriptional effect Effects 0.000 description 4
- 208000035473 Communicable disease Diseases 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 3
- 201000010374 Down Syndrome Diseases 0.000 description 3
- 208000026350 Inborn Genetic disease Diseases 0.000 description 3
- 102000009617 Inorganic Pyrophosphatase Human genes 0.000 description 3
- 108010009595 Inorganic Pyrophosphatase Proteins 0.000 description 3
- 108020005196 Mitochondrial DNA Proteins 0.000 description 3
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 3
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 3
- 241001362551 Samba Species 0.000 description 3
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 229960002685 biotin Drugs 0.000 description 3
- 235000020958 biotin Nutrition 0.000 description 3
- 239000011616 biotin Substances 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 239000006227 byproduct Substances 0.000 description 3
- 238000003776 cleavage reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000004624 confocal microscopy Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 206010012601 diabetes mellitus Diseases 0.000 description 3
- 208000035475 disorder Diseases 0.000 description 3
- 239000000975 dye Substances 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 208000016361 genetic disease Diseases 0.000 description 3
- 102000054766 genetic haplotypes Human genes 0.000 description 3
- 238000012268 genome sequencing Methods 0.000 description 3
- 229920001519 homopolymer Polymers 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000002493 microarray Methods 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 239000002751 oligonucleotide probe Substances 0.000 description 3
- 230000001766 physiological effect Effects 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 238000011533 pre-incubation Methods 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 108090000765 processed proteins & peptides Proteins 0.000 description 3
- 238000003753 real-time PCR Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 3
- 238000004574 scanning tunneling microscopy Methods 0.000 description 3
- 230000007017 scission Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000010845 search algorithm Methods 0.000 description 3
- 229910052710 silicon Inorganic materials 0.000 description 3
- 239000010703 silicon Substances 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 3
- 230000005945 translocation Effects 0.000 description 3
- 230000003612 virological effect Effects 0.000 description 3
- 238000012070 whole genome sequencing analysis Methods 0.000 description 3
- OAKPWEUQDVLTCN-NKWVEPMBSA-N 2',3'-Dideoxyadenosine-5-triphosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1CC[C@@H](CO[P@@](O)(=O)O[P@](O)(=O)OP(O)(O)=O)O1 OAKPWEUQDVLTCN-NKWVEPMBSA-N 0.000 description 2
- VGIRNWJSIRVFRT-UHFFFAOYSA-N 2',7'-difluorofluorescein Chemical compound OC(=O)C1=CC=CC=C1C1=C2C=C(F)C(=O)C=C2OC2=CC(O)=C(F)C=C21 VGIRNWJSIRVFRT-UHFFFAOYSA-N 0.000 description 2
- 206010005949 Bone cancer Diseases 0.000 description 2
- 208000018084 Bone neoplasm Diseases 0.000 description 2
- 208000003174 Brain Neoplasms Diseases 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 206010009944 Colon cancer Diseases 0.000 description 2
- 201000003883 Cystic fibrosis Diseases 0.000 description 2
- 102000010719 DNA-(Apurinic or Apyrimidinic Site) Lyase Human genes 0.000 description 2
- 108010063362 DNA-(Apurinic or Apyrimidinic Site) Lyase Proteins 0.000 description 2
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 2
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 2
- AHCYMLUZIRLXAA-SHYZEUOFSA-N Deoxyuridine 5'-triphosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C=C1 AHCYMLUZIRLXAA-SHYZEUOFSA-N 0.000 description 2
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- 208000009292 Hemophilia A Diseases 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 239000000020 Nitrocellulose Substances 0.000 description 2
- 108020004485 Nonsense Codon Proteins 0.000 description 2
- 239000004677 Nylon Substances 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 108010090804 Streptavidin Proteins 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- 208000037280 Trisomy Diseases 0.000 description 2
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 2
- 102100037111 Uracil-DNA glycosylase Human genes 0.000 description 2
- HDRRAMINWIWTNU-NTSWFWBYSA-N [[(2s,5r)-5-(2-amino-6-oxo-3h-purin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=2NC(N)=NC(=O)C=2N=CN1[C@H]1CC[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HDRRAMINWIWTNU-NTSWFWBYSA-N 0.000 description 2
- ARLKCWCREKRROD-POYBYMJQSA-N [[(2s,5r)-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)CC1 ARLKCWCREKRROD-POYBYMJQSA-N 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 2
- 238000004630 atomic force microscopy Methods 0.000 description 2
- 239000012472 biological sample Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 150000001720 carbohydrates Chemical class 0.000 description 2
- 235000014633 carbohydrates Nutrition 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000000701 chemical imaging Methods 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 208000029742 colonic neoplasm Diseases 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- URGJWIFLBWJRMF-JGVFFNPUSA-N ddTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)CC1 URGJWIFLBWJRMF-JGVFFNPUSA-N 0.000 description 2
- 238000010511 deprotection reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000001177 diphosphate Substances 0.000 description 2
- 230000036267 drug metabolism Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000006911 enzymatic reaction Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 231100000221 frame shift mutation induction Toxicity 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 238000001502 gel electrophoresis Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000013412 genome amplification Methods 0.000 description 2
- KWIUHFFTVRNATP-UHFFFAOYSA-N glycine betaine Chemical compound C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 208000014829 head and neck neoplasm Diseases 0.000 description 2
- 208000019622 heart disease Diseases 0.000 description 2
- 238000000126 in silico method Methods 0.000 description 2
- 239000012678 infectious agent Substances 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- PHTQWCKDNZKARW-UHFFFAOYSA-N isoamylol Chemical compound CC(C)CCO PHTQWCKDNZKARW-UHFFFAOYSA-N 0.000 description 2
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 208000032839 leukemia Diseases 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 238000007403 mPCR Methods 0.000 description 2
- 239000006249 magnetic particle Substances 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000002503 metabolic effect Effects 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 238000000386 microscopy Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000004651 near-field scanning optical microscopy Methods 0.000 description 2
- 229920001220 nitrocellulos Polymers 0.000 description 2
- 230000037434 nonsense mutation Effects 0.000 description 2
- 229920001778 nylon Polymers 0.000 description 2
- 238000002515 oligonucleotide synthesis Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 238000000206 photolithography Methods 0.000 description 2
- 239000004033 plastic Substances 0.000 description 2
- 229920003023 plastic Polymers 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 238000005086 pumping Methods 0.000 description 2
- BBEAQIROQSPTKN-UHFFFAOYSA-N pyrene Chemical compound C1=CC=C2C=CC3=CC=CC4=CC=C1C2=C43 BBEAQIROQSPTKN-UHFFFAOYSA-N 0.000 description 2
- 230000002285 radioactive effect Effects 0.000 description 2
- 239000000376 reactant Substances 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 238000004621 scanning probe microscopy Methods 0.000 description 2
- 238000002864 sequence alignment Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000037436 splice-site mutation Effects 0.000 description 2
- WGTODYJZXSJIAG-UHFFFAOYSA-N tetramethylrhodamine chloride Chemical compound [Cl-].C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C(O)=O WGTODYJZXSJIAG-UHFFFAOYSA-N 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- 229940035893 uracil Drugs 0.000 description 2
- IOOMXAQUNPWDLL-UHFFFAOYSA-N 2-[6-(diethylamino)-3-(diethyliminiumyl)-3h-xanthen-9-yl]-5-sulfobenzene-1-sulfonate Chemical compound C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=C(S(O)(=O)=O)C=C1S([O-])(=O)=O IOOMXAQUNPWDLL-UHFFFAOYSA-N 0.000 description 1
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 description 1
- 125000003903 2-propenyl group Chemical group [H]C([*])([H])C([H])=C([H])[H] 0.000 description 1
- 206010000021 21-hydroxylase deficiency Diseases 0.000 description 1
- 108700001666 APC Genes Proteins 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 208000017194 Affective disease Diseases 0.000 description 1
- 208000007848 Alcoholism Diseases 0.000 description 1
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 241000567030 Ampulloclitocybe clavipes Species 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 208000023275 Autoimmune disease Diseases 0.000 description 1
- 208000010061 Autosomal Dominant Polycystic Kidney Diseases 0.000 description 1
- 108700040618 BRCA1 Genes Proteins 0.000 description 1
- 101150072950 BRCA1 gene Proteins 0.000 description 1
- 102100022548 Beta-hexosaminidase subunit alpha Human genes 0.000 description 1
- 108010078791 Carrier Proteins Proteins 0.000 description 1
- 108020004998 Chloroplast DNA Proteins 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 108020004638 Circular DNA Proteins 0.000 description 1
- 102100026735 Coagulation factor VIII Human genes 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 208000002330 Congenital Heart Defects Diseases 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- 108010017826 DNA Polymerase I Proteins 0.000 description 1
- 102000004594 DNA Polymerase I Human genes 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 1
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 description 1
- 101150029707 ERBB2 gene Proteins 0.000 description 1
- 108010067770 Endopeptidase K Proteins 0.000 description 1
- 241000701533 Escherichia virus T4 Species 0.000 description 1
- 229910052693 Europium Inorganic materials 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 201000003542 Factor VIII deficiency Diseases 0.000 description 1
- 208000001914 Fragile X syndrome Diseases 0.000 description 1
- 230000005526 G1 to G0 transition Effects 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 206010017993 Gastrointestinal neoplasms Diseases 0.000 description 1
- 208000018565 Hemochromatosis Diseases 0.000 description 1
- 208000031220 Hemophilia Diseases 0.000 description 1
- 208000008051 Hereditary Nonpolyposis Colorectal Neoplasms Diseases 0.000 description 1
- 208000017095 Hereditary nonpolyposis colon cancer Diseases 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 102000006947 Histones Human genes 0.000 description 1
- 101000911390 Homo sapiens Coagulation factor VIII Proteins 0.000 description 1
- 241000722343 Human papillomavirus types Species 0.000 description 1
- 208000023105 Huntington disease Diseases 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 238000004971 IR microspectroscopy Methods 0.000 description 1
- 208000001019 Inborn Errors Metabolism Diseases 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 208000017924 Klinefelter Syndrome Diseases 0.000 description 1
- 208000000501 Lipidoses Diseases 0.000 description 1
- 206010024585 Lipidosis Diseases 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 206010025323 Lymphomas Diseases 0.000 description 1
- 201000005027 Lynch syndrome Diseases 0.000 description 1
- 208000032271 Malignant tumor of penis Diseases 0.000 description 1
- 208000001826 Marfan syndrome Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 108091092878 Microsatellite Proteins 0.000 description 1
- 208000019022 Mood disease Diseases 0.000 description 1
- 206010068871 Myotonic dystrophy Diseases 0.000 description 1
- 208000003019 Neurofibromatosis 1 Diseases 0.000 description 1
- 208000024834 Neurofibromatosis type 1 Diseases 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- AWZJFZMWSUBJAJ-UHFFFAOYSA-N OG-514 dye Chemical compound OC(=O)CSC1=C(F)C(F)=C(C(O)=O)C(C2=C3C=C(F)C(=O)C=C3OC3=CC(O)=C(F)C=C32)=C1F AWZJFZMWSUBJAJ-UHFFFAOYSA-N 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 102000015636 Oligopeptides Human genes 0.000 description 1
- 108010038807 Oligopeptides Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 206010031243 Osteogenesis imperfecta Diseases 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 241000282373 Panthera pardus Species 0.000 description 1
- 241000282376 Panthera tigris Species 0.000 description 1
- 208000002471 Penile Neoplasms Diseases 0.000 description 1
- 206010034299 Penile cancer Diseases 0.000 description 1
- 108091005804 Peptidases Proteins 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 201000011252 Phenylketonuria Diseases 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 229920001213 Polysorbate 20 Polymers 0.000 description 1
- 239000004793 Polystyrene Substances 0.000 description 1
- 239000004365 Protease Substances 0.000 description 1
- 201000000660 Pyloric Stenosis Diseases 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 208000037340 Rare genetic disease Diseases 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 201000000582 Retinoblastoma Diseases 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 229920002684 Sepharose Polymers 0.000 description 1
- 101710082933 Single-strand DNA-binding protein Proteins 0.000 description 1
- 208000000453 Skin Neoplasms Diseases 0.000 description 1
- 102000039471 Small Nuclear RNA Human genes 0.000 description 1
- 108020004688 Small Nuclear RNA Proteins 0.000 description 1
- 238000002105 Southern blotting Methods 0.000 description 1
- PJANXHGTPQOBST-VAWYXSNFSA-N Stilbene Natural products C=1C=CC=CC=1/C=C/C1=CC=CC=C1 PJANXHGTPQOBST-VAWYXSNFSA-N 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 206010043101 Talipes Diseases 0.000 description 1
- 208000022292 Tay-Sachs disease Diseases 0.000 description 1
- 229910052771 Terbium Inorganic materials 0.000 description 1
- 208000024313 Testicular Neoplasms Diseases 0.000 description 1
- 206010057644 Testis cancer Diseases 0.000 description 1
- 208000002903 Thalassemia Diseases 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 206010044688 Trisomy 21 Diseases 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 208000026928 Turner syndrome Diseases 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 229910052770 Uranium Inorganic materials 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 208000006593 Urologic Neoplasms Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 101000578253 Xenopus laevis Homeobox protein Nkx-3.2 Proteins 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000003916 acid precipitation Methods 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 201000007930 alcohol dependence Diseases 0.000 description 1
- 208000006682 alpha 1-Antitrypsin Deficiency Diseases 0.000 description 1
- 150000001408 amides Chemical class 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 235000013405 beer Nutrition 0.000 description 1
- 229960003237 betaine Drugs 0.000 description 1
- 125000002619 bicyclic group Chemical group 0.000 description 1
- 230000027455 binding Effects 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 235000008429 bread Nutrition 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000001444 catalytic combustion detection Methods 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 230000036755 cellular response Effects 0.000 description 1
- 201000007455 central nervous system cancer Diseases 0.000 description 1
- 208000025997 central nervous system neoplasm Diseases 0.000 description 1
- 239000000919 ceramic Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 235000013351 cheese Nutrition 0.000 description 1
- VYXSBFYARXAAKO-WTKGSRSZSA-N chembl402140 Chemical compound Cl.C1=2C=C(C)C(NCC)=CC=2OC2=C\C(=N/CC)C(C)=CC2=C1C1=CC=CC=C1C(=O)OCC VYXSBFYARXAAKO-WTKGSRSZSA-N 0.000 description 1
- 150000005829 chemical entities Chemical class 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 208000011654 childhood malignant neoplasm Diseases 0.000 description 1
- 210000003763 chloroplast Anatomy 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 208000029664 classic familial adenomatous polyposis Diseases 0.000 description 1
- 206010009259 cleft lip Diseases 0.000 description 1
- 201000011228 clubfoot Diseases 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000002322 conducting polymer Substances 0.000 description 1
- 229920001940 conductive polymer Polymers 0.000 description 1
- 208000028831 congenital heart disease Diseases 0.000 description 1
- 229920000547 conjugated polymer Polymers 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000009223 counseling Methods 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000004132 cross linking Methods 0.000 description 1
- DMSZORWOGDLWGN-UHFFFAOYSA-N ctk1a3526 Chemical compound NP(N)(N)=O DMSZORWOGDLWGN-UHFFFAOYSA-N 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 238000004163 cytometry Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 125000001295 dansyl group Chemical group [H]C1=C([H])C(N(C([H])([H])[H])C([H])([H])[H])=C2C([H])=C([H])C([H])=C(C2=C1[H])S(*)(=O)=O 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008021 deposition Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 1
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 1
- 239000000539 dimer Substances 0.000 description 1
- NAGJZTKCGNOGPW-UHFFFAOYSA-K dioxido-sulfanylidene-sulfido-$l^{5}-phosphane Chemical compound [O-]P([O-])([S-])=S NAGJZTKCGNOGPW-UHFFFAOYSA-K 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 229940000406 drug candidate Drugs 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 238000010894 electron beam technology Methods 0.000 description 1
- 238000001493 electron microscopy Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 230000005686 electrostatic field Effects 0.000 description 1
- 210000000750 endocrine system Anatomy 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000006872 enzymatic polymerization reaction Methods 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- IINNWAYUJNWZRM-UHFFFAOYSA-L erythrosin B Chemical compound [Na+].[Na+].[O-]C(=O)C1=CC=CC=C1C1=C2C=C(I)C(=O)C(I)=C2OC2=C(I)C([O-])=C(I)C=C21 IINNWAYUJNWZRM-UHFFFAOYSA-L 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- OGPBJKLSAFTDLK-UHFFFAOYSA-N europium atom Chemical compound [Eu] OGPBJKLSAFTDLK-UHFFFAOYSA-N 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 230000001036 exonucleolytic effect Effects 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000005357 flat glass Substances 0.000 description 1
- GVEPBJHOBDJJJI-UHFFFAOYSA-N fluoranthrene Natural products C1=CC(C2=CC=CC=C22)=C3C2=CC=CC3=C1 GVEPBJHOBDJJJI-UHFFFAOYSA-N 0.000 description 1
- GNBHRKFJIUUOQI-UHFFFAOYSA-N fluorescein Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GNBHRKFJIUUOQI-UHFFFAOYSA-N 0.000 description 1
- 238000000799 fluorescence microscopy Methods 0.000 description 1
- 238000012632 fluorescent imaging Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 230000004077 genetic alteration Effects 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 201000010536 head and neck cancer Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 238000007849 hot-start PCR Methods 0.000 description 1
- 210000003917 human chromosome Anatomy 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003100 immobilizing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000016245 inborn errors of metabolism Diseases 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 208000015978 inherited metabolic disease Diseases 0.000 description 1
- 238000007641 inkjet printing Methods 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000155 isotopic effect Effects 0.000 description 1
- 229910052747 lanthanoid Inorganic materials 0.000 description 1
- 150000002602 lanthanoids Chemical class 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000001459 lithography Methods 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000023707 liver extraskeletal osteosarcoma Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 239000012160 loading buffer Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- DLBFLQKQABVKGT-UHFFFAOYSA-L lucifer yellow dye Chemical compound [Li+].[Li+].[O-]S(=O)(=O)C1=CC(C(N(C(=O)NN)C2=O)=O)=C3C2=CC(S([O-])(=O)=O)=CC3=C1N DLBFLQKQABVKGT-UHFFFAOYSA-L 0.000 description 1
- 235000019689 luncheon sausage Nutrition 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 235000013372 meat Nutrition 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 239000002082 metal nanoparticle Substances 0.000 description 1
- 239000010445 mica Substances 0.000 description 1
- 229910052618 mica group Inorganic materials 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 239000011259 mixed solution Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 239000010841 municipal wastewater Substances 0.000 description 1
- 201000000050 myeloid neoplasm Diseases 0.000 description 1
- 239000011807 nanoball Substances 0.000 description 1
- 239000002086 nanomaterial Substances 0.000 description 1
- 238000007857 nested PCR Methods 0.000 description 1
- 201000010193 neural tube defect Diseases 0.000 description 1
- 238000001668 nucleic acid synthesis Methods 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 238000002966 oligonucleotide array Methods 0.000 description 1
- 231100000590 oncogenic Toxicity 0.000 description 1
- 230000002246 oncogenic effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229920000620 organic polymer Polymers 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 108700025694 p53 Genes Proteins 0.000 description 1
- VYNDHICBIRRPFP-UHFFFAOYSA-N pacific blue Chemical compound FC1=C(O)C(F)=C2OC(=O)C(C(=O)O)=CC2=C1 VYNDHICBIRRPFP-UHFFFAOYSA-N 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000009928 pasteurization Methods 0.000 description 1
- 230000002572 peristaltic effect Effects 0.000 description 1
- 238000011170 pharmaceutical development Methods 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- UEZVMMHDMIWARA-UHFFFAOYSA-M phosphonate Chemical compound [O-]P(=O)=O UEZVMMHDMIWARA-UHFFFAOYSA-M 0.000 description 1
- 150000003013 phosphoric acid derivatives Chemical class 0.000 description 1
- 238000003322 phosphorimaging Methods 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 238000006303 photolysis reaction Methods 0.000 description 1
- 230000015843 photosynthesis, light reaction Effects 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 201000008519 polycystic kidney disease 1 Diseases 0.000 description 1
- 229920000193 polymethacrylate Polymers 0.000 description 1
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 1
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 1
- 208000014081 polyp of colon Diseases 0.000 description 1
- 229920005553 polystyrene-acrylate Polymers 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000037452 priming Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000007425 progressive decline Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 125000006239 protecting group Chemical group 0.000 description 1
- 238000012175 pyrosequencing Methods 0.000 description 1
- 239000010453 quartz Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 108700042226 ras Genes Proteins 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- XFKVYXCRNATCOO-UHFFFAOYSA-M rhodamine 6G Chemical compound [Cl-].C=12C=C(C)C(NCC)=CC2=[O+]C=2C=C(NCC)C(C)=CC=2C=1C1=CC=CC=C1C(=O)OCC XFKVYXCRNATCOO-UHFFFAOYSA-M 0.000 description 1
- 239000003161 ribonuclease inhibitor Substances 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 1
- 238000005185 salting out Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004626 scanning electron microscopy Methods 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 208000007056 sickle cell anemia Diseases 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 238000007860 single-cell PCR Methods 0.000 description 1
- 201000000849 skin cancer Diseases 0.000 description 1
- 238000010583 slow cooling Methods 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 239000001509 sodium citrate Substances 0.000 description 1
- 238000002174 soft lithography Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- PJANXHGTPQOBST-UHFFFAOYSA-N stilbene Chemical compound C=1C=CC=CC=1C=CC1=CC=CC=C1 PJANXHGTPQOBST-UHFFFAOYSA-N 0.000 description 1
- 235000021286 stilbenes Nutrition 0.000 description 1
- 235000000346 sugar Nutrition 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
- 208000035581 susceptibility to neural tube defects Diseases 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- GZCRRIHWUXGPOV-UHFFFAOYSA-N terbium atom Chemical compound [Tb] GZCRRIHWUXGPOV-UHFFFAOYSA-N 0.000 description 1
- 230000002381 testicular Effects 0.000 description 1
- 201000003120 testicular cancer Diseases 0.000 description 1
- 238000005382 thermal cycling Methods 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 229910052723 transition metal Inorganic materials 0.000 description 1
- 150000003624 transition metals Chemical class 0.000 description 1
- 238000004627 transmission electron microscopy Methods 0.000 description 1
- 238000002054 transplantation Methods 0.000 description 1
- 239000013638 trimer Substances 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- 239000000439 tumor marker Substances 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 235000012431 wafers Nutrition 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 238000007704 wet chemistry method Methods 0.000 description 1
- 235000014101 wine Nutrition 0.000 description 1
- 239000002676 xenobiotic agent Substances 0.000 description 1
- 235000013618 yogurt Nutrition 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
- C12Q1/6874—Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
Definitions
- Nucleic acid sequencing is important for biological research, clinical diagnostics, personalized medicine and pharmaceutical development and many other fields. Cost effective, accurate and fast sequencing is needed for many applications, such as, but not limited to for microbial or pathogen detection and identification, and genetic identification for subjects.
- applications can include, but not be limited to paternity testing and in forensic science (Reynolds et al., Anal. Chem., 63:2-15 (1991)), for organ-transplant donor-recipient matching (Buyse et al., Tissue Antigens, 41:1-14 (1993) and Gyllensten et al., PCR Meth.
- a variety of DNA hybridization techniques are available for detecting the presence of one or more selected polynucleotide sequences in a sample containing a large number of sequence regions.
- a fragment containing a selected sequence is captured by hybridization to an immobilized probe.
- the captured fragment can be labeled by hybridization to a second probe which contains a detectable reporter moiety.
- Another widely used method is Southern blotting.
- a mixture of DNA fragments in a sample is fractionated by gel electrophoresis, and then fixed on a nitrocellulose filter.
- the filter By reacting the filter with one or more labeled probes under hybridization conditions, the presence of bands containing the probe sequences can be identified.
- the method is especially useful for identifying fragments in a restriction-enzyme DNA digest which contains a given probe sequence and for analyzing restriction-fragment length polymorphisms (“RFLPs”).
- RFLPs restriction-fragment length polymorphisms
- Another approach to detecting the presence of a given sequence or sequences in a polynucleotide sample involves selective amplification of the sequence(s) by polymerase chain reaction, U.S. Pat. No. 4,683,202 and R. K. Saiki, et al., Science 230:1350 (1985).
- primers complementary to opposite end portions of the selected sequence(s) are used to promote, in conjunction with thermal cycling, successive rounds of primer-initiated replication.
- the amplified sequence(s) may be readily identified by a variety of techniques. This approach is particularly useful for detecting the presence of low-copy sequences in a polynucleotide-containing sample, e.g., for detecting pathogen sequences in a body-fluid sample.
- oligonucleotide ligation assay two probes or probe elements which span a target region of interest are hybridized to the target region. Where the probe elements basepair with adjacent target bases, the confronting ends of the probe elements can be joined by ligation, e.g., by treatment with ligase. The ligated probe element is then assayed, evidencing the presence of the target sequence.
- the ligated probe elements act as a template for a pair of complementary probe elements.
- the target sequence is amplified linearly, allowing very small amounts of target sequence to be detected and/or amplified.
- This approach is referred to as ligase detection reaction.
- the process is referred to as the ligase chain reaction which achieves exponential amplification of target sequences.
- Jou, et al., Human Mutation 5:86-93 (1995) relates to the use of a so called “gap ligase chain reaction” process to amplify simultaneously selected regions of multiple exons with the amplified products being read on an immunochromatographic strip having antibodies specific to the different haptens on the probes for each exon.
- Ligation of allele-specific probes generally has used solid-phase capture (U. Landegren et al. Science, 241:1077-1080 (1988); Nickerson et al., Proc. Natl. Acad. Sci. USA, 87:8923-8927 (1990)) or size-dependent separation (D. Y. Wu, et al., Genomics, 4:560-569 (1989) and F. Barany, Proc. Natl. Acad. Sci, 88:189-193 (1991)) to resolve the allelic signals, the latter method being limited in multiplex scale by the narrow size range of ligation probes. Further, in a multiplex format, the ligase detection reaction alone cannot make enough product to detect and quantify small amounts of target sequences.
- the gap ligase chain reaction process requires an additional step—polymerase extension.
- the use of probes with distinctive ratios of charge/translational frictional drag for a more complex multiplex will either require longer electrophoresis times or the use of an alternate form of detection.
- Some embodiments of the invention are particularly suitable for sequencing a large number of target nucleic acids simultaneously.
- nucleic acids are often sequenced using stepwise methods such as polymerase extension based sequencing or ligation sequencing where one or more bases are read for each sequencing step.
- stepwise methods such as polymerase extension based sequencing or ligation sequencing where one or more bases are read for each sequencing step.
- stepwise based sequencing methods are often limited by its stepwise inefficiency, e.g., incomplete incorporation, incomplete ligation and other problems that create prephasing or dephasing.
- the stepwise inefficiency can accumulate over read length and limits read length.
- methods, kits and computer software products are provided to reset stepwise sequencing partially or completely.
- the method comprises: (a) sequencing one or more bases of a target nucleic acid by extending a first sequencing primer hybridized to the target nucleic acid to generate a first primer extension product, thereby obtaining a first sequence read; (b) releasing the first primer extension product from the target nucleic acid; (c) hybridizing a second sequencing primer to the target nucleic acid, optionally at the same or neighboring regions of the same target nucleic acid; (d) generating a second primer extension product (extended primer) by extending the second sequencing primer through limited or controlled extension; and (e) sequencing one or more bases of the target nucleic acid by further extending the second primer extension product to generate a third primer extension product, thereby obtaining a second sequence read.
- the first sequencing primer and second sequencing primer are the same. In another embodiment, the first sequencing primer and second sequencing primer are different.
- the controlled or limited extension can be carried out or performed by pulse extension, such as, by allowing the extending reaction to last for a short period of time, such as less than a minute or from approximately half a minute to a minute, e.g. from 1-5, 5-10, 10-30, 30 to 60 seconds. In some embodiments, the extension is controlled by depriving 1, 2, or 3 of the four nucleotides.
- the pulse extension can be performed by adding nucleotide degrading enzymes such as alkaline phosphatase or apyrase. In some other embodiments, the pulse extension may be controlled using reversible terminator nucleotides.
- each or some extension steps can be performed by including one or more reversible terminator nucleotides, such as dATP, dCTP, dGTP, dTTP*, where dTTP is a reversible terminator.
- a step of removing the blocking group in the terminator may be performed before the next extension step.
- controlled extension can be performed by extension and wash cycles. Similar to the pulse extension, the controlled extension may be performed by limiting the availability of nucleotides or by adding reversible terminator nucleotide(s).
- the limited extension can be carried out by using a nucleic acid polymerase and one or more sets of nucleotides.
- the one or more sets generally each comprise no more than three different nucleotides (bases).
- the one or more sets comprise one to four nucleotides and at least one of the nucleotides is a reversible terminator nucleotide.
- the extending can be with more than one set of nucleotides, such as at least 1, 2, 3, or more sets.
- a set of nucleotides can comprise one, two or three different nucleotides.
- the method further comprises obtaining one or more additional sequence reads, such as by repeating the steps of releasing a primer extension product from the target nucleic acid; hybridizing an additional seed sequencing primer (or extension primer) (in some embodiments, the additional seed sequencing primer targeting the same or similar regions of the target nucleic acid) to the target nucleic acid; generating an additional primer extension product by extending the additional sequencing primer through controlled extension; and sequencing one or more bases of the target nucleic acid by further extending the additional primer extension product to generate an additional primer extension product, thereby obtaining an additional sequence read.
- the sequence of the target nucleic acid can be determined by assembling the first, second, and optional, one or more additional sequence reads.
- the sequencing of the target nucleic acid can be by extending the sequencing primer using a labeled reversible terminator, ligation, or any other methods known in the art for reading nucleotide sequences.
- a washing step or nucleotide degradation step can be performed prior to a subsequent addition of a set of nucleotides.
- the target nucleic acid can be attached to a substrate.
- the substrate can be a flat surface or bead, such as a flow cell.
- the substrate can comprise glass, silicon, metal, or plastics that have been surface treated to immobilize template strands or oligonucleotides.
- the target nucleic acid can be attached to the substrate via a capture probe.
- the methods and systems disclosed herein can further comprise analyzing the sequencing results, such as generated by a method disclosed herein, to provide a diagnosis, prognosis, or theranosis for a subject.
- a method disclosed herein can be used to sequence a plurality of target nucleic acids.
- the invention refers to a method for sequencing a target nucleic acid, comprising:
- the invention in a third aspect, relates to a method for sequencing a target nucleic acid, the method comprising generating sequence information of length n from a single template using sequencing by synthesis; wherein the sequence information maintains a quality score of at least 26, 27, 28, 29, 30 or 31; and
- n is greater than 100, 150, 200, 300, 400, 500, 700, 1000, 1500, 2000, or 3000.
- the invention relates to a system for sequencing a target nucleic acid, the system comprising;
- the invention relates to a method for sequencing a target nucleic acid comprising:
- said extending comprises repeating step (g), wherein before the repeating, said nucleotides are removed.
- said set of nucleotides are different between two subsequent steps.
- said nucleotides are removed by washing.
- said nucleotides are removed by a nucleotide degrading enzyme.
- said set of nucleotides further comprises a reversible terminator nucleotide, wherein before the repeating, incorporated reversible terminator nucleotides are deblocked and made ready for further extension.
- said extension is carried out by pulse extension. In some embodiments, said pulse extension is carried out by allowing an extending reaction to last 30 to 60 seconds.
- the sequence of said target nucleic acid is determined by assembling said first, second, and optionally additional sequence reads.
- said target nucleic acid is attached to a substrate.
- said substrate is a flat surface or bead.
- said substrate is a flow cell.
- said substrate comprises glass.
- said target nucleic acid is attached to said substrate via a capture probe.
- the method further comprises analyzing results of said sequencing providing a diagnosis, prognosis, or theranosis for a subject.
- the method further comprises sequencing a plurality of target nucleic acids.
- said assembling results in sequence information comprising a nucleotide sequence of length greater than 500, 1000, 1500, 2000, or 3000 bases. In some embodiments, the assembling results in sequence information comprising an average quality score of at least 26, 27, 28, 29, 30 or 31. In some embodiments, the assembling results in sequence information comprising a quality score of at least 26, 27, 28, 29, 30 or 31 for any nucleotide position. In some embodiments, the first and second sequence reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the template nucleic acid.
- sequence reads from the complement strand of the template nucleic acid are further assembled with the first and second sequence reads.
- the polymerase is Klenow exo( ⁇ ).
- the nucleotide degrading enzyme comprises pyrophosphatase or apyrase.
- the enzymatic digestion of said sequencing product is performed by an enzyme comprising a 5′-3′ exonuclease or 3′-5′ exonuclease activity.
- the invention relates to a for sequencing a target nucleic acid comprising:
- said first and second sequencings are performed using as a template a polynucleotide from the same strand of the target nucleic acid. In some embodiments, at least one sequencing of said first and second sequencings comprises:
- said extending comprises controlled extension comprising:
- said extending comprises repeating of step 1, wherein before the repeating, said nucleotides are removed.
- said set of nucleotides are different between two subsequent steps.
- said nucleotides are removed by washing.
- said nucleotides are removed by a nucleotide degrading enzyme.
- said set of nucleotides further comprises a reversible terminator nucleotide wherein before the repeating, incorporated reversible terminator nucleotides are deblocked and made ready for further extension.
- said combining is performed in silico by stitching said first and second regions into an assembled sequence for the target nucleic acid.
- the assembled sequence comprises a gap of length n.
- n is less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, or 100 nucleotides.
- said first and second sequencings are further performed using the same polynucleotide.
- said extending is performed using native nucleotides.
- said extension is carried out by pulse extension.
- said pulse extension is carried out by allowing an extending reaction to last 30 to 60 seconds.
- said target nucleic acid is attached to a substrate.
- said substrate is a flat surface or bead. In some embodiments, said substrate is a flow cell. In some embodiments, said substrate comprises glass. In some embodiments, said target nucleic acid is attached to said substrate via a capture probe. In some embodiments, the method further comprises analyzing results of said sequencing providing a diagnosis, prognosis, or theranosis for a subject. In some embodiments, the method further comprises sequencing a plurality of target nucleic acids. In some embodiments, said combined read comprises sequence information comprising a nucleotide sequence of length greater than 500, 1000, 1500, 2000, or 3000 bases. In some embodiments, said combined read comprises sequence information comprising an average quality score of at least 26, 27, 28, 29, 30 or 31.
- said combined read comprises sequence information comprising a quality score of at least 26, 27, 28, 29, 30 or 31 for any nucleotide position.
- the first and second reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the template nucleic acid.
- a sequence read from a complement strand of the template nucleic acid are further combined producing the combined read.
- the polymerase is Klenow exo( ⁇ ).
- the nucleotide degrading enzyme comprises pyrophosphatase or apyrase.
- a set of nucleotides for controlled extension is a combination of any number of different types nucleotides including native, reversibly terminated, or other modified nucleotides as long as the combination allows controlled (or designed).
- a set of nucleotides is of any combination of any number of native, reversibly terminated, or otherwise manipulated nucleotides that do not result in runaway extension (unlimited extension).
- a controlled extension nucleotide set is described as containing no more than three different nucleotides.
- nucleotide refers to three different nucleotides, each having a different base (i.e., three of the A, C, G, T bases or three of the A, C, G, U bases. T and U bases can be considered equivalent in some embodiments). If a nucleotide set contains A, C, T, and U, it contains three different nucleotides because T and U are considered as equivalent in some embodiments. If the base of a nucleotide is modified, the modified nucleotide can be classified according to its pairing property. For example, if a dATP is modified in the base, but once incorporated, the base of the modified nucleotide still pair with a T base, the modified dATP still has the A base.
- FIG. 1 is a schematic illustrating an exemplary process of sequencing a long nucleic acid.
- FIG. 2 is a schematic illustrating an exemplary process of sequencing a long nucleic acid where the resulting read has a gap.
- FIG. 3 is a schematic illustrating an exemplary process of creating an extended sequencing primer for sequencing
- FIG. 4 is a schematic illustrating an exemplary process of building an extended sequencing primer by removing a sequencing product by peeling off the sequencing product or by digesting the sequencing product
- FIG. 5 is a schematic illustrating an exemplary process of building an extended sequencing primer by removing a sequencing product by digesting sequencing product.
- FIG. 6 is a schematic illustrating an exemplary process of building an extended sequencing primer by partial digestion of a sequencing primer.
- FIG. 7 depicts that nucleic acid sequence information can be obtained, processed, analyzed and/or assembled via a computer system.
- FIG. 8 depicts an example of a template and triple base extension reactions.
- FIG. 8 discloses SEQ ID NOS 1-11, respectively, in order of appearance.
- FIG. 9 depicts an exemplary embodiment of a dark base (native nucleotide) extension experiment design.
- FIG. 10 depicts results of an exemplary embodiment of the present invention, in which 12 steps of 3-base extension resulted in a 124 base pair (bp) product (extension plus primer), wherein the template was an oligonucleotide.
- FIG. 11 depicts results of an exemplary embodiment of the present invention, in which 12 steps of 3-base extension resulted in a 124 bp product (extension plus primer), wherein the template was a PCR product.
- FIG. 12 depicts the percent base calls per sequencing step for lane 1 of an exemplary embodiment of the present invention, where the last step of the dark base extension was a missing T step, and as expected, 100% of the first sequencing base was “T”.
- FIG. 13 depicts the percent base calls per sequencing step for lane 3 of an exemplary embodiment of the present invention, where the last step of the dark base extension was a missing C step, and as expected, 100% of the first sequencing base was “C”.
- FIG. 14 depicts the distribution of dark base extensions in lane 1 (10 steps) and lane 3 (4 steps).
- FIG. 15 depicts the distribution of dark base extensions in lane 4 (10 steps), lane 5 (16 steps) and lane 6 (20 steps) in another exemplary embodiment of the present invention.
- FIG. 16A shows cluster density of different lanes after +S Extension.
- FIG. 16B shows percentage of cluster pass filter rate.
- FIG. 16C shows the number of pass filter reads for different lanes.
- FIG. 16D shows the predicted quality scores of different lanes.
- FIG. 17A shows the 100 bp standard Illumina sequencing run.
- FIG. 17B shows the additional 100 bp Illumina sequencing run.
- FIG. 17C shows the number of correct bases was calculated to show changes of overall correct bases as the read length increases in FIG. 17A .
- FIG. 17D shows the number of correct bases was calculated to show changes of overall correct bases as the read length increases in FIG. 17B .
- FIG. 18 is a summary of Q-scores changing over read length related to Example 6.
- the x-axis is read length in bp.
- Y-axis is measured or empirical Q-Score.
- Some embodiments of the invention are particularly suitable for sequencing a large number of target nucleic acids simultaneously.
- nucleic acids are often sequenced using stepwise methods such as polymerase extension based sequencing or ligation sequencing, where one or more bases are read for each sequencing step.
- stepwise methods such as polymerase extension based sequencing or ligation sequencing, where one or more bases are read for each sequencing step.
- stepwise based sequencing methods are often limited by their stepwise inefficiency, e.g., incomplete incorporation, incomplete ligation and other problems that create prephasing or dephasing.
- the stepwise inefficiency can accumulate over read length and limits read length.
- reversible terminator nucleotide based sequencing are limited by the efficiency of incorporating reversible terminator nucleotides that are modified in the 3′ hydroxyl group or modified otherwise to interrupt further extension by a polymerase. If the sequencing detection is based upon incorporation of modified nucleotides with added detectable label such as a fluorescent group, the incorporation efficiency could be further reduced. The problem can be partially alleviated by mixing unlabeled and labeled reversible terminator nucleotides. However, even with improved chemistry and efficiency, the stepwise inefficiency can significantly limit read length and read quality at the end of the read.
- each sequencing step has a constant stepwise efficiency of incorporation of about 99% and there are 1,000 template molecules in a cluster.
- 10 sequencing primers are not extended and are capped or otherwise no longer involved in sequencing.
- nucleotide limited addition sequencing methods such as pyrophosphate detection based sequencing (commercially available from Roche/454 and described in vendor literature and patent filings and at www.454.com) or pH detection based sequencing (commercially available from Ion Torrent, Inc./Life Technologies. Inc. and described in vendor literature and patent filings)
- the efficiency can be limited by incomplete incorporation, mis-incorporation, loss of bound polymerase (fall-off).
- Stepwise ligation based sequencing has a similar efficiency problem as stepwise efficiency is limited by, e.g., ligation reaction efficiency and removal of labels.
- FIG. 1 illustrates the process in some embodiments.
- a part ( 102 ) of the target nucleic acid ( 101 ) is sequenced ( FIG. 1A ).
- Another part ( 103 ) of the target nucleic acid ( 101 ) is also sequenced ( FIG. 1B ).
- the process can be repeated ( FIG. 1C ) many times.
- the sequenced parts are overlapping so the sequences can be assembly based upon overlapping sequences and/or other information.
- a large number of target nucleic acids (e.g. at least 10, 100, 1,000, 10.000, 100,000, or 1,000,000) is sequenced simultaneously.
- These target nucleic acids can be DNA, RNA or modified nucleic acids. While they can be sequenced as single molecules, they can also be sequenced as clones or clusters. Each of the clones or clusters (e.g. on beads) are derived from a single nucleic acid molecule. Methods for sequencing a large number of target nucleic acids in single molecule or clonal molecular clusters or beads are well known in the art.
- a target nucleic acid or “an extension primer,” one of skill in the art would appreciate that many of the embodiments can be used to sequence many target nucleic acids simultaneously or sequentially and such sequencing may be performed on copies (more than 10, 100, 1,000, 100,000 copies) of the target nucleic acids.
- a computer software product is generally used to assemble the sequences when the amount of data is quite large.
- the computer software product typically inputs the raw sequences for each of the target nucleic acids and assembles contiguous sequences upon finding overlapping regions and optionally validating the overlapping regions using additional information such as alignment with a reference sequence, information about the starting position of the sequencing run or relative positional difference among sequencing runs.
- the resulting contiguous sequence ( 105 ) can be further validated by, for example, alignment with a reference sequence for the target nucleic acid.
- the sequencing can be performed using, for example, stepwise sequencing methods discussed earlier.
- the assembled contiguous sequence can be significantly longer at for example, greater than 1.5, 2, 3, 4, or 5 ⁇ of the individual sequencing reads ( 102 , 103 , and 104 ).
- the individual sequencing runs can be carried out sequentially. In some embodiments, the order of the sequencing runs is not important. For example, the step in FIG. 1 C can be performed before the step in FIG. 1A . If the target nucleic acid is copied to several distinct locations, the sequencing runs using alternative sequencing primers may also be carried out in parallel.
- FIG. 2 illustrates the sequencing of a long nucleic acid by three independent sequencing runs. Sequencing reads 202 and 203 do not overlap and the resulting assembled sequence 205 has a gap.
- the computer software product provided can output the sequence with the gap, but can also estimate the size of the gap based upon alignment to a reference sequence. The positional difference between the sequencing reads can be estimated, for example, based upon different sequencing primer starting positions. The positional difference can be used to estimate the gap size.
- each sequencing run resets the sequencing start conditions and is not affected or less affected by cumulative inefficiency or errors.
- sequencing methods and chemistries that have inherent length limitations can be used to sequence a target nucleic acid obtaining longer sequence information than the original length limitations of these sequencing methods and chemistries.
- a reversible terminator sequencing chemistry with sequencing length limitation of 250 bases
- a 1,000 base long target nucleic acid can be sequenced contiguously by carrying out the 250 base long reversible terminator sequencing 4 or more times.
- the total read length from a single template can be up to 100, 200, 250, 500, 1000, 2000 bases or more.
- methods and reagent kits are provided for building sequencing primers.
- the resulting sequencing primers can be of varying length. Different sequencing primers for the same target nucleic acid can be used to sequence different segments of the target nucleic acid.
- an extension primer hybridized to a target nucleic acid is provided.
- the extension primer is extended by controlled extension.
- Controlled extensions can be performed using polymerase extension reactions, stepwise ligation reactions and other methods.
- controlled extension can be performed by, for example, three nucleotide cycles or by reversible terminator reactions. Controlled extension is also described in great detail in a section below and throughout the specification.
- the extended extension primer can be used for sequencing.
- FIG. 3 illustrates some embodiments of this process.
- FIG. 3A shows that a target nucleic acid ( 301 ) is hybridized with an extension primer ( 302 ).
- the extension primer ( 302 ) is then extended by a number of bases using one or more nucleic acid polymerization reactions or by one or more ligation reactions to produce an extended primer ( 302 and 303 , where 303 is the extended portion).
- the extended primer ( 302 , 303 ) is then used as a sequencing primer for sequencing ( FIG. 3C , sequencing product is shown as 304 ).
- a target nucleic acid is hybridized with a sequencing product (such as the product resulting from FIG. 3C ).
- the sequencing product can be the result of reversible terminator sequencing or nucleotide addition sequencing.
- sequencing products of different length may be hybridized with the target nucleic acid copies in the clonal cluster because of the inefficiencies of sequencing reactions which result in, for example, dephased or prephased products.
- FIG. 4 illustrates some embodiments of the process.
- a sequencing template ( 401 ) is hybridized with a sequencing primer ( 402 ) and the sequencing primer is used for sequencing which results in a sequencing product ( 403 ).
- the sequencing primer ( 402 ) and sequencing product ( 403 ) structure is removed by denaturation or by enzymatic digestion ( FIG. 4B ). Methods for removing a strand of nucleic acid from a double strand nucleic acid structure are well known in the art.
- the sequencing structure can be denatured by contacting it with a NaOH solution (e.g., about 0.1 N NaOH) or another denaturation reagent.
- the sequencing product structure can also be removed by exonuclease digestion or other enzymatic treatment. If enzymatic digestion is used, the target nucleic acid strand can be protected using, for example, protecting bases in the 5′ and/or 3′ end.
- the template is immobilized on a substrate so that only one end could be potentially susceptible to nuclease digestion. In some case, protecting the template is not necessary because certain exonucleases only digest in a particular orientation (5′-3′ or 3′-5′).
- exonuclease III predominately digests recessed 3′ ends of double strand DNA. If the target nucleic acid is immobilized at its 3′ end, it may not be necessary to protect the 5′ end.
- an extension primer can be hybridized and extended ( FIG. 4C ) as described above and detailed in following sections to produce an extended primer, which can serve as a primer for sequencing ( FIG. 4D ).
- a sequencing product structure does not need to be completely removed. It can be partially removed. As shown in FIGS. 5 and 6 , the sequencing product part ( 503 or 603 ) may be completely ( FIG. 6 ) or partially removed ( FIG. 5, 505 is smaller than 503 ).
- the sequencing primer part ( 502 or 602 ) can be the product of earlier extension reactions such as these described in FIGS. 3,4, 5 and 6 . Partial digestion of nucleic acids may be achieved using exonuclease digestion (such as Exonuclease III digestion). If a synthetic primer was used as 502 , the last base can be a base that cannot be digested by an exonuclease.
- the last base of the 502 part can be connected using a thiol bond which is resistant to certain exonuclease digestion.
- alpha-thiophosphate-containing phosphodiester bonds are resistant to hydrolysis by the 3-to-5′ exonucleolytic activity of phage T4 DNA polymerase and exonuclease III.
- a thiophosphate containing diester bond can also be produced by incorporating one or more thiotriphosphate nucleotides in the desired position(s). As reported by Yang et al., (2007), “Nucleoside Alpha-Thiotriphosphates.
- FIG. 5B illustrates the partial digestion of sequencing product.
- a nucleotide thiotriphosphate can be incorporated into one or more specific positions.
- the reversible terminator nucleotide can be a nucleotide thiotriphoshate. This position can be used to terminate an exonuclease digestion in the step illustrated in FIG. 5B .
- Partial removal of sequencing products can be useful where the early steps of sequencing do not introduce too many prephasing or dephasing or other inefficiencies. It can reduce the need for extension steps illustrated in FIG. 5C because the total size of 504 plus 505 is longer than 405 in FIG. 4 and extend the next sequencing ( 506 ) further than 406 . However, by incorporating part of the sequencing product ( 505 ), if the 504 fragments in a cluster vary too much in length, the process may affect the subsequent sequencing quality.
- the present invention provides a method for sequencing a target nucleic acid molecule or a collection of target nucleic acids.
- target nucleic acid molecule By “target nucleic acid molecule”, “target molecule”, “target polynucleotide”, “target polynucleotide molecule” or grammatically equivalent thereof, as used herein it is meant a nucleic acid of interest.
- Target nucleic acid for example, can be DNA or RNA or any synthetic structure that have similar properties of DNA or RNA.
- Sequencing refers to the determination of at least a single base, at least 2 consecutive bases, at least 10 consecutive bases or at least 25 consecutive bases in a target nucleic acid.
- Sequencing accuracy can be at least 65%, 75%, 85, 95%, 99%, 99.9% and 99.99% overall or per base. Sequencing can be performed directly on a target nucleic acid or on a nucleic acid derived from target nucleic acids. In some applications, a large number of target nucleic acids, such as at least 1,000, 10,000, 100.000 or 1,000,000 target nucleic acids are simultaneously sequenced.
- a target nucleic acid is genomic DNA derived from the genetic material in the chromosomes of a particular organism and/or in nonchromosomal genetic materials such as mitochondrial DNA.
- a genomic clone library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism.
- a genomic library is a collection of at least 2%, 5%, 10%, 30%, 50%, 70%, 80%, or 90% of the sequence or sequences in the genomic DNA of an organism.
- Target nucleic acids include naturally occurring or genetically altered or synthetically prepared nucleic acids (such as genomic DNA from a mammalian disease model).
- Target nucleic acids can be obtained from virtually any source and can be prepared using methods known in the art.
- target nucleic acids can be directly isolated without amplification using methods known in the art, including without limitation extracting a fragment of genomic DNA from an organism (e.g. a cell or bacteria) to obtain target nucleic acids.
- target nucleic acids can also be isolated by amplification using methods known in the art, including without limitation polymerase chain reaction (PCR), whole genome amplification (WGA), multiple displacement amplification (MDA), rolling circle amplification (RCA), rolling circle amplification (RCR) and other amplification methodologies.
- PCR polymerase chain reaction
- WGA whole genome amplification
- MDA multiple displacement amplification
- RCA rolling circle amplification
- RCR rolling circle amplification
- Target nucleic acids may also be obtained through cloning, including cloning into vehicles such as plasmids, yeast, and bacterial artificial chromosomes.
- “Amplification” refers to any process by which the copy number of a target sequence is increased. Amplification can be performed by any means known in the art. Methods for primer-directed amplification of target polynucleotides are known in the art, and include without limitation, methods based on the polymerase chain reaction (PCR).
- PCR techniques include, but are not limited to, quantitative PCR, quantitative fluorescent PCR (QF-PCR), multiplex fluorescent PCR (MF-PCR), real time PCR (RTPCR), single cell PCR, restriction fragment length polymorphism PCR (PCR-RFLP), PCK-RFLPIRT-PCR-IRFLP, hot start PCR, nested PCR in situ polony PCR, in situ rolling circle amplification (RCA), bridge PCR, picotiter PCR and emulsion PCR.
- QF-PCR quantitative fluorescent PCR
- MF-PCR multiplex fluorescent PCR
- RTPCR real time PCR
- PCR-RFLP restriction fragment length polymorphism PCR
- PCK-RFLPIRT-PCR-IRFLP PCK-RFLPIRT-PCR-IRFLP
- hot start PCR nested PCR in situ polony PCR
- RCA in situ rolling circle amplification
- bridge PCR picotiter PCR and emulsion PCR.
- Conditions favorable to the amplification of target sequences by PCR can be optimized at a variety of steps in the process, and depend on characteristics of elements in the reaction, such as target type, target concentration, sequence length to be amplified, sequence of the target and/or one or more primers, primer length, primer concentration, polymerase used, reaction volume, ratio of one or more elements to one or more other elements, and others, some or all of which can be altered.
- PCR involves the steps of denaturation of the target to be amplified (if double stranded), hybridization of one or more primers to the target, and extension of the primers by a DNA polymerase, with the steps repeated (or “cycled”) in order to amplify the target sequence.
- Steps in this process can be optimized for various outcomes, such as to enhance yield, decrease the formation of spurious products, and/or increase or decrease specificity of primer annealing.
- Methods of optimization are well known in the art and include adjustments to the type or amount of elements in the amplification reaction and/or to the conditions of a given step in the process, such as temperature at a particular step, duration of a particular step, and/or number of cycles.
- an amplification reaction comprises at least 5, 10, 15, 20, 25, 30, 35, 50, or more cycles.
- an amplification reaction comprises no more than 5, 10, 15, 20, 25, 35, 50, or more cycles. Cycles can contain any number of steps, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more steps.
- Steps can comprise any temperature or gradient of temperatures, suitable for achieving the purpose of the given step, including but not limited to, 3′ end extension (e.g. adapter fill-in), primer annealing, primer extension, and strand denaturation. Steps can be of any duration, including but not limited to about, less than about, or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 180, 240, 300, 360, 420, 480, 540, 600, or more seconds, including indefinitely until manually interrupted. Cycles of any number comprising different steps can be combined in any order.
- different cycles comprising different steps are combined such that the total number of cycles in the combination is about, less that about, or more than about 5, 10, 15, 20, 25, 30, 35, 50, or more cycles.
- Other suitable amplification methods include the ligase chain reaction (LCR), transcription amplification, self-sustained sequence replication, selective amplification of target polynucleotide sequences, consensus sequence primed polymerase chain reaction (CP-PCR), arbitrarily primed polymerase chain reaction (AP-PCR), degenerate oligonucleotide-primed PCR (DOP-PCR) and nucleic acid based sequence amplification (NABSA).
- Other amplification methods that can be used herein include those described in U.S. Pat. Nos. 5,242,794; 5,494,810; 4,988,617; and 6,582,938.
- the amplification is performed inside a cell.
- amplification may occur on a support, such as a bead or a surface.
- targets may be amplified from an extract of a single cell.
- Target nucleic acids may also have an exogenous sequence, such as a universal primer sequence or barcode sequence introduced during, for example, library preparation via a ligation or amplification process.
- the term “sequencing template” used herein may refer the target nucleic acid itself or to a nucleotide sequence that is identical or substantially similar to the nucleotide sequence of a fragment of a target nucleic acid or the complement of a target nucleic acid.
- the target nucleic acid molecule comprises ribonucleic acid (RNA).
- the target polynucleotide is genomic DNA or a portion of the genomic DNA. While one embodiment is for sequencing a whole genome, such as at more than 50% coverage, these embodiments are also suitable for sequencing a targeted region such as genomic regions relating to drug metabolism. In one example, the target polynucleotide is human genomic DNA.
- Target nucleic acid can also refer to nucleic acid structures for sequencing. Such structures typically comprise adaptor sequences on one or both ends of target nucleic acid sequences. For example, a sequence derived from the genomic DNA of sample or derived from a RNA molecule of a sample, may be ligated with amplification and/or sequencing adaptor(s). Library construction methods are well known in the art. Nucleic acid sequencing libraries may be amplified in clonal fashion on substrates using bridge amplifications, emulsion PCR amplifications, rolling cycle amplifications or other amplification methods. Such processes may be performed manually or using automation equipment such as the cBot (Illumina, Inc.) or OneTouchTM (Ion Torrent).
- Nucleic acid or “oligonucleotide” or “polynucleotide” or grammatical equivalents typically refer to at least two nucleotides covalently linked together.
- a nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases, as outlined below (for example in the construction of primers and probes such as label probes), nucleic acid analogs are included that may have alternate backbones, comprising, for example, phosphoramide (see e.g. Beaucage et al., Tetrahedron 49(10):1925 (1993); Letsinger, J. Org. Chem. 35:3800 (1970); Sblul et al., Eur. J. Biochem.
- LNA locked nucleic acids
- Other analog nucleic acids include those with bicyclic structures including locked nucleic acids, also referred to herein as “LNA”, (see e.g. Koshkin et al., J. Am. Chem. Soc. 120.13252 3 (1998)); positive backbones (Denpcy et al., Proc. Natl. Acad. Sci. USA 92:6097 (1995)); non-ionic backbones (see e.g. U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed.
- nucleic acids containing one or more carbocyclic sugars are also included within the definition of nucleic acids (see e.g. Jenkins et al., Chem. Soc. Rev. (1995) pp 169 176). Several nucleic acid analogs are described in Rawls, C & E News Jun. 2, 1997 page 35.
- the target nucleic acids may be single stranded or double stranded, as specified, or contain portions of both double stranded or single stranded sequence.
- the nucleic acids may be DNA (including genomic and cDNA), RNA (including mRNA and rRNA) or a hybrid, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine, hypoxathanine, isocytosine, isoguanine, etc.
- the methods of the present invention comprise capture of target polynucleotide.
- the target polynucleotide may be from a known region of the genome.
- oligonucleotide probes can be immobilized on beads and these oligonucleotide beads which are inexpensive and reusable can be used to capture the target genomic polynucleotide.
- microarray s are used to capture target polynucleotide.
- the target polynucleotide may be fragmented to a suitable length or plurality of suitable lengths, such as approximately between 100-200, 200-300, 300-500, 500-1000, 1000-2000 or more bases in length.
- the target polynucleotide is prepared by whole genome amplification (WGA) (see for example, Hawkins et al.: Whole genome amplification—applications and advances. Curr. Opin. Biotechnol. 2002 February; 13(1): 65-7)).
- WGA whole genome amplification
- the target polynucleotide is prepared by whole genome sampling assay (WGSA).
- WGSA whole genome sampling assay
- the WGSA reduces the complexity of a nucleic acid sample by amplifying a subset of the fragments in the sample.
- a nucleic acid sample is fragmented with one or more restriction enzymes and an adapter is ligated to both ends of the fragments.
- a primer that is complementary to the adapter sequence is used to amplify the fragments using PCR.
- PCR fragments of a selected size range are selectively amplified.
- the size range may be, for example, 400-800 or 400 to 2000 base pairs. Fragments that are outside the selected size range are not efficiently amplified.
- the fragments that are amplified by WGSA may be predicted by in silico digestion and restriction enzyme combinations may be selected so that the resulting WGSA amplified fragments may represent the genomic regions of specific interests.
- the resulting library often having desired adaptor sequences (including optional barcode sequences and sequencing primer hybridization site(s)) may be used for sequencing and for hybridizing with a genotyping array. In such embodiments, the library can be used for sequencing and the detected SNPs or indels can be validated by hybridizing the same library with an array.
- WGSA is disclosed in Kennedy et al. (2003), Nat. Biotechnol. Vol., pp. 1233-1237, and U.S. patent application Ser. Nos. 10/316,517, 10/442,021, 10/463,991, 10/316,629 and U.S. Pat. Nos. 6,361,947, 6,548,810, 7,267,966, 7,297,778, and 7,300,788, all of which are herein incorporated by reference.
- the target polynucleotide or a collection of target polynucleotides is prepared by PCR, such as long-range PCR.
- Long range PCR allows the amplification of PCR products, which are much larger than those achieved with conventional Taq polymerases.
- up to 27 kb fragments from good quality genomic DNA can be prepared, although 10-20 kb fragments are routinely achievable, given the appropriate conditions.
- a fragment greater than 27 kb is obtained.
- the method typically relies on a mixture of thermostable DNA polymerases, usually Taq DNA polymerase for high processivity (i.e. 5′-3′ polymerase activity) and another DNA polymerase with 3′-5′ proofreading abilities (usually Pwo). This combination of features allows longer primer extension than can be achieved with Taq alone.
- the target polynucleotide is prepared by locus-specific multiplex PCR.
- Multiplex locus specific amplification can be used to amplify a plurality of pre-selected target sequences from a complex background of nucleic acids.
- the targets are selected for amplification using splint oligonucleotides that are used to modify the ends of the fragments.
- the fragments have known end sequences and the splints are designed to be complementary to the ends.
- the splint can bring the ends of the fragment together and the ends are joined to form a circle.
- the splint can also be used to add a common priming site to the ends of the target fragments. Specific loci are amplified and can be subsequently analyzed.
- target polynucleotides are produced using multiplex PCR and each of the PCR fragments is labeled with a tag sequence.
- tag sequence can be added as a part of one of the primers used for the PCR. Therefore, each resulting PCR fragment can be uniquely identified.
- Such applications can be useful for the identification of species, such as microbial species.
- LCR ligase chain reaction
- amplification methods include but are not limited to the ligase chain reaction (LCR) (e.g., Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 1077 (1988) and Barringer et al. Gene 89:117 (1990)), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 1173 (1989) and WO88/10315), self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990) and WO90/06995), selective amplification of target polynucleotide sequences (U.S. Pat. No.
- CP-PCR consensus sequence primed polymerase chain reaction
- AP-PCR arbitrarily primed polymerase chain reaction
- NABSA nucleic acid based sequence amplification
- Naturally-existing targets can be assayed directly in cell lysates, in nucleic acid extracts, or after partial purification of fractions of nucleic acids so that they are enriched in targets of interest.
- the target polynucleotide is human genomic DNA.
- the polynucleotide target to be detected can be unmodified or modified.
- Useful modifications include, without limitation, radioactive and fluorescent labels as well as anchor ligands such as biotin or digoxigenin.
- the modification(s) can be placed internally or at either the 5′ or 3′ end of the targets.
- Target modification can be carried out post-synthetically, ether by chemical or enzymatic reaction such as ligation or polymerase-assisted extension.
- the internal labels and anchor ligands can be incorporated into an amplified target or its complement directly during enzymatic polymerization reactions using small amounts of modified NTPs as substrates.
- the target polynucleotide can be isolated from a subject.
- the subject is not limited to a human being but may also be other organisms including but not limited to mammals, plants, bacteria, virus or fungi.
- the target polynucleotide is genomic DNA extracted from a human.
- the input nucleic acid can be DNA, or complex DNA, for example genomic DNA.
- the input DNA may also be cDNA.
- the cDNA can be generated from RNA, e.g., mRNA.
- the input DNA can be of a specific species, for example, human, rat, mouse, other animals, plants, bacteria, algae, viruses, and the like.
- the input nucleic acid also can be from a mixture of genomes of different species such as host-pathogen, bacterial populations and the like.
- the input DNA can be cDNA made from a mixture of genomes of different species.
- the input nucleic acid can be from a synthetic source.
- the input DNA can be mitochondrial DNA.
- the input DNA can be cell-free DNA.
- the cell-free DNA can be obtained from, e.g., a serum or plasma sample.
- the input DNA can comprise one or more chromosomes.
- the DNA can comprise one or more of chromosome 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y.
- the DNA can be from a linear or circular genome.
- the DNA can be plasmid DNA, cosmid DNA, bacterial artificial chromosome (BAC), or yeast artificial chromosome (YAC).
- the input DNA can be from more than one individual or organism.
- the input DNA can be double stranded or single stranded.
- the input DNA can be part of chromatin.
- the input DNA can be associated with histones.
- the methods described herein can be applied to high molecular weight DNA, such as is isolated from tissues or cell culture, for example, as well as highly degraded DNA, such as cell-free DNA from blood and urine and/or DNA extracted from formalin-fixed, paraffin-embedded tissues, for example.
- high molecular weight DNA such as is isolated from tissues or cell culture, for example, as well as highly degraded DNA, such as cell-free DNA from blood and urine and/or DNA extracted from formalin-fixed, paraffin-embedded tissues, for example.
- the different samples from which the target polynucleotides are derived can comprise multiple samples from the same individual, samples from different individuals, or combinations thereof.
- a sample comprises a plurality of polynucleotides from a single individual.
- a sample comprises a plurality of polynucleotides from two or more individuals.
- An individual is any organism or portion thereof from which target polynucleotides can be derived, non-limiting examples of which include plants, animals, fungi, protists, monerans, viruses, mitochondria, and chloroplasts.
- Sample polynucleotides can be isolated from a subject, such as a cell sample, tissue sample, or organ sample derived therefrom, including, for example, cultured cell lines, biopsy, blood sample, or fluid sample containing a cell.
- the subject may be an animal, including but not limited to, an animal such as a cow, a pig, a mouse, a rat, a chicken, a cat, a dog, etc., and is usually a mammal, such as a human.
- Samples can also be artificially derived, such as by chemical synthesis.
- the samples comprise DNA.
- the samples comprise genomic DNA.
- the samples comprise mitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificial chromosomes, yeast artificial chromosomes, oligonucleotide tags, or combinations thereof.
- the samples comprise DNA generated by primer extension reactions using any suitable combination of primers and a DNA polymerase, including but not limited to polymerase chain reaction (PCR), reverse transcription, and combinations thereof.
- PCR polymerase chain reaction
- Primers useful in primer extension reactions can comprise sequences specific to one or more targets, random sequences, partially random sequences, and combinations thereof. Reaction conditions suitable for primer extension reactions are known in the art.
- sample polynucleotides comprise any polynucleotide present in a sample, which may or may not include target polynucleotides.
- nucleic acids can be purified by organic extraction with phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent.
- extraction techniques include: (1) organic extraction followed by ethanol precipitation. e.g., using a phenol/chloroform organic reagent (Ausubel et al., 1993), with or without the use of an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption methods (U.S. Pat. No.
- nucleic acid isolation and/or purification includes the use of magnetic particles to which nucleic acids can specifically or non-specifically bind, followed by isolation of the beads using a magnet, and washing and eluting the nucleic acids from the beads (see e.g. U.S. Pat. No. 5,705,628).
- the above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K. or other like proteases.
- RNase inhibitors may be added to the lysis buffer.
- RNase inhibitors may be added to the lysis buffer.
- Purification methods may be directed to isolate DNA, RNA, or both. When both DNA and RNA are isolated together during or subsequent to an extraction procedure, further steps may be employed to purify one or both separately from the other. Sub-fractions of extracted nucleic acids can also be generated, for example, purification by size, sequence, or other physical or chemical characteristic.
- purification of nucleic acids can be performed after any step in the methods of the invention, such as to remove excess or unwanted reagents, reactants, or products.
- a controlled extension is an increase in the length of an extension primer by a defined length or defined distance.
- defined length refers to a length of extension that is dependent upon the extension conditions and may be dependent upon the template sequence.
- a defined length of the extension may not be known, but can be determined.
- a single step of three nucleotide extension can extend the primer to a position where a missing nucleotide is needed for correct further extension. Such a position is dependent upon the nucleotide combination and the template sequence and is thus defined. But it may not be known if the template sequence is unknown and the extension product has not been measured. Once the template or target nucleic acid sequence is determined, the extension length can be estimated.
- the defined length may be independent of the template sequence.
- the controlled extension is carried out by stepwise ligation reactions, the defined extension length could be independent of the template sequence.
- stepwise ligation to grow a primer.
- a random hexamer (a collection of hexamers with random sequences) is ligated to the 5′ end of the extension primer.
- the random hexamer does not have 5′ phosphate so it cannot be ligated to already extended primer (added hexamer does not provide 5′ phosphate).
- the 5′ phosphate can be added with a kinase reaction and the extended primer is then read for another extension.
- each extension step adds 6 bases. Similar stepwise ligation can be performed in the 3′ end of the extension primer.
- the controlled extensions are at least 55%, 65%, 70%, 75%, 80%, 85%, 95%, 98%, 99%, 99.9%, 99.99% synchronized, because at least majority of the molecules in a cluster are extended at the same length for each steps.
- a controlled primer extension is performed using polymerization.
- the extension primer is extended from its 3′ end in the 5′-3′ orientation.
- long nucleic acids are sequenced by incorporating sequence reads that are obtained using one or more the controlled primer extension reactions.
- controlled primer extension comprises the use of native nucleotides or modified nucleotides.
- a series of sequential reactions is performed such that each reaction of the series extends an extension primer, such as a deoxyribonucleic acid (DNA) primer or a sequencing primer, to a different length to create incremental sequences complementary to a sequencing template (the target nucleic acid or target polynucleotide molecule).
- the extension primer may be the same or similar to other(s) in the series.
- two similar primers may target the same region of the target nucleic acid or target neighboring regions, typically within 10, 20, 50, 100 bases. Two similar primers may target the same region but be different in length.
- the desired region of the target nucleotides may be surrounded by or adjacent to adaptor and/or key(s) sequences.
- a biologically derived sequence may be ligated with an adaptor sequence (such as in sequencing libraries for Illumina HiSeq's reversible terminator sequencing or for Ion Torrent's pH detection sequencing).
- a sequencing primer is often designed to hybridize with the whole or a part of the adaptor sequence and can be designed to hybridize to the last 3′ base of an adaptor sequence so that the first base read is the biological sample derived sequence (Illumina HiSeq library).
- the sequencing primer may be designed to hybridize to a region that is 5′ to the biological sample derived sequence because the first part of the sequence to be read can be a barcode or index run or a key sequence (e.g., in Ion Torrent PGM Sequencing).
- These sequencing primers can also be used as extension primers.
- the extension primer sequences are designed to hybridize to the same or different parts of the adaptor sequences, typically 5′ to the biologically derived sequences.
- the extension primers can be the same or similar.
- extension primer and the extended extension primer can also be used as a sequencing primer.
- the extension of the extension primer or sequencing primer can be with one or more nucleotides and a polymerase, such as native or native performance nucleotide(s) and native or native performance polymerase or a modified polymerase.
- RNA extension can be performed similarly, using an RNA polymerase, various embodiments are illustrated using DNA extensions as examples.
- extended extension primers can be generated or produced by extending the extension primer through controlled extension, such as by pulse extension.
- a series of extended sequencing primers of incremental length are generated.
- sequencing primers of incremental length can be generated or produced by extending the extension primer through extension, such as with an incomplete set of nucleotides, i.e., with a set of nucleotides comprising no more than three different nucleotides.
- Each incomplete set of nucleotides can extend the extension primer until the extension reaches a position where the target nucleic acid (or template) has the complementary nucleotide base.
- the sequencing primer can be extended until it reaches a T base in the template target nucleic acid.
- extension reactions can be performed with at least two different sets of nucleotides. For example, multiple steps of extension can be performed using a first nucleotide set consisting of dATP, dCTP, dGTP and a second nucleotide set consisting of dATP, dCTP, dTTP. Because certain DNA polymerases can incorporate nucleotide diphosphates, if such a DNA polymerase is used for extension, the nucleotides can be diphosphates instead of triphosphates.
- a washing step is used between two extension steps. Because the target nucleic acids or the extension primers are often immobilized on a substrate such as on a glass slide or on beads, washing can be performed relatively easily.
- the washing solution may optionally include nucleotide degrading enzymes such as apyrase and/or alkaline phosphatase.
- Controlled extension can be performed using pulse extension with no washing steps between extension steps when extension is performed with serial addition of various sets of nucleotides, wherein each set comprises one, two or three different nucleotides.
- sets of nucleotides are typically added serially at specified time intervals (such as for 1-10, 10-20, 20-30, 30-60 seconds).
- the nucleotides are typically degraded before the next addition of nucleotides by nucleotide degrading enzymes such as apyrase and/or alkaline phosphatase in the reaction solution.
- Extension with washing and pulse extension steps can be combined. For example, extension can be performed in a pulse mode After certain number of pulse extension steps (such as 20-40, 41-60, 61-100 steps), the reaction mixture can be washed to remove residual nucleotides or by products. A new series of pulse extension steps can then be performed.
- pulse extension steps such as 20-40, 41-60, 61-100 steps
- controlled extension is performed using unmodified nucleotides.
- Unmodified nucleotides are typically more efficiently incorporated than labeled nucleotides.
- labeled nucleotides can be used as long as their incorporation efficiency is high. Incorporation efficiency can be affected by the polymerase used. Therefore, the selection of nucleotides can be dependent upon the corresponding polymerase used to incorporate the nucleotides.
- Modified nucleotides with a bulky group such as a fluorescent label can significantly reduce the incorporation efficiency and may not be good nucleotides for some embodiments.
- the controlled extension can be performed using a polymerase in a buffer that is suitable for the polymerase to catalyze polymerase reaction.
- nucleotide(s) are also added to the extension reaction.
- a reaction contains a polymerase and a set of nucleotides, wherein the set of nucleotides comprises no more than three different nucleotides.
- the set of nucleotides consists of one to three of the four types of nucleotides (e.g. for DNA polymerase, one, two or three of the four nucleotides dATP, dCTP, dTTP, dGTP).
- a reaction containing three of the different nucleotides stops at the template base that is complementary to the missing nucleotide.
- the extension stops at a base “A” on the template because “A” is complementary to the missing nucleotide dTTP, thereby limiting extension of a primer hybridized to the template.
- nucleotide polymers such as dimers, trimers, or longer nucleotide polymers can be used in each set.
- a set may contain GA, GG, GC, GT, AA, AG, AC, AT, CA, CC, CG, and CT.
- Base extension can be performed many times with various nucleotide sets, or with numerous cycles of nucleotide sets.
- the average extension length per single “three nucleotide” extension step is about 4 bases.
- “single nucleotide” extension as used in Ion Torrent's PGM or pyrophosphate sequencing requires a total of 154 extension steps to achieve an approximate average extension length of 96 bases.
- Forty eight three base extension steps can achieve an average extension length of approximately 192 bases.
- Three nucleotide extensions are more than 6 times faster than single nucleotide extensions.
- Optimizing conditions for controlled extension is important for many embodiments where it is desirable to minimize dephasing or prephasing.
- DNA polymerases such as Bst DNA polymerase and Klenow DNA polymerase, both of which are suitable for controlled extension, may incorporate wrong bases particularly if the correct nucleotide is absent. Mis-incorporation tends to happen slower than correct incorporation for some enzymes. Therefore, it may be desirable to complete the extension quickly, for example, within 30 sec, 1 min., 2 min. or 5 min. of incorporation time. On the other hand, too short an extension time may cause incomplete incorporation because of the lack of sufficient incorporation time. Many DNA polymerases, however, have very fast incorporation time.
- Nucleotide concentration is another important consideration for controlled extensions. Higher concentrations of nucleotides tend to cause mis-incorporation, while lower concentrations tend to cause incomplete incorporation. In some embodiments, the nucleotide concentration is between 1-100 ⁇ M, 2-60 ⁇ M, 3-50 ⁇ M, 3-25 ⁇ M, 3-10 ⁇ M, 5-8 ⁇ M.
- the optimal nucleic acid concentrations vary. The optimal nucleotide concentration may be obtained by performing extensions using different nucleotide concentrations and measuring mis-incorporation and/or incomplete extension products versus correct extension products. Various extension products can be detected by gel electrophoresis, HPLC analyses or sequencing. The optimal nucleotide concentration may be dependent upon other conditions for controlled extension.
- DNA polymerases are suitable for controlled extensions in at least some embodiments. Suitable DNA polymerases include. Klenow fragment, Bst, and other DNA polymerases known in the art. Bst DNA polymerase is particularly suitable for controlled extensions when there is no reversible terminator nucleotides in the nucleotide mix. If a reversible terminator is included, a modified polymerase may be used to increase the efficiency of incorporation.
- Controlled extension can be performed in a variety of temperature settings. Typically, the polymerase used has a preferred or optimal reaction temperature or temperature range. The GC content of the target nucleic acids may be a consideration for selecting an extension temperature.
- the controlled extension can be performed, for example, at room temperature, about 20° C., about 37° C., about 65° C. or about 70-75° C.
- the reaction buffer can be selected based upon the polymerase used.
- a pyro-phosphatase/inorganic phosphatase can be included to remove extension byproducts.
- the buffer contains apyrase to digest nucleotides so that the polymerase is only exposed to nucleotides in a short period of time.
- the apyrase concentration can be adjusted to affect the nucleotide concentration curve during the incorporation period.
- a single strand DNA binding protein (SSB) is used in extension reactions to reduce the effect of secondary structures.
- Other additives such as GC Melt, betaine and formamide can be added at appropriate amounts.
- a buffer containing a polymerase such as the Bst DNA polymerase can be used to incubate the hybridized extension primer/template (target nucleic acid) complex so that the enzyme has sufficient time to bind with the complex.
- the incubation time can be optimized by measuring extension results. Typically, the extension time is between 30 sec to 10 min.
- additional polymerase can be added at each step or in some steps to improve overall efficiency of multi-step extensions.
- polymerase is not added at extension steps, particularly in pulse model where the polymerase remains in the buffer when there are no washing steps.
- one to three types of nucleotides are mixed with a reversible terminator nucleotide (such as dGTP) and can be used to control the extension.
- a reversible terminator nucleotide such as dGTP
- Many reversible terminator nucleotides are suitable for this method and are discussed in, e.g., Wu et al. (2007), 3′-O-modified nucleotides as reversible terminators for pyrosequencing, PNAS vol. 104 no. 42 16462-16467; and Bently et al.
- nucleotides that have 3′ phosphates are used as reversible terminators. Treatment with alkaline phosphatase can effectively remove the 3′ phosphate and reverse the chain termination.
- the extension stops at the first base in the template that is complementary to the reversible terminator in the solution (such as a C base in the template and G base in the reversible terminator). There is generally no particular preference for which base is used as the reversible terminator base except when the target templates base composition is known and is biased towards the use of certain bases.
- C or G it may be preferred to use C or G as reversible terminator if the goal is to maximize extension length for every step.
- the mixture may contain more than two or three reversible terminators with one or two no terminator nucleotides.
- the unincorporated nucleotides are washed away and the chain termination is reversed by removing the terminating group in the reversible terminator base.
- the use of reversible terminators in traditional reversible terminator sequencing causes inefficient polymerization and may result in progressive decline in sequencing quality, and further, limit the read length.
- Using reversible terminators in an extension mixture to extend an extension primer will cause less incorporation inefficiency because these are on average incorporated in every four or five bases in random sequences instead of every step in traditional reversible terminator sequencing. Therefore, a mixture of three no terminator nucleotides with one reversible terminator can extend a sequencing primer efficiently even when reversible terminators are used.
- the reversible terminators can be optionally labeled. In such cases, the incorporation can be monitored. In some embodiments, the extension reactions can be monitored by, for example, measuring polymerization byproducts such as pyrophosphate or phosphate or pH changes.
- the extended primers can then be used as sequencing primers to determine the sequence of the template.
- a primer extension product can be extended in the presence of labeled nucleotides to generate a sequence read for the template.
- Sequencing can be performed using, for example, reversible terminator sequencing, ligation based sequencing, pyrophosphate detection based sequencing, proton detection based sequencing, or any suitable sequencing reaction known in the art.
- sequencing a target nucleic acid comprises incremental base extension, compiling data generated from detecting the presence of bases present in each incrementally extended sequence, and determining the sequence of the target nucleic acid through analyzing the collected data.
- a plurality of primer extension products of varying lengths are generated or produced for a target nucleic acid sequence serving as a template.
- the plurality of primer extension products can be used to produce a variety of sequence reads.
- the sequence of the target polynucleotide molecule can be obtained by assembling the variety of sequence reads. The assembly may comprise stitching together overlapping sequence information, for example, originating from a specific target sequence.
- the origin of target sequences may be determined, among other methods, by location, by specific target or barcode sequences or any other suitable method known in the art.
- a barcode specific oligonucleotide can be either used as a seed/extension primer or ligated to a seed/extension primer. The products of the ligation can then be used to prime a sequencing reaction or primer extension reaction.
- the method comprises sequencing one or more bases of a target nucleic acid by using a first sequencing primer hybridized to a target nucleic acid.
- sequencing can be performed using sequencing by synthesis, for example, step-wise reversible terminator sequencing, incorporating labeled nucleotides, pyrophosphate detection based sequencing, ion detection based sequencing, or alternatively, step-wise ligations, or other methods, thereby obtaining a first sequence read.
- the first primer and any extension from the primer from the first sequencing can then be released from the target nucleic acid, for example, by denaturing the target nucleic acid via heating the target nucleic acid, contacting the target nucleic acid with sodium hydroxide solution, urea solution, formamide solution, or any other suitable denaturation solution known in the art.
- the target nucleic acid is then hybridized to a second sequencing primer, which can be the same as the first sequencing primer.
- a primer extension product is generated by extending the second sequencing primer, such as through controlled limited extension to produce an elongated primer
- the elongated sequencing primer can be used to sequence one or more bases of the target nucleic acid by using one of many sequencing methods such as step-wise reversible terminator sequencing from the elongated primer, incorporating labeled nucleotides, pyrophosphate detection based sequencing, ion detection based sequencing, step-wise ligations, or other methods, thereby obtaining a second sequence read.
- the steps of releasing the primer extension product, hybridizing a sequencing primer, extending the sequencing primer to produce an elongated primer, and extending the elongated primer product to obtain a sequence read can be repeated for many times.
- controlled extension means extension of nucleic acid sequence at specific length.
- the specific length can be known or unknown.
- the extension length can be dependent upon the sequence of the template. Because the template sequence may or may not be known before it is sequenced, the specific extension length may not be known until the template is sequenced or the length is otherwise determined. Nevertheless, the length of extension is generally not random, rather it may be determined by the template sequence.
- a majority of the primer extension molecules e.g. at least 55%, 70%, 85%, 90%, 95%, 99%, 99.9%, 99.99%, 99.999% hybridized to target nucleic acids in the cluster is extended to the same length in a single step of extension.
- Some dephasing or prephasing may occur. Over multiple steps of extension, some dephasing or prephasing in an early step may be overcome by one or more late extension steps.
- Each primer extension may include one or more cycles of extension and may extend the sequencing primer by a varying number of bases.
- the plurality of sequence reads can be assembled, such as through overlapping sequence reads, to generate the sequence of the target nucleic acid.
- first primer extension For example, using same initial oligonucleotides for the first seed sequencing primer and if the second primer extension product is shorter than the first sequence read (first primer extension), there will be an overlapping sequence between the first sequence read and the second sequence read. If the second primer extension product is longer than the first sequence read, there can be a gap between the first sequence read and the second sequence read. However, additional sequence reads can be obtained with subsequent extension product removal(s) and one or more new rounds of primer extension to obtain additional sequence reads. Fewer extension steps may be used to have more overlapping sequence results between successive sequencing for more templates. Alternatively, more extension steps can be used to have more non-overlapping sequences.
- first sequence read and subsequent reads depend on the sequencing technology used, which can generate different lengths for a given accuracy.
- sequence read is between 25 to 100 bp, 200 bp, 500 bp, 1 kb or up to 2 kb.
- order of sequencing may not be significant. For example, long sequences can be obtained with extension and sequencing first and then primer without extension and sequencing.
- a large number of nucleic acid targets are simultaneously sequenced.
- the target nucleic acids are typically immobilized on a substrate.
- At least some target nucleic acids can be spatially separated by forming single molecule clusters that are at least partially non-overlapping.
- Methods for sequencing a large number of single molecule clusters are well known in the art and kits, instruments and instructions for performing such sequencing have been commercially available from, e.g., Illumina, Inc. (San Diego, Calif.), Life Technologies, Inc. (Foster City, Calif.) Further, sequencing services are available from Complete Genomics, Inc. (Mountain View, Calif.) and Centrillion Biosciences, Inc. (Mountain View, Calif.).
- the extension distance of one or more steps of controlled extensions is estimated by calculating the difference (Pe ⁇ Ps) between the extension start position (Ps) and the extension end position (Pe). If the target nucleic acid sequence is known, for each extension step, the stop position can be found by, for example, finding the positions of a target nucleic acid base that is complementary with the missing base in the extension step. The stop position is one base before the first complementary base position. For example, an extension with a nucleotide combination of A, C, and G is used to extend a primer over a template sequence of TTGCATTG.
- the stop position is base 4 (“C”) because the template base A is complement with the missing base “T.” If a reversible terminator nucleotide is used in the extension step with three other nucleotides (e.g., A, C, G and terminator T), the stop position should be the first complementary base position (position 5 or first “A”).
- the start position of a single extension step in a series can be the start position of the series if it is the first extension step.
- the start position of a single extension step can also be the next complementary target nucleotide to a missing base or one base after the next complementary target nucleotide to a reversible terminator.
- the total extension distance can be calculated by aggregating the extension distance of each step.
- the extension distance can be calculated, for example, as described. However, if the target nucleic acid sequence is unknown, the extension distance can still be estimated by, for example, using simulated random sequences.
- the average extension distance of each three nucleotide extension step extends about 4 bases per step. If a reversible terminator is used, the average extension distance of a single extension step, after the first extension step, is about 5 bases per step.
- each extension is performed in about 20 seconds
- a 1.000 base extension takes on average 250 steps or 1.4 hours.
- the extension time is less than one hour. If a reversible terminator is used, the single step extension time may be longer to allow time for deblocking and other optional steps.
- controlled extensions are performed in suitable reaction vessels, such as a test tube, a well in a microtiter plate, or a flow cell. While controlled extensions and sequencing can be performed manually, it is more convenient and may be more consistent if some steps are performed with automated equipment.
- controlled extensions are performed using a computer controlled instrument.
- nucleotide sets are delivered to the reaction site, such as a lane in a flow cell or a flow chamber of a chip, using a computer controlled pump or an automated pipette.
- Computer controlled pumps are available from many commercial sources and in many format and specifications. Syringe pumps and peristaltic pumps are particularly suitable for delivering small volumes of reagents in a very short time.
- Computer software that control the operation of the pumps can be coded using any suitable language known in the art, such as C/C++, objective C, C#, Java. or a variety of scripting languages.
- each reagent such as washing solution or a nucleotide set can be delivered using its own pump, it is often desirable to use a pump in combination with one or more valves.
- a computer controlled valve can make the system more versatile.
- liquid reagents can be manipulated via pressurized containers creating back pressure onto reagents, rather than using pumps.
- sequencers such as the Hiseq 2000, Hiscan Sequencers, MiSeq sequencers and Ion Torrent PGM sequencers include computer controlled reagent delivery systems. These systems may be reprogrammed to perform the sequencing methods in some embodiments.
- liquid handling equipment such as the cBot cluster station and MiSeq from Illumina, Inc. and a variety of liquid handling robots, such as the Tecan Freedom Evo and Beckman Coulters Biomek series liquid handling robots can be reprogrammed (using scripts) to perform controlled extensions.
- Reagents may be packaged as kits to facilitate automation.
- the controlled extensions can be performed in line in a sequencer with suitable reagent delivery capability.
- a flow cell is sequenced, stripped, extended, and sequenced in a sequencer with the cluster alignment maintained so that the resulting sequence data can be correlated with the correct clusters. Maintaining alignment can be important because a large number of clusters can easily be sequenced simultaneously. Maintaining alignment, however, does not necessarily mean that the flow cell cannot be moved.
- cluster generation methods such as the Ion Torrent beads on chip format
- aligning different reads to the same cluster/bead is straight forward since each bead has its own coordinate in a chip.
- clusters in the Hiseq or MiSeq sequencers each identified cluster has coordinates and can be located as long as alignment has not changed significantly.
- clusters from different sequencing runs may still be correlated by comparing coordinates between two different runs and using overlapping sequences, as well as, alignment to reference sequences. If a consistent pattern of pixel shift is uncovered, a large percentage of clusters in different sequencing runs can still be correlated.
- Sequencing by extending a sequencing primer or by extending an extension product can be carried out using a variety of methods.
- sequencing can be carried out with a labeled reversible terminator or by ligation with a labeled oligonucleotide.
- Sequencing can be performed using any commercially available method, such as a reversible terminator based sequencing method that is commercially available from companies such as Illumina, Inc. (San Diego, Calif.). Helicos, Inc. (Boston, Mass.), and Azco Biotech, Inc. (San Diego, Calif.).
- Sequencing can be accomplished through classic Sanger sequencing methods, which are well known in the art.
- a long target nucleic acid e.g. at least 1,000, 2,000, 10,000, 50,000 bases in length
- the sequence readout can be carried out using Sanger sequencing which can read about 500-1200 bases per reaction.
- the controlled extension is carried out in a series of extension reactions.
- a 1.800 base long DNA fragment can be sequenced by one Sanger sequence read of 1.000 bases and another Sanger sequence read of 1,000 bases after a controlled extension of about 800 bases.
- the controlled extension takes about 2-5 hours.
- cleavable nucleotides are used during the controlled extension.
- the controlled extension product can be removed from the Sanger sequencing product so that the controlled extension product does not add bases to the Sanger fragment.
- the Sanger readout can be performed using standard Sanger sequencing gels or capillary sequencers.
- the cleavable nucleotide can be a dUTP.
- the uracil from the base U can be released using Uracil-DNA glycosylase (UDG).
- UDG Uracil-DNA glycosylase
- the resulting apurinic/apyrimidinic (AP) site can be cleaved using, e.g., AP lyase, which can break a DNA fragment.
- AP lyase e.g., AP lyase
- other suitable cleavable base systems known in the art can also be used.
- Sequencing can also be accomplished using high-throughput systems some of which allow detection of a sequenced nucleotide immediately after or upon its incorporation into a growing strand, i.e., detection of sequence in real time or substantially real time.
- high throughput sequencing generates at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 100.000 or at least 500,000 sequence reads per hour; with each read being at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read
- high-throughput sequencing involves monitoring pH changes during polymerization.
- high-throughput sequencing involves the use of technology available by Helicos BioSciences Corporation (Cambridge, Mass.) such as the Single Molecule Sequencing by Synthesis (SMSS) method. SMSS is described in part in US Publication Application Nos. 20060024711; 20060024678; 20060012793; 20060012784; and 20050100932.
- high-throughput sequencing involves the use of technology available from 454 Lifesciences, Inc. (Branford, Conn.). Methods for using bead amplification followed by fiber optics detection are described in Marguiles, M., et al. “Genome sequencing in microfabricated high-density picolitre reactors”. Nature, doi: 10.1038/nature03959; and well as in US Publication Application Nos. 20020012930; 20030058629; 20030100102; 20030148344; 20040248161; 20050079510; 20050124022; and 20060078909.
- high-throughput sequencing is performed using Clonal Single Molecule Array (Solexa, Inc./Illumina, Inc.) or sequencing-by-synthesis (SBS) utilizing reversible terminator chemistry.
- Clonal Single Molecule Array Solexa, Inc./Illumina, Inc.
- SBS sequencing-by-synthesis
- anyDot.chjps Geneovoxx, Germany
- AnyDot-chips allow for 10 ⁇ -50 ⁇ enhancement of nucleotide fluorescence signal detection.
- Any Dot.chips and methods for using them are described in part in International Publication Application Nos. WO02/088382, WO03020968, WO03/031947, WO2005/044836, PCT/EP05/105657, PCT/EP05/105655; and German Patent Application Nos.
- Sequence can then be deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions.
- a polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site.
- a plurality of labeled types of nucleotide analogs are provided proximate to the active site, with each distinguishably type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence.
- the growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site.
- the nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified.
- the steps of providing labeled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
- sequencing can be conducted with labeled nucleotides such as dNTPs with labels.
- Bases may be detected by extending the incremental fragments via contacting the hybridization complexes sequentially with one of labeled dATP, dCTP, dGTP and dTTP, in the presence of a polymerase, and detecting the incorporation of the labeled dATP, dCTP, dGTP and dTTP to obtain a sequence read from each reaction.
- a mixture of labeled dATP, dCTP, dGTP and dTTP are used.
- modified dNTPs such as labeled dNTPs
- only the first few bases are extended to generate strong signal.
- the possibility of “run-on” extension is rather low and the signal generated by such “run-on” extension can be filtered out as noise using methods provided herein or known in the art.
- a mixture of labeled ddATP, ddCTP, ddGTP and ddTTP are used, and no “run-on” extension is permitted.
- only one round of interrogation that covers all four possible bases is carried for each incremental fragment.
- sequential addition with one labeled dNTP in each round of interrogation provides possible addition of one detectable base at a time (i.e. on each substrate). This generally results in short read (such as one base or a few bases) that could be assembled for each round. In another embodiment, a longer read is generated with more than one round of interrogation.
- a mixture of labeled ddATP, ddCTP, ddGTP, ddTTP and small amount ( ⁇ 10% (e.g. 5, 6, 7, 8, or 9%) or ⁇ 20% (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19%) of native dATP, dCTP, dGTP, and dTTP are added.
- the labeled nucleotides are reversible terminators. Multiple bases can be detected by the signal strength or in the case of reversible terminator, base addition detection.
- Nucleotide reversible terminators are nucleotide analogues, which are modified with a reversible chemical moiety capping the 3′—OH group to temporarily terminate the polymerase reaction. In this way, generally only one nucleotide is incorporated into the growing DNA strand even in homopolymeric regions.
- the 3′ end can be capped with an amino-2-hydroxypropyl group.
- An allyl or a 2-nitrobenzyl group can also be used as the reversible moiety to cap the 3′-OH of the four nucleotides.
- Examples of reversible terminators include but are not limited to 3′-O-modified nucleotides such as 3′-O-allyl-dNTPs and 3′-O-(2-nitrobenzyl)-dNTPs.
- the 3′-OH of the primer extension products is regenerated through different deprotection methods.
- the capping moiety on the 3′-OH of the DNA extension product can be efficiently removed after detection of a cleavage site by a chemical method, enzymatic reaction or photolysis, i.e. the cap will be cleaved from the cleavage site.
- templates containing homopolymeric regions are immobilized on Sepharose beads, and then extension-signal detection-deprotection cycles are conducted by using the nucleotide reversible terminators on the DNA beads to unambiguously decipher the sequence of DNA templates. In one embodiment, this reversible-terminator-sequencing approach is used in the subject methods to accurately determine DNA sequences.
- the cap may be referred to herein as a “protective group”).
- Polynucleotide of the invention can be labeled.
- a molecule or compound has at least one detectable label (e.g., isotope or chemical compound) attached to enable the detection of the compound.
- labels of use in the present invention include without limitation isotopic labels, which may be radioactive or heavy isotopes, magnetic labels, electrical labels, thermal labels, colored and luminescent dyes, enzymes and magnetic particles as well. Labels can also include metal nanoparticles, such as a heavy element or large atomic number element, which provide high contrast in electron microscopy. Dyes of use in the invention may be chromophores, phosphors or fluorescent dyes, which due to their strong signals provide a good signal-to-noise ratio for decoding.
- labels may include the use of fluorescent labels.
- Suitable dyes for use in the present invention include, but are not limited to, fluorescent lanthanide complexes, including those of Europium and Terbium, fluorescein, rhodamine, tetramethylrhodamine, eosin, erythrosin, coumarin, methyl-coumarins, pyrene, Malacite green, stilbene, Lucifer Yellow, Cascade Blue, Texas Red, and others described in the 1 lth Edition of the Molecular Probes Handbook by Richard P. Haugland, hereby expressly incorporated by reference in its entirety.
- fluorescent nucleotide analogues readily incorporated into the labeling oligonucleotides include, for example, Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5-dUTP (GE Healthcare), fluorescein-12-dUTP, tetramethylrhodamine-6-dUTP.
- fluorophores available for post-synthetic attachment include, inter alia, Alexa Fluor® 350, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor®647, BODIPY 493/503, BODIPY FL, BODIPY R6G, BODIPY 530/550, BODIPY TMR, BODIPY 558/568, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665, Cascade Blue, Cascade Yellow, Dansyl, lissamine rhodamine B, Marina Blue, Oregon Green 488, Oregon Green 514, Pacific Blue, rhodamine 6G, rhodamine green, rhodamine red, tetramethylrhodamine, Texas Red (available from Invitrog
- multiplex detection formats are used for base detection or sequencing.
- multiplex formats include, but are not limited to, either labeled/tagged bead sets (e.g., those produced by Luminex), in which each label is assigned to the individual probe-specific primer, or oligonucleotide arrays on slides, in which specific oligonucleotide spot/position is assigned to the individual probe-specific primer.
- the limited sequence complexity of the recovered target-specific probes can provide conditions for easier and higher level multiplexing, especially using with universal and Zip-code/ID sequence tags.
- the primers can be extended by a nucleotide polymerase.
- the polymerase is selected from an RNA polymerase and a reverse transcriptase.
- the detection phase of the process may involve scanning and identifying target polynucleotide sequences in the test sample.
- Scanning can be carried out by scanning probe microscopy (SPM) including scanning tunneling microscopy (STM) and atomic force microscopy (AFM), scanning electron microscopy, confocal microscopy, charge-coupled device, infrared microscopy, electrical conductance, transmission electron microscopy (TEM), and fluorescent or phosphor imaging, for example fluorescence resonance energy transfer (FRET).
- Optical interrogation/detection techniques include but are not limited to near-field scanning optical microscopy (NSOM), confocal microscopy and evanescent wave excitation.
- More specific versions of these techniques include far-field confocal microscopy, two-photon microscopy, wide-field epi-illumination, and total internal reflection (TIR) microscopy. Many of the above techniques can also be used in a spectroscopic mode.
- the actual detection means include charge coupled device (CCD) cameras and intensified CCDs, photodiodes and photomultiplier tubes. These methods and techniques are well-known in the art. Various detection methods are disclosed in U.S. Patent Application Publication No. US 2004/0248144, which is herein incorporated by reference.
- signals of different wavelength can be obtained by multiple acquisitions or by simultaneous acquisition by splitting the signal, using RGB detectors or analyzing the whole spectrum (Richard Levenson. Cambridge Healthtech Institutes, Fifth Annual meeting on Advances in Assays. Molecular Labels, Signaling and Detection, May 17-18 th Washington D.C.).
- Several spectral lines can be acquired by the use of a filter wheel or a monochrometer.
- Electronic tunable filters such as acoustic-optic tunable filters or liquid crystal tunable filters can be used to obtain multispectral imaging (e.g. Oleg Hait, Sergey Smirnov and Chieu D. Tran, 2001, Analytical Chemistry 73: 732-739).
- An alternative method to obtain a spectrum is hyperspectral imaging (Schultz et al., 2001, Cytometry 43:239-247).
- Phred software is used for DNA sequence analysis. Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. Phred is a widely-used program for base calling DNA sequencing trace files. Phred can read trace data from SCF files and ABI model 373 and 377 DNA sequencer chromat files, automatically detecting the file format. After calling bases, Phred writes the sequences to files in either FASTA format, the format suitable for XBAP.
- Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence.
- the Phred quality values have been thoroughly tested for both accuracy and power to discriminate between correct and incorrect base-calls. Phred can use the quality values to perform sequence trimming.
- DNA polymerase based sequencing reactions generally possess efficiency problems.
- Native nucleotides can be incorporated at a relatively high efficiency, compared to reduced efficiency incorporation of non-native nucleotides, such as labeled nucleotides or reversible terminators.
- non-native nucleotides such as labeled nucleotides or reversible terminators.
- the reduced incorporation efficiency accounts for increased error rates and hence decreased sequence information quality along growing strands.
- the resulting sequence information consists of relatively short sequence reads that have been terminated due to unacceptably low correct sequence signal.
- a seed primer can be extended using high incorporation efficiency nucleotides, such as native nucleotides. Accordingly, a large population of templates can be primed further and further downstream to start a sequencing reaction, for example n bases downstream as compared to another sequencing primer. The sequencing reaction at the start position would start with a high overall efficiency and continue s bases, until the quality of the sequencing information drops below an acceptable level. Due to the initial n bases, sequence information can be obtained down to n+s bases on the target template. Sequencing primers of different length can thus provide sequencing information that ends n bases apart. By varying the length n of high efficiency extension reactions prior to sequencing, overlapping sequence information of high quality can be obtained from a single template.
- a set of sequencing primers are used that start sequencing reactions less than 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200 or more bases apart.
- sequence information for up to 500, 1000, 2000 or more bases are obtained. Methods described herein, allow for obtaining sequence information for up to 500, 1000, 2000 or more bases in over 80, 90, 95, 98, 99, 99.5, 99.9%, or more of the templates.
- one detection cycle is performed by adding labeled A, C, G, T sequentially followed by washing and detecting after each addition.
- multiple detection cycles can be performed using nucleotides with removable labels.
- the series of incremental fragments are further extended (thus, serving as sequencing primer) for sequencing reactions to obtain the sequence information of the target molecules.
- the sequence information is a series fragment sequences that are adjacent on the target molecule, which can be assembled to obtain a long fragment or the full length sequence of the target molecule.
- serial sequencing of a target polynucleotide is converted to parallel sequencing to reduce the time required for sequencing a given number of bases of the target polynucleotide.
- a nucleic acid target is attached to a substrate or immobilized on a substrate.
- the substrate can be a bead, flat substrate, flow cell or other suitable surfaces.
- the substrate comprises glass.
- a target nucleic acid is attached or immobilized to a substrate via a capture probe.
- a capture probe is an oligonucleotide that is attached to the surface of a substrate and is capable to bind to a sequencing template.
- Capture probes can be of various lengths, such as from 18 bases to 100 bases, such as 20 bases to 50 bases.
- the capture probe has a sequence that is complementary to the sequencing template.
- capture probes can be designed to complement to the known sequences.
- the capture probes are complementary to “barcode” or “identifier” sequence added to the sequencing templates via, e.g., specific ligation, as a part of the primer for PCR reaction. In such reactions, a sequencing template-specific primer and a primer comprising a unique barcode are used for the amplification, thus all the target molecules with the same sequences have the same barcode attached.
- the capture probe can be attached to the substrate at either the 5′ end or the 3′ end.
- the capture probe is attached to the substrate at the 5′ end, and the 3′ end of the capture probe can be extended by the incorporation of nucleotides as described herein to generate incremental extension fragments which can in turn be sequenced by further incorporation of labeled nucleotides.
- the capture probe is attached to the substrate at the 3′ end, and the 5′ end of the capture probe cannot be extended by the incorporation of nucleotides.
- a second probe hybridizes to the sequencing template and its 3′ end is extended by the incorporation of nucleotides as described herein to generate an incremental extension fragment which can in turn be sequenced by further incorporation of labeled nucleotides. In this case, the extension is towards the direction of the capture probe.
- the sequencing primer hybridizes to a linker introduced to the end of the sequencing template when generated, either directly from a genomic DNA or from a parent target molecule.
- a seed/sequencing primer that is a “universal primer” can be used to sequence different target molecules.
- sequencing primers specific to the target molecule are used.
- the capture probe is immobilized on a solid support before binding to the sequencing template.
- the 5′ end of a capture probe is attached to a solid surface or substrate.
- a capture probe can be immobilized by various methods known in the art including, without limitation, covalent cross-linking to a surface (e.g., photochemically or chemically), non-covalent attachment to the surface through the interaction of an anchor ligand with a corresponding receptor protein (e.g. biotin-streptavidin or digoxigenin-anti-digoxigenin antibody), or through hybridization to an anchor nucleic acid or nucleic acid analog.
- the anchor nucleic acid or nucleic acid analog have sufficient complementarity to the sequencing template (i.e., the formed duplex has sufficiently high T m ) that the anchor-sequencing template-probe complex will survive stringent washing to remove unbound targets and probes, but they do not overlap with the target site that is complementary to the probe antisense sequence.
- a capture template or target nucleic acid is used as a template for bridge amplification.
- two or more different immobilized probes are used.
- single molecule templates are used to generate clusters of nucleic acids on a substrate by bridge amplification.
- each of the clusters of nucleic acids contains substantially the same (>95%) type of nucleic acids because they are derived from a single template nucleic acid. These clusters are typically referred to as single molecule clusters.
- Such substrates with single molecular clusters can be produced using, for example, the method described in Bently et al., Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456, 53-59 (2008), incorporated herein by reference, or using commercially available kit and instrument from, for example, Illumina, Inc. (San Diego, Calif.).
- the solid substrate can be made of any material to which the molecules can be bound, either directly or indirectly.
- suitable solid substrates include flat glass, quartz, silicon wafers, mica, ceramics and organic polymers such as plastics, including polystyrene and polymethacrylate.
- the surface can be configured to act as an electrode or a thermally conductive substrate (which enhances the hybridization or discrimination process).
- micro and sub-micro electrodes can be formed on the surface of a suitable substrate using lithographic techniques. Smaller nanoelectrodes can be made by electron beam writing/lithography. Electrodes can also be made using conducting polymers which can pattern a substrate by ink-jet printing devices by soft lithography or be applied homogenously by wet chemistry.
- Electrodes can be provided at a density such that each immobilized molecule has its own electrode or at a higher density such that groups of molecules or elements are connected to an individual electrode. Alternatively, one electrode may be provided as a layer below the surface of the array which forms a single electrode.
- the solid substrate may optionally be interfaced with a permeation layer or a buffer layer. It is also possible to use semi-permeable membranes such as nitrocellulose or nylon membranes, which are widely available. The semi-permeable membranes can be mounted on a more robust solid surface such as glass.
- the surface layer may comprise a sol-gel.
- the surfaces may optionally be coated with a layer of metal, such as gold, platinum or other transition metal.
- a particular example of a suitable solid substrate is the commercially available SPR BIACoreTM chip (GE Healthcare). Heaton et al., 2001 (PNAS 98:3701-3704) have applied an electrostatic field to an SPR surface and used the electric field to control hybridization.
- the solid substrate is generally a material having a rigid or semi-rigid surface.
- at least one surface of the substrate is substantially flat, although in some embodiments it may be desirable to physically separate discrete elements with, for example, raised regions or etched trenches.
- the solid substrate may comprise nanovials-small cavities in a flat surface e.g. 10 ⁇ m in diameter and 10 ⁇ m deep.
- Other formats include but are not limited to synthetic or natural beads, membranes or filters, slides including microarray slides, microtiter plates, microcapillaries, and microcentrifuge tubes.
- oligonucleotide capture probes are coated or attached onto beads for capturing the sequencing templates.
- Hybridization between capture probes and sequencing template polynucleotides can be carried out on beads in columns at a controlled temperature and salt concentration. The hybridization products can be eluted from the beads with moderate pressure.
- Loading of nucleic acids onto these substrates can be modulated and/or controlled by the flow and/or electrical forces, including diffusion forces and surface forces exerted by areas of differential charge and/or hydrophobicity.
- the number of nucleic acids applied to the substrate i.e., with a loading buffer or other solution
- the number of nucleic acids applied to the substrate can be adjusted to assure maximal occupancy of the linear features with non-overlapping nucleic acid molecules and thus minimize the number of empty linear features on the substrate.
- at least 50% of the linear features of a substrate are occupied by at least one nucleic acid molecule.
- at least 60%, 70%, 80%, 90%, and 95% of the linear features are occupied by one or more nucleic acids.
- the first approach is in situ oligonucleotide synthesis in which the probes are in known geographic locations in the X-Y coordinate plane.
- the oligonucleotide probe is synthesized on the surface. Examples of technologies that allow on-surface oligo synthesis include but are not limited to photolithography and ink jet.
- the pre-synthesized oligonucleotide probes are spotted onto the surface.
- Various microarray protocols for example, protocol for Agilent inkjet-deposited pre-synthesized oligo arrays are known to one skilled in the art.
- Polymers such as nucleic acids or polypeptides can be synthesized in situ using photolithography and other masking techniques whereby molecules are synthesized in a step-wise manner with incorporation of monomers at particular positions being controlled by methods of masking techniques and photolabile reactants.
- U.S. Pat. No. 5,837,832 describes a method for producing DNA arrays immobilized to silicon substrates based on very large scale integration technology.
- U.S. Pat. No. 5,837,832 describes a strategy called “tiling” to synthesize specific sets of probes at spatially-defined locations on a substrate.
- U.S. Pat. No. 5,837,832 also provides references for earlier techniques that can also be used.
- Light directed synthesis can also be carried out by using a Digital Light Micromirror chip (Texas Instruments) as described (Singh-Gasson et al., (1999) Nature Biotechnology 17:974-978).
- conventional deprotecting groups such as dimethoxytrityl can be employed with light directed methods where, for example, a photoacid molecule bearing a chromophore capable of receiving UV radiation is generated in a spatially addressable way which selectively deprotects the DNA monomers (McGall et al PNAS 1996 93: 1355-13560; Gao et al J. Am. Chem Soc. 1998 120: 12698-12699).
- Electrochemical generation of acid is another method that can be used in the subject methods of the present invention.
- the in situ arrays can have about 1 to 10, 10 to 0.100 to 1000, or 1.000 to 100,000,000 probes.
- the in situ arrays can have more than 100,000.000 array probes. In one embodiment, the in situ array carries approximately 200,000,000 probes.
- Molecules that can be immobilized in the array include nucleic acids such as DNA and analogues and derivatives thereof, such as PNA. Nucleic acids can be obtained from any source, for example genomic DNA or cDNA or synthesized using known techniques such as step-wise synthesis. Nucleic acids can be single or double stranded. DNA nanostructures or other supramolecular structures can also be immobilized. Other molecules include but are not limited to compounds joined by amide linkages such as peptides, oligopeptides, polypeptides, proteins or complexes containing the same; defined chemical entities, such as organic molecules; conjugated polymers and carbohydrates or combinatorial libraries thereof.
- biotinylated beads are used to anchor the target sequence and the sequencing is carried out by performing the base incorporation in the bead system.
- a “chip” is a substrate for immobilizing or attached a target.
- the geometric design of the chip can vary.
- the chip can be a tube with the usable surface inside. Chips can be in flow cell format to facilitate liquid handling.
- the chips are allele specific sequencing chips as disclosed in PCT/US2010/048526, herein is incorporated by reference.
- the chip is a membrane multichip.
- a multilayered substrate with holes e.g. 1 micron to 50 micron is generated.
- Target molecules are loaded into the holes with some holes containing a single molecule target.
- Targets are amplified within holes.
- the layers are peeled off. Each layer has some molecules attached to the holes.
- the layers are substantially similar in terms of molecules (copies of each other). These layers can be directly used or transferred to a suitable sequencing substrate for sequencing.
- chips include but are not limited to photo cleavable oligo multichip, multilayer substrates with holes, and nanoprinting chip.
- biotinylated beads are used to anchor the target sequence and the sequencing is carried out by performing the base incorporation in the bead system.
- An immobilized or attached target nucleic acid can then be hybridized with a primer (or multiple primers).
- Polymerase in its suitable buffer is then added to make contact with the immobilized or attached template or target nucleic acid.
- the primer can be used directly as a sequencing primer or can be used as a seed primer to generate primer extension products of various lengths. These primer extension products can further be used as sequencing primers in a sequencing reaction. Primer extension reactions are discussed in further detail elsewhere herein.
- a controlled extension reaction may be chosen to generate primer extension products.
- the buffer may contain a set of nucleotides (1-3 nucleotides of the four possible nucleotides) or the set of nucleotides can be added later to start the reaction.
- nucleotide degrading enzymes such as apyrase or alkaline phosphatase are added into the reaction buffer at the end of the reaction and/or in the washing solution to minimize contamination of the next round of extension with nucleotides from the previous extension.
- primer extension is performed using a pulse method, such as described herein.
- the immobilized template is contacted with a multi-enzyme buffer that contains a polymerase (such as Klenow exo( ⁇ ) for DNA sequencing), one or several nucleotide degrading enzymes such as apyrase, alkaline phosphatase.
- a polymerase such as Klenow exo( ⁇ ) for DNA sequencing
- nucleotide degrading enzymes such as apyrase, alkaline phosphatase.
- an inorganic pyrophosphatase is added to degrade pyrophosphate generated by polymerase reaction.
- Sets of nucleotides are successively added to the reaction buffer at interval of 30-90 seconds (preferably 30 seconds). Nucleotides are utilized by the polymerase for polymerase reaction and at the same time, are degraded by apyrase or alkaline phosphatase.
- a large number of different target polynucleotides or its fragments can be immobilized on a substrate. Such a substrate is replicated many times to produce a set of the substrates.
- a plurality of target nucleic acids or templates are immobilized on substrates and each template cluster is originated from a single molecule (see for example, Bentley et al., Nature 456, 53-59, (2008) and its supplement, incorporated herein by reference in its entirety). Because the location of the template cluster are known, a first sequence from the first round of sequencing and second sequence from a second round of sequencing for the same template can be readily determined.
- parallel sequencing is performed.
- parallel sequencing commonly referred to as next generation sequencing, millions or more template (clusters) are sequenced simultaneously often with a single primer.
- nucleotide addition is optimized to control primer extension length.
- a fixed sequence of nucleotide addition such as step one: dATP, dCTP, dGTP; step two, dCTP, dGTP, dTTP; step three: dGTP, dTTP, dATP; step four; dTTP, dATP, dCTP; step five: dATP, dCTP, dGTP, and so forth, is used to control the length of the primer extension. Because template sequences vary, the resulting extended primer length varies.
- multiple targets such as 10.000, 100,000, 1 million, 10 million, or 100 million sequences or targets are sequenced simultaneously.
- there are a plurality of capture sites with each capture sites have different capture probes that recognize different targets (sequencing templates). If the targets are fragments of a longer sequence, contigs can be assembled to obtain the longer sequence, such as the whole genome sequence.
- multiple target sequencing is typically done in chip format, but it can be performed in bead format as well.
- the chip comprises random clusters started with single molecules (such as Illumina flow cells).
- the molecular clones of target molecules can be printed to many substrates to create replicate substrates for sequencing.
- the chips are duplicating chips by nylon membrane impression and printing or other methods known in the art.
- the present invention provides a system for sequencing.
- one or more methods of sequencing disclosed herein are performed by a system, such as an automated sequencing system instrument controlled by a user (e.g., as schematically depicted in FIG. 7 ).
- the user controls a computer which may operate various instrumentation, liquid handling equipment or analysis steps of the invention.
- a computer controlled collection, handling, or analysis system is used to control, activate, initiate, continue or terminate any step or process of the methods as herein described.
- a computer device is used to control, activate, initiate, continue or terminate the handling and/or movement of fluids or reagents into and through the system or device as herein described, the handling or movement of one or more reagents to one or more chambers or plurality of chambers in one or more cartridges, the obtaining or analysis of data, etc.
- chips of the sequencing reaction are placed in one or more chambers/flow cells or plurality of chambers/flow cells in one or more cartridges. The chips may comprise substrates which provide sites for the sequencing reactions.
- the computer is any type of computer platform such as a workstation, a personal computer, a server, or any other present or future computer.
- the computer typically includes known components such as a processor, an operating system, system memory, memory storage devices, and input-output controllers, input-output devices, and display devices.
- display devices include display devices that provides visual information, this information typically may be logically and/or physically organized as an array of pixels.
- a graphical user interface (GUI) controller is included that comprises any of a variety of known or future software programs for providing graphical input and output interfaces.
- GUI's provide one or more graphical representations to the user, and are enabled to process the user inputs via GUI's using means of selection or input known to those of ordinary skill in the related art.
- each execution core may perform as an independent processor that enables parallel execution of multiple threads.
- the processor executes operating system, which is, for example, a WINDOWSTM type operating system (such as WINDOWSTM XP) from the Microsoft Corporation; the Mac OS X operating system from Apple Computer Corp. (such as 7.5 Mac OS X v10.4 “Tiger” or 7.6 Mac OS X v10.5 “Leopard” operating systems); a UNIXTM or Linux-type operating system available from many vendors or what is referred to as an open source; or a combination thereof.
- the operating system interfaces with firmware and hardware in a well-known manner, and facilitates processor in coordinating and executing the functions of various computer programs that may be written in a variety of programming languages.
- the operating system typically in cooperation with the processor, coordinates and executes functions of the other components of computer.
- the operating system also provides scheduling, input-output control, file and data management, memory management, and communication control and related services, all in accordance with known techniques.
- the system memory is of a variety of known or future memory storage devices. Examples include any commonly available random access memory (RAM), magnetic medium such as a resident hard disk or tape, an optical medium such as a read and write compact disc, or other memory storage device.
- RAM random access memory
- Memory storage devices may be any of a variety of known or future devices, including a compact disk drive, a tape drive, a removable hard disk drive, USB or flash drive, or a diskette drive.
- Such types of memory storage devices typically read from, and/or write to, a program storage medium (not shown) such as, respectively, a compact disk, magnetic tape, removable hard disk. USB or flash drive, or floppy diskette.
- a computer program product comprising a computer usable medium having control logic (computer software program, including program code) stored therein.
- the control logic when executed by a processor, causes the processor to perform functions described herein.
- some functions are implemented primarily in hardware using, for example, a hardware state machine. Implementation of the hardware state machine so as to perform the functions described herein will be apparent to those skilled in the relevant arts.
- input-output controllers include any of a variety of known devices for accepting and processing information from a user, whether a human or a machine, whether local or remote. Such devices include, for example, modern cards, wireless cards, network interface cards, sound cards, or other types of controllers for any of a variety of known input devices.
- Output controllers of input-output controllers could include controllers for any of a variety of known display devices for presenting information to a user, whether a human or a machine, whether local or remote.
- the functional elements of computer communicate with each other via system bus. Some of these communications may be accomplished in alternative embodiments using network or other types of remote communications.
- applications communicate with, and receive instruction or information from or control one or more elements or processes of one or more servers, one or more workstations, and/or one or more instruments.
- a server or computer with an implementation of applications stored thereon are located locally or remotely and communicate with one or more additional servers and/or one or more other computers/workstations or instruments.
- applications are capable of data encryption/decryption functionality. For example, it may be desirable to encrypt data, files, information associated with GUI's or other information that may be transferred over network to one or more remote computers or servers for data security and confidentiality purposes.
- applications include instrument control features, where the control functions of individual types or specific instruments such as a temperature controlling device, imaging device, or fluid handling system are organized as plug-in type modules to the applications.
- the instrument control features include the control of one or more elements of one or more instruments that, for instance, include elements of a fluid processing instrument, temperature controlling device, or imaging device.
- the instrument control features are capable of receiving information from the one or more instruments that include experiment or instrument status, process steps, or other relevant information.
- the instrument control features are under the control of an element of the interface of the applications.
- a user inputs desired control commands and/or receive the instrument control information via one of GUI's.
- the automated sequencing system is controlled by a first user, conducts sequencing methods described herein, analyzes the raw data as described herein, assembles sequence reads as described herein, and then send the sequencing information to a remote second user at a location different from that of the first user.
- identifying target polynucleotide sequence and integrating sequences to assemble genomic information is carried out with a computer.
- the present invention encompasses a computer software or algorithm designed to analyze and assemble sequence information obtained via the methods of the present invention.
- reads at array features correspond to X-Y coordinates that map to the loci of interest.
- a “read” typically refers to an observed sequence derived from raw data, such as the order of detected signals corresponding to the cyclical addition of individual nucleotides.
- the reads are checked against the expected reference genome sequence at the 10-bp loci for quality control.
- a reference sequence enables the use of short read length. Reads that have passed the quality control check are then combined to generate a consensus sequence at each locus. In one example, there are 10 unique probes per locus of interest minus any reads that have failed the quality control checks.
- the reads are at random locations on a surface. e.g. a flow cell.
- the reads are checked against the expected subset of reference genome sequence at the loci of interest for quality control. Reads that have passed the quality control check are mapped to the individual locus of interest. Reads corresponding to each locus are then combined to generate a consensus sequence. In one embodiment, there are more than 3,000 reads per 10-bp locus.
- the present invention provides a method for obtaining the sequence information of the target molecules by assembling the sequence reads from each of the substrates.
- the sequence reads can be obtained by base extension of a series of polynucleotide with different lengths due to the different base extension of the same capture probe using the same target molecules, such as described above. As such, they represent continued fragments of the target molecule sequence and can be assembled to provide the continue sequence of the target molecule.
- a computer program can be used to track the sequence reads obtained from the same capture probes on different substrates for the assembly.
- sequencing information originating from a single template is identified using a unique identifier of the template, such as the template location or a tag sequence. Overlapping sequence information can be stitched together to generate longer sequence information from a single template. In some embodiments, a template's complement is also sequenced. In some embodiments, sequence information is stitched together using sequence reads generated both from the template and its complement.
- the sequencing methods provided herein permit the use of unmodified nucleotide and enzymes, which utilize the natural nucleic acid synthesis chemistry. This not only reduces the cost, but also increases the accuracy because the high-fidelity chemistry generated by the evolution process.
- the sequencing method provided by the present invention can be used to sequence DNA/RNA. It can be used to sequence pathogens/microbial genomes to identify species/strains quickly.
- One advantage of the sequencing method provided by the present invention is that is can accommodate low efficiency sequencing chemistry (reversible terminators, ligations, etc.), thus reduces the time to sequence.
- the method can sequence very long fragments (e.g. 100-10000 base pairs or more).
- loci- and allele-specific sequencing templates are SNP capable, and can carry multiple signal-reporting labels or ligands, providing for a higher level of multiplexing of diverse target sequences.
- the present invention can provide low-cost, high-throughput and accurate methods for sequencing target polynucleotides with long reads.
- the long reads are assembled from sequencing reads obtained using available sequencing technologies discussed herein and assembled using the methods, compositions, and systems of the inventions.
- samples can comprise pooled genomes of target and control subject populations respectively.
- Populations can be of any sex, race, gender or age. Populations can also include animal subjects, particularly mammalian subjects such as dog, cat, horse, mouse, rat, etc., screened for veterinary medicine or pharmaceutical drug development purposes.
- the target polynucleotide is DNA, for example DNA composing at least 50% of a genome of an organism.
- Some embodiments further comprise identifying and/or counting a gene sequence of more than one cell, and correlating sequence information from the various cells. Such embodiments find application in medical genetics. Other embodiments compare DNA sequences of normal cells to those of non-normal cells to detect genetic variants. Identification of such variants finds use in diagnostic and/or prognostic applications.
- enumeration may determine changes in gene number, indicating, for example that a gene appears three times instead of two times (as in a trisomy) or a gene fails to appear (such as a homozygous deletion).
- Other types of allelic loss and changes change in diploidy may also be determined, including changes related to, for example, a somatic recombination, a translocation, and/or a rearrangement, as well as a sporadic mutation.
- a homozygous deletion may indicate certain forms of cancer. It will be appreciated by those of skill in the art that other diseases, disorders, and/or conditions may also be identified based on recognized changes in diploidy. For example, three copies of chromosome 21 genes can indicate trisomy 21, associated with Down syndrome.
- Methods of the present invention allow rapid analysis of DNA sequences at the single molecule level, lending themselves to applications relying on detailed analysis of individual sequences. Additional aspects of the present invention include such applications.
- certain embodiments provide for SNP detection, by identifying incorporation of a single nucleotide into a complementary strand of a target polynucleotide sequence at the site of a known SNP. Any of the variations, embodiments, and/or aspects of the present invention may be used for such SNP detection. Such methods can also be used to identify other variants due to point mutations, including a substitution, frameshift mutation, an insertion, a deletion, and inversion, a missense mutation, a nonsense mutation, a promoter mutation, a splice site mutation, a sporadic mutation and the like.
- the invention also features methods of diagnosing a metabolic condition, a pathological condition, a cancer and other disease, disorder or condition (including a response to a drug) by identifying such genetic variants.
- a known wild type versus a known variant can be distinguished using the methods described herein. Whether a target polynucleotide exhibits the wild type or variant sequence can readily be determined by the methods of the present invention.
- the long sequence information originating from single templates can provide haplotyping information that is otherwise difficult to obtain.
- the haplotyping information linking two or more loci can be used in genetic analysis.
- Certain embodiments provide for detection of additional genetic variants, by identifying incorporation of more than one nucleotide into a complementary strand of a target polynucleotide sequences, either at substantially known regions of variation or at substantially unknown regions. Any of the variations, embodiments, and aspects of the present invention may be used for such detection.
- Comparison of sequences from more than one individual allows identification of genetic variants, including substitutions, frameshift mutations, insertions, deletions, inversions, missense mutations, nonsense mutations, promoter mutations, splice site mutations, sporadic mutations, a duplication, variable number tandem repeats, short tandem repeat polymorphisms, and the like.
- the sequencing method provided herein use single molecule counting for accurate analysis of allele frequencies and/or haplotype frequencies. Since more than a single site on each molecule can be probed, haplotype information can be easily determined.
- the present methods and systems disclosed herein can be used to obtain haplotype frequencies. Such methods can be applicable to association studies, where genotype frequencies (such as SNP frequencies) are correlated with diseases in a population. The expense of single SNP typing reactions can be prohibitive when each study requires the performance of millions of individual reactions; the present invention permits millions of individual reactions to be performed and analyzed on a single array surface.
- the sequencing methods provided herein are used for identifying high value polymorphisms located in regulatory elements and coding regions for a number of drug metabolizing enzyme and transporter (DMET) genes.
- information on the expression of DMET genes provides information on the absorption, distribution, metabolism, and excretion profiles of a drug.
- the methods of the present invention provide for information collected on the complex transcriptional responses to various drugs and subsequent prediction of physiological effects is important for the development of effective therapeutics.
- the sequencing methods provided herein are used to draw links between gene expression profiles and physiological effects. Physiological effects can include a subjects' likely response to a drug candidate.
- a wide variety of diseases can be detected by the process of the present invention.
- the sequencing methods provided herein are used for detecting infectious diseases. Infectious diseases can be caused by a pathogen, such as a bacterial, viral, parasitic, or fungal infectious agent. In one embodiment, resistance of various infectious agents to drugs is determined using the methods of the present invention.
- the sequencing methods provided herein are used to sequence pathogens/microbial. In one embodiment, the sequencing methods provided herein are used to identify species/strains. In one embodiment, the sequencing methods provided herein are used to sequence pathogens/microbial and to identify species/strains.
- the sequencing method provided herein can be used for detecting one or more microbes.
- Detection of a microbe can be by sequencing PCR products from a microbe, such as a virus or bacteria.
- a viral or bacterial PCR product can be hybridized with 5′-3′ chips (direct sequencing) or 3′-5′ chips (requires additional sequencing primer).
- approximately 20-50 bases or longer sequencing is used, to detect a microbe.
- about 10-20 chips, wherein a chip density of 10 k can produce approximately 200 k to 500 k base sequence, is used.
- the invention also provides methods of diagnosing a metabolic condition, a pathological condition, a cancer, and/or other disease, disorder or condition (including a response to a drug) by identifying such genetic variants.
- detection is carried out by prenatal or post-natal screening for chromosomal and genetic aberrations or for genetic diseases.
- an identified sequence variant indicates a disease or carrier status for a genetic condition.
- detectable genetic diseases include, but are not limited to, 21 hydroxylase deficiency, adenomatous polyposis coli, adult polycystic kidney disease, ⁇ 1-antitrypsin deficiency, cystic fibrosis, familial hypercholesterolemia, Fragile X Syndrome, hemochromatosis, hemophilia A, hereditary nonpolyposis colorectal cancer, Marfan syndrome, myotonic dystrophy, neurofibromatosis type 1, osteogenesis imperfecta, retinoblastoma, Turner Syndrome, Duchenne Muscular Dystrophy, Down Syndrome or other trisomies, heart disease, single gene diseases, HLA typing, phenylketonuria, sickle cell anemia, Tay-Sachs Disease, thalassemia, Klinefelter Syndrome.
- Huntington Disease autoimmune diseases, lipidosis, obesity defects, hemophilia, inborn errors of metabolism, diabetes, as well as cleft lip, club foot, congenital heart defects, neural tube defects, pyloric stenosis, alcoholism, Alzheimer disease, bipolar affective disorder, cancer, diabetes type I, diabetes type II, heart disease, stroke, and schizophrenia.
- sequence information from a cancer cell is correlated with information from a non-cancer cell or with another cancer cell in a different stage of cancer.
- sequence information may be obtained, for example, for at least about 10 cells, for at least about 20 cells, for at least about 50 cells, for at least about 70 cells, and for at least about 100 cells.
- Cells in different stages of cancer for example, include a colon polyp cell vs. a colon cancer cell vs. a colon metastasizing cell from a given patient at various times over the disease course.
- Cancer cells of other types of cancer may also be used, including, for example a bone cancer, a brain tumor, a breast cancer, an endocrine system cancer, a gastrointestinal cancer, a gynecological cancer, a head and neck cancer, a leukemia, a lung cancer, a lymphoma, a metastases, a myeloma, a pediatric cancer, a penile cancer, a prostate cancer, a sarcoma, a skin cancer, a testicular cancer, a thyroid cancer, and a urinary tract cancer.
- detection of a cancer involves detection of one or more cancer markers.
- cancer markers include, but are not limited to, oncogenes, tumor suppressor genes, or genes involved in DNA amplification, replication, recombination, or repair. Specific examples include, but are not limited to, BRCA1 gene, p53 gene, APC gene, Her2/Neu amplification, Bcr/Abl, K-ras gene, and human papillomavirus Types 16 and 18.
- the sequencing methods provided herein can be used to identify amplifications, large deletions as well as point mutations and small deletions/insertions or other mutations of genes in the following human cancers: leukemia, colon cancer, breast cancer, lung cancer, prostate cancer, brain tumors, central nervous system tumors, bladder tumors, melanomas, liver cancer, osteosarcoma and other bone cancers, testicular and ovarian carcinomas, head and neck tumors, and cervical neoplasms.
- the genomic DNA from subject can be prepared as a sequencing template and can be allowed to bind a capture probe fixed to a substrate.
- a capture probe fixed to a substrate.
- the arrays, or chips are then subjected to incremental base extension.
- the capture probes can serve as a primer and specifically bind to a region of the sequencing template near a location that can be use for detecting a relevant distinction indicating a disease.
- the capture probes can bind in close proximity to the expected translocation site.
- Incremental extensions of the bases can reveal whether or not the sequencing template contains DNA from only one gene in the region of interest or that from a translocated gene region. After reading the results from step-wise hybridization events across the multiple chips, and processing the raw data, once can then determine if a subject's DNA has a Bcr/Abl translocation, and therefore detect the presence of a genetic sequence indicative of cancer.
- the sequencing methods of the present invention are used for environmental monitoring.
- Environmental monitoring includes but is not limited to detection, identification, and monitoring of pathogenic and indigenous microorganisms in natural and engineered ecosystems and microcosms such as in municipal waste water purification systems and water reservoirs or in polluted areas undergoing bioremediation.
- the methods of the present invention are used to detect plasmids containing genes that can metabolize xenobiotics, to monitor specific target microorganisms in population dynamic studies, or either to detect, identify, or monitor genetically modified microorganisms in the environment and in industrial plants.
- the sequencing methods provided herein are used in a variety of forensic areas.
- forensic areas include, but are not limited to, human identification for military personnel and criminal investigation, paternity testing and family relation analysis, HLA compatibility typing, and screening blood, sperm, and transplantation organs for contamination.
- the sequencing methods provided herein are used for identification and characterization of production organisms.
- production organisms include, but are not limited to, yeast for production of beer, wine, cheese, yogurt, and bread.
- the methods of the present invention are used for quality control and certification of products and processes (e.g., livestock, pasteurization, and meat processing) for contaminants.
- the sequencing methods provided herein are used for characterization of plants, bulbs, and seeds for breeding purposes, identification of the presence of plant-specific pathogens, and detection and identification of veterinary infections.
- the target polynucleotide is RNA, and/or cDNA copies corresponding to RNA.
- the RNA includes one or more types of RNA, including, for example, mRNA, tRNA, rRNA, and snRNA.
- the RNA comprises RNA transcripts.
- Some embodiments use a primer that hybridizes to the target polynucleotide whose complementary strand is to be synthesized.
- the primer used comprises a polyT region and optionally, a region of degenerate nucleotides. This facilitates identification and/or counting of random mRNA sequences in eukaryotic cells, as the polyT can hybridize to the polyA region of the mRNA and the degenerate nucleotides can hybridize to corresponding random sequences. Incorporation of degenerate nucleotides into seed primers also avoids sequencing the polyA tail itself while taking advantage of a universal seed primer for primer extension.
- the RNA comprises RNA molecules from a cell, from an organelle, and/or from a microorganism.
- the number of RNA molecules may be about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1,000, about 2,000, about 3,000, about 4,000, about 5000, about 6,000, about 7,000, about 8,000, about 9,000, about 10,000, up to an including all of the RNA molecules in the cell, organelle, and/or microorganism.
- Some embodiments comprise identifying/sequencing and/or counting RNA molecules from more than one cell, organelle, and/or microorganism.
- a histogram of the copy numbers of various types of RNA molecules identified can be constructed for different cells, organelles and/or microorganisms, and used to compile transcriptional patterns of RNA complements for each analyzed cell.
- the different cells, organelles, and/or microorganisms may be in different states, e.g. a diseased cell vs. a normal cell; or at different stages of development, e.g. a totipotent cell vs. a pluripotent cell vs. a differentiated cell; or subjected to different stimuli, e.g. a bacterial cell vs. a bacterial cell exposed to an antibiotic.
- the methods can detect any statistically significant difference in copy numbers between cells, organelles, and/or microorganisms.
- the invention also features an approach to annotating genomes based on counting and identifying RNA transcripts.
- the identified transcripts indicate, for example, how sequenced genes are actually transcribed and/or expressed.
- the prediction can be confirmed, modified, or refuted, providing a means to annotate genomes.
- Still another feature of the present invention involves methods of determining phylogenic relationships of various species. Such embodiments provide for compiling transcriptional patterns of cells from different species and analyzing the relationships amongst homologous transcripts. Such information finds use in determining evolutionary relationships amongst species.
- Another feature of the present invention involves a method of determining a microorganism's response to various stimuli, for example, response when exposed to a drug or subjected to other treatment, such as being deprived of certain metabolites.
- transcriptional patterns of a cell of the microorganism for example a bacteria cell, can be compared before and after administration of the drug or other treatment.
- a sequencing template was immobilized on streptavidin coated beads via its 5′ biotin and was hybridized with a sequencing primer by incubating at 70° C. for 3 min., 55° C. for 15 min and 25° C. for 5 min.
- 8 U Klenow exo( ⁇ ) 8 U Klenow exo( ⁇ )
- 65 mU of apyrase 10 mU of inorganic pyrophosphatase
- 5 ⁇ g of single strand binding protein (SSB) were added.
- the extension reactions were carried out at room temperature.
- successive sets of nucleotides, each of 6.7 ⁇ M final concentration were added to the reaction buffer with mixing.
- Three dark bases were added at each step as depicted in FIG.
- the results of the extension products are depicted in FIG. 10 .
- the largest band is the expected extension product.
- the primary product of the extension was as expected in length. Few smaller bands were detected, which may be products of incomplete incorporation and represented a small portion of the reaction products.
- the Step 9 extension product of 85 base pairs (bp), which corresponds to the extension of 63 bp to the 22 bp primer, the Step 10 extension product of 98 bp, which corresponds to the extension of 76 bp to the 22 bp primer, and the Step 12 extension product of 124 bp, which corresponds to the extension of 102 bp to the 22 bp primer, are depicted in FIG. 11 .
- a PCR product was used as a template in this Example.
- the PCR template was immobilized on streptavidin coated beads via its 5′ biotin and was hybridized with a sequencing primer by incubating at 70° C. for 3 min., 55° C. for 15 min and 25° C. for 5 min.
- 8 U Klenow exo( ⁇ ) 8 U Klenow exo( ⁇ ), 65 mU of apyrase, 10 mU of inorganic pyrophosphatase, and 5 ⁇ g of single strand binding protein (SSB) were added.
- the extension reactions were carried out at room temperature. At one minute intervals, successive sets of nucleotides, each at 6.7 ⁇ M final concentration, were added to the reaction buffer with mixing. Three dark bases were added at each step as depicted in FIG. 8 .
- the results of the extension products are depicted in FIG. 11 .
- the largest band is the extension product.
- the primary product of the extension was as expected in length. Few smaller bands were detected, which may be products of incomplete incorporation and represented small portion of the reaction products.
- the Step 9 extension product of 85 base pairs (bp), which corresponds to the extension by 63 bp of the 22 bp primer, the Step 10 extension product of 98 bp, which corresponds to the extension by 76 bp of the 22 bp primer, and the Step 12 extension product of 124 bp, which corresponds to the extension by 102 bp of the 22 bp primer, are depicted in FIG. 11 .
- the flow cell was then loaded to an Illumina HiScanSQ sequencer to sequence 25 bases (second sequencing). After the second sequencing, the flow cell lanes were striped again with 0.1 N NaOH and the striped nucleic acids were analyzed using a denaturing gel.
- Lane 1 generated about 278 million base reads with about 11 million clusters passing filter.
- Lane 3 generated about 653 million base reads with about 25.6 million clusters passing filter.
- FIG. 12 shows the percent base calls per sequencing step for Lane 1. As expected, 100% of the first base was called “T” as the last step of the dark base extension was a “missing T” step, as it is expected that the first base addition in the sequencer after the first base should be “T”.
- FIG. 13 shows the percent base calls per sequencing step for Lane 3. Also as expected, 100% of the first base called was “C.”
- sequences from the seconding sequencing were matched with the sequences from the first sequencing as the templates were the same. Because there were alignment changes between the first and second sequencings (flow cell was removed from the sequencer for dark base extension), a search algorithm was used to match the sequences with a range of 150 units of x, y coordinates from the Illumina qseq files.
- One million passed filter sequences from lane one, second sequencing (25 bases long) were checked and 71.3% of the sequences matched part of the sequences from seconding sequencing (100 bases long).
- one million passed filter sequences from lane three, second sequencing (25 base long) were checked and 76.56% of the sequences matched part of the sequences from second sequencing (100 bases long).
- FIG. 14 shows that the distribution of dark base extensions in Lane 1 (10 steps) and Lane 3 (4 steps). These distributions agree with the expected distribution. Both the high exact sequence match and the correct distribution indicate that the sequence after dark extension worked reasonably well.
- the cBot cluster generation system was reprogrammed to utilize a custom edited protocol to deliver nucleotide combinations at specified time intervals, as well as other reagents. After all lanes were stripped with 0.1N NaOH (120 ⁇ l) to remove sequencing extension products, an Illumina sequencing primer (SP2, 95 ⁇ L) was introduced into all lanes to hybridize to clusters of ssDNA template on the surface of the flow cell. Hybridization was performed for 15 min at 60° C., followed by slow cooling to 20° C. at a rate of 3° C./min.
- Controlled extension was accomplished by repeated introduction of unlabeled native nucleotide triplets (85 ⁇ L for 1 minute), followed by apyrase containing washing solution (120 ⁇ L for 2 minutes). Finally, a wash solution of NEB2 (120 ⁇ L, 1 ⁇ ) was pumped through the flow cell before proceeding to the following dark base extension step.
- Lane 4 (10 steps), nucleotide combinations were: —missing A, C, G, T, A, C, G, T, A, C; Lane 5—(16 steps)—missing A, C, G, T, A, C, G, T, A, C, A, C, G, T, A, C; Lane 6—(20 steps)—missing A, C, G, T, A, C, G, T, A, C, A, C
- the flow cell was loaded to an Illumina HiScanSQ sequencer to sequence 75 bases (second sequencing).
- Lane 4 generated about 1,927 million base reads with about 25.7 million clusters passing filter. Lane 5 generated about 1,324 million base reads with about 17.6 million clusters passing filter. Lane 6 generated about 884 million base reads with about 11.8 million clusters passing filter.
- sequences from the second sequencing were matched with the sequences from the second read of the first sequencing. Because the second sequencing was extended longer than the second read of the first sequencing, the sequences from the second sequencing may or may not overlap with the sequences from the second read of the first sequencing from the same cluster.
- the sequences from both sequencing runs were mapped to the human genome and a search algorithm was used to compare the mapping position on human chromosomes to determine if two sequences were from the same cluster based on their mapping positions. Because there were cluster alignment changes between the first and second sequencings (flow cell was removed from the sequencer for dark base extension), the search algorithm considered to match the sequences with a range of 600 units of x, y coordinates from the Illumina qseq files.
- FIG. 15 shows that the distribution of dark base extensions in Lane 4 (10 steps), Lane 5 (16 steps) and Lane 6 (20 steps). These distributions agree with the expected distribution. Both the high sequence mapping position match and the correct distribution indicate that the sequencing after dark extension worked reasonably well.
- NGS Next-generation sequencing
- SBS DNA sequencing-by-synthesis
- Solid a ligase enzyme
- SBS DNA sequencing-by-synthesis
- the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that bases have been read sequentially, through iterative cycles of polymerase-mediated fluorescent-labeled nucleotide extensions or through successive fluorescent-labeled oligonucleotide ligation. Since fluorescently-labeled nucleotides are not native substrates of the polymerase, it is difficult for the reaction to achieve 100% completion.
- +STM technology an implementation of some embodiments described above, overcomes this hurdle by resetting the sequencing chemistry using length-controlled extension. Consequently, regions of DNA template farther away from the sequencing primer could be reached via +S, effectively increasing the read length without the signal loss and quality reduction inherent in current NGS platforms.
- This example demonstrates that +STM technology that employs controlled extension in addition to sequencing greatly improves sequencing quality for long reads.
- Human DNA samples and E. coli (strain ATCC 11303) DNA sample were sheared using a Covaris protocol (Covaris, Inc., Woburn, Mass., USA) to desired length distribution. Resulting fragmented Human DNA samples were processed according to Agilent SureSelectTM Exome Protocols to prepare human exome libraries for sequencing. The resulting fragmented E. coli DNA was further separated using 2% Agarose gel and a band ranging 600 to 700 bp was excised. After DNA extraction, the sample was processed according Illumina TruSeq DNA Sample Preparation Guide to generate libraries for sequencing.
- Human Exome and E. coli libraries were quantified by qPCR, diluted to proper concentration and denatured with 0.1 N NaOH according to Illumina TruSeq cBot procedure. Denatured human libraries and the 1% E. coli Library were loaded into the cBot along with TruSeq PE Cluster v3 plate and a v3 Flow Cell. After completion of the cluster generation, the flow cell was loaded into HiScanSQ sequencer along with TruSeq SBS Kit v3 and multiplexing reagents. The sequencing run was executed using 2 ⁇ 100 TruSeq 3 Paired-End protocol and fully completed before any +S related steps were performed.
- lane 2 and lane 3 of the flow cell were treated with 0.1 N NaOH (200 ⁇ L) to remove the synthesized strands which are not attached to the flow cell (i.e. the second 100 bp read).
- 0.1 N NaOH 200 ⁇ L
- a sequencing primer mix was prepared by adding Illumina multiplex read2 sequencing primer (PN 1005721) to a final concentration of 0.5 ⁇ M in hybridization mix (5 ⁇ SSC, 0.05% Tween-20). Lanes 2 and 3 were hybridized with the sequence primer mix according to standard Illumina cBot protocol. At this point lane 2 was also protected until further sequencing.
- Lane 3 underwent the +S Extension method. In total, twenty four cycles of three base +S Extensions were performed on lane 3 at 37° C. Three nucleotides (a triplet format) were added at each addition step together (forming a cycle). For clarity, we named the addition of tri-nucleotides as “minus the fourth nucleotide mix”. Therefore, ⁇ A mix consists of (dC, dG, dT); ⁇ C mix contains (dA, dG, dT); ⁇ G mix contains (dA, dC, dT); and finally. ⁇ T is the addition of (dA, dC, dG).
- the sequence of cycles of tri-nucleotides was “ ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T, ⁇ A, ⁇ C, ⁇ G, ⁇ T”, for a total of 24 cycles.
- each nucleotide washing solution was prepared with 1 ⁇ Thermopol, 4 mM DTT and 1 mU/ul apyrase (NEB)).
- lane 3 Prior to +S Extension, lane 3 was filled with 85 ⁇ l of the +S extension mix without nucleotides and then incubated for 30 seconds.
- the +S extension cycle was performed by pumping +S extension mix with nucleotides (35 ⁇ l), followed by 3 ⁇ l of air at a rate of 60 ⁇ l/min. Consequently, wash mix (120 ⁇ l) was pumped and incubated for 1 minute, followed by 1 ⁇ Thermopol wash (120 ⁇ l). This order of reagent pumping was repeated for 24 cycles with the designated nucleotide triplet combination in each cycle (i.e. ⁇ A, ⁇ C, etc.). Finally, after +S Extension, lane 3 was loaded with holding buffer and protected until further sequencing.
- this new single read 1 ⁇ 100 run is re-sequencing the 2nd read of the pair-end protocol that was completed earlier, where lane 1 is reading base positions 102-201 as a continuation of the previous run, lane 2 is re-reading bases 2-101 since it starts with only the sequencing primer, while lane 3 starts at a range of positions due to +S Extension. More precisely, the 24 cycles of +S Extension in lane 3 resulted in sequencing primers being extended by an average of 96 bp.
- E. coli sequencing reads were aligned to the assembled E. coli genome (strain ATCC 11303) using sequence alignment tool BWA.
- the genome of E. coli strain ATCC 11303 was assembled using sequencing reads of the same strain from a standard Illumina sequencing run. Only uniquely aligned reads were used in the quality calculation. In one quality calculation, all bases of each uniquely aligned read were counted regardless of the quality value. For an individual read, bases at each position were recorded as correct or wrong based on the comparison to the reference E. coli genome, then the Phred-style quality score Q at each base position was calculated as the negative logarithm of error rate E at the base position:
- Sequencing quality was also measured using Genome Analysis Tool Kit (GATK, www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit).
- GATK Genome Analysis Tool Kit
- All sequence reads were aligned to the assembled E. coli genome (strain ATCC 11303) using sequence alignment tool BWA.
- the CountCovariates module of GATK was then used to calculate the quality. In this calculation, continuous low quality bases (bases with raw Illumina quality score of 2) at the end of each read were dropped before the average quality was calculated.
- Q-Scores for bases 1 to 100 were taken from the sequencing reads using standard Illumina protocol for lane 1 (S1) and lane 3 (S3), (i.e. the 2nd read of 2 ⁇ 100 pair-end protocol).
- S1 lane 1
- S3 lane 3
- bases 101 to 200 Q-Scores were obtained from the continuation sequencing run using standard Illumina protocol (1 ⁇ 100) without +S extension.
- 24 steps of +S Extension were introduced before reads were sequenced using standard Illumina sequencing protocol (1 ⁇ 100), which provided the Q-Scores for lane 3.
- This example demonstrates +S technology's ability to increase read length while maintaining read quality using Illumina's HiScanSQ sequencer.
- the standard sequencing primer is extended on average about 100 bp before running the 1 ⁇ 100 Illumina Sequencing (see Methods and Materials).
- the +S Extension in lane 3 is similar in length to the lane 1 condition, which contains the 100 bp read of the original Illumina's SBS.
- the single read 1 ⁇ 100 Illumina Sequencing is reading positions 101-200 in both lanes 1 and 3, with the difference that lane 1 is continuation of earlier Illumina sequencing, while lane 3 contains freshly made +S Extension of average length of 100 bp In this way, the two lanes could be compared side-by-side to evaluate the effectiveness of +S Extension in increasing read length while maintaining read quality.
- Lane 2 is the control lane for sequencing primer hybridization, cluster retention and flow-cell performance.
- FIG. 16A compares the cluster density of different lanes after +S Extension on lane 3.
- Lane 1 is protected throughout +S process.
- Lane 2 was treated with NaOH and subsequently re-hybridized with sequencing primer together with Lane 3.
- Neither lanes 1 nor 2 were extended with +S.
- the similar cluster density in lanes 2 and 3 indicate good cluster retention after +S.
- Lane 1 continuous sequencing 101-200 bases
- Lane 3 (+S) has a higher density than Lane 1 (standard Illumina sequencing).
- FIG. 16B shows % cluster pass filter rate. After restarting the sequencer, only 10% of clusters passed filter on lane 1. In contrast, 70% of clusters passed filter on lane 3.
- FIG. 16C shows the number of pass filter reads for different lanes.
- Lane 3 (+S) has a much higher pass filter rate than lane 1 and is only slightly lower than lane 2, which was sequencing the bases from 1 to 100 vs. lane 3's sequencing which sequenced on average positions 101 to 200.
- the predicted quality scores of different lanes show similar pattern, where +S sequencing dramatically improved the number of Q30 or above reads vs. lane 1.
- FIGS. 17A and 17B show the empirical (actual Q-Score distribution over read length) Q-Score calculated using GATK.
- FIG. 17A shows the 100 bp standard Illumina sequencing run.
- FIG. 17B shows the additional 10 bp Illumina sequencing run, which was after the 100 bp sequencing run shown in FIG. 17A and an extra 1 bp sequencing run.
- x-axis position 1 to 100 in FIG. 17A was the actual base position 1 to 100 on each DNA fragment sequenced;
- x-axis position 1 to 100 in FIG. 17B was actual base position 102 to 201 on each DNA fragment sequenced.
- 17A was the actual base position on each DNA fragment sequenced; the actual base position on each DNA fragment for x-axis position 1 to 100 in FIG. 17B would depend on the actual +S extension size of each individual DNA fragment. Based on the +S extension size distribution, the average extension size on lane 3 is 97 bases. Therefore, the average of actual base position on DNA fragment for x-axis position 1 to 100 in FIG. 17B is 98 (97 plus 1 from additional 1 bp sequencing run) to 197. Because very few bases were available for lane 1 after x-axis position 94 in FIG. 17B , the empirical quality score was not calculated for lane 1 after x-axis position 94 in FIG. 17B .
- FIGS. 17A and 17B Because the low quality bases at the end of reads were dropped in GATK empirical quality ( FIGS. 17A and 17B ) calculation, the number of correct bases was calculated to show changes of overall correct bases as the read length increases ( FIGS. 17C and 17D ).
- the x-axis in FIG. 17C is the same to that in FIG. 17A and the x-axis in FIG. 17D is the same to that in FIG. 17B .
- Each read was aligned to the assembled reference E. coli genome (strain ATCC 11303). A base on a read was called correct if it was the same to the aligned base on the reference genome.
- strain ATCC 11303 strain ATCC 11303
- the number of correct base at each x-axis position was calculated as the number of reads that have correct bases at the position for the lane.
- the reads from lane 3 in the additional sequencing after +S extension had much higher number of correct bases.
- a “fusion” PCR construct of 176 bp insert size were designed according to Ion Torrent's guidelines (Ion Amplicon Library Preparation (Fusion Method) p/n 4468326 Rev. B).
- the basic sequence of the PCR construct was from the plasmid pBR322.
- Herculase II DNA Polymerase Agilent #600675
- the amplicons were extracted with Qiagen's Gel Extraction Kit (Qiagen #28704).
- Input DNA was amplified onto Ion SphereTM Particles (ISPs) using Ion Torrent's Ion Xpress Template 200 kit (Life p/n Life #4471253).
- Enriched ISPs were hybridized with sequencing primer and DNA polymerase was bound according to protocol (Ion Torrent protocol 4469714 Rev. B). (Polymerase and primer from Ion's Sequencing Kit Life #4468995).
- the Ion Torrent Personal Genome Machine was initialized with reagents from the sequencing kit. After initialization, the primed and polymerase-bound ISPs were loaded into a 314R chip with reagents from the Ion Sequencing 200 kit (Life #4471258) according to the 200 protocol (Life p/n 4471999 Rev. B). ISPs loaded into the chip were sequenced on the PGM with 320 nucleotide flows in Ion Torrent's SAMBA flow order.
- the chip was stored in a fridge in Annealing Buffer with PVP from Ion Torrent's Paired-End Sequencing Demonstrated Protocol (p/n MAN0006191; 900 ul of Annealing Buffer from sequencing kit was combined with 48 ⁇ l of 8% PVP-10).
- the extended sequencing primer was stripped with 0.1N NaOH and ISP-bound templates were hybridized with sequencing primer mixture (5 ⁇ l Sequencing Primer in 25 ⁇ l Annealing Buffer) at 65° C. for 5 min followed by room temperature for 15 minutes.
- the Personal Genome Machine was again washed and initialized and polymerase was bound onto the ISPs in the chip according to the Paired-End Demonstrated Protocol (1.5 ⁇ l of Polymerase from the Sequencing Kit was added to 6 ⁇ l of Annealing Buffer with PVP; the mixture was injected into the chip and incubated for 5 minutes).
- each nucleotide was replaced by 20 ⁇ l of each of the other three nucleotides provided.
- 20 ⁇ l of dATP was replaced with 20 ⁇ l of dCTP, 20 ⁇ l of dGTP, 20 ⁇ l of dTTP and the mixture was inserted into the dATP position on the PGM. This was repeated for each nucleotide position on the Personal Genome Machine.
- ISPs loaded into the chip were extended on the PGM with 16 nucleotide-triplet flows in Ion Torrent's SAMBA flow order.
- the chip was stored in a fridge in Annealing Buffer with PVP from Ion Torrent's Paired-End Sequencing Demonstrated Protocol. After the PGM was washed and re-initialized according to the v2.0 protocol, the chip was washed 2 ⁇ with 50 ⁇ l of Enzyme Denaturation Solution (from PE Demonstrated Protocol: 1 ⁇ TE, 50 mM NaCl, 2% SDS), reloaded onto the machine, and incubated with polymerase (see above). The extended chip was sequenced with 320 flows in the SAMBA flow order. Sequence calls were made on a Torrent Server using Torrent Suite v 2.0.1 (Ion Torrent/Life Technologies, Inc.).
- BAM files are automatically generated by Torrent Suite and visualized with IGV (www.broadinstitute.org/iv/).
- IGV www.broadinstitute.org/iv/.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Chemical Kinetics & Catalysis (AREA)
Abstract
The disclosure provides methods and systems for sequencing long nucleic acid fragments. In one aspect, methods, systems and reagent kits are provided for sequencing nucleic acid target sequences. Some embodiments of the methods, systems and reagent kits are particularly suitable for sequencing a large number of fragments, particularly long fragments.
Description
- This application is a continuation of U.S. patent application Ser. No. 14/009,089, filed Jul. 3, 2014, which is a US National Stage Entry of PCT/US12/00185, filed Apr. 2, 2012, which is a continuation of U.S. application Ser. No. 13/153,218, filed Jun. 3, 2011, now abandoned, which claims the benefit of U.S. Provisional Application Nos. 61/470,497, filed Apr. 1, 2011; 61/477,173, filed Apr. 20, 2011; and 61/489,662, filed May 24, 2011; each of which is incorporated by reference in its entirety.
- The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Aug. 21, 2020, is named 38558-705.303_SL.txt and is 4.096 bytes in size.
- Nucleic acid sequencing is important for biological research, clinical diagnostics, personalized medicine and pharmaceutical development and many other fields. Cost effective, accurate and fast sequencing is needed for many applications, such as, but not limited to for microbial or pathogen detection and identification, and genetic identification for subjects. For example, applications can include, but not be limited to paternity testing and in forensic science (Reynolds et al., Anal. Chem., 63:2-15 (1991)), for organ-transplant donor-recipient matching (Buyse et al., Tissue Antigens, 41:1-14 (1993) and Gyllensten et al., PCR Meth. Appl, 1:91-98 (1991)), for genetic disease diagnosis, prognosis, and prenatal counseling (Chamberlain et al., Nucleic Acids Res., 16:11141-11156 (1988) and L. C. Tsui, Human Mutat., 1:197-203 (1992)), and the study of drug metabolism and oncogenic mutations (Hollstein et al., Science, 253:49-53 (1991)). In addition, the cost-effectiveness of nucleic acid analysis, such as for infectious disease diagnosis, varies directly with the multiplex scale in panel testing. Many of these applications depend on the discrimination of single-base differences at a multiplicity of sometimes closely spaced loci.
- A variety of DNA hybridization techniques are available for detecting the presence of one or more selected polynucleotide sequences in a sample containing a large number of sequence regions. In a simple method, which relies on fragment capture and labeling, a fragment containing a selected sequence is captured by hybridization to an immobilized probe. The captured fragment can be labeled by hybridization to a second probe which contains a detectable reporter moiety.
- Another widely used method is Southern blotting. In this method, a mixture of DNA fragments in a sample is fractionated by gel electrophoresis, and then fixed on a nitrocellulose filter. By reacting the filter with one or more labeled probes under hybridization conditions, the presence of bands containing the probe sequences can be identified. The method is especially useful for identifying fragments in a restriction-enzyme DNA digest which contains a given probe sequence and for analyzing restriction-fragment length polymorphisms (“RFLPs”).
- Another approach to detecting the presence of a given sequence or sequences in a polynucleotide sample involves selective amplification of the sequence(s) by polymerase chain reaction, U.S. Pat. No. 4,683,202 and R. K. Saiki, et al., Science 230:1350 (1985). In this method, primers complementary to opposite end portions of the selected sequence(s) are used to promote, in conjunction with thermal cycling, successive rounds of primer-initiated replication. The amplified sequence(s) may be readily identified by a variety of techniques. This approach is particularly useful for detecting the presence of low-copy sequences in a polynucleotide-containing sample, e.g., for detecting pathogen sequences in a body-fluid sample.
- More recently, methods of identifying known target sequences by probe ligation methods have been reported, U.S. Pat. No. 4,883,750, D. Y. Wu, et al. Genomics 4:560 (1989). U. Landegren, et al., Science 241:1077 (1988), and E. Winn-Deen, et al., Clin. Chem. 37:1522 (1991). In one approach, known as oligonucleotide ligation assay (“OLA”), two probes or probe elements which span a target region of interest are hybridized to the target region. Where the probe elements basepair with adjacent target bases, the confronting ends of the probe elements can be joined by ligation, e.g., by treatment with ligase. The ligated probe element is then assayed, evidencing the presence of the target sequence.
- In a modification of this approach, the ligated probe elements act as a template for a pair of complementary probe elements. With continued cycles of denaturation, hybridization, and ligation in the presence of pairs of probe elements, the target sequence is amplified linearly, allowing very small amounts of target sequence to be detected and/or amplified. This approach is referred to as ligase detection reaction. When two complementary pairs of probe elements are utilized, the process is referred to as the ligase chain reaction which achieves exponential amplification of target sequences. F. Barany, Proc. Nat'l Acad. Sci. USA, 88:189-93 (1991) and F. Barany. PCR Methods and Applications, 1:5-16 (1991).
- Another scheme for multiplex detection of nucleic acid sequence differences is disclosed in U.S. Pat. No. 5,470,705 where sequence-specific probes, having a detectable label and a distinctive ratio of charge/translational frictional drag, can be hybridized to a target and ligated together. This technique was used in Grossman, et al., Nucl. Acids Res. 22(21):4527-34 (1994) for the large scale multiplex analysis of the cystic fibrosis transmembrane regulator gene. Jou, et al., Human Mutation 5:86-93 (1995) relates to the use of a so called “gap ligase chain reaction” process to amplify simultaneously selected regions of multiple exons with the amplified products being read on an immunochromatographic strip having antibodies specific to the different haptens on the probes for each exon.
- Ligation of allele-specific probes generally has used solid-phase capture (U. Landegren et al. Science, 241:1077-1080 (1988); Nickerson et al., Proc. Natl. Acad. Sci. USA, 87:8923-8927 (1990)) or size-dependent separation (D. Y. Wu, et al., Genomics, 4:560-569 (1989) and F. Barany, Proc. Natl. Acad. Sci, 88:189-193 (1991)) to resolve the allelic signals, the latter method being limited in multiplex scale by the narrow size range of ligation probes. Further, in a multiplex format, the ligase detection reaction alone cannot make enough product to detect and quantify small amounts of target sequences. The gap ligase chain reaction process requires an additional step—polymerase extension. The use of probes with distinctive ratios of charge/translational frictional drag for a more complex multiplex will either require longer electrophoresis times or the use of an alternate form of detection.
- Methods for efficiently and accurately sequencing long nucleic acid fragments are needed. There is a great need for rapid, high-throughput, and low cost sequencing technology, such as for point-of-care applications and field detection of pathogens. The present invention permits sequencing of large amount of genome using simple chemistry and low cost equipments that lead to significant cost reduction and increase in speed, and other related advantages as well.
- Provided herein are methods and systems for sequencing a target nucleic acid. Some embodiments of the invention are particularly suitable for sequencing a large number of target nucleic acids simultaneously.
- In one aspect of the invention, methods, kits, computer software products are provided for sequencing long nucleic acids. Nucleic acids are often sequenced using stepwise methods such as polymerase extension based sequencing or ligation sequencing where one or more bases are read for each sequencing step. These stepwise based sequencing methods are often limited by its stepwise inefficiency, e.g., incomplete incorporation, incomplete ligation and other problems that create prephasing or dephasing. The stepwise inefficiency can accumulate over read length and limits read length.
- In some embodiments, methods, kits and computer software products are provided to reset stepwise sequencing partially or completely.
- In a first aspect, the method comprises: (a) sequencing one or more bases of a target nucleic acid by extending a first sequencing primer hybridized to the target nucleic acid to generate a first primer extension product, thereby obtaining a first sequence read; (b) releasing the first primer extension product from the target nucleic acid; (c) hybridizing a second sequencing primer to the target nucleic acid, optionally at the same or neighboring regions of the same target nucleic acid; (d) generating a second primer extension product (extended primer) by extending the second sequencing primer through limited or controlled extension; and (e) sequencing one or more bases of the target nucleic acid by further extending the second primer extension product to generate a third primer extension product, thereby obtaining a second sequence read. In one embodiment, the first sequencing primer and second sequencing primer are the same. In another embodiment, the first sequencing primer and second sequencing primer are different. The controlled or limited extension can be carried out or performed by pulse extension, such as, by allowing the extending reaction to last for a short period of time, such as less than a minute or from approximately half a minute to a minute, e.g. from 1-5, 5-10, 10-30, 30 to 60 seconds. In some embodiments, the extension is controlled by depriving 1, 2, or 3 of the four nucleotides. The pulse extension can be performed by adding nucleotide degrading enzymes such as alkaline phosphatase or apyrase. In some other embodiments, the pulse extension may be controlled using reversible terminator nucleotides. For example, each or some extension steps can be performed by including one or more reversible terminator nucleotides, such as dATP, dCTP, dGTP, dTTP*, where dTTP is a reversible terminator. In reversible terminator controlled extension, a step of removing the blocking group in the terminator may be performed before the next extension step.
- In some embodiments, controlled extension can be performed by extension and wash cycles. Similar to the pulse extension, the controlled extension may be performed by limiting the availability of nucleotides or by adding reversible terminator nucleotide(s).
- The limited extension can be carried out by using a nucleic acid polymerase and one or more sets of nucleotides. The one or more sets generally each comprise no more than three different nucleotides (bases). In some embodiments, the one or more sets comprise one to four nucleotides and at least one of the nucleotides is a reversible terminator nucleotide. The extending can be with more than one set of nucleotides, such as at least 1, 2, 3, or more sets. A set of nucleotides can comprise one, two or three different nucleotides.
- In one embodiment, the method further comprises obtaining one or more additional sequence reads, such as by repeating the steps of releasing a primer extension product from the target nucleic acid; hybridizing an additional seed sequencing primer (or extension primer) (in some embodiments, the additional seed sequencing primer targeting the same or similar regions of the target nucleic acid) to the target nucleic acid; generating an additional primer extension product by extending the additional sequencing primer through controlled extension; and sequencing one or more bases of the target nucleic acid by further extending the additional primer extension product to generate an additional primer extension product, thereby obtaining an additional sequence read. The sequence of the target nucleic acid can be determined by assembling the first, second, and optional, one or more additional sequence reads. The sequencing of the target nucleic acid can be by extending the sequencing primer using a labeled reversible terminator, ligation, or any other methods known in the art for reading nucleotide sequences.
- In another embodiment, a washing step or nucleotide degradation step can be performed prior to a subsequent addition of a set of nucleotides.
- The target nucleic acid can be attached to a substrate. The substrate can be a flat surface or bead, such as a flow cell. In another embodiment, the substrate can comprise glass, silicon, metal, or plastics that have been surface treated to immobilize template strands or oligonucleotides. In another embodiment, the target nucleic acid can be attached to the substrate via a capture probe.
- The methods and systems disclosed herein can further comprise analyzing the sequencing results, such as generated by a method disclosed herein, to provide a diagnosis, prognosis, or theranosis for a subject.
- Furthermore, a method disclosed herein can be used to sequence a plurality of target nucleic acids.
- In a second aspect, the invention refers to a method for sequencing a target nucleic acid, comprising:
-
- (a) obtaining a plurality of sequence reads from a nucleic acid template using a plurality of different sequencing primers, wherein at least one said primer is generated by a template dependent extension reaction; and
- (b) generating sequence information about the target nucleic acid by combining multiple sequence reads from step (a). In some embodiments, the sequence information comprises a nucleotide sequence of length greater than 500, 1000, 1500, 2000, or 3000 bases. In some embodiments, the assembled sequence reads generate sequence information with an average quality score of at least 26, 27, 28, 29, 30 or 31. In some embodiments, the assembled sequence reads generate sequence information with a quality score of at least 26, 27, 28, 29, 30 or 31 for any nucleotide position. In some embodiments, the sequence reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the template nucleic acid. In some embodiments, sequence reads from the complement strand of the template nucleic acid are further assembled with the sequence reads.
- (c) In a third aspect, the invention relates to kits for sequencing a target nucleic acid, comprising a primer that is hybridizable to the target nucleic acid, and one or more incomplete sets of nucleotides. In some embodiments, the multiple incomplete sets of nucleotides comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 35, 40, 45, 50, or 60 incomplete sets of nucleotide bases. In some embodiments, the kit further comprises at least one DNA polymerase. In some embodiments, the DNA polymerase is a DNA-dependent DNA polymerase. In some embodiments, the DNA polymerase is an RNA-dependent DNA polymerase. In some embodiments, the DNA polymerase is Klenow exo(−). In some embodiments, the kit further comprises pyrophosphatase. In some embodiments, the kit further comprises apyrase. In some embodiments, the kit further comprises a nucleic acid denaturant. In some embodiments, the denaturant comprises, urea, formamide, or sodium hydroxide. In some embodiments, the kit further comprises a single strand binding protein. In some embodiments, an incomplete set of nucleotides comprises 1, 2, or 3 nucleotides. In some embodiments, the kit further comprises an exonuclease. In some embodiments, the exonuclease is a 5′-3′ exonuclease. In some embodiments, the exonuclease is a 3′-5′ exonuclease.
- In a third aspect, the invention relates to a method for sequencing a target nucleic acid, the method comprising generating sequence information of length n from a single template using sequencing by synthesis; wherein the sequence information maintains a quality score of at least 26, 27, 28, 29, 30 or 31; and
- wherein n is greater than 100, 150, 200, 300, 400, 500, 700, 1000, 1500, 2000, or 3000.
- In a fourth aspect, the invention relates to a system for sequencing a target nucleic acid, the system comprising;
-
- (d) a sequencer adapted for multiple sequencing by synthesis reactions; and
- (e) a primer that is hybridizable to the target nucleic acid; and
- (f) one or more incomplete sets of nucleotides. In some embodiments, the multiple incomplete sets of nucleotides comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 35, 40, 45, 50, or 60 incomplete sets of nucleotide bases. In some embodiments, the system further comprises at least one DNA polymerase. In some embodiments, the DNA polymerase is n DNA-dependent DNA polymerase. In some embodiments, the DNA polymerase is an RNA-dependent DNA polymerase. In some embodiments, the DNA polymerase is Klenow exo(−). In some embodiments, the system further comprises pyrophosphatase. In some embodiments, the system further comprises apyrase. In some embodiments, the system further comprising a nucleic acid denaturant. In some embodiments, the denaturant comprises, urea, formamide, or sodium hydroxide. In some embodiments, the system further comprises a single strand binding protein. In some embodiments, an incomplete set of nucleotides comprises 1, 2, or 3 nucleotides. In some embodiments, the system further comprises an exonuclease. In some embodiments, the exonuclease is a 5′-3′ exonuclease. In some embodiments, the exonuclease is a 3′-5′ exonuclease.
- In a fifth aspect, the invention relates to a method for sequencing a target nucleic acid comprising:
-
- (g) providing a first extension primer hybridized with said target nucleic acid;
- (h) extending said first extension primer to a defined length; and
- (i) sequencing the target nucleic acid from the extended first extension primer generating a first sequence read, thereby further extending the extended first extension primer with a sequencing product. In some embodiments, the method further comprises;
- (d) removing said extended first extension primer and sequencing product;
- (e) hybridizing a second extension primer with said target nucleic acid; and
- (f) repeating steps (b) and (c) with the second extension primer replacing the first extension primer, sequencing a second region of said target nucleic acid generating a second sequence read. In some embodiments, the method further comprises;
- (d) removing at least a part of said sequencing product;
- (e) providing a second extension primer hybridized with said target nucleic acid;
- (f) repeating steps b) and c) with the second extension primer replacing the first extension primer, sequencing a second region of said target nucleic acid generating a second sequence read, wherein said second region is different from said first region. In some embodiments, said removing comprises removing said sequencing product and said first extension primer completely from the target nucleic acid. In some embodiments, said removing comprises denaturing said sequencing product and said first extension primer from said target nucleic acid. In some embodiments, denaturing comprises contacting said sequencing product with NaOH, urea, or formamide. In some embodiments, said removing comprises enzymatic digestion of said sequencing product. In some embodiments, said removing comprises exonuclease digestion and wherein a base that is resistant to exonuclease digestion is incorporated to a position in the sequencing product during said sequencing. In some embodiments, said providing comprises:
- (i) hybridizing a sequencing primer with said target nucleic acid;
- (ii) sequencing a region of the target nucleic acid from the sequencing primer, thereby extending the sequencing primer with a sequencing product; and
- (iii) removing a part of said sequencing product. In some embodiments, said providing comprises:
- (i) hybridizing a sequencing primer with said target nucleic acid;
- (ii) sequencing a region of the target nucleic acid from the sequencing primer, thereby extending the sequencing primer with a sequencing product;
- (iii) removing said sequencing primer and its associated sequencing product; and
- (iv) hybridizing said first extension primer with said target nucleic acid. In some embodiments, said first and second extension primers are the same. In some embodiments, said first and second extension primers are different. In some embodiments, said extending comprises controlled extension comprising:
- (g) contacting said first extension primer with a set of nucleotides comprising no more than three different nucleotides and a polymerase.
- In some embodiments, said extending comprises repeating step (g), wherein before the repeating, said nucleotides are removed. In some embodiments, said set of nucleotides are different between two subsequent steps. In some embodiments, said nucleotides are removed by washing. In some embodiments, said nucleotides are removed by a nucleotide degrading enzyme. In some embodiments, said set of nucleotides further comprises a reversible terminator nucleotide, wherein before the repeating, incorporated reversible terminator nucleotides are deblocked and made ready for further extension. In some embodiments, said extension is carried out by pulse extension. In some embodiments, said pulse extension is carried out by allowing an extending reaction to last 30 to 60 seconds. In some embodiments, the sequence of said target nucleic acid is determined by assembling said first, second, and optionally additional sequence reads. In some embodiments, said target nucleic acid is attached to a substrate. In some embodiments, said substrate is a flat surface or bead. In some embodiments, said substrate is a flow cell. In some embodiments, said substrate comprises glass. In some embodiments, said target nucleic acid is attached to said substrate via a capture probe. In some embodiments, the method further comprises analyzing results of said sequencing providing a diagnosis, prognosis, or theranosis for a subject. In some embodiments, the method further comprises sequencing a plurality of target nucleic acids. In some embodiments, said assembling results in sequence information comprising a nucleotide sequence of length greater than 500, 1000, 1500, 2000, or 3000 bases. In some embodiments, the assembling results in sequence information comprising an average quality score of at least 26, 27, 28, 29, 30 or 31. In some embodiments, the assembling results in sequence information comprising a quality score of at least 26, 27, 28, 29, 30 or 31 for any nucleotide position. In some embodiments, the first and second sequence reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the template nucleic acid. In some embodiments, sequence reads from the complement strand of the template nucleic acid are further assembled with the first and second sequence reads. In some embodiments, the polymerase is Klenow exo(−). In some embodiments, the nucleotide degrading enzyme comprises pyrophosphatase or apyrase. In some embodiments, the enzymatic digestion of said sequencing product is performed by an enzyme comprising a 5′-3′ exonuclease or 3′-5′ exonuclease activity.
- In a sixth aspect, the invention relates to a for sequencing a target nucleic acid comprising:
-
- (a) performing a first sequencing of a first region of the target nucleic acid generating a first read;
- (b) performing a second sequencing of a second region of the target nucleic acid generating a second read, wherein said first and second regions are different;
- (c) combining said first and second regions to produce a combined read.
- In some embodiments, said first and second sequencings are performed using as a template a polynucleotide from the same strand of the target nucleic acid. In some embodiments, at least one sequencing of said first and second sequencings comprises:
-
- (i) extending an extension primer to a defined length; and
- (ii) sequencing using the extended primer.
- In some embodiments, said extending comprises controlled extension comprising:
-
- (1) contacting said first extension primer with a set of nucleotides comprising no more than three different nucleotides and a polymerase.
- In some embodiments, said extending comprises repeating of
step 1, wherein before the repeating, said nucleotides are removed. In some embodiments, said set of nucleotides are different between two subsequent steps. In some embodiments, said nucleotides are removed by washing. In some embodiments, said nucleotides are removed by a nucleotide degrading enzyme. In some embodiments, said set of nucleotides further comprises a reversible terminator nucleotide wherein before the repeating, incorporated reversible terminator nucleotides are deblocked and made ready for further extension. In some embodiments, - said combining is performed in silico by stitching said first and second regions into an assembled sequence for the target nucleic acid. In some embodiments, the assembled sequence comprises a gap of length n. In some embodiments, n is less than 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, or 100 nucleotides. In some embodiments, said first and second sequencings are further performed using the same polynucleotide. In some embodiments, said extending is performed using native nucleotides. In some embodiments, said extension is carried out by pulse extension. In some embodiments, said pulse extension is carried out by allowing an extending reaction to last 30 to 60 seconds. In some embodiments, said target nucleic acid is attached to a substrate. In some embodiments, said substrate is a flat surface or bead. In some embodiments, said substrate is a flow cell. In some embodiments, said substrate comprises glass. In some embodiments, said target nucleic acid is attached to said substrate via a capture probe. In some embodiments, the method further comprises analyzing results of said sequencing providing a diagnosis, prognosis, or theranosis for a subject. In some embodiments, the method further comprises sequencing a plurality of target nucleic acids. In some embodiments, said combined read comprises sequence information comprising a nucleotide sequence of length greater than 500, 1000, 1500, 2000, or 3000 bases. In some embodiments, said combined read comprises sequence information comprising an average quality score of at least 26, 27, 28, 29, 30 or 31. In some embodiments, said combined read comprises sequence information comprising a quality score of at least 26, 27, 28, 29, 30 or 31 for any nucleotide position. In some embodiments, the first and second reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the template nucleic acid. In some embodiments, a sequence read from a complement strand of the template nucleic acid are further combined producing the combined read. In some embodiments, the polymerase is Klenow exo(−). In some embodiments, the nucleotide degrading enzyme comprises pyrophosphatase or apyrase.
- A set of nucleotides for controlled extension is a combination of any number of different types nucleotides including native, reversibly terminated, or other modified nucleotides as long as the combination allows controlled (or designed). In other words, a set of nucleotides is of any combination of any number of native, reversibly terminated, or otherwise manipulated nucleotides that do not result in runaway extension (unlimited extension). Sometimes, a controlled extension nucleotide set is described as containing no more than three different nucleotides. As used herein, “no more than three different nucleotides” refer to three different nucleotides, each having a different base (i.e., three of the A, C, G, T bases or three of the A, C, G, U bases. T and U bases can be considered equivalent in some embodiments). If a nucleotide set contains A, C, T, and U, it contains three different nucleotides because T and U are considered as equivalent in some embodiments. If the base of a nucleotide is modified, the modified nucleotide can be classified according to its pairing property. For example, if a dATP is modified in the base, but once incorporated, the base of the modified nucleotide still pair with a T base, the modified dATP still has the A base.
- All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
- The novel features of the present invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
-
FIG. 1 is a schematic illustrating an exemplary process of sequencing a long nucleic acid. -
FIG. 2 is a schematic illustrating an exemplary process of sequencing a long nucleic acid where the resulting read has a gap. -
FIG. 3 is a schematic illustrating an exemplary process of creating an extended sequencing primer for sequencing -
FIG. 4 is a schematic illustrating an exemplary process of building an extended sequencing primer by removing a sequencing product by peeling off the sequencing product or by digesting the sequencing product -
FIG. 5 is a schematic illustrating an exemplary process of building an extended sequencing primer by removing a sequencing product by digesting sequencing product. -
FIG. 6 is a schematic illustrating an exemplary process of building an extended sequencing primer by partial digestion of a sequencing primer. -
FIG. 7 depicts that nucleic acid sequence information can be obtained, processed, analyzed and/or assembled via a computer system. -
FIG. 8 depicts an example of a template and triple base extension reactions.FIG. 8 discloses SEQ ID NOS 1-11, respectively, in order of appearance. -
FIG. 9 depicts an exemplary embodiment of a dark base (native nucleotide) extension experiment design. -
FIG. 10 depicts results of an exemplary embodiment of the present invention, in which 12 steps of 3-base extension resulted in a 124 base pair (bp) product (extension plus primer), wherein the template was an oligonucleotide. -
FIG. 11 depicts results of an exemplary embodiment of the present invention, in which 12 steps of 3-base extension resulted in a 124 bp product (extension plus primer), wherein the template was a PCR product. -
FIG. 12 depicts the percent base calls per sequencing step forlane 1 of an exemplary embodiment of the present invention, where the last step of the dark base extension was a missing T step, and as expected, 100% of the first sequencing base was “T”. -
FIG. 13 depicts the percent base calls per sequencing step forlane 3 of an exemplary embodiment of the present invention, where the last step of the dark base extension was a missing C step, and as expected, 100% of the first sequencing base was “C”. -
FIG. 14 depicts the distribution of dark base extensions in lane 1 (10 steps) and lane 3 (4 steps). -
FIG. 15 depicts the distribution of dark base extensions in lane 4 (10 steps), lane 5 (16 steps) and lane 6 (20 steps) in another exemplary embodiment of the present invention. -
FIG. 16A shows cluster density of different lanes after +S Extension.FIG. 16B shows percentage of cluster pass filter rate.FIG. 16C shows the number of pass filter reads for different lanes.FIG. 16D shows the predicted quality scores of different lanes. -
FIG. 17A shows the 100 bp standard Illumina sequencing run.FIG. 17B shows the additional 100 bp Illumina sequencing run.FIG. 17C shows the number of correct bases was calculated to show changes of overall correct bases as the read length increases inFIG. 17A .FIG. 17D shows the number of correct bases was calculated to show changes of overall correct bases as the read length increases inFIG. 17B . -
FIG. 18 is a summary of Q-scores changing over read length related to Example 6. The x-axis is read length in bp. Y-axis is measured or empirical Q-Score. - Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this present invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press); Stryer, L. (1995) Biochemistry (4th Ed.) Freeman, New York; Gait, “Oligonucleotide Synthesis: A Practical Approach” 1984. IRL Press, London, Nelson and Cox (2000), Lehninger, (2004) Principles of
Biochemistry 4thEd., W. H. Freeman Pub., New York, N.Y. and Berg et al. (2006) Biochemistry, 6th Ed., W. H. Freeman Pub., New York, N.Y., all of which are herein incorporated in their entirety by reference for all purposes. - Provided herein are methods and systems for sequencing a target nucleic acid. Some embodiments of the invention are particularly suitable for sequencing a large number of target nucleic acids simultaneously.
- In one aspect of the invention, methods, kits, computer software products are provided for sequencing long nucleic acids. Nucleic acids are often sequenced using stepwise methods such as polymerase extension based sequencing or ligation sequencing, where one or more bases are read for each sequencing step. These stepwise based sequencing methods are often limited by their stepwise inefficiency, e.g., incomplete incorporation, incomplete ligation and other problems that create prephasing or dephasing. The stepwise inefficiency can accumulate over read length and limits read length.
- For example, reversible terminator nucleotide based sequencing (commercially available from Helicos, Inc., Illumina, Inc., Intelligent Biosystems, Inc./Azco Biotech, Inc. and described in vendor literature and their patent filings and at www.helicosbio.com, www.illumina.com, www.azcobiotech.com) are limited by the efficiency of incorporating reversible terminator nucleotides that are modified in the 3′ hydroxyl group or modified otherwise to interrupt further extension by a polymerase. If the sequencing detection is based upon incorporation of modified nucleotides with added detectable label such as a fluorescent group, the incorporation efficiency could be further reduced. The problem can be partially alleviated by mixing unlabeled and labeled reversible terminator nucleotides. However, even with improved chemistry and efficiency, the stepwise inefficiency can significantly limit read length and read quality at the end of the read.
- The stepwise efficiency problem can be illustrated with a case where each sequencing step has a constant stepwise efficiency of incorporation of about 99% and there are 1,000 template molecules in a cluster. After the first incorporation step, 10 sequencing primers are not extended and are capped or otherwise no longer involved in sequencing. In such a case, after 100 sequencing steps, only (0.99)100=36.6% or 360 molecules remain in the cluster for additional sequencing. At step 200, only (0.99)200=13.4% or 134 molecules remain in the cluster for additional sequencing. If the efficiency drops to 98%, at
step 100, there is only 13.4% molecules left for additional sequencing reactions and at step 200, only 1.8% molecules can be potentially used for further sequencing. - For nucleotide limited addition sequencing methods such as pyrophosphate detection based sequencing (commercially available from Roche/454 and described in vendor literature and patent filings and at www.454.com) or pH detection based sequencing (commercially available from Ion Torrent, Inc./Life Technologies. Inc. and described in vendor literature and patent filings), the efficiency can be limited by incomplete incorporation, mis-incorporation, loss of bound polymerase (fall-off). Stepwise ligation based sequencing has a similar efficiency problem as stepwise efficiency is limited by, e.g., ligation reaction efficiency and removal of labels.
- In one aspect of the invention, methods, reagents kits, instrument and computer software products are provided to sequence nucleic acids. In some embodiments, two or more segments of a nucleic acid target sequence are obtained sequentially from a template. The segments are then assembled to produce a contiguous sequence or a gapped sequence of the nucleic acid target sequence.
FIG. 1 illustrates the process in some embodiments. A part (102) of the target nucleic acid (101) is sequenced (FIG. 1A ). Another part (103) of the target nucleic acid (101) is also sequenced (FIG. 1B ). The process can be repeated (FIG. 1C ) many times. As shown inFIG. 1 , the sequenced parts are overlapping so the sequences can be assembly based upon overlapping sequences and/or other information. - In some embodiments, a large number of target nucleic acids (e.g. at least 10, 100, 1,000, 10.000, 100,000, or 1,000,000) is sequenced simultaneously. These target nucleic acids can be DNA, RNA or modified nucleic acids. While they can be sequenced as single molecules, they can also be sequenced as clones or clusters. Each of the clones or clusters (e.g. on beads) are derived from a single nucleic acid molecule. Methods for sequencing a large number of target nucleic acids in single molecule or clonal molecular clusters or beads are well known in the art. For simplicity of illustration, some embodiments may be described using singular terms such as “a target nucleic acid” or “an extension primer,” one of skill in the art would appreciate that many of the embodiments can be used to sequence many target nucleic acids simultaneously or sequentially and such sequencing may be performed on copies (more than 10, 100, 1,000, 100,000 copies) of the target nucleic acids.
- A computer software product is generally used to assemble the sequences when the amount of data is quite large. The computer software product typically inputs the raw sequences for each of the target nucleic acids and assembles contiguous sequences upon finding overlapping regions and optionally validating the overlapping regions using additional information such as alignment with a reference sequence, information about the starting position of the sequencing run or relative positional difference among sequencing runs. The resulting contiguous sequence (105) can be further validated by, for example, alignment with a reference sequence for the target nucleic acid. The sequencing can be performed using, for example, stepwise sequencing methods discussed earlier. While the individual sequencing runs (such as 102, 103, and 104) have read length limitations based on the underlying sequencing readout technologies, the assembled contiguous sequence can be significantly longer at for example, greater than 1.5, 2, 3, 4, or 5× of the individual sequencing reads (102, 103, and 104). The individual sequencing runs can be carried out sequentially. In some embodiments, the order of the sequencing runs is not important. For example, the step in
FIG. 1 C can be performed before the step inFIG. 1A . If the target nucleic acid is copied to several distinct locations, the sequencing runs using alternative sequencing primers may also be carried out in parallel. - The individual sequencing reads do not have to overlap.
FIG. 2 illustrates the sequencing of a long nucleic acid by three independent sequencing runs. Sequencing reads 202 and 203 do not overlap and the resulting assembledsequence 205 has a gap. In some embodiments, the computer software product provided can output the sequence with the gap, but can also estimate the size of the gap based upon alignment to a reference sequence. The positional difference between the sequencing reads can be estimated, for example, based upon different sequencing primer starting positions. The positional difference can be used to estimate the gap size. - Because individual sequencing runs can be carried out independently, each sequencing run resets the sequencing start conditions and is not affected or less affected by cumulative inefficiency or errors. By segmenting the sequencing of a target nucleic acid, sequencing methods and chemistries that have inherent length limitations can be used to sequence a target nucleic acid obtaining longer sequence information than the original length limitations of these sequencing methods and chemistries. For example, for a reversible terminator sequencing chemistry with sequencing length limitation of 250 bases, a 1,000 base long target nucleic acid can be sequenced contiguously by carrying out the 250 base long
reversible terminator sequencing 4 or more times. In various embodiments, the total read length from a single template can be up to 100, 200, 250, 500, 1000, 2000 bases or more. - In another aspect of the invention, methods and reagent kits are provided for building sequencing primers. The resulting sequencing primers can be of varying length. Different sequencing primers for the same target nucleic acid can be used to sequence different segments of the target nucleic acid.
- In some embodiments, an extension primer hybridized to a target nucleic acid is provided. In one embodiment, the extension primer is extended by controlled extension. Controlled extensions can be performed using polymerase extension reactions, stepwise ligation reactions and other methods. For polymerase extension reaction, controlled extension can be performed by, for example, three nucleotide cycles or by reversible terminator reactions. Controlled extension is also described in great detail in a section below and throughout the specification.
- The extended extension primer can be used for sequencing.
FIG. 3 illustrates some embodiments of this process.FIG. 3A shows that a target nucleic acid (301) is hybridized with an extension primer (302). InFIG. 1B , the extension primer (302) is then extended by a number of bases using one or more nucleic acid polymerization reactions or by one or more ligation reactions to produce an extended primer (302 and 303, where 303 is the extended portion). The extended primer (302, 303) is then used as a sequencing primer for sequencing (FIG. 3C , sequencing product is shown as 304). - In some embodiments, a target nucleic acid is hybridized with a sequencing product (such as the product resulting from
FIG. 3C ). The sequencing product can be the result of reversible terminator sequencing or nucleotide addition sequencing. Typically, in a clonal cluster of the target nucleic acids, sequencing products of different length may be hybridized with the target nucleic acid copies in the clonal cluster because of the inefficiencies of sequencing reactions which result in, for example, dephased or prephased products. One of skill in the art would appreciate that, while embodiments of the invention are often described using singular terms, typical sequencing reactions can be carried out using molecular clones, where each of the clones contains large number of copies of the same molecule with small variations because of errors in bridge amplifications, emulsion PCRs, rolling cycle amplifications and other amplification reactions. One of skill in the art would also appreciate that a large of number of target nucleic acids and thus a large number of molecular clonal clusters are sequenced simultaneously in a massively parallel fashion. - Such a sequencing product (or in the case of sequencing clusters, products) can be removed before an extension primer is hybridized to the sequencing template.
FIG. 4 illustrates some embodiments of the process. InFIG. 4A , a sequencing template (401) is hybridized with a sequencing primer (402) and the sequencing primer is used for sequencing which results in a sequencing product (403). The sequencing primer (402) and sequencing product (403) structure is removed by denaturation or by enzymatic digestion (FIG. 4B ). Methods for removing a strand of nucleic acid from a double strand nucleic acid structure are well known in the art. For example, the sequencing structure can be denatured by contacting it with a NaOH solution (e.g., about 0.1 N NaOH) or another denaturation reagent. The sequencing product structure can also be removed by exonuclease digestion or other enzymatic treatment. If enzymatic digestion is used, the target nucleic acid strand can be protected using, for example, protecting bases in the 5′ and/or 3′ end. In many cases, the template is immobilized on a substrate so that only one end could be potentially susceptible to nuclease digestion. In some case, protecting the template is not necessary because certain exonucleases only digest in a particular orientation (5′-3′ or 3′-5′). For example, exonuclease III predominately digests recessed 3′ ends of double strand DNA. If the target nucleic acid is immobilized at its 3′ end, it may not be necessary to protect the 5′ end. After the sequencing product is removed, an extension primer can be hybridized and extended (FIG. 4C ) as described above and detailed in following sections to produce an extended primer, which can serve as a primer for sequencing (FIG. 4D ). - In some other embodiments, a sequencing product structure does not need to be completely removed. It can be partially removed. As shown in
FIGS. 5 and 6 , the sequencing product part (503 or 603) may be completely (FIG. 6 ) or partially removed (FIG. 5, 505 is smaller than 503). The sequencing primer part (502 or 602) can be the product of earlier extension reactions such as these described inFIGS. 3,4, 5 and 6 . Partial digestion of nucleic acids may be achieved using exonuclease digestion (such as Exonuclease III digestion). If a synthetic primer was used as 502, the last base can be a base that cannot be digested by an exonuclease. For example, if the orientation from 502 to 503 is 5′ to 3′ the last base of the 502 part can be connected using a thiol bond which is resistant to certain exonuclease digestion. It is well known that alpha-thiophosphate-containing phosphodiester bonds are resistant to hydrolysis by the 3-to-5′ exonucleolytic activity of phage T4 DNA polymerase and exonuclease III. A thiophosphate containing diester bond can also be produced by incorporating one or more thiotriphosphate nucleotides in the desired position(s). As reported by Yang et al., (2007), “Nucleoside Alpha-Thiotriphosphates. Polymerases and the Exonuclease III Analysis of Oligonucleotides Containing Phosphorothioate Linkages”, Nucleic Acids Research, 2007, Vol. 35: 3118-3127, incorporated herein by reference, the pure S-diastereomer form of thiotriphosphate is recommended because the R-diastereomer form may be labile to Exonuclease III digestion. -
FIG. 5B illustrates the partial digestion of sequencing product. For example, during sequencing, a nucleotide thiotriphosphate can be incorporated into one or more specific positions. In reversible terminator sequencing, the reversible terminator nucleotide can be a nucleotide thiotriphoshate. This position can be used to terminate an exonuclease digestion in the step illustrated inFIG. 5B . Partial removal of sequencing products can be useful where the early steps of sequencing do not introduce too many prephasing or dephasing or other inefficiencies. It can reduce the need for extension steps illustrated inFIG. 5C because the total size of 504 plus 505 is longer than 405 inFIG. 4 and extend the next sequencing (506) further than 406. However, by incorporating part of the sequencing product (505), if the 504 fragments in a cluster vary too much in length, the process may affect the subsequent sequencing quality. - In one aspect, the present invention provides a method for sequencing a target nucleic acid molecule or a collection of target nucleic acids. By “target nucleic acid molecule”, “target molecule”, “target polynucleotide”, “target polynucleotide molecule” or grammatically equivalent thereof, as used herein it is meant a nucleic acid of interest. Target nucleic acid, for example, can be DNA or RNA or any synthetic structure that have similar properties of DNA or RNA. Sequencing, as used herein, refers to the determination of at least a single base, at least 2 consecutive bases, at least 10 consecutive bases or at least 25 consecutive bases in a target nucleic acid. Sequencing accuracy can be at least 65%, 75%, 85, 95%, 99%, 99.9% and 99.99% overall or per base. Sequencing can be performed directly on a target nucleic acid or on a nucleic acid derived from target nucleic acids. In some applications, a large number of target nucleic acids, such as at least 1,000, 10,000, 100.000 or 1,000,000 target nucleic acids are simultaneously sequenced.
- In some embodiments, a target nucleic acid is genomic DNA derived from the genetic material in the chromosomes of a particular organism and/or in nonchromosomal genetic materials such as mitochondrial DNA. A genomic clone library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism. A genomic library is a collection of at least 2%, 5%, 10%, 30%, 50%, 70%, 80%, or 90% of the sequence or sequences in the genomic DNA of an organism.
- Target nucleic acids include naturally occurring or genetically altered or synthetically prepared nucleic acids (such as genomic DNA from a mammalian disease model). Target nucleic acids can be obtained from virtually any source and can be prepared using methods known in the art. For example, target nucleic acids can be directly isolated without amplification using methods known in the art, including without limitation extracting a fragment of genomic DNA from an organism (e.g. a cell or bacteria) to obtain target nucleic acids. In another example, target nucleic acids can also be isolated by amplification using methods known in the art, including without limitation polymerase chain reaction (PCR), whole genome amplification (WGA), multiple displacement amplification (MDA), rolling circle amplification (RCA), rolling circle amplification (RCR) and other amplification methodologies. Target nucleic acids may also be obtained through cloning, including cloning into vehicles such as plasmids, yeast, and bacterial artificial chromosomes. “Amplification” refers to any process by which the copy number of a target sequence is increased. Amplification can be performed by any means known in the art. Methods for primer-directed amplification of target polynucleotides are known in the art, and include without limitation, methods based on the polymerase chain reaction (PCR). Examples of PCR techniques that can be used include, but are not limited to, quantitative PCR, quantitative fluorescent PCR (QF-PCR), multiplex fluorescent PCR (MF-PCR), real time PCR (RTPCR), single cell PCR, restriction fragment length polymorphism PCR (PCR-RFLP), PCK-RFLPIRT-PCR-IRFLP, hot start PCR, nested PCR in situ polony PCR, in situ rolling circle amplification (RCA), bridge PCR, picotiter PCR and emulsion PCR. Conditions favorable to the amplification of target sequences by PCR are known in the art, can be optimized at a variety of steps in the process, and depend on characteristics of elements in the reaction, such as target type, target concentration, sequence length to be amplified, sequence of the target and/or one or more primers, primer length, primer concentration, polymerase used, reaction volume, ratio of one or more elements to one or more other elements, and others, some or all of which can be altered. In general, PCR involves the steps of denaturation of the target to be amplified (if double stranded), hybridization of one or more primers to the target, and extension of the primers by a DNA polymerase, with the steps repeated (or “cycled”) in order to amplify the target sequence. Steps in this process can be optimized for various outcomes, such as to enhance yield, decrease the formation of spurious products, and/or increase or decrease specificity of primer annealing. Methods of optimization are well known in the art and include adjustments to the type or amount of elements in the amplification reaction and/or to the conditions of a given step in the process, such as temperature at a particular step, duration of a particular step, and/or number of cycles. In some embodiments, an amplification reaction comprises at least 5, 10, 15, 20, 25, 30, 35, 50, or more cycles. In some embodiments, an amplification reaction comprises no more than 5, 10, 15, 20, 25, 35, 50, or more cycles. Cycles can contain any number of steps, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more steps. Steps can comprise any temperature or gradient of temperatures, suitable for achieving the purpose of the given step, including but not limited to, 3′ end extension (e.g. adapter fill-in), primer annealing, primer extension, and strand denaturation. Steps can be of any duration, including but not limited to about, less than about, or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 180, 240, 300, 360, 420, 480, 540, 600, or more seconds, including indefinitely until manually interrupted. Cycles of any number comprising different steps can be combined in any order. In some embodiments, different cycles comprising different steps are combined such that the total number of cycles in the combination is about, less that about, or more than about 5, 10, 15, 20, 25, 30, 35, 50, or more cycles. Other suitable amplification methods include the ligase chain reaction (LCR), transcription amplification, self-sustained sequence replication, selective amplification of target polynucleotide sequences, consensus sequence primed polymerase chain reaction (CP-PCR), arbitrarily primed polymerase chain reaction (AP-PCR), degenerate oligonucleotide-primed PCR (DOP-PCR) and nucleic acid based sequence amplification (NABSA). Other amplification methods that can be used herein include those described in U.S. Pat. Nos. 5,242,794; 5,494,810; 4,988,617; and 6,582,938. In some embodiments, the amplification is performed inside a cell.
- In any of the embodiments, amplification may occur on a support, such as a bead or a surface. In any of the embodiments herein, targets may be amplified from an extract of a single cell.
- Target nucleic acids may also have an exogenous sequence, such as a universal primer sequence or barcode sequence introduced during, for example, library preparation via a ligation or amplification process. The term “sequencing template” used herein may refer the target nucleic acid itself or to a nucleotide sequence that is identical or substantially similar to the nucleotide sequence of a fragment of a target nucleic acid or the complement of a target nucleic acid. In one embodiment, the target nucleic acid molecule comprises ribonucleic acid (RNA).
- In one embodiment, the target polynucleotide is genomic DNA or a portion of the genomic DNA. While one embodiment is for sequencing a whole genome, such as at more than 50% coverage, these embodiments are also suitable for sequencing a targeted region such as genomic regions relating to drug metabolism. In one example, the target polynucleotide is human genomic DNA.
- Target nucleic acid, as used herein, can also refer to nucleic acid structures for sequencing. Such structures typically comprise adaptor sequences on one or both ends of target nucleic acid sequences. For example, a sequence derived from the genomic DNA of sample or derived from a RNA molecule of a sample, may be ligated with amplification and/or sequencing adaptor(s). Library construction methods are well known in the art. Nucleic acid sequencing libraries may be amplified in clonal fashion on substrates using bridge amplifications, emulsion PCR amplifications, rolling cycle amplifications or other amplification methods. Such processes may be performed manually or using automation equipment such as the cBot (Illumina, Inc.) or OneTouch™ (Ion Torrent).
- “Nucleic acid” or “oligonucleotide” or “polynucleotide” or grammatical equivalents typically refer to at least two nucleotides covalently linked together. A nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases, as outlined below (for example in the construction of primers and probes such as label probes), nucleic acid analogs are included that may have alternate backbones, comprising, for example, phosphoramide (see e.g. Beaucage et al., Tetrahedron 49(10):1925 (1993); Letsinger, J. Org. Chem. 35:3800 (1970); Sprinzl et al., Eur. J. Biochem. 81:579 (1977); Letsinger et al., Nucl. Acids Res. 14:3487 (1986); Sawai et al, Chem. Lett. 805 (1984). Letsinger et al., J. Am. Chem. Soc. 110:4470 (1988); and Pauwels et al., Chemica Scripta 26:141 (1986)), phosphorothioate (Mag et al., Nucleic Acids Res. 19:1437 (1991); and U.S. Pat. No. 5,644,048), phosphorodithioate (see e.g. Briu et al., J. Am. Chem. Soc. 111:2321 (1989)), O-methylphophoroamidite linkages (see e.g. Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press), and peptide nucleic acid (also referred to herein as “PNA”) backbones and linkages (see e.g. Egholm, J. Am. Chem. Soc. 114:1895 (1992); Meier et al., Chem. Int. Ed. Engl. 31:1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson et al., Nature 380:207 (1996)).
- Other analog nucleic acids include those with bicyclic structures including locked nucleic acids, also referred to herein as “LNA”, (see e.g. Koshkin et al., J. Am. Chem. Soc. 120.13252 3 (1998)); positive backbones (Denpcy et al., Proc. Natl. Acad. Sci. USA 92:6097 (1995)); non-ionic backbones (see e.g. U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed. English 30:423 (1991)); Letsinger et al., J. Am. Chem. Soc. 110:4470 (1988); Ed. Y. S. Sanghui and P. Dan Cook; Mesmaeker et al., Bioorganic & Medicinal Chem. Lett. 4:395 (1994); Jeffs et al., J. Biomolecular NMR 34:17 (1994); Tetrahedron Lett. 37:743 (196)) and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, and
Chapters - Nucleic acids containing one or more carbocyclic sugars are also included within the definition of nucleic acids (see e.g. Jenkins et al., Chem. Soc. Rev. (1995) pp 169 176). Several nucleic acid analogs are described in Rawls, C & E News Jun. 2, 1997
page 35. - The target nucleic acids may be single stranded or double stranded, as specified, or contain portions of both double stranded or single stranded sequence. Depending on the application, the nucleic acids may be DNA (including genomic and cDNA), RNA (including mRNA and rRNA) or a hybrid, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine, hypoxathanine, isocytosine, isoguanine, etc.
- In one embodiment, the methods of the present invention comprise capture of target polynucleotide. The target polynucleotide may be from a known region of the genome. In one embodiment, oligonucleotide probes can be immobilized on beads and these oligonucleotide beads which are inexpensive and reusable can be used to capture the target genomic polynucleotide. In another embodiment, microarray s are used to capture target polynucleotide.
- In one embodiment, the target polynucleotide may be fragmented to a suitable length or plurality of suitable lengths, such as approximately between 100-200, 200-300, 300-500, 500-1000, 1000-2000 or more bases in length.
- In one embodiment, the target polynucleotide is prepared by whole genome amplification (WGA) (see for example, Hawkins et al.: Whole genome amplification—applications and advances. Curr. Opin. Biotechnol. 2002 February; 13(1): 65-7)). In another embodiment, the target polynucleotide is prepared by whole genome sampling assay (WGSA). Generally, the WGSA reduces the complexity of a nucleic acid sample by amplifying a subset of the fragments in the sample. A nucleic acid sample is fragmented with one or more restriction enzymes and an adapter is ligated to both ends of the fragments. A primer that is complementary to the adapter sequence is used to amplify the fragments using PCR. During PCR fragments of a selected size range are selectively amplified. The size range may be, for example, 400-800 or 400 to 2000 base pairs. Fragments that are outside the selected size range are not efficiently amplified. The fragments that are amplified by WGSA may be predicted by in silico digestion and restriction enzyme combinations may be selected so that the resulting WGSA amplified fragments may represent the genomic regions of specific interests. The resulting library, often having desired adaptor sequences (including optional barcode sequences and sequencing primer hybridization site(s)) may be used for sequencing and for hybridizing with a genotyping array. In such embodiments, the library can be used for sequencing and the detected SNPs or indels can be validated by hybridizing the same library with an array. WGSA is disclosed in Kennedy et al. (2003), Nat. Biotechnol. Vol., pp. 1233-1237, and U.S. patent application Ser. Nos. 10/316,517, 10/442,021, 10/463,991, 10/316,629 and U.S. Pat. Nos. 6,361,947, 6,548,810, 7,267,966, 7,297,778, and 7,300,788, all of which are herein incorporated by reference.
- In one embodiment, the target polynucleotide or a collection of target polynucleotides is prepared by PCR, such as long-range PCR. Long range PCR allows the amplification of PCR products, which are much larger than those achieved with conventional Taq polymerases. Generally, up to 27 kb fragments from good quality genomic DNA can be prepared, although 10-20 kb fragments are routinely achievable, given the appropriate conditions. In some embodiments, a fragment greater than 27 kb is obtained. The method typically relies on a mixture of thermostable DNA polymerases, usually Taq DNA polymerase for high processivity (i.e. 5′-3′ polymerase activity) and another DNA polymerase with 3′-5′ proofreading abilities (usually Pwo). This combination of features allows longer primer extension than can be achieved with Taq alone.
- In one embodiment, the target polynucleotide is prepared by locus-specific multiplex PCR. Multiplex locus specific amplification can be used to amplify a plurality of pre-selected target sequences from a complex background of nucleic acids. The targets are selected for amplification using splint oligonucleotides that are used to modify the ends of the fragments. The fragments have known end sequences and the splints are designed to be complementary to the ends. The splint can bring the ends of the fragment together and the ends are joined to form a circle. The splint can also be used to add a common priming site to the ends of the target fragments. Specific loci are amplified and can be subsequently analyzed.
- In yet another embodiment, target polynucleotides are produced using multiplex PCR and each of the PCR fragments is labeled with a tag sequence. Such tag sequence can be added as a part of one of the primers used for the PCR. Therefore, each resulting PCR fragment can be uniquely identified. Such applications can be useful for the identification of species, such as microbial species.
- Other suitable amplification methods include but are not limited to the ligase chain reaction (LCR) (e.g., Wu and Wallace,
Genomics 4, 560 (1989), Landegren et al., Science 241, 1077 (1988) and Barringer et al. Gene 89:117 (1990)), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 1173 (1989) and WO88/10315), self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990) and WO90/06995), selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276), consensus sequence primed polymerase chain reaction (CP-PCR) (U.S. Pat. No. 4,437,975), arbitrarily primed polymerase chain reaction (AP-PCR) (U.S. Pat. Nos. 5,413,909, 5,861,245) and nucleic acid based sequence amplification (NABSA). (See. U.S. Pat. Nos. 5,409,818, 5,554,517, and 6,063,603 each of which is incorporated herein by reference). Other amplification methods that may be used are described in, U.S. Pat. Nos. 5,242,794, 5,494,810, 4,988,617 and in U.S. Ser. No. 09/854,317, each of which is incorporated herein by reference. Additional methods of sample preparation and techniques for reducing the complexity of a nucleic sample are described in Dong et al.,Genome Research 11, 1418 (2001), in U.S. Pat. Nos. 6,361,947, 6,391,592, 6,632,611, 6,872,529, 6,958,225 and U.S. Ser. No. 09/916,135. - Naturally-existing targets can be assayed directly in cell lysates, in nucleic acid extracts, or after partial purification of fractions of nucleic acids so that they are enriched in targets of interest. In one example, the target polynucleotide is human genomic DNA. The polynucleotide target to be detected can be unmodified or modified. Useful modifications include, without limitation, radioactive and fluorescent labels as well as anchor ligands such as biotin or digoxigenin. The modification(s) can be placed internally or at either the 5′ or 3′ end of the targets. Target modification can be carried out post-synthetically, ether by chemical or enzymatic reaction such as ligation or polymerase-assisted extension. Alternatively, the internal labels and anchor ligands can be incorporated into an amplified target or its complement directly during enzymatic polymerization reactions using small amounts of modified NTPs as substrates.
- The target polynucleotide can be isolated from a subject. The subject is not limited to a human being but may also be other organisms including but not limited to mammals, plants, bacteria, virus or fungi. In one example, the target polynucleotide is genomic DNA extracted from a human.
- The input nucleic acid can be DNA, or complex DNA, for example genomic DNA. The input DNA may also be cDNA. The cDNA can be generated from RNA, e.g., mRNA. The input DNA can be of a specific species, for example, human, rat, mouse, other animals, plants, bacteria, algae, viruses, and the like. The input nucleic acid also can be from a mixture of genomes of different species such as host-pathogen, bacterial populations and the like. The input DNA can be cDNA made from a mixture of genomes of different species. Alternatively, the input nucleic acid can be from a synthetic source. The input DNA can be mitochondrial DNA. The input DNA can be cell-free DNA. The cell-free DNA can be obtained from, e.g., a serum or plasma sample. The input DNA can comprise one or more chromosomes. For example, if the input DNA is from a human, the DNA can comprise one or more of
chromosome - The different samples from which the target polynucleotides are derived can comprise multiple samples from the same individual, samples from different individuals, or combinations thereof. In some embodiments, a sample comprises a plurality of polynucleotides from a single individual. In some embodiments, a sample comprises a plurality of polynucleotides from two or more individuals. An individual is any organism or portion thereof from which target polynucleotides can be derived, non-limiting examples of which include plants, animals, fungi, protists, monerans, viruses, mitochondria, and chloroplasts. Sample polynucleotides can be isolated from a subject, such as a cell sample, tissue sample, or organ sample derived therefrom, including, for example, cultured cell lines, biopsy, blood sample, or fluid sample containing a cell. The subject may be an animal, including but not limited to, an animal such as a cow, a pig, a mouse, a rat, a chicken, a cat, a dog, etc., and is usually a mammal, such as a human. Samples can also be artificially derived, such as by chemical synthesis. In some embodiments, the samples comprise DNA. In some embodiments, the samples comprise genomic DNA. In some embodiments, the samples comprise mitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificial chromosomes, yeast artificial chromosomes, oligonucleotide tags, or combinations thereof. In some embodiments, the samples comprise DNA generated by primer extension reactions using any suitable combination of primers and a DNA polymerase, including but not limited to polymerase chain reaction (PCR), reverse transcription, and combinations thereof. Where the template for the primer extension reaction is RNA, the product of reverse transcription is referred to as complementary DNA (cDNA). Primers useful in primer extension reactions can comprise sequences specific to one or more targets, random sequences, partially random sequences, and combinations thereof. Reaction conditions suitable for primer extension reactions are known in the art. In general, sample polynucleotides comprise any polynucleotide present in a sample, which may or may not include target polynucleotides.
- Methods for the extraction and purification of nucleic acids are well known in the art. For example, nucleic acids can be purified by organic extraction with phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent. Other non-limiting examples of extraction techniques include: (1) organic extraction followed by ethanol precipitation. e.g., using a phenol/chloroform organic reagent (Ausubel et al., 1993), with or without the use of an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption methods (U.S. Pat. No. 5,234,809; Walsh et al., 1991); and (3) salt-induced nucleic acid precipitation methods (Miller et al., (1988), such precipitation methods being typically referred to as “salting-out” methods. Another example of nucleic acid isolation and/or purification includes the use of magnetic particles to which nucleic acids can specifically or non-specifically bind, followed by isolation of the beads using a magnet, and washing and eluting the nucleic acids from the beads (see e.g. U.S. Pat. No. 5,705,628). In some embodiments, the above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K. or other like proteases. Sec. e.g., U.S. Pat. No. 7,001,724. If desired, RNase inhibitors may be added to the lysis buffer. For certain cell or sample types, it may be desirable to add a protein denaturation/digestion step to the protocol. Purification methods may be directed to isolate DNA, RNA, or both. When both DNA and RNA are isolated together during or subsequent to an extraction procedure, further steps may be employed to purify one or both separately from the other. Sub-fractions of extracted nucleic acids can also be generated, for example, purification by size, sequence, or other physical or chemical characteristic. In addition to an initial nucleic isolation step, purification of nucleic acids can be performed after any step in the methods of the invention, such as to remove excess or unwanted reagents, reactants, or products.
- Controlled Primer Extension
- A controlled extension is an increase in the length of an extension primer by a defined length or defined distance. As used herein, defined length refers to a length of extension that is dependent upon the extension conditions and may be dependent upon the template sequence. For an extension reaction, a defined length of the extension may not be known, but can be determined. For example, a single step of three nucleotide extension can extend the primer to a position where a missing nucleotide is needed for correct further extension. Such a position is dependent upon the nucleotide combination and the template sequence and is thus defined. But it may not be known if the template sequence is unknown and the extension product has not been measured. Once the template or target nucleic acid sequence is determined, the extension length can be estimated.
- In some other embodiments, however, the defined length may be independent of the template sequence. For example, if the controlled extension is carried out by stepwise ligation reactions, the defined extension length could be independent of the template sequence. There are many ways to carry out stepwise ligation to grow a primer. In one example, a random hexamer (a collection of hexamers with random sequences) is ligated to the 5′ end of the extension primer. The random hexamer does not have 5′ phosphate so it cannot be ligated to already extended primer (added hexamer does not provide 5′ phosphate). The 5′ phosphate can be added with a kinase reaction and the extended primer is then read for another extension. In this example, each extension step adds 6 bases. Similar stepwise ligation can be performed in the 3′ end of the extension primer.
- For a clonal cluster of molecules for sequencing, the controlled extensions are at least 55%, 65%, 70%, 75%, 80%, 85%, 95%, 98%, 99%, 99.9%, 99.99% synchronized, because at least majority of the molecules in a cluster are extended at the same length for each steps.
- In some embodiments, a controlled primer extension is performed using polymerization. In such embodiments, the extension primer is extended from its 3′ end in the 5′-3′ orientation. In some embodiments, long nucleic acids are sequenced by incorporating sequence reads that are obtained using one or more the controlled primer extension reactions. In some embodiments, controlled primer extension comprises the use of native nucleotides or modified nucleotides.
- In one embodiment, a series of sequential reactions is performed such that each reaction of the series extends an extension primer, such as a deoxyribonucleic acid (DNA) primer or a sequencing primer, to a different length to create incremental sequences complementary to a sequencing template (the target nucleic acid or target polynucleotide molecule). For each of the extension reactions (often with incremental number of steps), the extension primer may be the same or similar to other(s) in the series. As used herein, two similar primers may target the same region of the target nucleic acid or target neighboring regions, typically within 10, 20, 50, 100 bases. Two similar primers may target the same region but be different in length. In many sequencing reactions, the desired region of the target nucleotides may be surrounded by or adjacent to adaptor and/or key(s) sequences. In one example, a biologically derived sequence may be ligated with an adaptor sequence (such as in sequencing libraries for Illumina HiSeq's reversible terminator sequencing or for Ion Torrent's pH detection sequencing).
- A sequencing primer is often designed to hybridize with the whole or a part of the adaptor sequence and can be designed to hybridize to the last 3′ base of an adaptor sequence so that the first base read is the biological sample derived sequence (Illumina HiSeq library). However, in some cases, the sequencing primer may be designed to hybridize to a region that is 5′ to the biological sample derived sequence because the first part of the sequence to be read can be a barcode or index run or a key sequence (e.g., in Ion Torrent PGM Sequencing). These sequencing primers can also be used as extension primers.
- In some embodiments, the extension primer sequences are designed to hybridize to the same or different parts of the adaptor sequences, typically 5′ to the biologically derived sequences. The extension primers can be the same or similar.
- An extension primer and the extended extension primer can also be used as a sequencing primer. The extension of the extension primer or sequencing primer can be with one or more nucleotides and a polymerase, such as native or native performance nucleotide(s) and native or native performance polymerase or a modified polymerase. Where RNA extension can be performed similarly, using an RNA polymerase, various embodiments are illustrated using DNA extensions as examples.
- These extended extension primers can be generated or produced by extending the extension primer through controlled extension, such as by pulse extension. In some embodiments, a series of extended sequencing primers of incremental length are generated. In another embodiment, sequencing primers of incremental length can be generated or produced by extending the extension primer through extension, such as with an incomplete set of nucleotides, i.e., with a set of nucleotides comprising no more than three different nucleotides. Each incomplete set of nucleotides can extend the extension primer until the extension reaches a position where the target nucleic acid (or template) has the complementary nucleotide base. For example, in an incomplete set of nucleotides comprising C, G, and T, the sequencing primer can be extended until it reaches a T base in the template target nucleic acid.
- Multiple steps of extension can be performed using different incomplete nucleotide sets. The extension reactions can be performed with at least two different sets of nucleotides. For example, multiple steps of extension can be performed using a first nucleotide set consisting of dATP, dCTP, dGTP and a second nucleotide set consisting of dATP, dCTP, dTTP. Because certain DNA polymerases can incorporate nucleotide diphosphates, if such a DNA polymerase is used for extension, the nucleotides can be diphosphates instead of triphosphates.
- Between the extension steps, unincorporated nucleotides need to be removed to avoid run-offs. In some embodiments, a washing step is used between two extension steps. Because the target nucleic acids or the extension primers are often immobilized on a substrate such as on a glass slide or on beads, washing can be performed relatively easily. The washing solution may optionally include nucleotide degrading enzymes such as apyrase and/or alkaline phosphatase.
- Controlled extension can be performed using pulse extension with no washing steps between extension steps when extension is performed with serial addition of various sets of nucleotides, wherein each set comprises one, two or three different nucleotides. In a pulse mode, sets of nucleotides are typically added serially at specified time intervals (such as for 1-10, 10-20, 20-30, 30-60 seconds). The nucleotides are typically degraded before the next addition of nucleotides by nucleotide degrading enzymes such as apyrase and/or alkaline phosphatase in the reaction solution.
- Extension with washing and pulse extension steps can be combined. For example, extension can be performed in a pulse mode After certain number of pulse extension steps (such as 20-40, 41-60, 61-100 steps), the reaction mixture can be washed to remove residual nucleotides or by products. A new series of pulse extension steps can then be performed.
- In some embodiments, controlled extension is performed using unmodified nucleotides. Unmodified nucleotides are typically more efficiently incorporated than labeled nucleotides. However, labeled nucleotides can be used as long as their incorporation efficiency is high. Incorporation efficiency can be affected by the polymerase used. Therefore, the selection of nucleotides can be dependent upon the corresponding polymerase used to incorporate the nucleotides. Modified nucleotides with a bulky group such as a fluorescent label can significantly reduce the incorporation efficiency and may not be good nucleotides for some embodiments.
- In one embodiment, the controlled extension can be performed using a polymerase in a buffer that is suitable for the polymerase to catalyze polymerase reaction. In addition to the polymerase, nucleotide(s) are also added to the extension reaction. In one embodiment, a reaction contains a polymerase and a set of nucleotides, wherein the set of nucleotides comprises no more than three different nucleotides. For example, the set of nucleotides consists of one to three of the four types of nucleotides (e.g. for DNA polymerase, one, two or three of the four nucleotides dATP, dCTP, dTTP, dGTP). In one embodiment, a reaction containing three of the different nucleotides stops at the template base that is complementary to the missing nucleotide. For example, for a reaction that has dATP, dCTP, dGTP, the extension stops at a base “A” on the template because “A” is complementary to the missing nucleotide dTTP, thereby limiting extension of a primer hybridized to the template. Alternatively, nucleotide polymers, such as dimers, trimers, or longer nucleotide polymers can be used in each set. For example, a set may contain GA, GG, GC, GT, AA, AG, AC, AT, CA, CC, CG, and CT.
- Base extension can be performed many times with various nucleotide sets, or with numerous cycles of nucleotide sets. For random chosen genomic sequences, the average extension length per single “three nucleotide” extension step is about 4 bases. To extend an average length of approximately 96 bases, a total of 24 extension steps are needed on average. In comparison. “single nucleotide” extension as used in Ion Torrent's PGM or pyrophosphate sequencing requires a total of 154 extension steps to achieve an approximate average extension length of 96 bases. Forty eight three base extension steps can achieve an average extension length of approximately 192 bases. Three nucleotide extensions are more than 6 times faster than single nucleotide extensions.
- Optimizing conditions for controlled extension is important for many embodiments where it is desirable to minimize dephasing or prephasing. DNA polymerases, such as Bst DNA polymerase and Klenow DNA polymerase, both of which are suitable for controlled extension, may incorporate wrong bases particularly if the correct nucleotide is absent. Mis-incorporation tends to happen slower than correct incorporation for some enzymes. Therefore, it may be desirable to complete the extension quickly, for example, within 30 sec, 1 min., 2 min. or 5 min. of incorporation time. On the other hand, too short an extension time may cause incomplete incorporation because of the lack of sufficient incorporation time. Many DNA polymerases, however, have very fast incorporation time.
- Nucleotide concentration is another important consideration for controlled extensions. Higher concentrations of nucleotides tend to cause mis-incorporation, while lower concentrations tend to cause incomplete incorporation. In some embodiments, the nucleotide concentration is between 1-100 μM, 2-60 μM, 3-50 μM, 3-25 μM, 3-10 μM, 5-8 μM. One of skill in the art would appreciate that the optimal nucleic acid concentrations vary. The optimal nucleotide concentration may be obtained by performing extensions using different nucleotide concentrations and measuring mis-incorporation and/or incomplete extension products versus correct extension products. Various extension products can be detected by gel electrophoresis, HPLC analyses or sequencing. The optimal nucleotide concentration may be dependent upon other conditions for controlled extension.
- Many DNA polymerases are suitable for controlled extensions in at least some embodiments. Suitable DNA polymerases include. Klenow fragment, Bst, and other DNA polymerases known in the art. Bst DNA polymerase is particularly suitable for controlled extensions when there is no reversible terminator nucleotides in the nucleotide mix. If a reversible terminator is included, a modified polymerase may be used to increase the efficiency of incorporation.
- Controlled extension can be performed in a variety of temperature settings. Typically, the polymerase used has a preferred or optimal reaction temperature or temperature range. The GC content of the target nucleic acids may be a consideration for selecting an extension temperature. The controlled extension can be performed, for example, at room temperature, about 20° C., about 37° C., about 65° C. or about 70-75° C. The reaction buffer can be selected based upon the polymerase used. Optionally, a pyro-phosphatase/inorganic phosphatase can be included to remove extension byproducts. In some embodiments, the buffer contains apyrase to digest nucleotides so that the polymerase is only exposed to nucleotides in a short period of time. The apyrase concentration can be adjusted to affect the nucleotide concentration curve during the incorporation period. In some embodiments, a single strand DNA binding protein (SSB) is used in extension reactions to reduce the effect of secondary structures. Other additives such as GC Melt, betaine and formamide can be added at appropriate amounts.
- In some embodiments, before the first extension reaction, a buffer containing a polymerase such as the Bst DNA polymerase can be used to incubate the hybridized extension primer/template (target nucleic acid) complex so that the enzyme has sufficient time to bind with the complex. The incubation time can be optimized by measuring extension results. Typically, the extension time is between 30 sec to 10 min.
- In the subsequent extension steps, additional polymerase can be added at each step or in some steps to improve overall efficiency of multi-step extensions. In some embodiments, however, polymerase is not added at extension steps, particularly in pulse model where the polymerase remains in the buffer when there are no washing steps.
- In some embodiments, instead of missing one or more nucleotides in the extension reaction, one to three types of nucleotides (such dATP, dCTP, dTTP) are mixed with a reversible terminator nucleotide (such as dGTP) and can be used to control the extension. Many reversible terminator nucleotides are suitable for this method and are discussed in, e.g., Wu et al. (2007), 3′-O-modified nucleotides as reversible terminators for pyrosequencing, PNAS vol. 104 no. 42 16462-16467; and Bently et al. (2008), Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456, 53-59, all incorporated herein by reference. In one embodiment, nucleotides that have 3′ phosphates are used as reversible terminators. Treatment with alkaline phosphatase can effectively remove the 3′ phosphate and reverse the chain termination. For each step, the extension stops at the first base in the template that is complementary to the reversible terminator in the solution (such as a C base in the template and G base in the reversible terminator). There is generally no particular preference for which base is used as the reversible terminator base except when the target templates base composition is known and is biased towards the use of certain bases. For example, it may be preferred to use C or G as reversible terminator if the goal is to maximize extension length for every step. To avoid situations of slow extension for homopolymers (e.g. GGGGG), it is desirable to alternate two or more reversible terminators, e.g., G, C or G, C, A, or G, C, A, T. In some embodiments, the mixture may contain more than two or three reversible terminators with one or two no terminator nucleotides.
- After incorporating the reversible terminator base, the unincorporated nucleotides are washed away and the chain termination is reversed by removing the terminating group in the reversible terminator base. The use of reversible terminators in traditional reversible terminator sequencing, particularly when some of the terminators are labeled with fluorescent labels, causes inefficient polymerization and may result in progressive decline in sequencing quality, and further, limit the read length. Using reversible terminators in an extension mixture to extend an extension primer will cause less incorporation inefficiency because these are on average incorporated in every four or five bases in random sequences instead of every step in traditional reversible terminator sequencing. Therefore, a mixture of three no terminator nucleotides with one reversible terminator can extend a sequencing primer efficiently even when reversible terminators are used.
- The reversible terminators can be optionally labeled. In such cases, the incorporation can be monitored. In some embodiments, the extension reactions can be monitored by, for example, measuring polymerization byproducts such as pyrophosphate or phosphate or pH changes.
- The extended primers can then be used as sequencing primers to determine the sequence of the template. For example, a primer extension product can be extended in the presence of labeled nucleotides to generate a sequence read for the template. Sequencing can be performed using, for example, reversible terminator sequencing, ligation based sequencing, pyrophosphate detection based sequencing, proton detection based sequencing, or any suitable sequencing reaction known in the art.
- In one embodiment, sequencing a target nucleic acid comprises incremental base extension, compiling data generated from detecting the presence of bases present in each incrementally extended sequence, and determining the sequence of the target nucleic acid through analyzing the collected data. For example, a plurality of primer extension products of varying lengths are generated or produced for a target nucleic acid sequence serving as a template. The plurality of primer extension products can be used to produce a variety of sequence reads. The sequence of the target polynucleotide molecule can be obtained by assembling the variety of sequence reads. The assembly may comprise stitching together overlapping sequence information, for example, originating from a specific target sequence. The origin of target sequences may be determined, among other methods, by location, by specific target or barcode sequences or any other suitable method known in the art. For example, a barcode specific oligonucleotide can be either used as a seed/extension primer or ligated to a seed/extension primer. The products of the ligation can then be used to prime a sequencing reaction or primer extension reaction.
- In one aspect of the present invention, the method comprises sequencing one or more bases of a target nucleic acid by using a first sequencing primer hybridized to a target nucleic acid. Such sequencing can be performed using sequencing by synthesis, for example, step-wise reversible terminator sequencing, incorporating labeled nucleotides, pyrophosphate detection based sequencing, ion detection based sequencing, or alternatively, step-wise ligations, or other methods, thereby obtaining a first sequence read. The first primer and any extension from the primer from the first sequencing can then be released from the target nucleic acid, for example, by denaturing the target nucleic acid via heating the target nucleic acid, contacting the target nucleic acid with sodium hydroxide solution, urea solution, formamide solution, or any other suitable denaturation solution known in the art. The target nucleic acid is then hybridized to a second sequencing primer, which can be the same as the first sequencing primer. A primer extension product is generated by extending the second sequencing primer, such as through controlled limited extension to produce an elongated primer The elongated sequencing primer can be used to sequence one or more bases of the target nucleic acid by using one of many sequencing methods such as step-wise reversible terminator sequencing from the elongated primer, incorporating labeled nucleotides, pyrophosphate detection based sequencing, ion detection based sequencing, step-wise ligations, or other methods, thereby obtaining a second sequence read. The steps of releasing the primer extension product, hybridizing a sequencing primer, extending the sequencing primer to produce an elongated primer, and extending the elongated primer product to obtain a sequence read can be repeated for many times. When these steps are repeated, the controlled extension length may be different. As used herein, “controlled extension” means extension of nucleic acid sequence at specific length. The specific length can be known or unknown. For example, in a three base template dependent extension reaction driven by a nucleic acid polymerase, the extension length can be dependent upon the sequence of the template. Because the template sequence may or may not be known before it is sequenced, the specific extension length may not be known until the template is sequenced or the length is otherwise determined. Nevertheless, the length of extension is generally not random, rather it may be determined by the template sequence. In the case where a cluster of the template molecules, such as in a cluster generated by bridge amplification from a single template or a bead with molecules copied from a single template nucleic acid molecule via emulsion PCR, a majority of the primer extension molecules (e.g. at least 55%, 70%, 85%, 90%, 95%, 99%, 99.9%, 99.99%, 99.999%) hybridized to target nucleic acids in the cluster is extended to the same length in a single step of extension. Some dephasing or prephasing may occur. Over multiple steps of extension, some dephasing or prephasing in an early step may be overcome by one or more late extension steps.
- Each primer extension may include one or more cycles of extension and may extend the sequencing primer by a varying number of bases. The plurality of sequence reads can be assembled, such as through overlapping sequence reads, to generate the sequence of the target nucleic acid.
- For example, using same initial oligonucleotides for the first seed sequencing primer and if the second primer extension product is shorter than the first sequence read (first primer extension), there will be an overlapping sequence between the first sequence read and the second sequence read. If the second primer extension product is longer than the first sequence read, there can be a gap between the first sequence read and the second sequence read. However, additional sequence reads can be obtained with subsequent extension product removal(s) and one or more new rounds of primer extension to obtain additional sequence reads. Fewer extension steps may be used to have more overlapping sequence results between successive sequencing for more templates. Alternatively, more extension steps can be used to have more non-overlapping sequences.
- In general, the length of first sequence read and subsequent reads depend on the sequencing technology used, which can generate different lengths for a given accuracy. Preferably, the sequence read is between 25 to 100 bp, 200 bp, 500 bp, 1 kb or up to 2 kb. One of skill in the art would appreciate the order of sequencing may not be significant. For example, long sequences can be obtained with extension and sequencing first and then primer without extension and sequencing.
- In some embodiments, a large number of nucleic acid targets are simultaneously sequenced. In such embodiments, the target nucleic acids are typically immobilized on a substrate. At least some target nucleic acids can be spatially separated by forming single molecule clusters that are at least partially non-overlapping. Methods for sequencing a large number of single molecule clusters are well known in the art and kits, instruments and instructions for performing such sequencing have been commercially available from, e.g., Illumina, Inc. (San Diego, Calif.), Life Technologies, Inc. (Foster City, Calif.) Further, sequencing services are available from Complete Genomics, Inc. (Mountain View, Calif.) and Centrillion Biosciences, Inc. (Mountain View, Calif.).
- Predicting Controlled Extension Distance
- In some embodiments, the extension distance of one or more steps of controlled extensions is estimated by calculating the difference (Pe−Ps) between the extension start position (Ps) and the extension end position (Pe). If the target nucleic acid sequence is known, for each extension step, the stop position can be found by, for example, finding the positions of a target nucleic acid base that is complementary with the missing base in the extension step. The stop position is one base before the first complementary base position. For example, an extension with a nucleotide combination of A, C, and G is used to extend a primer over a template sequence of TTGCATTG. The stop position is base 4 (“C”) because the template base A is complement with the missing base “T.” If a reversible terminator nucleotide is used in the extension step with three other nucleotides (e.g., A, C, G and terminator T), the stop position should be the first complementary base position (
position 5 or first “A”). The start position of a single extension step in a series can be the start position of the series if it is the first extension step. The start position of a single extension step can also be the next complementary target nucleotide to a missing base or one base after the next complementary target nucleotide to a reversible terminator. The total extension distance can be calculated by aggregating the extension distance of each step. - After a target nucleic acid is sequenced, the extension distance can be calculated, for example, as described. However, if the target nucleic acid sequence is unknown, the extension distance can still be estimated by, for example, using simulated random sequences. After the first extension step, the average extension distance of each three nucleotide extension step extends about 4 bases per step. If a reversible terminator is used, the average extension distance of a single extension step, after the first extension step, is about 5 bases per step.
- In embodiments where each extension is performed in about 20 seconds, a 1.000 base extension takes on average 250 steps or 1.4 hours. In comparison, in embodiments where each extension is performed in about 10 seconds, the extension time is less than one hour. If a reversible terminator is used, the single step extension time may be longer to allow time for deblocking and other optional steps.
- Instrument and Computer Software Products for Controlled Extension Instrument, Automation and Computer Software
- In some embodiments, controlled extensions are performed in suitable reaction vessels, such as a test tube, a well in a microtiter plate, or a flow cell. While controlled extensions and sequencing can be performed manually, it is more convenient and may be more consistent if some steps are performed with automated equipment.
- In some embodiments, controlled extensions are performed using a computer controlled instrument. In one embodiment, nucleotide sets are delivered to the reaction site, such as a lane in a flow cell or a flow chamber of a chip, using a computer controlled pump or an automated pipette. Computer controlled pumps are available from many commercial sources and in many format and specifications. Syringe pumps and peristaltic pumps are particularly suitable for delivering small volumes of reagents in a very short time. Computer software that control the operation of the pumps can be coded using any suitable language known in the art, such as C/C++, objective C, C#, Java. or a variety of scripting languages.
- While each reagent such as washing solution or a nucleotide set can be delivered using its own pump, it is often desirable to use a pump in combination with one or more valves. A computer controlled valve can make the system more versatile. In some embodiments, such as IonTorrent by Life Technologies, liquid reagents can be manipulated via pressurized containers creating back pressure onto reagents, rather than using pumps.
- Some commercially available sequencers such as the
Hiseq 2000, Hiscan Sequencers, MiSeq sequencers and Ion Torrent PGM sequencers include computer controlled reagent delivery systems. These systems may be reprogrammed to perform the sequencing methods in some embodiments. - Other liquid handling equipment, such as the cBot cluster station and MiSeq from Illumina, Inc. and a variety of liquid handling robots, such as the Tecan Freedom Evo and Beckman Coulters Biomek series liquid handling robots can be reprogrammed (using scripts) to perform controlled extensions.
- Reagents may be packaged as kits to facilitate automation.
- The controlled extensions, including stripping or removing sequencing products, can be performed in line in a sequencer with suitable reagent delivery capability. In some embodiments, a flow cell is sequenced, stripped, extended, and sequenced in a sequencer with the cluster alignment maintained so that the resulting sequence data can be correlated with the correct clusters. Maintaining alignment can be important because a large number of clusters can easily be sequenced simultaneously. Maintaining alignment, however, does not necessarily mean that the flow cell cannot be moved.
- For some cluster generation methods, such as the Ion Torrent beads on chip format, aligning different reads to the same cluster/bead is straight forward since each bead has its own coordinate in a chip. For clusters in the Hiseq or MiSeq sequencers, each identified cluster has coordinates and can be located as long as alignment has not changed significantly.
- In some embodiments, if the cluster alignment is not maintained between different sequencings, clusters from different sequencing runs may still be correlated by comparing coordinates between two different runs and using overlapping sequences, as well as, alignment to reference sequences. If a consistent pattern of pixel shift is uncovered, a large percentage of clusters in different sequencing runs can still be correlated.
- Sequencing
- Sequencing by extending a sequencing primer or by extending an extension product can be carried out using a variety of methods. For example, sequencing can be carried out with a labeled reversible terminator or by ligation with a labeled oligonucleotide. Sequencing can be performed using any commercially available method, such as a reversible terminator based sequencing method that is commercially available from companies such as Illumina, Inc. (San Diego, Calif.). Helicos, Inc. (Boston, Mass.), and Azco Biotech, Inc. (San Diego, Calif.).
- Sequencing can be accomplished through classic Sanger sequencing methods, which are well known in the art. In some embodiments, a long target nucleic acid (e.g. at least 1,000, 2,000, 10,000, 50,000 bases in length) can be sequenced using controlled extension and sequencing approach. The sequence readout can be carried out using Sanger sequencing which can read about 500-1200 bases per reaction. In one embodiment, the controlled extension is carried out in a series of extension reactions. A 1.800 base long DNA fragment can be sequenced by one Sanger sequence read of 1.000 bases and another Sanger sequence read of 1,000 bases after a controlled extension of about 800 bases. The controlled extension takes about 2-5 hours. In some embodiments, during the controlled extension, preferably in the last step, cleavable nucleotides are used. After Sanger sequencing reaction, the controlled extension product can be removed from the Sanger sequencing product so that the controlled extension product does not add bases to the Sanger fragment. By removing the controlled extension product, the Sanger readout can be performed using standard Sanger sequencing gels or capillary sequencers.
- The cleavable nucleotide can be a dUTP. Once incorporated, the uracil from the base U can be released using Uracil-DNA glycosylase (UDG). The resulting apurinic/apyrimidinic (AP) site can be cleaved using, e.g., AP lyase, which can break a DNA fragment. In addition to the dUTP/Glycosylase/AP Lyase system, other suitable cleavable base systems known in the art can also be used.
- Sequencing can also be accomplished using high-throughput systems some of which allow detection of a sequenced nucleotide immediately after or upon its incorporation into a growing strand, i.e., detection of sequence in real time or substantially real time. In some cases, high throughput sequencing generates at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 100.000 or at least 500,000 sequence reads per hour; with each read being at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read
- In some embodiments, high-throughput sequencing involves monitoring pH changes during polymerization. In some embodiments, high-throughput sequencing involves the use of technology available by Helicos BioSciences Corporation (Cambridge, Mass.) such as the Single Molecule Sequencing by Synthesis (SMSS) method. SMSS is described in part in US Publication Application Nos. 20060024711; 20060024678; 20060012793; 20060012784; and 20050100932.
- In some embodiments, high-throughput sequencing involves the use of technology available from 454 Lifesciences, Inc. (Branford, Conn.). Methods for using bead amplification followed by fiber optics detection are described in Marguiles, M., et al. “Genome sequencing in microfabricated high-density picolitre reactors”. Nature, doi: 10.1038/nature03959; and well as in US Publication Application Nos. 20020012930; 20030058629; 20030100102; 20030148344; 20040248161; 20050079510; 20050124022; and 20060078909.
- In some embodiments, high-throughput sequencing is performed using Clonal Single Molecule Array (Solexa, Inc./Illumina, Inc.) or sequencing-by-synthesis (SBS) utilizing reversible terminator chemistry. These technologies are described in part in, e.g., U.S. Pat. Nos. 6,969,488; 6,897,023; 6,833,246; 6,787,308; and US Publication Application Nos. 20040106130; 20030064398; 20030022207; and Constans, A., The Scientist 2003, 17(13):36.
- In some embodiments, high-throughput sequencing of RNA or DNA can take place using AnyDot.chjps (Genovoxx, Germany). In particular, the AnyDot-chips allow for 10×-50× enhancement of nucleotide fluorescence signal detection. Any Dot.chips and methods for using them are described in part in International Publication Application Nos. WO02/088382, WO03020968, WO03/031947, WO2005/044836, PCT/EP05/105657, PCT/EP05/105655; and German Patent Application Nos.
DE 101 49 786,DE 102 14 395,DE 103 56 837,DE 10 2004 009 704,DE 10 2004 025 696,DE 10 2004 025 746,DE 10 2004 025 694,DE 10 2004 025 695,DE 10 2004 025 744,DE 10 2004 025 745, andDE 10 2005 012 301. - Other high-throughput sequencing systems include those disclosed in Venter, J., et. al.
Science 16 Feb. 2001; Adams, M. et al,Science 24 Mar. 2000; and M. J, Levene, et al. Science 299:682-686, January 2003; as well as US Publication Application No. 20030044781 and 2006/0078937. Overall such systems involve sequencing a target nucleic acid molecule having a plurality of bases by the temporal addition of bases via a polymerization reaction that is measured on a molecule of nucleic acid, i e., the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. Sequence can then be deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labeled types of nucleotide analogs are provided proximate to the active site, with each distinguishably type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labeled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined. - In one embodiment, sequencing can be conducted with labeled nucleotides such as dNTPs with labels. Bases may be detected by extending the incremental fragments via contacting the hybridization complexes sequentially with one of labeled dATP, dCTP, dGTP and dTTP, in the presence of a polymerase, and detecting the incorporation of the labeled dATP, dCTP, dGTP and dTTP to obtain a sequence read from each reaction.
- In one embodiment, a mixture of labeled dATP, dCTP, dGTP and dTTP are used. Generally, due to general low incorporation efficiency of the modified dNTPs, such as labeled dNTPs, only the first few bases are extended to generate strong signal. The possibility of “run-on” extension is rather low and the signal generated by such “run-on” extension can be filtered out as noise using methods provided herein or known in the art. In one embodiment, a mixture of labeled ddATP, ddCTP, ddGTP and ddTTP are used, and no “run-on” extension is permitted. In one embodiment, only one round of interrogation that covers all four possible bases is carried for each incremental fragment. For example, sequential addition with one labeled dNTP in each round of interrogation provides possible addition of one detectable base at a time (i.e. on each substrate). This generally results in short read (such as one base or a few bases) that could be assembled for each round. In another embodiment, a longer read is generated with more than one round of interrogation.
- In another embodiment, a mixture of labeled ddATP, ddCTP, ddGTP, ddTTP and small amount (<10% (e.g. 5, 6, 7, 8, or 9%) or <20% (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19%) of native dATP, dCTP, dGTP, and dTTP are added.
- In one embodiment, the labeled nucleotides are reversible terminators. Multiple bases can be detected by the signal strength or in the case of reversible terminator, base addition detection. Nucleotide reversible terminators are nucleotide analogues, which are modified with a reversible chemical moiety capping the 3′—OH group to temporarily terminate the polymerase reaction. In this way, generally only one nucleotide is incorporated into the growing DNA strand even in homopolymeric regions. For example, the 3′ end can be capped with an amino-2-hydroxypropyl group. An allyl or a 2-nitrobenzyl group can also be used as the reversible moiety to cap the 3′-OH of the four nucleotides. Examples of reversible terminators include but are not limited to 3′-O-modified nucleotides such as 3′-O-allyl-dNTPs and 3′-O-(2-nitrobenzyl)-dNTPs.
- In one embodiment, after detection of the cleavage site present on the solution probe, the 3′-OH of the primer extension products is regenerated through different deprotection methods. The capping moiety on the 3′-OH of the DNA extension product can be efficiently removed after detection of a cleavage site by a chemical method, enzymatic reaction or photolysis, i.e. the cap will be cleaved from the cleavage site. To sequence DNA, in one embodiment, templates containing homopolymeric regions are immobilized on Sepharose beads, and then extension-signal detection-deprotection cycles are conducted by using the nucleotide reversible terminators on the DNA beads to unambiguously decipher the sequence of DNA templates. In one embodiment, this reversible-terminator-sequencing approach is used in the subject methods to accurately determine DNA sequences. (The cap may be referred to herein as a “protective group”).
- Polynucleotide of the invention can be labeled. In one embodiment, a molecule or compound has at least one detectable label (e.g., isotope or chemical compound) attached to enable the detection of the compound. In general, labels of use in the present invention include without limitation isotopic labels, which may be radioactive or heavy isotopes, magnetic labels, electrical labels, thermal labels, colored and luminescent dyes, enzymes and magnetic particles as well. Labels can also include metal nanoparticles, such as a heavy element or large atomic number element, which provide high contrast in electron microscopy. Dyes of use in the invention may be chromophores, phosphors or fluorescent dyes, which due to their strong signals provide a good signal-to-noise ratio for decoding.
- In one embodiment, labels may include the use of fluorescent labels. Suitable dyes for use in the present invention include, but are not limited to, fluorescent lanthanide complexes, including those of Europium and Terbium, fluorescein, rhodamine, tetramethylrhodamine, eosin, erythrosin, coumarin, methyl-coumarins, pyrene, Malacite green, stilbene, Lucifer Yellow, Cascade Blue, Texas Red, and others described in the 1 lth Edition of the Molecular Probes Handbook by Richard P. Haugland, hereby expressly incorporated by reference in its entirety. Commercially available fluorescent nucleotide analogues readily incorporated into the labeling oligonucleotides include, for example, Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5-dUTP (GE Healthcare), fluorescein-12-dUTP, tetramethylrhodamine-6-dUTP. Texas Red®-5-dUTP, Cascade Blue®-7-dUTP, BODIPY® FL-14-dUTP, BODIPY®R-14-dUTP, BODIPY® TR-14-dUTP, Rhodamine Green™-5-dUTP, Oregon Green® 488-5-dUTP, Texas Red®-12-dUTP, BODIPY® 630/650-14-dUTP, BODIPY®650/665-14-dUTP, Alexa Fluor® 488-5-dUTP, Alexa Fluor® 532-5-dUTP, Alexa Fluor® 568-5-dUTP, Alexa Fluor® 594-5-dUTP, Alexa Fluor® 546-1 4-dUTP, fluorescein-12-UTP, tetramethylrhodamine-6-UTP, Texas Red®-5-UTP, Cascade Blue®-7-UTP, BODIPY® FL-14-UTP, BODIPY® TMR-14-UTP, BODIPY® TR-14-UTP, Rhodamine Green™-5-UTP, Alexa Fluor® 488-5-UTP, and Alexa Fluor® 546-14-UTP (Invitrogen). Other fluorophores available for post-synthetic attachment include, inter alia, Alexa Fluor® 350, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor®647, BODIPY 493/503, BODIPY FL, BODIPY R6G, BODIPY 530/550, BODIPY TMR, BODIPY 558/568, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589,
BODIPY 581/591, BODIPY 630/650, BODIPY 650/665, Cascade Blue, Cascade Yellow, Dansyl, lissamine rhodamine B, Marina Blue, Oregon Green 488, Oregon Green 514, Pacific Blue, rhodamine 6G, rhodamine green, rhodamine red, tetramethylrhodamine, Texas Red (available from Invitrogen), and Cy2, Cy3.5, Cy 5.5, and Cy7 (GE Healthcare). - In one embodiment, multiplex detection formats are used for base detection or sequencing. Examples of multiplex formats that can be used include, but are not limited to, either labeled/tagged bead sets (e.g., those produced by Luminex), in which each label is assigned to the individual probe-specific primer, or oligonucleotide arrays on slides, in which specific oligonucleotide spot/position is assigned to the individual probe-specific primer. The limited sequence complexity of the recovered target-specific probes can provide conditions for easier and higher level multiplexing, especially using with universal and Zip-code/ID sequence tags. After the hybridization of the primers to the target-probe complex, the primers can be extended by a nucleotide polymerase. In certain embodiments, the polymerase is selected from an RNA polymerase and a reverse transcriptase.
- Where an array is utilized, the detection phase of the process may involve scanning and identifying target polynucleotide sequences in the test sample. Scanning can be carried out by scanning probe microscopy (SPM) including scanning tunneling microscopy (STM) and atomic force microscopy (AFM), scanning electron microscopy, confocal microscopy, charge-coupled device, infrared microscopy, electrical conductance, transmission electron microscopy (TEM), and fluorescent or phosphor imaging, for example fluorescence resonance energy transfer (FRET). Optical interrogation/detection techniques include but are not limited to near-field scanning optical microscopy (NSOM), confocal microscopy and evanescent wave excitation. More specific versions of these techniques include far-field confocal microscopy, two-photon microscopy, wide-field epi-illumination, and total internal reflection (TIR) microscopy. Many of the above techniques can also be used in a spectroscopic mode. The actual detection means include charge coupled device (CCD) cameras and intensified CCDs, photodiodes and photomultiplier tubes. These methods and techniques are well-known in the art. Various detection methods are disclosed in U.S. Patent Application Publication No. US 2004/0248144, which is herein incorporated by reference.
- For multicolor imaging, signals of different wavelength can be obtained by multiple acquisitions or by simultaneous acquisition by splitting the signal, using RGB detectors or analyzing the whole spectrum (Richard Levenson. Cambridge Healthtech Institutes, Fifth Annual meeting on Advances in Assays. Molecular Labels, Signaling and Detection, May 17-18th Washington D.C.). Several spectral lines can be acquired by the use of a filter wheel or a monochrometer. Electronic tunable filters such as acoustic-optic tunable filters or liquid crystal tunable filters can be used to obtain multispectral imaging (e.g. Oleg Hait, Sergey Smirnov and Chieu D. Tran, 2001, Analytical Chemistry 73: 732-739). An alternative method to obtain a spectrum is hyperspectral imaging (Schultz et al., 2001, Cytometry 43:239-247).
- Methods and apparatus for signal detection and processing of intensity data are disclosed in, for example, U.S. Pat. Nos. 5,143,854, 5,547,839, 5,578,832, 5,631,734, 5,800,992, 5,834.758; 5,856,092, 5,902,723, 5,936,324, 5,981,956, 6,025,601, 6,090,555, 6,141,096, 6,185,030, 6,201,639; 6,218,803; and 6,225,625, 7,689,022 and in WO99/47964, each of which also is hereby incorporated by reference in its entirety for all purposes. Fluorescence imaging and software programs or algorithms for DNA sequence analysis and read interpretation are known to one of ordinary skill in the art and are disclosed in Harris T D, et al. “Single-Molecule DNA Sequencing of a Viral Genome”
Science 4 Apr. 2008: Vol. 320, no. 5872, pp. 106-109, which is herein incorporated by reference in its entirety. In one embodiment. Phred software is used for DNA sequence analysis. Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. Phred is a widely-used program for base calling DNA sequencing trace files. Phred can read trace data from SCF files and ABI model 373 and 377 DNA sequencer chromat files, automatically detecting the file format. After calling bases, Phred writes the sequences to files in either FASTA format, the format suitable for XBAP. PHD format, or the SCF format. Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence. The quality value is a log-transformed error probability, specifically Q=−10 log10(Pe) where Q and Pe are respectively the quality value and error probability of a particular base call. The Phred quality values have been thoroughly tested for both accuracy and power to discriminate between correct and incorrect base-calls. Phred can use the quality values to perform sequence trimming. - DNA polymerase based sequencing reactions generally possess efficiency problems. Native nucleotides can be incorporated at a relatively high efficiency, compared to reduced efficiency incorporation of non-native nucleotides, such as labeled nucleotides or reversible terminators. Thus, in a growing strand of a nucleotide extension reaction, the likelihood of elongation drops as a function of the extended length. Thus, even slight differences in single nucleotide incorporation efficiency can lead to significant differences, as the reaction proceeds. The reduced incorporation efficiency accounts for increased error rates and hence decreased sequence information quality along growing strands. The resulting sequence information consists of relatively short sequence reads that have been terminated due to unacceptably low correct sequence signal. The present invention provides methods and compositions to overcome these problems in sequencing reactions. A seed primer can be extended using high incorporation efficiency nucleotides, such as native nucleotides. Accordingly, a large population of templates can be primed further and further downstream to start a sequencing reaction, for example n bases downstream as compared to another sequencing primer. The sequencing reaction at the start position would start with a high overall efficiency and continue s bases, until the quality of the sequencing information drops below an acceptable level. Due to the initial n bases, sequence information can be obtained down to n+s bases on the target template. Sequencing primers of different length can thus provide sequencing information that ends n bases apart. By varying the length n of high efficiency extension reactions prior to sequencing, overlapping sequence information of high quality can be obtained from a single template. In various embodiments, a set of sequencing primers are used that start sequencing reactions less than 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200 or more bases apart. In some embodiments, sequence information for up to 500, 1000, 2000 or more bases are obtained. Methods described herein, allow for obtaining sequence information for up to 500, 1000, 2000 or more bases in over 80, 90, 95, 98, 99, 99.5, 99.9%, or more of the templates.
- In one embodiment, one detection cycle is performed by adding labeled A, C, G, T sequentially followed by washing and detecting after each addition. In one embodiment, multiple detection cycles can be performed using nucleotides with removable labels.
- In one embodiment, the series of incremental fragments are further extended (thus, serving as sequencing primer) for sequencing reactions to obtain the sequence information of the target molecules. The sequence information is a series fragment sequences that are adjacent on the target molecule, which can be assembled to obtain a long fragment or the full length sequence of the target molecule.
- In one embodiment of the present invention, serial sequencing of a target polynucleotide is converted to parallel sequencing to reduce the time required for sequencing a given number of bases of the target polynucleotide.
- In one embodiment, a nucleic acid target is attached to a substrate or immobilized on a substrate. The substrate can be a bead, flat substrate, flow cell or other suitable surfaces. In one embodiment, the substrate comprises glass.
- In one embodiment, a target nucleic acid is attached or immobilized to a substrate via a capture probe. A capture probe is an oligonucleotide that is attached to the surface of a substrate and is capable to bind to a sequencing template. Capture probes can be of various lengths, such as from 18 bases to 100 bases, such as 20 bases to 50 bases.
- In one embodiment, the capture probe has a sequence that is complementary to the sequencing template. For example, if the present method is used to sequence a genome with at least partial sequence known already, capture probes can be designed to complement to the known sequences. In one embodiment, the capture probes are complementary to “barcode” or “identifier” sequence added to the sequencing templates via, e.g., specific ligation, as a part of the primer for PCR reaction. In such reactions, a sequencing template-specific primer and a primer comprising a unique barcode are used for the amplification, thus all the target molecules with the same sequences have the same barcode attached.
- The capture probe can be attached to the substrate at either the 5′ end or the 3′ end. In some embodiments, the capture probe is attached to the substrate at the 5′ end, and the 3′ end of the capture probe can be extended by the incorporation of nucleotides as described herein to generate incremental extension fragments which can in turn be sequenced by further incorporation of labeled nucleotides. In another embodiment, the capture probe is attached to the substrate at the 3′ end, and the 5′ end of the capture probe cannot be extended by the incorporation of nucleotides. A second probe (or sequencing primer) hybridizes to the sequencing template and its 3′ end is extended by the incorporation of nucleotides as described herein to generate an incremental extension fragment which can in turn be sequenced by further incorporation of labeled nucleotides. In this case, the extension is towards the direction of the capture probe. In general, the sequencing primer hybridizes to a linker introduced to the end of the sequencing template when generated, either directly from a genomic DNA or from a parent target molecule. Thus a seed/sequencing primer that is a “universal primer” can be used to sequence different target molecules. In one embodiment, sequencing primers specific to the target molecule are used.
- In one embodiment, the capture probe is immobilized on a solid support before binding to the sequencing template. In one embodiment, the 5′ end of a capture probe is attached to a solid surface or substrate. A capture probe can be immobilized by various methods known in the art including, without limitation, covalent cross-linking to a surface (e.g., photochemically or chemically), non-covalent attachment to the surface through the interaction of an anchor ligand with a corresponding receptor protein (e.g. biotin-streptavidin or digoxigenin-anti-digoxigenin antibody), or through hybridization to an anchor nucleic acid or nucleic acid analog. The anchor nucleic acid or nucleic acid analog have sufficient complementarity to the sequencing template (i.e., the formed duplex has sufficiently high Tm) that the anchor-sequencing template-probe complex will survive stringent washing to remove unbound targets and probes, but they do not overlap with the target site that is complementary to the probe antisense sequence.
- In one embodiment, a capture template or target nucleic acid is used as a template for bridge amplification. In such embodiments, two or more different immobilized probes are used. In some cases, single molecule templates are used to generate clusters of nucleic acids on a substrate by bridge amplification. In one embodiment, each of the clusters of nucleic acids contains substantially the same (>95%) type of nucleic acids because they are derived from a single template nucleic acid. These clusters are typically referred to as single molecule clusters. Such substrates with single molecular clusters can be produced using, for example, the method described in Bently et al., Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456, 53-59 (2008), incorporated herein by reference, or using commercially available kit and instrument from, for example, Illumina, Inc. (San Diego, Calif.).
- Another method for generating suitable nucleic acids for sequencing is described in Church et al., US Patent Application Publication No. US20090018024 A1, incorporated herein by reference. Additional exemplary methods for generating a suitable template for sequencing include emulsion PCR with DNA capture, with beads that are used to create random arrays (commercially available from, for example, Life Technologies, Inc.) or nanoballs created after rolling circle amplification of constructs that contact target molecules and deposition on patterned arrays (commercial service using the technology is available from, for example, Complete Genomics, Inc.).
- The solid substrate can be made of any material to which the molecules can be bound, either directly or indirectly. Examples of suitable solid substrates include flat glass, quartz, silicon wafers, mica, ceramics and organic polymers such as plastics, including polystyrene and polymethacrylate. The surface can be configured to act as an electrode or a thermally conductive substrate (which enhances the hybridization or discrimination process). For example, micro and sub-micro electrodes can be formed on the surface of a suitable substrate using lithographic techniques. Smaller nanoelectrodes can be made by electron beam writing/lithography. Electrodes can also be made using conducting polymers which can pattern a substrate by ink-jet printing devices by soft lithography or be applied homogenously by wet chemistry. TnO2 coated glass substrates are available. Electrodes can be provided at a density such that each immobilized molecule has its own electrode or at a higher density such that groups of molecules or elements are connected to an individual electrode. Alternatively, one electrode may be provided as a layer below the surface of the array which forms a single electrode. The solid substrate may optionally be interfaced with a permeation layer or a buffer layer. It is also possible to use semi-permeable membranes such as nitrocellulose or nylon membranes, which are widely available. The semi-permeable membranes can be mounted on a more robust solid surface such as glass. The surface layer may comprise a sol-gel. The surfaces may optionally be coated with a layer of metal, such as gold, platinum or other transition metal. A particular example of a suitable solid substrate is the commercially available SPR BIACore™ chip (GE Healthcare). Heaton et al., 2001 (PNAS 98:3701-3704) have applied an electrostatic field to an SPR surface and used the electric field to control hybridization.
- The solid substrate is generally a material having a rigid or semi-rigid surface. In one embodiment, at least one surface of the substrate is substantially flat, although in some embodiments it may be desirable to physically separate discrete elements with, for example, raised regions or etched trenches. For example, the solid substrate may comprise nanovials-small cavities in a flat surface e.g. 10 μm in diameter and 10 μm deep. Other formats include but are not limited to synthetic or natural beads, membranes or filters, slides including microarray slides, microtiter plates, microcapillaries, and microcentrifuge tubes.
- In one embodiment, oligonucleotide capture probes are coated or attached onto beads for capturing the sequencing templates. Hybridization between capture probes and sequencing template polynucleotides can be carried out on beads in columns at a controlled temperature and salt concentration. The hybridization products can be eluted from the beads with moderate pressure.
- The use of a solid support with an array of capture oligonucleotides is disclosed in U.S. Pat. No. 6,852,487, which is hereby incorporated by reference.
- Loading of nucleic acids onto these substrates can be modulated and/or controlled by the flow and/or electrical forces, including diffusion forces and surface forces exerted by areas of differential charge and/or hydrophobicity. The number of nucleic acids applied to the substrate (i.e., with a loading buffer or other solution) can be adjusted to assure maximal occupancy of the linear features with non-overlapping nucleic acid molecules and thus minimize the number of empty linear features on the substrate. In an exemplary embodiment, at least 50% of the linear features of a substrate are occupied by at least one nucleic acid molecule. In a further embodiment, at least 60%, 70%, 80%, 90%, and 95% of the linear features are occupied by one or more nucleic acids.
- Two exemplary approaches of laying probes are disclosed herein below for illustrative purposes. The first approach is in situ oligonucleotide synthesis in which the probes are in known geographic locations in the X-Y coordinate plane. In one embodiment, the oligonucleotide probe is synthesized on the surface. Examples of technologies that allow on-surface oligo synthesis include but are not limited to photolithography and ink jet. In another embodiment, the pre-synthesized oligonucleotide probes are spotted onto the surface. Various microarray protocols, for example, protocol for Agilent inkjet-deposited pre-synthesized oligo arrays are known to one skilled in the art.
- Polymers such as nucleic acids or polypeptides can be synthesized in situ using photolithography and other masking techniques whereby molecules are synthesized in a step-wise manner with incorporation of monomers at particular positions being controlled by methods of masking techniques and photolabile reactants. For example, U.S. Pat. No. 5,837,832 describes a method for producing DNA arrays immobilized to silicon substrates based on very large scale integration technology. In particular, U.S. Pat. No. 5,837,832 describes a strategy called “tiling” to synthesize specific sets of probes at spatially-defined locations on a substrate. U.S. Pat. No. 5,837,832 also provides references for earlier techniques that can also be used. Light directed synthesis can also be carried out by using a Digital Light Micromirror chip (Texas Instruments) as described (Singh-Gasson et al., (1999) Nature Biotechnology 17:974-978). Instead of using photo-deprotecting groups which are directly processed by light, conventional deprotecting groups such as dimethoxytrityl can be employed with light directed methods where, for example, a photoacid molecule bearing a chromophore capable of receiving UV radiation is generated in a spatially addressable way which selectively deprotects the DNA monomers (McGall et al PNAS 1996 93: 1355-13560; Gao et al J. Am. Chem Soc. 1998 120: 12698-12699). Electrochemical generation of acid is another method that can be used in the subject methods of the present invention.
- The in situ arrays can have about 1 to 10, 10 to 0.100 to 1000, or 1.000 to 100,000,000 probes. The in situ arrays can have more than 100,000.000 array probes. In one embodiment, the in situ array carries approximately 200,000,000 probes.
- Molecules that can be immobilized in the array include nucleic acids such as DNA and analogues and derivatives thereof, such as PNA. Nucleic acids can be obtained from any source, for example genomic DNA or cDNA or synthesized using known techniques such as step-wise synthesis. Nucleic acids can be single or double stranded. DNA nanostructures or other supramolecular structures can also be immobilized. Other molecules include but are not limited to compounds joined by amide linkages such as peptides, oligopeptides, polypeptides, proteins or complexes containing the same; defined chemical entities, such as organic molecules; conjugated polymers and carbohydrates or combinatorial libraries thereof.
- In one embodiment, the biotinylated beads are used to anchor the target sequence and the sequencing is carried out by performing the base incorporation in the bead system.
- In another embodiment, a “chip” is a substrate for immobilizing or attached a target. The geometric design of the chip can vary. For example, the chip can be a tube with the usable surface inside. Chips can be in flow cell format to facilitate liquid handling. In one embodiment, the chips are allele specific sequencing chips as disclosed in PCT/US2010/048526, herein is incorporated by reference.
- In one embodiment, the chip is a membrane multichip. A multilayered substrate with holes (e.g. 1 micron to 50 micron) is generated. Target molecules are loaded into the holes with some holes containing a single molecule target. Targets are amplified within holes. The layers are peeled off. Each layer has some molecules attached to the holes. The layers are substantially similar in terms of molecules (copies of each other). These layers can be directly used or transferred to a suitable sequencing substrate for sequencing.
- Other chips can also be used in the present invention, include but are not limited to photo cleavable oligo multichip, multilayer substrates with holes, and nanoprinting chip.
- In one embodiment, the biotinylated beads are used to anchor the target sequence and the sequencing is carried out by performing the base incorporation in the bead system.
- An immobilized or attached target nucleic acid can then be hybridized with a primer (or multiple primers). Polymerase in its suitable buffer is then added to make contact with the immobilized or attached template or target nucleic acid. The primer can be used directly as a sequencing primer or can be used as a seed primer to generate primer extension products of various lengths. These primer extension products can further be used as sequencing primers in a sequencing reaction. Primer extension reactions are discussed in further detail elsewhere herein. A controlled extension reaction may be chosen to generate primer extension products. The buffer may contain a set of nucleotides (1-3 nucleotides of the four possible nucleotides) or the set of nucleotides can be added later to start the reaction. After a suitable amount of time (such as approximately, 5, 10, 15, 20, 25, or 30 to 90 second for native bases), the buffer solution is removed and the immobilized template is washed to remove the nucleotides. Optionally, nucleotide degrading enzymes such as apyrase or alkaline phosphatase are added into the reaction buffer at the end of the reaction and/or in the washing solution to minimize contamination of the next round of extension with nucleotides from the previous extension.
- In some embodiments, primer extension is performed using a pulse method, such as described herein. In some embodiments, the immobilized template is contacted with a multi-enzyme buffer that contains a polymerase (such as Klenow exo(−) for DNA sequencing), one or several nucleotide degrading enzymes such as apyrase, alkaline phosphatase. Optionally, an inorganic pyrophosphatase is added to degrade pyrophosphate generated by polymerase reaction. Sets of nucleotides are successively added to the reaction buffer at interval of 30-90 seconds (preferably 30 seconds). Nucleotides are utilized by the polymerase for polymerase reaction and at the same time, are degraded by apyrase or alkaline phosphatase.
- For sequencing multiple target polynucleotides (or fragments of a single large polynucleotide target), a large number of different target polynucleotides or its fragments can be immobilized on a substrate. Such a substrate is replicated many times to produce a set of the substrates.
- In one embodiment, a plurality of target nucleic acids or templates are immobilized on substrates and each template cluster is originated from a single molecule (see for example, Bentley et al., Nature 456, 53-59, (2008) and its supplement, incorporated herein by reference in its entirety). Because the location of the template cluster are known, a first sequence from the first round of sequencing and second sequence from a second round of sequencing for the same template can be readily determined.
- In one embodiment, parallel sequencing is performed. In parallel sequencing, commonly referred to as next generation sequencing, millions or more template (clusters) are sequenced simultaneously often with a single primer. In one embodiment, nucleotide addition is optimized to control primer extension length.
- In another embodiment, a fixed sequence of nucleotide addition such as step one: dATP, dCTP, dGTP; step two, dCTP, dGTP, dTTP; step three: dGTP, dTTP, dATP; step four; dTTP, dATP, dCTP; step five: dATP, dCTP, dGTP, and so forth, is used to control the length of the primer extension. Because template sequences vary, the resulting extended primer length varies.
- In one embodiment, multiple targets such as 10.000, 100,000, 1 million, 10 million, or 100 million sequences or targets are sequenced simultaneously. Thus, for each substrate, there are a plurality of capture sites with each capture sites have different capture probes that recognize different targets (sequencing templates). If the targets are fragments of a longer sequence, contigs can be assembled to obtain the longer sequence, such as the whole genome sequence. In general, multiple target sequencing is typically done in chip format, but it can be performed in bead format as well.
- In one embodiment, the chip comprises random clusters started with single molecules (such as Illumina flow cells). The molecular clones of target molecules can be printed to many substrates to create replicate substrates for sequencing. In one embodiment, the chips are duplicating chips by nylon membrane impression and printing or other methods known in the art.
- In another aspect, the present invention provides a system for sequencing. In some embodiments, one or more methods of sequencing disclosed herein are performed by a system, such as an automated sequencing system instrument controlled by a user (e.g., as schematically depicted in
FIG. 7 ). In one embodiment, the user controls a computer which may operate various instrumentation, liquid handling equipment or analysis steps of the invention. In one embodiment, a computer controlled collection, handling, or analysis system is used to control, activate, initiate, continue or terminate any step or process of the methods as herein described. In one embodiment, a computer device is used to control, activate, initiate, continue or terminate the handling and/or movement of fluids or reagents into and through the system or device as herein described, the handling or movement of one or more reagents to one or more chambers or plurality of chambers in one or more cartridges, the obtaining or analysis of data, etc. In one embodiment, chips of the sequencing reaction are placed in one or more chambers/flow cells or plurality of chambers/flow cells in one or more cartridges. The chips may comprise substrates which provide sites for the sequencing reactions. - In one embodiment, the computer is any type of computer platform such as a workstation, a personal computer, a server, or any other present or future computer. The computer typically includes known components such as a processor, an operating system, system memory, memory storage devices, and input-output controllers, input-output devices, and display devices. Such display devices include display devices that provides visual information, this information typically may be logically and/or physically organized as an array of pixels. In one embodiment, a graphical user interface (GUI) controller is included that comprises any of a variety of known or future software programs for providing graphical input and output interfaces. In one embodiment. GUI's provide one or more graphical representations to the user, and are enabled to process the user inputs via GUI's using means of selection or input known to those of ordinary skill in the related art.
- It will be understood by those of ordinary skill in the relevant art that there are many possible configurations of the components of a computer and that some components that may typically be included in a computer are not described, such as cache memory, a data backup unit, and many other devices. In the present example each execution core may perform as an independent processor that enables parallel execution of multiple threads.
- In one embodiment, the processor executes operating system, which is, for example, a WINDOWS™ type operating system (such as WINDOWS™ XP) from the Microsoft Corporation; the Mac OS X operating system from Apple Computer Corp. (such as 7.5 Mac OS X v10.4 “Tiger” or 7.6 Mac OS X v10.5 “Leopard” operating systems); a UNIX™ or Linux-type operating system available from many vendors or what is referred to as an open source; or a combination thereof. The operating system interfaces with firmware and hardware in a well-known manner, and facilitates processor in coordinating and executing the functions of various computer programs that may be written in a variety of programming languages. The operating system, typically in cooperation with the processor, coordinates and executes functions of the other components of computer. The operating system also provides scheduling, input-output control, file and data management, memory management, and communication control and related services, all in accordance with known techniques.
- In one embodiment, the system memory is of a variety of known or future memory storage devices. Examples include any commonly available random access memory (RAM), magnetic medium such as a resident hard disk or tape, an optical medium such as a read and write compact disc, or other memory storage device. Memory storage devices may be any of a variety of known or future devices, including a compact disk drive, a tape drive, a removable hard disk drive, USB or flash drive, or a diskette drive. Such types of memory storage devices typically read from, and/or write to, a program storage medium (not shown) such as, respectively, a compact disk, magnetic tape, removable hard disk. USB or flash drive, or floppy diskette.
- In one embodiment, a computer program product is described comprising a computer usable medium having control logic (computer software program, including program code) stored therein. The control logic, when executed by a processor, causes the processor to perform functions described herein. In other embodiments, some functions are implemented primarily in hardware using, for example, a hardware state machine. Implementation of the hardware state machine so as to perform the functions described herein will be apparent to those skilled in the relevant arts.
- In one embodiment, input-output controllers include any of a variety of known devices for accepting and processing information from a user, whether a human or a machine, whether local or remote. Such devices include, for example, modern cards, wireless cards, network interface cards, sound cards, or other types of controllers for any of a variety of known input devices. Output controllers of input-output controllers could include controllers for any of a variety of known display devices for presenting information to a user, whether a human or a machine, whether local or remote. In one embodiment, the functional elements of computer communicate with each other via system bus. Some of these communications may be accomplished in alternative embodiments using network or other types of remote communications.
- In one embodiment, applications communicate with, and receive instruction or information from or control one or more elements or processes of one or more servers, one or more workstations, and/or one or more instruments. In one embodiment, a server or computer with an implementation of applications stored thereon are located locally or remotely and communicate with one or more additional servers and/or one or more other computers/workstations or instruments. In one embodiment, applications are capable of data encryption/decryption functionality. For example, it may be desirable to encrypt data, files, information associated with GUI's or other information that may be transferred over network to one or more remote computers or servers for data security and confidentiality purposes.
- In one embodiment, applications include instrument control features, where the control functions of individual types or specific instruments such as a temperature controlling device, imaging device, or fluid handling system are organized as plug-in type modules to the applications. In one embodiment, the instrument control features include the control of one or more elements of one or more instruments that, for instance, include elements of a fluid processing instrument, temperature controlling device, or imaging device. In one embodiment, the instrument control features are capable of receiving information from the one or more instruments that include experiment or instrument status, process steps, or other relevant information. In one embodiment, the instrument control features are under the control of an element of the interface of the applications. In one embodiment, a user inputs desired control commands and/or receive the instrument control information via one of GUI's.
- In one embodiment, the automated sequencing system is controlled by a first user, conducts sequencing methods described herein, analyzes the raw data as described herein, assembles sequence reads as described herein, and then send the sequencing information to a remote second user at a location different from that of the first user.
- In one embodiment, identifying target polynucleotide sequence and integrating sequences to assemble genomic information is carried out with a computer. In one embodiment, the present invention encompasses a computer software or algorithm designed to analyze and assemble sequence information obtained via the methods of the present invention.
- In terms of sequence read interpretation for the in situ arrays, reads at array features correspond to X-Y coordinates that map to the loci of interest. A “read” typically refers to an observed sequence derived from raw data, such as the order of detected signals corresponding to the cyclical addition of individual nucleotides. In one embodiment, the reads are checked against the expected reference genome sequence at the 10-bp loci for quality control. A reference sequence enables the use of short read length. Reads that have passed the quality control check are then combined to generate a consensus sequence at each locus. In one example, there are 10 unique probes per locus of interest minus any reads that have failed the quality control checks.
- In terms of sequence read interpretation for the “lawn” approach, the reads are at random locations on a surface. e.g. a flow cell. In one embodiment, the reads are checked against the expected subset of reference genome sequence at the loci of interest for quality control. Reads that have passed the quality control check are mapped to the individual locus of interest. Reads corresponding to each locus are then combined to generate a consensus sequence. In one embodiment, there are more than 3,000 reads per 10-bp locus.
- In one embodiment, the present invention provides a method for obtaining the sequence information of the target molecules by assembling the sequence reads from each of the substrates. The sequence reads can be obtained by base extension of a series of polynucleotide with different lengths due to the different base extension of the same capture probe using the same target molecules, such as described above. As such, they represent continued fragments of the target molecule sequence and can be assembled to provide the continue sequence of the target molecule.
- A computer program can be used to track the sequence reads obtained from the same capture probes on different substrates for the assembly.
- In some embodiments, sequencing information originating from a single template is identified using a unique identifier of the template, such as the template location or a tag sequence. Overlapping sequence information can be stitched together to generate longer sequence information from a single template. In some embodiments, a template's complement is also sequenced. In some embodiments, sequence information is stitched together using sequence reads generated both from the template and its complement.
- The methods of the present invention provide several advantages. In one embodiment, the sequencing methods provided herein permit the use of unmodified nucleotide and enzymes, which utilize the natural nucleic acid synthesis chemistry. This not only reduces the cost, but also increases the accuracy because the high-fidelity chemistry generated by the evolution process.
- The sequencing method provided by the present invention can be used to sequence DNA/RNA. It can be used to sequence pathogens/microbial genomes to identify species/strains quickly. One advantage of the sequencing method provided by the present invention is that is can accommodate low efficiency sequencing chemistry (reversible terminators, ligations, etc.), thus reduces the time to sequence. In addition, the method can sequence very long fragments (e.g. 100-10000 base pairs or more).
- Furthermore, when loci- and allele-specific sequencing templates are used, they are SNP capable, and can carry multiple signal-reporting labels or ligands, providing for a higher level of multiplexing of diverse target sequences.
- Thus, the present invention can provide low-cost, high-throughput and accurate methods for sequencing target polynucleotides with long reads. In some embodiments, the long reads are assembled from sequencing reads obtained using available sequencing technologies discussed herein and assembled using the methods, compositions, and systems of the inventions.
- The sequencing methods of the present invention can be multiplexed to a very high degree. In one embodiment, samples can comprise pooled genomes of target and control subject populations respectively. Populations can be of any sex, race, gender or age. Populations can also include animal subjects, particularly mammalian subjects such as dog, cat, horse, mouse, rat, etc., screened for veterinary medicine or pharmaceutical drug development purposes.
- In some embodiments, the target polynucleotide is DNA, for example DNA composing at least 50% of a genome of an organism. Some embodiments further comprise identifying and/or counting a gene sequence of more than one cell, and correlating sequence information from the various cells. Such embodiments find application in medical genetics. Other embodiments compare DNA sequences of normal cells to those of non-normal cells to detect genetic variants. Identification of such variants finds use in diagnostic and/or prognostic applications.
- In some embodiments, enumeration may determine changes in gene number, indicating, for example that a gene appears three times instead of two times (as in a trisomy) or a gene fails to appear (such as a homozygous deletion). Other types of allelic loss and changes change in diploidy may also be determined, including changes related to, for example, a somatic recombination, a translocation, and/or a rearrangement, as well as a sporadic mutation.
- Such embodiments find use in diagnostic and prognostic applications, also featured in the present invention. For example, a homozygous deletion may indicate certain forms of cancer. It will be appreciated by those of skill in the art that other diseases, disorders, and/or conditions may also be identified based on recognized changes in diploidy. For example, three copies of
chromosome 21 genes can indicatetrisomy 21, associated with Down syndrome. - Detection of Genetic Variants
- Methods of the present invention allow rapid analysis of DNA sequences at the single molecule level, lending themselves to applications relying on detailed analysis of individual sequences. Additional aspects of the present invention include such applications.
- For example, certain embodiments provide for SNP detection, by identifying incorporation of a single nucleotide into a complementary strand of a target polynucleotide sequence at the site of a known SNP. Any of the variations, embodiments, and/or aspects of the present invention may be used for such SNP detection. Such methods can also be used to identify other variants due to point mutations, including a substitution, frameshift mutation, an insertion, a deletion, and inversion, a missense mutation, a nonsense mutation, a promoter mutation, a splice site mutation, a sporadic mutation and the like.
- Moreover, the invention also features methods of diagnosing a metabolic condition, a pathological condition, a cancer and other disease, disorder or condition (including a response to a drug) by identifying such genetic variants. For example, a known wild type versus a known variant can be distinguished using the methods described herein. Whether a target polynucleotide exhibits the wild type or variant sequence can readily be determined by the methods of the present invention. Furthermore, the long sequence information originating from single templates can provide haplotyping information that is otherwise difficult to obtain. The haplotyping information linking two or more loci, can be used in genetic analysis.
- Certain embodiments provide for detection of additional genetic variants, by identifying incorporation of more than one nucleotide into a complementary strand of a target polynucleotide sequences, either at substantially known regions of variation or at substantially unknown regions. Any of the variations, embodiments, and aspects of the present invention may be used for such detection.
- Comparison of sequences from more than one individual allows identification of genetic variants, including substitutions, frameshift mutations, insertions, deletions, inversions, missense mutations, nonsense mutations, promoter mutations, splice site mutations, sporadic mutations, a duplication, variable number tandem repeats, short tandem repeat polymorphisms, and the like.
- In another embodiment, the sequencing method provided herein use single molecule counting for accurate analysis of allele frequencies and/or haplotype frequencies. Since more than a single site on each molecule can be probed, haplotype information can be easily determined. In another embodiment, the present methods and systems disclosed herein can be used to obtain haplotype frequencies. Such methods can be applicable to association studies, where genotype frequencies (such as SNP frequencies) are correlated with diseases in a population. The expense of single SNP typing reactions can be prohibitive when each study requires the performance of millions of individual reactions; the present invention permits millions of individual reactions to be performed and analyzed on a single array surface.
- In one embodiment, the sequencing methods provided herein are used for identifying high value polymorphisms located in regulatory elements and coding regions for a number of drug metabolizing enzyme and transporter (DMET) genes. In one embodiment, information on the expression of DMET genes provides information on the absorption, distribution, metabolism, and excretion profiles of a drug. In one embodiment, the methods of the present invention provide for information collected on the complex transcriptional responses to various drugs and subsequent prediction of physiological effects is important for the development of effective therapeutics. In one embodiment, the sequencing methods provided herein are used to draw links between gene expression profiles and physiological effects. Physiological effects can include a subjects' likely response to a drug candidate.
- A wide variety of diseases can be detected by the process of the present invention. In one embodiment, the sequencing methods provided herein are used for detecting infectious diseases. Infectious diseases can be caused by a pathogen, such as a bacterial, viral, parasitic, or fungal infectious agent. In one embodiment, resistance of various infectious agents to drugs is determined using the methods of the present invention.
- In one embodiment, the sequencing methods provided herein are used to sequence pathogens/microbial. In one embodiment, the sequencing methods provided herein are used to identify species/strains. In one embodiment, the sequencing methods provided herein are used to sequence pathogens/microbial and to identify species/strains.
- For example, the sequencing method provided herein can be used for detecting one or more microbes. Detection of a microbe can be by sequencing PCR products from a microbe, such as a virus or bacteria. For example, a viral or bacterial PCR product can be hybridized with 5′-3′ chips (direct sequencing) or 3′-5′ chips (requires additional sequencing primer). In one embodiment, approximately 20-50 bases or longer sequencing is used, to detect a microbe. In one embodiment, about 10-20 chips, wherein a chip density of 10 k can produce approximately 200 k to 500 k base sequence, is used.
- The invention also provides methods of diagnosing a metabolic condition, a pathological condition, a cancer, and/or other disease, disorder or condition (including a response to a drug) by identifying such genetic variants. In one embodiment, detection is carried out by prenatal or post-natal screening for chromosomal and genetic aberrations or for genetic diseases. In some embodiments, an identified sequence variant indicates a disease or carrier status for a genetic condition. Examples of detectable genetic diseases include, but are not limited to, 21 hydroxylase deficiency, adenomatous polyposis coli, adult polycystic kidney disease, α1-antitrypsin deficiency, cystic fibrosis, familial hypercholesterolemia, Fragile X Syndrome, hemochromatosis, hemophilia A, hereditary nonpolyposis colorectal cancer, Marfan syndrome, myotonic dystrophy,
neurofibromatosis type 1, osteogenesis imperfecta, retinoblastoma, Turner Syndrome, Duchenne Muscular Dystrophy, Down Syndrome or other trisomies, heart disease, single gene diseases, HLA typing, phenylketonuria, sickle cell anemia, Tay-Sachs Disease, thalassemia, Klinefelter Syndrome. Huntington Disease, autoimmune diseases, lipidosis, obesity defects, hemophilia, inborn errors of metabolism, diabetes, as well as cleft lip, club foot, congenital heart defects, neural tube defects, pyloric stenosis, alcoholism, Alzheimer disease, bipolar affective disorder, cancer, diabetes type I, diabetes type II, heart disease, stroke, and schizophrenia. - In one embodiment, the sequencing methods provided herein are used to detect a cancer or for performing genetic cancer research, where sequence information from a cancer cell is correlated with information from a non-cancer cell or with another cancer cell in a different stage of cancer. In certain embodiments, sequence information may be obtained, for example, for at least about 10 cells, for at least about 20 cells, for at least about 50 cells, for at least about 70 cells, and for at least about 100 cells. Cells in different stages of cancer, for example, include a colon polyp cell vs. a colon cancer cell vs. a colon metastasizing cell from a given patient at various times over the disease course. Cancer cells of other types of cancer may also be used, including, for example a bone cancer, a brain tumor, a breast cancer, an endocrine system cancer, a gastrointestinal cancer, a gynecological cancer, a head and neck cancer, a leukemia, a lung cancer, a lymphoma, a metastases, a myeloma, a pediatric cancer, a penile cancer, a prostate cancer, a sarcoma, a skin cancer, a testicular cancer, a thyroid cancer, and a urinary tract cancer. In one embodiment, detection of a cancer involves detection of one or more cancer markers. Examples of cancer markers include, but are not limited to, oncogenes, tumor suppressor genes, or genes involved in DNA amplification, replication, recombination, or repair. Specific examples include, but are not limited to, BRCA1 gene, p53 gene, APC gene, Her2/Neu amplification, Bcr/Abl, K-ras gene, and human papillomavirus Types 16 and 18. The sequencing methods provided herein can be used to identify amplifications, large deletions as well as point mutations and small deletions/insertions or other mutations of genes in the following human cancers: leukemia, colon cancer, breast cancer, lung cancer, prostate cancer, brain tumors, central nervous system tumors, bladder tumors, melanomas, liver cancer, osteosarcoma and other bone cancers, testicular and ovarian carcinomas, head and neck tumors, and cervical neoplasms.
- For example, to screen for a cancer marker, the genomic DNA from subject can be prepared as a sequencing template and can be allowed to bind a capture probe fixed to a substrate. In this example there can be multiple substrates each with the same capture probe wherein each substrate can then be exposed to an identical version of the sequencing template. After removal of any unbound sequencing template, the arrays, or chips, are then subjected to incremental base extension. The capture probes can serve as a primer and specifically bind to a region of the sequencing template near a location that can be use for detecting a relevant distinction indicating a disease. In the case of cancer and screening Bcr/Abl, the capture probes can bind in close proximity to the expected translocation site. Incremental extensions of the bases can reveal whether or not the sequencing template contains DNA from only one gene in the region of interest or that from a translocated gene region. After reading the results from step-wise hybridization events across the multiple chips, and processing the raw data, once can then determine if a subject's DNA has a Bcr/Abl translocation, and therefore detect the presence of a genetic sequence indicative of cancer.
- In one embodiment, the sequencing methods of the present invention are used for environmental monitoring. Environmental monitoring includes but is not limited to detection, identification, and monitoring of pathogenic and indigenous microorganisms in natural and engineered ecosystems and microcosms such as in municipal waste water purification systems and water reservoirs or in polluted areas undergoing bioremediation. In one embodiment, the methods of the present invention are used to detect plasmids containing genes that can metabolize xenobiotics, to monitor specific target microorganisms in population dynamic studies, or either to detect, identify, or monitor genetically modified microorganisms in the environment and in industrial plants.
- In one embodiment, the sequencing methods provided herein are used in a variety of forensic areas. Examples of forensic areas include, but are not limited to, human identification for military personnel and criminal investigation, paternity testing and family relation analysis, HLA compatibility typing, and screening blood, sperm, and transplantation organs for contamination.
- In the food and feed industry, the present invention has a wide variety of applications. In one embodiment, the sequencing methods provided herein are used for identification and characterization of production organisms. Examples of production organisms include, but are not limited to, yeast for production of beer, wine, cheese, yogurt, and bread. In one embodiment, the methods of the present invention are used for quality control and certification of products and processes (e.g., livestock, pasteurization, and meat processing) for contaminants. In one embodiment, the sequencing methods provided herein are used for characterization of plants, bulbs, and seeds for breeding purposes, identification of the presence of plant-specific pathogens, and detection and identification of veterinary infections.
- In some embodiments, the target polynucleotide is RNA, and/or cDNA copies corresponding to RNA. In some embodiments, the RNA includes one or more types of RNA, including, for example, mRNA, tRNA, rRNA, and snRNA. In some embodiments, the RNA comprises RNA transcripts.
- Some embodiments use a primer that hybridizes to the target polynucleotide whose complementary strand is to be synthesized. In some of those embodiments, the primer used comprises a polyT region and optionally, a region of degenerate nucleotides. This facilitates identification and/or counting of random mRNA sequences in eukaryotic cells, as the polyT can hybridize to the polyA region of the mRNA and the degenerate nucleotides can hybridize to corresponding random sequences. Incorporation of degenerate nucleotides into seed primers also avoids sequencing the polyA tail itself while taking advantage of a universal seed primer for primer extension.
- In some embodiments, the RNA comprises RNA molecules from a cell, from an organelle, and/or from a microorganism. The number of RNA molecules may be about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1,000, about 2,000, about 3,000, about 4,000, about 5000, about 6,000, about 7,000, about 8,000, about 9,000, about 10,000, up to an including all of the RNA molecules in the cell, organelle, and/or microorganism. Some embodiments comprise identifying/sequencing and/or counting RNA molecules from more than one cell, organelle, and/or microorganism. A histogram of the copy numbers of various types of RNA molecules identified can be constructed for different cells, organelles and/or microorganisms, and used to compile transcriptional patterns of RNA complements for each analyzed cell. The different cells, organelles, and/or microorganisms may be in different states, e.g. a diseased cell vs. a normal cell; or at different stages of development, e.g. a totipotent cell vs. a pluripotent cell vs. a differentiated cell; or subjected to different stimuli, e.g. a bacterial cell vs. a bacterial cell exposed to an antibiotic. In some embodiments, the methods can detect any statistically significant difference in copy numbers between cells, organelles, and/or microorganisms.
- The invention also features an approach to annotating genomes based on counting and identifying RNA transcripts. The identified transcripts indicate, for example, how sequenced genes are actually transcribed and/or expressed. By comparing the analyzed sequence of an identified transcript to one or more predicted expressed sequences, the prediction can be confirmed, modified, or refuted, providing a means to annotate genomes.
- Still another feature of the present invention involves methods of determining phylogenic relationships of various species. Such embodiments provide for compiling transcriptional patterns of cells from different species and analyzing the relationships amongst homologous transcripts. Such information finds use in determining evolutionary relationships amongst species.
- Another feature of the present invention involves a method of determining a microorganism's response to various stimuli, for example, response when exposed to a drug or subjected to other treatment, such as being deprived of certain metabolites. In such embodiments, transcriptional patterns of a cell of the microorganism, for example a bacteria cell, can be compared before and after administration of the drug or other treatment.
- While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the present invention described herein may be employed in practicing the present invention. It is intended that the following claims define the scope of the present invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
- A sequencing template was immobilized on streptavidin coated beads via its 5′ biotin and was hybridized with a sequencing primer by incubating at 70° C. for 3 min., 55° C. for 15 min and 25° C. for 5 min. In a 50 μl reaction, 8 U Klenow exo(−), 65 mU of apyrase, 10 mU of inorganic pyrophosphatase, and 5 μg of single strand binding protein (SSB) were added. The extension reactions were carried out at room temperature. At one minute intervals, successive sets of nucleotides, each of 6.7 μM final concentration, were added to the reaction buffer with mixing. Three dark bases (native nucleotides) were added at each step as depicted in
FIG. 8 . After 5 step nucleotide additions as depicted inFIG. 8 , the beads were washed and a fresh reaction buffer with enzymes and SSB was added to the beads. After some nucleotide addition steps, for example, afterSteps FIG. 8 , in which the results are depicted inFIG. 3 , an aliquot of beads was taken out and treated with NaOH to release the extended primer. The extension products were examined using denaturing polyacrylamide gel and the signals were analyzed using ImageJ (available from the National Institute of Heath). A general schematic of the protocol is depicted inFIG. 9 . - The results of the extension products are depicted in
FIG. 10 . The largest band is the expected extension product. The primary product of the extension was as expected in length. Few smaller bands were detected, which may be products of incomplete incorporation and represented a small portion of the reaction products. TheStep 9 extension product of 85 base pairs (bp), which corresponds to the extension of 63 bp to the 22 bp primer, theStep 10 extension product of 98 bp, which corresponds to the extension of 76 bp to the 22 bp primer, and theStep 12 extension product of 124 bp, which corresponds to the extension of 102 bp to the 22 bp primer, are depicted inFIG. 11 . - A PCR product was used as a template in this Example. The PCR template was immobilized on streptavidin coated beads via its 5′ biotin and was hybridized with a sequencing primer by incubating at 70° C. for 3 min., 55° C. for 15 min and 25° C. for 5 min. In a 50 μl reaction, 8 U Klenow exo(−), 65 mU of apyrase, 10 mU of inorganic pyrophosphatase, and 5 μg of single strand binding protein (SSB) were added. The extension reactions were carried out at room temperature. At one minute intervals, successive sets of nucleotides, each at 6.7 μM final concentration, were added to the reaction buffer with mixing. Three dark bases were added at each step as depicted in
FIG. 8 . - The results of the extension products are depicted in
FIG. 11 . The largest band is the extension product. The primary product of the extension was as expected in length. Few smaller bands were detected, which may be products of incomplete incorporation and represented small portion of the reaction products. - The
Step 9 extension product of 85 base pairs (bp), which corresponds to the extension by 63 bp of the 22 bp primer, theStep 10 extension product of 98 bp, which corresponds to the extension by 76 bp of the 22 bp primer, and theStep 12 extension product of 124 bp, which corresponds to the extension by 102 bp of the 22 bp primer, are depicted inFIG. 11 . - Massive parallel sequencing following dark base +S extension was demonstrated using a sequencing flow cell with 8 lanes (commercially available from Illumina, San Diego, Calif.). Sequencing libraries prepared from genomic samples (including samples enriched for exon regions) were prepared and sequenced for 100 bases according to standard protocols using an Illumina HiScanSQ sequencer.
- All flow cell lanes were then stripped with 0.1N NaOH to remove sequencing extension products that are labeled with fluorescent signals. The resulting flow cell lanes were washed with saline-sodium citrate (SSC) washing solution. A sequencing primer (P1) was hybridized with sequencing templates still in the flow cell lanes for 30 minutes at 60° C. The flow cell lanes/channels were then washed with SSC.
- For
Lane 1, pre-incubation buffer with Klenow, NEB2, pyrophosphatase was loaded and kept for 1 minute. A dark base (+S) triplet solution with 13.4 μM each of dTTP, dGTP, and dCTP in buffer was loaded for one minute, then removed. An apyrase wash solution (1 mU/μl) was loaded into the lane and removed after three minutes. Another cycle of dark base extension was then employed. The sequence of dark base extension in terms of missing nucleotides was A, T, G, C, A, T, G, C, A, and T. A total of ten dark base extension steps were used, with the last missing nucleotide being dTTP. - For
Lane 3, pre-incubation buffer with Klenow, NEB2, pyrophosphatase and apyrase (1 mU/μl) was loaded and kept for 1 minute. A dark base triplet solution is spiked into the pre-incubation solution with 13.4 μM each of dTTP, dGTP, and dCTP. The mixed solution was loaded into the flow cell lane for one minute. Another cycle of dark base addition/extension was then employed. The sequence of dark base extension in terms of missing nucleotides was A, T, G. and C. A total of four dark base extension steps were used, with the last missing nucleotide being dCTP. - After dark base extension, the flow cell was then loaded to an Illumina HiScanSQ sequencer to sequence 25 bases (second sequencing). After the second sequencing, the flow cell lanes were striped again with 0.1 N NaOH and the striped nucleic acids were analyzed using a denaturing gel.
-
Lane 1 generated about 278 million base reads with about 11 million clusters passing filter.Lane 3 generated about 653 million base reads with about 25.6 million clusters passing filter. -
FIG. 12 shows the percent base calls per sequencing step forLane 1. As expected, 100% of the first base was called “T” as the last step of the dark base extension was a “missing T” step, as it is expected that the first base addition in the sequencer after the first base should be “T”. -
FIG. 13 shows the percent base calls per sequencing step forLane 3. Also as expected, 100% of the first base called was “C.” - The sequences from the seconding sequencing were matched with the sequences from the first sequencing as the templates were the same. Because there were alignment changes between the first and second sequencings (flow cell was removed from the sequencer for dark base extension), a search algorithm was used to match the sequences with a range of 150 units of x, y coordinates from the Illumina qseq files. One million passed filter sequences from lane one, second sequencing (25 bases long) were checked and 71.3% of the sequences matched part of the sequences from seconding sequencing (100 bases long). Similarly, one million passed filter sequences from lane three, second sequencing (25 base long) were checked and 76.56% of the sequences matched part of the sequences from second sequencing (100 bases long).
- The sequence match positions were also analyzed.
FIG. 14 shows that the distribution of dark base extensions in Lane 1 (10 steps) and Lane 3 (4 steps). These distributions agree with the expected distribution. Both the high exact sequence match and the correct distribution indicate that the sequence after dark extension worked reasonably well. - When 8.8 million sequences from
Lane 1 were checked to examine whether the actual dark extension match with expectations according to the sequences fromsequence 1, 98.2% of the dark base extension was found as expected. Among the 8.8 million sequences, 8.7 million sequences matched with the 10 step (ATGC cycle) dark base extension. An additional 5,673 sequences from second sequencing did not have first base calls. Assuming that the first base was “T” as expected for these sequences, they matched with the 10 step dark base extension. - Massively parallel sequencing following controlled extension was again demonstrated using an Illumina HiScanSQ sequencer. Eight genomic samples enriched for exon regions were used to prepare Illumina pair end sequencing library and sequenced for 75 bases per end (2×75 bases) according to a standard protocol based on Agilent and Illumina reagents and protocols. After the second end sequencing (read 2), lanes 1-6 and 8 were used for controlled extension using a cBot cluster generation system (Illumina), custom programmed by Centrillion Biosciences, Inc. to perform controlled extension with a custom assembled reagent kit.
- The cBot cluster generation system was reprogrammed to utilize a custom edited protocol to deliver nucleotide combinations at specified time intervals, as well as other reagents. After all lanes were stripped with 0.1N NaOH (120 μl) to remove sequencing extension products, an Illumina sequencing primer (SP2, 95 μL) was introduced into all lanes to hybridize to clusters of ssDNA template on the surface of the flow cell. Hybridization was performed for 15 min at 60° C., followed by slow cooling to 20° C. at a rate of 3° C./min.
- Controlled extension was accomplished by repeated introduction of unlabeled native nucleotide triplets (85 μL for 1 minute), followed by apyrase containing washing solution (120 μL for 2 minutes). Finally, a wash solution of NEB2 (120 μL, 1×) was pumped through the flow cell before proceeding to the following dark base extension step. For example,
Lane 4—(10 steps), nucleotide combinations were: —missing A, C, G, T, A, C, G, T, A, C;Lane 5—(16 steps)—missing A, C, G, T, A, C, G, T, A, C, A, C, G, T, A, C;Lane 6—(20 steps)—missing A, C, G, T, A, C, G, T, A, C, A, C, G, T, A, C, G, T, A, C; andLane 7—(0 steps)—control, sequencing primer only (no dark base extension). - After dark base extension, the flow cell was loaded to an Illumina HiScanSQ sequencer to sequence 75 bases (second sequencing).
-
Lane 4 generated about 1,927 million base reads with about 25.7 million clusters passing filter.Lane 5 generated about 1,324 million base reads with about 17.6 million clusters passing filter.Lane 6 generated about 884 million base reads with about 11.8 million clusters passing filter. - The sequences from the second sequencing were matched with the sequences from the second read of the first sequencing. Because the second sequencing was extended longer than the second read of the first sequencing, the sequences from the second sequencing may or may not overlap with the sequences from the second read of the first sequencing from the same cluster. The sequences from both sequencing runs were mapped to the human genome and a search algorithm was used to compare the mapping position on human chromosomes to determine if two sequences were from the same cluster based on their mapping positions. Because there were cluster alignment changes between the first and second sequencings (flow cell was removed from the sequencer for dark base extension), the search algorithm considered to match the sequences with a range of 600 units of x, y coordinates from the Illumina qseq files.
- One million passed filter sequences from
lane 4, second sequencing (75 bases long) were checked and 80.4% of the sequences mapped to the positions next to where the sequences from first sequencing (75 bases long) were mapped. Similarly, one million passed filter sequences fromlane 5, second sequencing (75 base long) were checked and 81.8% of the sequences mapped to the positions next to where the sequences from first sequencing (75 bases long) were mapped. Similarly, one million passed filter sequences fromlane 6, second sequencing (75 base long) were checked and 82% of the sequences mapped to the positions next to where the sequences from first sequencing (75 bases long) were mapped. - The sequence match positions were also analyzed.
FIG. 15 shows that the distribution of dark base extensions in Lane 4 (10 steps), Lane 5 (16 steps) and Lane 6 (20 steps). These distributions agree with the expected distribution. Both the high sequence mapping position match and the correct distribution indicate that the sequencing after dark extension worked reasonably well. - Complete genome sequencing offers a truly unbiased view of the genome. It allows the entire genetic code of an individual to be deduced all at once and reveals comprehensive genetic information in personal health care. For a rare genetic disease for which the underlying mutation is currently unknown, whole-genome sequencing may be the only feasible way to identify the causative variant. However, the high cost of whole genome sequencing still prohibits routine genetic screens in large populations of individuals.
- Next-generation sequencing (NGS) technologies represent major improvements in accuracy, read-length and cost. DNA sequencing-by-synthesis (SBS) technologies using a polymerase (Illumina, 454, Ion Torrent), and a ligase enzyme (Solid) have already been incorporated in several commercially available NGS platforms with significant success. Although the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that bases have been read sequentially, through iterative cycles of polymerase-mediated fluorescent-labeled nucleotide extensions or through successive fluorescent-labeled oligonucleotide ligation. Since fluorescently-labeled nucleotides are not native substrates of the polymerase, it is difficult for the reaction to achieve 100% completion. The cumulative effect of incomplete extensions at each step lead to dephasing that ultimately contributes to significant decreases in signal intensity in long reads. In addition, incomplete removal of terminating groups on labeled nucleotides can lead to further signal loss. In order to optimize the enzyme-substrate system current NGS platforms extensively rely on expensive proprietary enzymes, along with fluorescent nucleotides, optics, and instrumentation.
- These fundamental system requirements limit current platforms ability to increase read length while maintaining high read quality. +S™ technology, an implementation of some embodiments described above, overcomes this hurdle by resetting the sequencing chemistry using length-controlled extension. Consequently, regions of DNA template farther away from the sequencing primer could be reached via +S, effectively increasing the read length without the signal loss and quality reduction inherent in current NGS platforms. This example demonstrates that +S™ technology that employs controlled extension in addition to sequencing greatly improves sequencing quality for long reads.
- Library Preparation:
- Human DNA samples and E. coli (strain ATCC 11303) DNA sample were sheared using a Covaris protocol (Covaris, Inc., Woburn, Mass., USA) to desired length distribution. Resulting fragmented Human DNA samples were processed according to Agilent SureSelect™ Exome Protocols to prepare human exome libraries for sequencing. The resulting fragmented E. coli DNA was further separated using 2% Agarose gel and a band ranging 600 to 700 bp was excised. After DNA extraction, the sample was processed according Illumina TruSeq DNA Sample Preparation Guide to generate libraries for sequencing.
- Standard Illumina Cluster Generation and Pair-End Sequencing:
- Human Exome and E. coli libraries were quantified by qPCR, diluted to proper concentration and denatured with 0.1 N NaOH according to Illumina TruSeq cBot procedure. Denatured human libraries and the 1% E. coli Library were loaded into the cBot along with TruSeq PE Cluster v3 plate and a v3 Flow Cell. After completion of the cluster generation, the flow cell was loaded into HiScanSQ sequencer along with TruSeq SBS Kit v3 and multiplexing reagents. The sequencing run was executed using 2×100
TruSeq 3 Paired-End protocol and fully completed before any +S related steps were performed. - Flow Cell Preparation for +S:
- After the completion of the second 100 bp read of standard Illumina pair-
end sequencing lane 1 was immediately protected, and did not go through further processing (no +S steps). This lane preserved the conditions at the end of the second read, and would serve as a control representing continuation of Illumina sequencing beyond the 100 bp length. - On the other hand,
lane 2 andlane 3 of the flow cell were treated with 0.1 N NaOH (200 μL) to remove the synthesized strands which are not attached to the flow cell (i.e. the second 100 bp read). Thus, only single stranded template molecules attached to the flow cell remained. - A sequencing primer mix was prepared by adding Illumina multiplex read2 sequencing primer (PN 1005721) to a final concentration of 0.5 μM in hybridization mix (5×SSC, 0.05% Tween-20).
Lanes point lane 2 was also protected until further sequencing. - +S Extension:
-
Lane 3 underwent the +S Extension method. In total, twenty four cycles of three base +S Extensions were performed onlane 3 at 37° C. Three nucleotides (a triplet format) were added at each addition step together (forming a cycle). For clarity, we named the addition of tri-nucleotides as “minus the fourth nucleotide mix”. Therefore, −A mix consists of (dC, dG, dT); −C mix contains (dA, dG, dT); −G mix contains (dA, dC, dT); and finally. −T is the addition of (dA, dC, dG). During the +S Extension, the sequence of cycles of tri-nucleotides (triplets) was “−A, −C, −G, −T, −A, −C, −G, −T, −A, −C, −G, −T, −A, −C, −G, −T, −A, −C, −G, −T, −A, −C, −G, −T”, for a total of 24 cycles. +S Extension mix included: 1× Thermopol buffer (NEB), 0.5 M GC-Melt (Clonetech), 4 mM DTT (Sigma), 1 mg/ml BSA (NEB), 0.2 mg/ml PVP-10 (Sigma), 0.8 μg/ul SSB (Epicentre), 2 mU/μl Pyrophosphatase (NEB) and 1.6 U/μl Bst Polymerase (NEB). - Appropriate nucleotide combinations were added to the +S extension mix to a final concentration of 5 μM (each nucleotide washing solution was prepared with 1× Thermopol, 4 mM DTT and 1 mU/ul apyrase (NEB)).
- Prior to +S Extension,
lane 3 was filled with 85 μl of the +S extension mix without nucleotides and then incubated for 30 seconds. The +S extension cycle was performed by pumping +S extension mix with nucleotides (35 μl), followed by 3 μl of air at a rate of 60 μl/min. Consequently, wash mix (120 μl) was pumped and incubated for 1 minute, followed by 1× Thermopol wash (120 μl). This order of reagent pumping was repeated for 24 cycles with the designated nucleotide triplet combination in each cycle (i.e. −A, −C, etc.). Finally, after +S Extension,lane 3 was loaded with holding buffer and protected until further sequencing. - Re-Run of Standard Illumina Sequencing (Single-Read):
- With all the lanes (1, 2, 3) prepared, the flow cell was loaded into HiScanSQ sequencer along with TruSeq SBS Kit v3. In order to focuse effectively with HisScanSQ after S+ process, 1 cycle of TruSequ v3 was performed for all the lanes (1, 2, 3). The new sequencing run was executed using
single read 1×100 TruSeq v3 protocol as if starting from a new flow cell. In effect, this newsingle read 1×100 run is re-sequencing the 2nd read of the pair-end protocol that was completed earlier, wherelane 1 is reading base positions 102-201 as a continuation of the previous run,lane 2 is re-reading bases 2-101 since it starts with only the sequencing primer, whilelane 3 starts at a range of positions due to +S Extension. More precisely, the 24 cycles of +S Extension inlane 3 resulted in sequencing primers being extended by an average of 96 bp. - Data Analysis:
- E. coli sequencing reads were aligned to the assembled E. coli genome (strain ATCC 11303) using sequence alignment tool BWA. The genome of E. coli strain ATCC 11303 was assembled using sequencing reads of the same strain from a standard Illumina sequencing run. Only uniquely aligned reads were used in the quality calculation. In one quality calculation, all bases of each uniquely aligned read were counted regardless of the quality value. For an individual read, bases at each position were recorded as correct or wrong based on the comparison to the reference E. coli genome, then the Phred-style quality score Q at each base position was calculated as the negative logarithm of error rate E at the base position:
-
Q=−10*log10 E - where E=(number of bases recorded as wrong)/(number of bases recorded as correct+number of based recorded as wrong)
- Sequencing quality was also measured using Genome Analysis Tool Kit (GATK, www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit). First, all sequence reads were aligned to the assembled E. coli genome (strain ATCC 11303) using sequence alignment tool BWA. The CountCovariates module of GATK was then used to calculate the quality. In this calculation, continuous low quality bases (bases with raw Illumina quality score of 2) at the end of each read were dropped before the average quality was calculated.
- In
FIG. 16 , Q-Scores forbases 1 to 100 were taken from the sequencing reads using standard Illumina protocol for lane 1 (S1) and lane 3 (S3), (i.e. the 2nd read of 2×100 pair-end protocol). Forlane 1,bases 101 to 200 Q-Scores were obtained from the continuation sequencing run using standard Illumina protocol (1×100) without +S extension. Forlane lane 3. - Results
- +S Technology on Illumina Sequencing Platform
- This example demonstrates +S technology's ability to increase read length while maintaining read quality using Illumina's HiScanSQ sequencer. After 24-cycle +S extension on
lane 3, the standard sequencing primer is extended on average about 100 bp before running the 1×100 Illumina Sequencing (see Methods and Materials). The +S Extension inlane 3 is similar in length to thelane 1 condition, which contains the 100 bp read of the original Illumina's SBS. Therefore, thesingle read 1×100 Illumina Sequencing is reading positions 101-200 in bothlanes lane 1 is continuation of earlier Illumina sequencing, whilelane 3 contains freshly made +S Extension of average length of 100 bp In this way, the two lanes could be compared side-by-side to evaluate the effectiveness of +S Extension in increasing read length while maintaining read quality. Finally,Lane 2 is the control lane for sequencing primer hybridization, cluster retention and flow-cell performance. -
FIG. 16A compares the cluster density of different lanes after +S Extension onlane 3.Lane 1 is protected throughout +S process.Lane 2 was treated with NaOH and subsequently re-hybridized with sequencing primer together withLane 3. Neitherlanes 1 nor 2 were extended with +S. The similar cluster density inlanes -
FIG. 16B shows % cluster pass filter rate. After restarting the sequencer, only 10% of clusters passed filter onlane 1. In contrast, 70% of clusters passed filter onlane 3. -
FIG. 16C shows the number of pass filter reads for different lanes. Lane 3 (+S) has a much higher pass filter rate thanlane 1 and is only slightly lower thanlane 2, which was sequencing the bases from 1 to 100 vs.lane 3's sequencing which sequenced onaverage positions 101 to 200. Similarly, the predicted quality scores of different lanes (FIG. 16D ) show similar pattern, where +S sequencing dramatically improved the number of Q30 or above reads vs.lane 1. - We also performed +S Extension then standard Illumina sequencing on another lane (lane 8). The results of
lane 8 show similar patterns to those of lane 3 (data not shown here). -
FIGS. 17A and 17B show the empirical (actual Q-Score distribution over read length) Q-Score calculated using GATK.FIG. 17A shows the 100 bp standard Illumina sequencing run.FIG. 17B shows the additional 10 bp Illumina sequencing run, which was after the 100 bp sequencing run shown inFIG. 17A and an extra 1 bp sequencing run. Forlane 1,x-axis position 1 to 100 inFIG. 17A was theactual base position 1 to 100 on each DNA fragment sequenced;x-axis position 1 to 100 inFIG. 17B wasactual base position 102 to 201 on each DNA fragment sequenced. Forlane 3,x-axis position 1 to 100 inFIG. 17A was the actual base position on each DNA fragment sequenced; the actual base position on each DNA fragment forx-axis position 1 to 100 inFIG. 17B would depend on the actual +S extension size of each individual DNA fragment. Based on the +S extension size distribution, the average extension size onlane 3 is 97 bases. Therefore, the average of actual base position on DNA fragment forx-axis position 1 to 100 inFIG. 17B is 98 (97 plus 1 from additional 1 bp sequencing run) to 197. Because very few bases were available forlane 1 afterx-axis position 94 inFIG. 17B , the empirical quality score was not calculated forlane 1 afterx-axis position 94 inFIG. 17B . Clearly, even with the low quality bases at the end of reads had been dropped, the quality ofactual base positions 102 to 193 of Illumina continuation sequencing (lane 1) was much worse than +S sequencing (lane 3). The several sudden dips in lane 3 Q-Scores were due to the bubbles in the flow cell which prevented proper imaging of the clusters at those base positions. - Because the low quality bases at the end of reads were dropped in GATK empirical quality (
FIGS. 17A and 17B ) calculation, the number of correct bases was calculated to show changes of overall correct bases as the read length increases (FIGS. 17C and 17D ). The x-axis inFIG. 17C is the same to that inFIG. 17A and the x-axis inFIG. 17D is the same to that inFIG. 17B . Each read was aligned to the assembled reference E. coli genome (strain ATCC 11303). A base on a read was called correct if it was the same to the aligned base on the reference genome. InFIGS. 17C and 17D , the number of correct base at each x-axis position was calculated as the number of reads that have correct bases at the position for the lane. Clearly, the reads fromlane 3 in the additional sequencing after +S extension had much higher number of correct bases. - Overall, the output and quality of +S Sequencing at bases 101-200 in
lane 3 were much better than without +S Extension Steps (lane 1 at bases 101-200). We also performed +S Sequencing on an additional lane (Lane 8). The results oflane 8 showed similar patterns to those of lane 3 (data not shown here). - This example demonstrates that three nucleotide controlled extension can be performed using an Ion Torrent PGM. It also demonstrates that the commercial implementation of the controlled extension sequencing process. +S Sequencing, can be performed using Ion Torrent as a readout device.
- Materials and Methods
- A “fusion” PCR construct of 176 bp insert size were designed according to Ion Torrent's guidelines (Ion Amplicon Library Preparation (Fusion Method) p/n 4468326 Rev. B). The basic sequence of the PCR construct was from the plasmid pBR322. After 25 cycles of amplification with Herculase II DNA Polymerase (Agilent #600675) the amplicons were extracted with Qiagen's Gel Extraction Kit (Qiagen #28704). Input DNA was amplified onto Ion Sphere™ Particles (ISPs) using Ion Torrent's Ion Xpress Template 200 kit (Life p/n Life #4471253). Enriched ISPs were hybridized with sequencing primer and DNA polymerase was bound according to protocol (Ion Torrent protocol 4469714 Rev. B). (Polymerase and primer from Ion's Sequencing Kit Life #4468995).
- The Ion Torrent Personal Genome Machine was initialized with reagents from the sequencing kit. After initialization, the primed and polymerase-bound ISPs were loaded into a 314R chip with reagents from the Ion Sequencing 200 kit (Life #4471258) according to the 200 protocol (Life p/n 4471999 Rev. B). ISPs loaded into the chip were sequenced on the PGM with 320 nucleotide flows in Ion Torrent's SAMBA flow order. After extension, the chip was stored in a fridge in Annealing Buffer with PVP from Ion Torrent's Paired-End Sequencing Demonstrated Protocol (p/n MAN0006191; 900 ul of Annealing Buffer from sequencing kit was combined with 48 μl of 8% PVP-10).
- After sequencing on the PGM, the extended sequencing primer was stripped with 0.1N NaOH and ISP-bound templates were hybridized with sequencing primer mixture (5 μl Sequencing Primer in 25 μl Annealing Buffer) at 65° C. for 5 min followed by room temperature for 15 minutes. The Personal Genome Machine was again washed and initialized and polymerase was bound onto the ISPs in the chip according to the Paired-End Demonstrated Protocol (1.5 μl of Polymerase from the Sequencing Kit was added to 6 μl of Annealing Buffer with PVP; the mixture was injected into the chip and incubated for 5 minutes). During the PGM's
Inititialization 20 μl of each nucleotide was replaced by 20 μl of each of the other three nucleotides provided. For example, 20 μl of dATP was replaced with 20 μl of dCTP, 20 μl of dGTP, 20 μl of dTTP and the mixture was inserted into the dATP position on the PGM. This was repeated for each nucleotide position on the Personal Genome Machine. ISPs loaded into the chip were extended on the PGM with 16 nucleotide-triplet flows in Ion Torrent's SAMBA flow order. - After +S extension, the chip was stored in a fridge in Annealing Buffer with PVP from Ion Torrent's Paired-End Sequencing Demonstrated Protocol. After the PGM was washed and re-initialized according to the v2.0 protocol, the chip was washed 2× with 50 μl of Enzyme Denaturation Solution (from PE Demonstrated Protocol: 1×TE, 50 mM NaCl, 2% SDS), reloaded onto the machine, and incubated with polymerase (see above). The extended chip was sequenced with 320 flows in the SAMBA flow order. Sequence calls were made on a Torrent Server using Torrent Suite v 2.0.1 (Ion Torrent/Life Technologies, Inc.). To make calls for sequencing after +S extension, a different key corresponding to the sequencing starting position of the 176mer was used. For the first sequencing, amplicons were sorted by barcode using the Torrent Suite software (all molecules of one amplicon type have the same barcode, which was different than the other amplicons included in the experiment). After +S extension, each amplicon calls a different sequence key, thus the reads generated by Torrent Suite only represented the population of amplicons that called that key. FastQ files were visually inspected for quality and read length using Prinseq online (edwards.sdsu.edu/prinseq_beta/#).
- In
FIG. 18 , BAM files are automatically generated by Torrent Suite and visualized with IGV (www.broadinstitute.org/iv/). The alignment result clearly shows that reads after +S extension start a uniform position for one construct indicating minimum dephasing.
Claims (21)
1-123. (canceled)
124. A method for sequencing a target nucleic acid, comprising the steps of:
(a) hybridizing a first extension primer with the target nucleic acid;
(b) extending the first extension primer, the extending in step (b) comprising one or more first controlled extensions to a first defined length, wherein each of the one or more first controlled extensions comprises contacting the target nucleic acid with a set of nucleotides comprising three different unmodified nucleotides, wherein the extending in step (b) produces a first extended product comprising the first extension primer and a first extended portion, wherein sequence of the first extended portion is unknown; and
(c) using the first extended product obtained in step (b) as a first sequencing primer from which to generate a first sequence read, and sequencing a first region of said target nucleic acid,
with the proviso that the sequencing in (c) is not pyrophosphate detection based sequencing.
125. The method of claim 124 , further comprising:
(d) removing the first sequence read and the first sequencing primer;
(e) hybridizing a second extension primer with the target nucleic acid;
(f) extending the second extension primer, the extending in step (f) comprising one or more second controlled extensions to a second defined length, wherein each of the one or more second controlled extensions comprising contacting the second extension primer with a second set of nucleotides comprising another three different unmodified nucleotides, wherein the extending in step (f) produces a second extended product comprising the second extension primer and a second extended portion; and
(g) using the second extended product obtained in step (f) as a second sequencing primer from which to generate a second sequence read, and sequencing a second region of said target nucleic acid.
126. The method of claim 125 , wherein the removing in (d) comprises enzymatic digestion of the first sequence read.
127. The method of claim 125 , wherein the removing in (d) comprises exonuclease digestion.
128. The method of claim 125 , wherein the first and second extended products are the same.
129. The method of claim 125 , wherein the first and second extended products are different.
130. The method of claim 124 , wherein the one or more first controlled extensions comprises (i) contacting the target nucleic acid with a first set of nucleotides comprising three different unmodified nucleotides; and (ii) contacting the target nucleic acid with a second set of nucleotides comprising three different unmodified nucleotides.
131. The method of claim 130 , further comprising: between (i) and (ii), removing the first set of nucleotides by washing, or by a nucleotide degrading enzyme.
132. The method of claim 130 , wherein the first set of nucleotides is different from the second set of nucleotides.
133. The method of claim 130 , wherein the second set of nucleotides further comprises one reversible terminator nucleotide, thereby the first extended product comprising an incorporated reversible terminator nucleotide.
134. The method of claim 133 , further comprising, in (c), deblocking the incorporated reversible terminator nucleotide before generating the first sequence read.
135. The method of claim 125 , wherein the sequence of the target nucleic acid is determined by assembling the first, second, and optionally additional sequence reads.
136. The method of claim 124 , wherein the target nucleic acid is attached to a substrate.
137. The method of claim 125 , wherein the first and second sequence reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the target nucleic acid.
138. A method for sequencing a target nucleic acid, comprising the steps of:
(a) hybridizing a first extension primer with the target nucleic acid;
(b) extending the first extension primer, the extending in step (b) comprising one or more first controlled extensions to a first defined length, wherein each of the one or more first controlled extensions comprises contacting the target nucleic acid with a set of nucleotides comprising three different unmodified nucleotides, wherein the extending in step (b) produces a first extended product comprising the first extension primer and a first extended portion, wherein sequence of the first extended portion is unknown; and
(c) using the first extended product obtained in step (b) as a first sequencing primer from which to generate a first sequence read, and sequencing a first region of said target nucleic acid,
wherein a base that is resistant to exonuclease digestion is incorporated to a position in the first sequence read, with the proviso that the sequencing in (c) is not pyrophosphate detection based sequencing.
139. The method of claim 138 , further comprising:
(d) removing at least a part of said sequencing product.
140. The method of claim 139 , wherein the removing in (d) comprises the exonuclease digestion.
141. The method of claim 140 , further comprising:
(e) hybridizing a second extension primer with the target nucleic acid;
(f) extending the second extension primer, the extending in step (f) comprising one or more second controlled extensions to a second defined length, wherein each of the one or more second controlled extension comprises contacting the second extension primer with a second set of nucleotides comprising another three different unmodified nucleotides, wherein the extending in step (f) produces a second extended product comprising the second extension primer and a second extended portion; and
(g) using the second extended product obtained in step (f) as a second sequencing primer from which to generate a second sequence read, and sequencing a second region of said target nucleic acid.
142. The method of claim 141 , wherein the first and second sequence reads start at positions that are at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 175, or 200 bases apart on the target nucleic acid.
143. The method of claim 141 , wherein the sequence of the target nucleic acid is determined by assembling the first, second, and optionally additional sequence reads.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/005,496 US20210180123A1 (en) | 2011-04-01 | 2020-08-28 | Methods and systems for sequencing long nucleic acids |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161470497P | 2011-04-01 | 2011-04-01 | |
US201161477173P | 2011-04-20 | 2011-04-20 | |
US201161489662P | 2011-05-24 | 2011-05-24 | |
US13/153,218 US20120252682A1 (en) | 2011-04-01 | 2011-06-03 | Methods and systems for sequencing nucleic acids |
PCT/US2012/000185 WO2012134602A2 (en) | 2011-04-01 | 2012-04-02 | Methods and systems for sequencing long nucleic acids |
US201414009089A | 2014-07-03 | 2014-07-03 | |
US17/005,496 US20210180123A1 (en) | 2011-04-01 | 2020-08-28 | Methods and systems for sequencing long nucleic acids |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2012/000185 Continuation WO2012134602A2 (en) | 2011-04-01 | 2012-04-02 | Methods and systems for sequencing long nucleic acids |
US14/009,089 Continuation US10801062B2 (en) | 2011-04-01 | 2012-04-02 | Methods and systems for sequencing long nucleic acids |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210180123A1 true US20210180123A1 (en) | 2021-06-17 |
Family
ID=46928004
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/153,218 Abandoned US20120252682A1 (en) | 2011-04-01 | 2011-06-03 | Methods and systems for sequencing nucleic acids |
US14/009,089 Active 2035-06-23 US10801062B2 (en) | 2011-04-01 | 2012-04-02 | Methods and systems for sequencing long nucleic acids |
US13/970,321 Active US9689032B2 (en) | 2011-04-01 | 2013-08-19 | Methods and systems for sequencing long nucleic acids |
US17/005,496 Pending US20210180123A1 (en) | 2011-04-01 | 2020-08-28 | Methods and systems for sequencing long nucleic acids |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/153,218 Abandoned US20120252682A1 (en) | 2011-04-01 | 2011-06-03 | Methods and systems for sequencing nucleic acids |
US14/009,089 Active 2035-06-23 US10801062B2 (en) | 2011-04-01 | 2012-04-02 | Methods and systems for sequencing long nucleic acids |
US13/970,321 Active US9689032B2 (en) | 2011-04-01 | 2013-08-19 | Methods and systems for sequencing long nucleic acids |
Country Status (5)
Country | Link |
---|---|
US (4) | US20120252682A1 (en) |
EP (1) | EP2694679A4 (en) |
CN (1) | CN103917654B (en) |
HK (1) | HK1200492A1 (en) |
WO (1) | WO2012134602A2 (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102858995B (en) | 2009-09-10 | 2016-10-26 | 森特瑞隆技术控股公司 | Targeting sequence measurement |
US10174368B2 (en) | 2009-09-10 | 2019-01-08 | Centrillion Technology Holdings Corporation | Methods and systems for sequencing long nucleic acids |
US20120252682A1 (en) | 2011-04-01 | 2012-10-04 | Maples Corporate Services Limited | Methods and systems for sequencing nucleic acids |
US20150011396A1 (en) | 2012-07-09 | 2015-01-08 | Benjamin G. Schroeder | Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing |
US20140024542A1 (en) * | 2012-07-17 | 2014-01-23 | Counsyl, Inc. | Methods and compositions for enrichment of target polynucleotides |
EP2971130A4 (en) | 2013-03-15 | 2016-10-05 | Nugen Technologies Inc | Sequential sequencing |
US10036013B2 (en) * | 2013-08-19 | 2018-07-31 | Abbott Molecular Inc. | Next-generation sequencing libraries |
GB201319779D0 (en) * | 2013-11-08 | 2013-12-25 | Cartagenia N V | Genetic analysis method |
US10597715B2 (en) | 2013-12-05 | 2020-03-24 | Centrillion Technology Holdings | Methods for sequencing nucleic acids |
CN111118121A (en) | 2013-12-05 | 2020-05-08 | 生捷科技控股公司 | Preparation of patterned arrays |
WO2015085268A1 (en) | 2013-12-05 | 2015-06-11 | Centrillion Technology Holdings Corporation | Modified surfaces |
JP2017506500A (en) * | 2013-12-10 | 2017-03-09 | コネクシオ ゲノミクス ピーティーワイ リミテッド | Methods and probes for identifying gene alleles |
US10537889B2 (en) | 2013-12-31 | 2020-01-21 | Illumina, Inc. | Addressable flow cell using patterned electrodes |
US11060139B2 (en) | 2014-03-28 | 2021-07-13 | Centrillion Technology Holdings Corporation | Methods for sequencing nucleic acids |
GB201410420D0 (en) * | 2014-06-11 | 2014-07-23 | Illumina Cambridge Ltd | Methods for estimating cluster numbers |
US9909167B2 (en) | 2014-06-23 | 2018-03-06 | The Board Of Trustees Of The Leland Stanford Junior University | On-slide staining by primer extension |
GB201419731D0 (en) * | 2014-11-05 | 2014-12-17 | Illumina Cambridge Ltd | Sequencing from multiple primers to increase data rate and density |
CN104762405A (en) * | 2015-04-22 | 2015-07-08 | 北京嘉宝仁和医疗科技有限公司 | Method and kit for quality appraisal for amplification products after single cell genome amplification |
EP3103885B1 (en) | 2015-06-09 | 2019-01-30 | Centrillion Technology Holdings Corporation | Methods for sequencing nucleic acids |
US10584378B2 (en) | 2015-08-13 | 2020-03-10 | Centrillion Technology Holdings Corporation | Methods for synchronizing nucleic acid molecules |
CN106702497B (en) * | 2015-11-17 | 2020-01-10 | 安诺优达基因科技(北京)有限公司 | Kit for detecting free DNA in peripheral blood of pregnant woman and library building method |
CN106702498B (en) * | 2015-11-17 | 2020-03-24 | 安诺优达基因科技(北京)有限公司 | Method for constructing DNA library for sequencing |
CN106811510A (en) * | 2015-12-01 | 2017-06-09 | 上海市质量监督检验技术研究院 | Animal derived components discrimination method and its application based on high-flux sequence |
US20190048413A1 (en) * | 2016-02-23 | 2019-02-14 | Novozymes A/S | Improved next-generation sequencing |
CA3031586A1 (en) | 2016-07-27 | 2018-02-01 | The Board Of Trustees Of The Leland Stanford Junior University | Highly-multiplexed fluorescent imaging |
CN108629157B (en) * | 2017-03-22 | 2021-08-31 | 深圳华大基因科技服务有限公司 | Method for compressing and encrypting nucleic acid sequencing data |
CN108728430B (en) * | 2017-04-21 | 2022-04-05 | 胤安国际(辽宁)基因科技股份有限公司 | Method for preparing long DNA probe containing multiple repeating units |
US20200109446A1 (en) * | 2017-06-14 | 2020-04-09 | Board Of Regents, The University Of Texas System | Chip hybridized association-mapping platform and methods of use |
TWI695890B (en) * | 2017-12-29 | 2020-06-11 | 行動基因生技股份有限公司 | Method and system for sequence alignment and variant calling |
CN112805394B (en) * | 2018-12-07 | 2024-03-19 | 深圳华大生命科学研究院 | Method for sequencing long fragment nucleic acid |
AU2020269377B2 (en) | 2019-05-03 | 2024-06-13 | Ultima Genomics, Inc. | Fast-forward sequencing by synthesis methods |
CN111020022B (en) * | 2019-08-01 | 2020-12-29 | 温州医科大学 | Method and kit for detecting chromosomal rearrangements |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5756285A (en) * | 1991-09-27 | 1998-05-26 | Amersham Life Science, Inc. | DNA cycle sequencing |
US6833246B2 (en) * | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
US7790869B2 (en) * | 2000-10-06 | 2010-09-07 | The Trustees Of Columbia University In The City Of New York | Massive parallel method for decoding DNA and RNA |
US20110009276A1 (en) * | 2006-02-08 | 2011-01-13 | Eric Hans Vermaas | Method for Sequencing a Polynucleotide Template |
US20120252682A1 (en) * | 2011-04-01 | 2012-10-04 | Maples Corporate Services Limited | Methods and systems for sequencing nucleic acids |
US20190360034A1 (en) * | 2011-04-01 | 2019-11-28 | Centrillion Technology Holdings Corporation | Methods and systems for sequencing nucleic acids |
Family Cites Families (131)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4437975A (en) | 1977-07-20 | 1984-03-20 | Mobil Oil Corporation | Manufacture of lube base stock oil |
US4458066A (en) | 1980-02-29 | 1984-07-03 | University Patents, Inc. | Process for preparing polynucleotides |
US4469863A (en) | 1980-11-12 | 1984-09-04 | Ts O Paul O P | Nonionic nucleic acid alkyl and aryl phosphonates and processes for manufacture and use thereof |
US4883750A (en) | 1984-12-13 | 1989-11-28 | Applied Biosystems, Inc. | Detection of specific sequences in nucleic acids |
US5242794A (en) | 1984-12-13 | 1993-09-07 | Applied Biosystems, Inc. | Detection of specific sequences in nucleic acids |
US5034506A (en) | 1985-03-15 | 1991-07-23 | Anti-Gene Development Group | Uncharged morpholino-based polymers having achiral intersubunit linkages |
US5235033A (en) | 1985-03-15 | 1993-08-10 | Anti-Gene Development Group | Alpha-morpholino ribonucleoside derivatives and polymers thereof |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
IL86724A (en) | 1987-06-19 | 1995-01-24 | Siska Diagnostics Inc | Method and kits for the amplification and detection of nucleic acid sequences |
WO1989001050A1 (en) | 1987-07-31 | 1989-02-09 | The Board Of Trustees Of The Leland Stanford Junior University | Selective amplification of target polynucleotide sequences |
CA1340807C (en) | 1988-02-24 | 1999-11-02 | Lawrence T. Malek | Nucleic acid amplification process |
JP2650159B2 (en) | 1988-02-24 | 1997-09-03 | アクゾ・ノベル・エヌ・ベー | Nucleic acid amplification method |
US4988617A (en) | 1988-03-25 | 1991-01-29 | California Institute Of Technology | Method of detecting a nucleotide change in nucleic acids |
US5216141A (en) | 1988-06-06 | 1993-06-01 | Benner Steven A | Oligonucleotide analogs containing sulfur linkages |
ZA899593B (en) | 1988-12-16 | 1990-09-26 | Siska Diagnostics Inc | Self-sustained,sequence replication system |
US5856092A (en) | 1989-02-13 | 1999-01-05 | Geneco Pty Ltd | Detection of a nucleic acid sequence or a change therein |
US5234809A (en) | 1989-03-23 | 1993-08-10 | Akzo N.V. | Process for isolating nucleic acid |
US5800992A (en) | 1989-06-07 | 1998-09-01 | Fodor; Stephen P.A. | Method of detecting nucleic acids |
US5871928A (en) | 1989-06-07 | 1999-02-16 | Fodor; Stephen P. A. | Methods for nucleic acid analysis |
US5143854A (en) | 1989-06-07 | 1992-09-01 | Affymax Technologies N.V. | Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof |
US5547839A (en) | 1989-06-07 | 1996-08-20 | Affymax Technologies N.V. | Sequencing of surface immobilized polymers utilizing microflourescence detection |
US5494810A (en) | 1990-05-03 | 1996-02-27 | Cornell Research Foundation, Inc. | Thermostable ligase-mediated DNA amplifications system for the detection of genetic disease |
US5386023A (en) | 1990-07-27 | 1995-01-31 | Isis Pharmaceuticals | Backbone modified oligonucleotide analogs and preparation thereof through reductive coupling |
US5602240A (en) | 1990-07-27 | 1997-02-11 | Ciba Geigy Ag. | Backbone modified oligonucleotide analogs |
DE69128545D1 (en) | 1990-08-24 | 1998-02-05 | Univ Tennessee Res Corp | GENETIC FINGERPRINT TECHNIQUE WITH DNA REPLACEMENT |
WO1992007095A1 (en) | 1990-10-15 | 1992-04-30 | Stratagene | Arbitrarily primed polymerase chain reaction method for fingerprinting genomes |
US5644048A (en) | 1992-01-10 | 1997-07-01 | Isis Pharmaceuticals, Inc. | Process for preparing phosphorothioate oligonucleotides |
US5470705A (en) | 1992-04-03 | 1995-11-28 | Applied Biosystems, Inc. | Probe composition containing a binding domain and polymer chain and methods of use |
US5837832A (en) | 1993-06-25 | 1998-11-17 | Affymetrix, Inc. | Arrays of nucleic acid probes on biological chips |
US6045996A (en) | 1993-10-26 | 2000-04-04 | Affymetrix, Inc. | Hybridization assays on oligonucleotide arrays |
US6090555A (en) | 1997-12-11 | 2000-07-18 | Affymetrix, Inc. | Scanned image alignment systems and methods |
US5578832A (en) | 1994-09-02 | 1996-11-26 | Affymetrix, Inc. | Method and apparatus for imaging a sample on a device |
US5631734A (en) | 1994-02-10 | 1997-05-20 | Affymetrix, Inc. | Method and apparatus for detection of fluorescently labeled materials |
US5637684A (en) | 1994-02-23 | 1997-06-10 | Isis Pharmaceuticals, Inc. | Phosphoramidate and phosphorothioamidate oligomeric compounds |
US6287850B1 (en) | 1995-06-07 | 2001-09-11 | Affymetrix, Inc. | Bioarray chip reaction apparatus and its manufacture |
US5705628A (en) | 1994-09-20 | 1998-01-06 | Whitehead Institute For Biomedical Research | DNA purification and isolation using magnetic particles |
US6362002B1 (en) | 1995-03-17 | 2002-03-26 | President And Fellows Of Harvard College | Characterization of individual polymer molecules based on monomer-interface interactions |
US5545531A (en) | 1995-06-07 | 1996-08-13 | Affymax Technologies N.V. | Methods for making a device for concurrently processing multiple biological chip assays |
US5882867A (en) * | 1995-06-07 | 1999-03-16 | Dade Behring Marburg Gmbh | Detection of nucleic acids by formation of template-dependent product |
US6518189B1 (en) | 1995-11-15 | 2003-02-11 | Regents Of The University Of Minnesota | Method and apparatus for high density nanostructures |
JP4054379B2 (en) | 1995-12-01 | 2008-02-27 | イノジェネティックス・ナムローゼ・フェンノートシャップ | Impedance type detection system and manufacturing method thereof |
US6852487B1 (en) | 1996-02-09 | 2005-02-08 | Cornell Research Foundation, Inc. | Detection of nucleic acid sequence differences using the ligase detection reaction with addressable arrays |
US6114122A (en) | 1996-03-26 | 2000-09-05 | Affymetrix, Inc. | Fluidics station with a mounting system and method of using |
US5867266A (en) | 1996-04-17 | 1999-02-02 | Cornell Research Foundation, Inc. | Multiple optical channels for chemical analysis |
WO1997043611A1 (en) | 1996-05-16 | 1997-11-20 | Affymetrix, Inc. | Systems and methods for detection of labeled materials |
US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
US6201639B1 (en) | 1998-03-20 | 2001-03-13 | James W. Overbeck | Wide field of view and high speed scanning microscopy |
US6185030B1 (en) | 1998-03-20 | 2001-02-06 | James W. Overbeck | Wide field of view and high speed scanning microscopy |
US5936324A (en) | 1998-03-30 | 1999-08-10 | Genetic Microsystems Inc. | Moving magnet scanner |
US6287821B1 (en) * | 1998-06-11 | 2001-09-11 | Orchid Biosciences, Inc. | Nucleotide analogues with 3'-pro-fluorescent fluorophores in nucleic acid sequence analysis |
US6787308B2 (en) | 1998-07-30 | 2004-09-07 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
US20030022207A1 (en) | 1998-10-16 | 2003-01-30 | Solexa, Ltd. | Arrayed polynucleotides and their use in genome analysis |
US6361947B1 (en) | 1998-10-27 | 2002-03-26 | Affymetrix, Inc. | Complexity management and analysis of genomic DNA |
US6267872B1 (en) | 1998-11-06 | 2001-07-31 | The Regents Of The University Of California | Miniature support for thin films containing single channels or nanopores and methods for using same |
US20060275782A1 (en) * | 1999-04-20 | 2006-12-07 | Illumina, Inc. | Detection of nucleic acid reactions on bead arrays |
US7056661B2 (en) | 1999-05-19 | 2006-06-06 | Cornell Research Foundation, Inc. | Method for sequencing nucleic acid molecules |
US6218803B1 (en) | 1999-06-04 | 2001-04-17 | Genetic Microsystems, Inc. | Position sensing with variable capacitance transducers |
US7258838B2 (en) | 1999-06-22 | 2007-08-21 | President And Fellows Of Harvard College | Solid state molecular probe device |
US6627067B1 (en) | 1999-06-22 | 2003-09-30 | President And Fellows Of Harvard College | Molecular and atomic scale evaluation of biopolymers |
US6464842B1 (en) | 1999-06-22 | 2002-10-15 | President And Fellows Of Harvard College | Control of solid state dimensional features |
US7211390B2 (en) | 1999-09-16 | 2007-05-01 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US7244559B2 (en) | 1999-09-16 | 2007-07-17 | 454 Life Sciences Corporation | Method of sequencing a nucleic acid |
US6958225B2 (en) | 1999-10-27 | 2005-10-25 | Affymetrix, Inc. | Complexity management of genomic DNA |
WO2001032930A1 (en) | 1999-11-04 | 2001-05-10 | California Institute Of Technology | Methods and apparatuses for analyzing polynucleotide sequences |
US6582938B1 (en) | 2001-05-11 | 2003-06-24 | Affymetrix, Inc. | Amplification of nucleic acids |
GB0002389D0 (en) | 2000-02-02 | 2000-03-22 | Solexa Ltd | Molecular arrays |
US6386749B1 (en) | 2000-06-26 | 2002-05-14 | Affymetrix, Inc. | Systems and methods for heating and mixing fluids |
US6897023B2 (en) | 2000-09-27 | 2005-05-24 | The Molecular Sciences Institute, Inc. | Method for determining relative abundance of nucleic acid sequences |
US7001724B1 (en) | 2000-11-28 | 2006-02-21 | Applera Corporation | Compositions, methods, and kits for isolating nucleic acids using surfactants and proteases |
US6391592B1 (en) | 2000-12-14 | 2002-05-21 | Affymetrix, Inc. | Blocker-aided target amplification of nucleic acids |
AU2002251946A1 (en) | 2001-02-14 | 2002-08-28 | Science And Technology Corporation @ Unm | Nanostructured devices for separation and analysis |
EP2801624B1 (en) | 2001-03-16 | 2019-03-06 | Singular Bio, Inc | Arrays and methods of use |
DE10120797B4 (en) | 2001-04-27 | 2005-12-22 | Genovoxx Gmbh | Method for analyzing nucleic acid chains |
US6777187B2 (en) | 2001-05-02 | 2004-08-17 | Rubicon Genomics, Inc. | Genome walking by selective amplification of nick-translate DNA library and amplification from complex mixtures of templates |
US6632611B2 (en) | 2001-07-20 | 2003-10-14 | Affymetrix, Inc. | Method of target enrichment and amplification |
US7297778B2 (en) | 2001-07-25 | 2007-11-20 | Affymetrix, Inc. | Complexity management of genomic DNA |
US6872529B2 (en) | 2001-07-25 | 2005-03-29 | Affymetrix, Inc. | Complexity management of genomic DNA |
US6548810B2 (en) | 2001-08-01 | 2003-04-15 | The University Of Chicago | Scanning confocal electron microscope |
DE10239504A1 (en) | 2001-08-29 | 2003-04-24 | Genovoxx Gmbh | Parallel sequencing of nucleic acid fragments, useful e.g. for detecting mutations, comprises sequential single-base extension of immobilized fragment-primer complex |
JP2003101204A (en) | 2001-09-25 | 2003-04-04 | Nec Kansai Ltd | Wiring substrate, method of manufacturing the same, and electronic component |
DE10246005A1 (en) | 2001-10-04 | 2003-04-30 | Genovoxx Gmbh | Automated nucleic acid sequencer, useful e.g. for analyzing gene expression, based on parallel incorporation of fluorescently labeled terminating nucleotides |
DE10149786B4 (en) | 2001-10-09 | 2013-04-25 | Dmitry Cherkasov | Surface for studies of populations of single molecules |
US6902921B2 (en) | 2001-10-30 | 2005-06-07 | 454 Corporation | Sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
US20050124022A1 (en) | 2001-10-30 | 2005-06-09 | Maithreyan Srinivasan | Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
JP2004003989A (en) | 2002-03-15 | 2004-01-08 | Affymetrix Inc | System, method, and product for scanning biological material |
US20030186280A1 (en) | 2002-03-28 | 2003-10-02 | Affymetrix, Inc. | Methods for detecting genomic regions of biological significance |
US20030186279A1 (en) | 2002-03-28 | 2003-10-02 | Affymetrix, Inc. | Large scale genotyping methods |
DE10214395A1 (en) | 2002-03-30 | 2003-10-23 | Dmitri Tcherkassov | Parallel sequencing of nucleic acid segments, useful for detecting single-nucleotide polymorphisms, by single-base extensions with labeled nucleotide |
JP4799861B2 (en) | 2002-04-16 | 2011-10-26 | プリンストン ユニバーシティ | Gradient structure for interface between microfluidic and nanofluid, and its manufacturing and use |
US20070065816A1 (en) | 2002-05-17 | 2007-03-22 | Affymetrix, Inc. | Methods for genotyping |
US20040072217A1 (en) | 2002-06-17 | 2004-04-15 | Affymetrix, Inc. | Methods of analysis of linkage disequilibrium |
US7300788B2 (en) | 2002-10-08 | 2007-11-27 | Affymetrix, Inc. | Method for genotyping polymorphisms in humans |
EP1590477B1 (en) | 2003-01-29 | 2009-07-29 | 454 Corporation | Methods of amplifying and sequencing nucleic acids |
GB2398383B (en) | 2003-02-12 | 2005-03-09 | Global Genomics Ab | Method and means for nucleic acid sequencing |
WO2005044836A2 (en) | 2003-11-05 | 2005-05-19 | Genovoxx Gmbh | Macromolecular nucleotide compounds and methods for using the same |
DE10356837A1 (en) | 2003-12-05 | 2005-06-30 | Dmitry Cherkasov | New conjugates useful for modifying nucleic acid chains comprise nucleotide or nucleoside molecules coupled to a label through water-soluble polymer linkers |
US7169560B2 (en) | 2003-11-12 | 2007-01-30 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
US20050186576A1 (en) | 2004-02-19 | 2005-08-25 | Intel Corporation | Polymer sequencing using selectively labeled monomers and data integration |
DE102004009704A1 (en) | 2004-02-27 | 2005-09-15 | Dmitry Cherkasov | New conjugates useful for labeling nucleic acids comprise a label coupled to nucleotide or nucleoside molecules through polymer linkers |
US7238485B2 (en) | 2004-03-23 | 2007-07-03 | President And Fellows Of Harvard College | Methods and apparatus for characterizing polynucleotides |
BR122016013290B1 (en) | 2004-05-13 | 2021-04-20 | Anita Goel | nucleic acid molecule amplification device, microfluidic device for applying tension to nucleic acid molecules and kits for nucleic acid molecule processing |
DE102004025746A1 (en) | 2004-05-26 | 2005-12-15 | Dmitry Cherkasov | Parallel sequencing of nucleic acids by optical methods, by cyclic primer-matrix extension, using a solid phase with reduced non-specific binding of labeled components |
DE102004025745A1 (en) | 2004-05-26 | 2005-12-15 | Cherkasov, Dmitry | Surface of solid phase, useful for parallel, optical analysis of many nucleic acids, has reduced non-specific binding of labeled components |
DE102004025695A1 (en) | 2004-05-26 | 2006-02-23 | Dmitry Cherkasov | Optical fluorescent parallel process to analyse nucleic acid chains in which a sample solid is bound with a primer-matrix complex |
DE102004025694A1 (en) | 2004-05-26 | 2006-02-23 | Dmitry Cherkasov | Optical fluorescent ultra-high parallel process to analyse nucleic acid chains in which a sample solid is bound with a primer-matrix complex |
DE102004025744A1 (en) | 2004-05-26 | 2005-12-29 | Dmitry Cherkasov | Surface of a solid support, useful for multiple parallel analysis of nucleic acids by optical methods, having low non-specific binding of labeled components |
DE102004025696A1 (en) | 2004-05-26 | 2006-02-23 | Dmitry Cherkasov | Ultra-high parallel analysis process to analyse nucleic acid chains in which a sample solid is bound and substrate material |
US20060024711A1 (en) | 2004-07-02 | 2006-02-02 | Helicos Biosciences Corporation | Methods for nucleic acid amplification and sequence determination |
US7276720B2 (en) | 2004-07-19 | 2007-10-02 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
US20060012793A1 (en) | 2004-07-19 | 2006-01-19 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
US20060024678A1 (en) | 2004-07-28 | 2006-02-02 | Helicos Biosciences Corporation | Use of single-stranded nucleic acid binding proteins in sequencing |
AU2006336262B2 (en) | 2005-04-06 | 2011-10-13 | President And Fellows Of Harvard College | Molecular characterization with carbon nanotube control |
WO2007120208A2 (en) | 2005-11-14 | 2007-10-25 | President And Fellows Of Harvard College | Nanogrid rolling circle dna sequencing |
EP1963525B2 (en) * | 2005-12-02 | 2017-10-11 | Synthetic Genomics, Inc. | Synthesis of error-minimized nucleic acid molecules |
US7754429B2 (en) | 2006-10-06 | 2010-07-13 | Illumina Cambridge Limited | Method for pair-wise sequencing a plurity of target polynucleotides |
US20080242560A1 (en) * | 2006-11-21 | 2008-10-02 | Gunderson Kevin L | Methods for generating amplified nucleic acid arrays |
KR100777230B1 (en) | 2006-11-30 | 2007-11-28 | 한국해양연구원 | Mutant dna polymerases and their genes from themococcus |
WO2009097626A2 (en) * | 2008-02-03 | 2009-08-06 | Helicos Biosciences Corporation | Paired-end reads in sequencing by synthesis |
US20100029498A1 (en) | 2008-02-04 | 2010-02-04 | Andreas Gnirke | Selection of nucleic acids by solution hybridization to oligonucleotide baits |
US8034568B2 (en) * | 2008-02-12 | 2011-10-11 | Nugen Technologies, Inc. | Isothermal nucleic acid amplification methods and compositions |
US20090291475A1 (en) | 2008-04-23 | 2009-11-26 | Kai Qin Lao | Sequence amplification with linear primers |
US8993230B2 (en) | 2008-12-04 | 2015-03-31 | Pacific Biosciences of Californ, Inc. | Asynchronous sequencing of biological polymers |
WO2010075188A2 (en) * | 2008-12-23 | 2010-07-01 | Illumina Inc. | Multibase delivery for long reads in sequencing by synthesis protocols |
WO2010141390A2 (en) * | 2009-06-05 | 2010-12-09 | Life Technologies Corporation | Nucleotide transient binding for sequencing methods |
CN102858995B (en) | 2009-09-10 | 2016-10-26 | 森特瑞隆技术控股公司 | Targeting sequence measurement |
US8674086B2 (en) * | 2010-06-25 | 2014-03-18 | Intel Corporation | Nucleotides and oligonucleotides for nucleic acid sequencing |
WO2012040624A1 (en) | 2010-09-23 | 2012-03-29 | Centrillion Technology Holding Corporation | Native-extension parallel sequencing |
CA2826131C (en) | 2011-02-02 | 2019-11-05 | Jay Ashok Shendure | Massively parallel continguity mapping |
US9328382B2 (en) | 2013-03-15 | 2016-05-03 | Complete Genomics, Inc. | Multiple tagging of individual long DNA fragments |
WO2015017759A1 (en) | 2013-08-02 | 2015-02-05 | Stc.Unm | Dna sequencing and epigenome analysis |
-
2011
- 2011-06-03 US US13/153,218 patent/US20120252682A1/en not_active Abandoned
-
2012
- 2012-04-02 US US14/009,089 patent/US10801062B2/en active Active
- 2012-04-02 EP EP12762821.2A patent/EP2694679A4/en not_active Withdrawn
- 2012-04-02 WO PCT/US2012/000185 patent/WO2012134602A2/en active Application Filing
- 2012-04-02 CN CN201280027272.XA patent/CN103917654B/en active Active
-
2013
- 2013-08-19 US US13/970,321 patent/US9689032B2/en active Active
-
2015
- 2015-01-09 HK HK15100240.1A patent/HK1200492A1/en unknown
-
2020
- 2020-08-28 US US17/005,496 patent/US20210180123A1/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5756285A (en) * | 1991-09-27 | 1998-05-26 | Amersham Life Science, Inc. | DNA cycle sequencing |
US6833246B2 (en) * | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
US7790869B2 (en) * | 2000-10-06 | 2010-09-07 | The Trustees Of Columbia University In The City Of New York | Massive parallel method for decoding DNA and RNA |
US20110009276A1 (en) * | 2006-02-08 | 2011-01-13 | Eric Hans Vermaas | Method for Sequencing a Polynucleotide Template |
US20120252682A1 (en) * | 2011-04-01 | 2012-10-04 | Maples Corporate Services Limited | Methods and systems for sequencing nucleic acids |
US20140065604A1 (en) * | 2011-04-01 | 2014-03-06 | Wei Zhou | Methods and systems for sequencing long nucleic acids |
US20190360034A1 (en) * | 2011-04-01 | 2019-11-28 | Centrillion Technology Holdings Corporation | Methods and systems for sequencing nucleic acids |
US10801062B2 (en) * | 2011-04-01 | 2020-10-13 | Centrillion Technology Holdings Corporation | Methods and systems for sequencing long nucleic acids |
Also Published As
Publication number | Publication date |
---|---|
WO2012134602A3 (en) | 2013-12-27 |
US10801062B2 (en) | 2020-10-13 |
US20120252682A1 (en) | 2012-10-04 |
HK1200492A1 (en) | 2015-08-07 |
CN103917654A (en) | 2014-07-09 |
WO2012134602A2 (en) | 2012-10-04 |
US9689032B2 (en) | 2017-06-27 |
EP2694679A4 (en) | 2014-10-22 |
US20140315724A1 (en) | 2014-10-23 |
EP2694679A2 (en) | 2014-02-12 |
CN103917654B (en) | 2017-10-27 |
US20140065604A1 (en) | 2014-03-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210180123A1 (en) | Methods and systems for sequencing long nucleic acids | |
US20190360034A1 (en) | Methods and systems for sequencing nucleic acids | |
AU2018266377B2 (en) | Universal short adapters for indexing of polynucleotide samples | |
AU2018214075B2 (en) | Systems and methods for prenatal genetic analysis | |
Van Dijk et al. | Ten years of next-generation sequencing technology | |
CN106434873B (en) | Method for synchronizing nucleic acid molecules | |
US10072287B2 (en) | Methods of targeted sequencing | |
US20210024996A1 (en) | Method for verifying bioassay samples | |
CA3220983A1 (en) | Optimal index sequences for multiplex massively parallel sequencing | |
US10174368B2 (en) | Methods and systems for sequencing long nucleic acids | |
US20120083417A1 (en) | Native-extension parallel sequencing | |
Myllykangas et al. | Targeted deep resequencing of the human cancer genome using next-generation technologies | |
Masoudi-Nejad et al. | Emergence of Next-Generation Sequencing | |
WO2022204685A1 (en) | Methods for sequencing nucleic acid molecules with sequential barcodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |