CN117321194A - Preparation method of nucleic acid sequencing library - Google Patents
Preparation method of nucleic acid sequencing library Download PDFInfo
- Publication number
- CN117321194A CN117321194A CN202280035174.4A CN202280035174A CN117321194A CN 117321194 A CN117321194 A CN 117321194A CN 202280035174 A CN202280035174 A CN 202280035174A CN 117321194 A CN117321194 A CN 117321194A
- Authority
- CN
- China
- Prior art keywords
- dna
- target
- dsdna
- protein
- composition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 150000007523 nucleic acids Chemical class 0.000 title claims abstract description 280
- 102000039446 nucleic acids Human genes 0.000 title claims abstract description 264
- 108020004707 nucleic acids Proteins 0.000 title claims abstract description 264
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 113
- 238000002360 preparation method Methods 0.000 title description 34
- 108020004414 DNA Proteins 0.000 claims abstract description 453
- 238000000034 method Methods 0.000 claims abstract description 224
- 102000053602 DNA Human genes 0.000 claims abstract description 155
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 125
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 117
- 230000027455 binding Effects 0.000 claims abstract description 105
- 108010020764 Transposases Proteins 0.000 claims abstract description 80
- 102000008579 Transposases Human genes 0.000 claims abstract description 80
- 239000000203 mixture Substances 0.000 claims abstract description 76
- 230000004568 DNA-binding Effects 0.000 claims abstract description 57
- 102000007474 Multiprotein Complexes Human genes 0.000 claims abstract description 35
- 108010085220 Multiprotein Complexes Proteins 0.000 claims abstract description 35
- 239000000523 sample Substances 0.000 claims description 236
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 204
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 130
- 230000003321 amplification Effects 0.000 claims description 129
- 235000018102 proteins Nutrition 0.000 claims description 115
- 125000003729 nucleotide group Chemical group 0.000 claims description 100
- 239000012634 fragment Substances 0.000 claims description 99
- 239000002773 nucleotide Substances 0.000 claims description 97
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 62
- 238000003752 polymerase chain reaction Methods 0.000 claims description 48
- 239000011541 reaction mixture Substances 0.000 claims description 48
- 108020001507 fusion proteins Proteins 0.000 claims description 47
- 102000037865 fusion proteins Human genes 0.000 claims description 47
- 238000006243 chemical reaction Methods 0.000 claims description 46
- 210000001519 tissue Anatomy 0.000 claims description 34
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 claims description 32
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 29
- 108010012306 Tn5 transposase Proteins 0.000 claims description 27
- 230000002068 genetic effect Effects 0.000 claims description 26
- 239000000126 substance Substances 0.000 claims description 26
- 108020005004 Guide RNA Proteins 0.000 claims description 25
- 239000012472 biological sample Substances 0.000 claims description 25
- 230000035772 mutation Effects 0.000 claims description 25
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 23
- 101710163270 Nuclease Proteins 0.000 claims description 22
- 201000010099 disease Diseases 0.000 claims description 22
- 102100031780 Endonuclease Human genes 0.000 claims description 21
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 21
- 108010042407 Endonucleases Proteins 0.000 claims description 20
- 238000010459 TALEN Methods 0.000 claims description 17
- 230000002950 deficient Effects 0.000 claims description 17
- 102000004190 Enzymes Human genes 0.000 claims description 15
- 108090000790 Enzymes Proteins 0.000 claims description 15
- 210000002966 serum Anatomy 0.000 claims description 15
- 239000004471 Glycine Substances 0.000 claims description 14
- 210000002381 plasma Anatomy 0.000 claims description 14
- 235000004252 protein component Nutrition 0.000 claims description 13
- 238000006073 displacement reaction Methods 0.000 claims description 12
- 239000013612 plasmid Substances 0.000 claims description 12
- 102000008682 Argonaute Proteins Human genes 0.000 claims description 11
- 108010088141 Argonaute Proteins Proteins 0.000 claims description 11
- 206010028980 Neoplasm Diseases 0.000 claims description 11
- 108010017070 Zinc Finger Nucleases Proteins 0.000 claims description 11
- 208000026350 Inborn Genetic disease Diseases 0.000 claims description 10
- 208000016361 genetic disease Diseases 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 10
- 238000003753 real-time PCR Methods 0.000 claims description 10
- 238000011529 RT qPCR Methods 0.000 claims description 9
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 claims description 9
- 201000011510 cancer Diseases 0.000 claims description 9
- 239000007850 fluorescent dye Substances 0.000 claims description 9
- 238000007397 LAMP assay Methods 0.000 claims description 8
- 108060004795 Methyltransferase Proteins 0.000 claims description 8
- 206010036790 Productive cough Diseases 0.000 claims description 8
- 102000018120 Recombinases Human genes 0.000 claims description 8
- 108010091086 Recombinases Proteins 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 8
- 230000001404 mediated effect Effects 0.000 claims description 8
- 230000010076 replication Effects 0.000 claims description 8
- 210000003802 sputum Anatomy 0.000 claims description 8
- 208000024794 sputum Diseases 0.000 claims description 8
- 108020000946 Bacterial DNA Proteins 0.000 claims description 7
- 108020000949 Fungal DNA Proteins 0.000 claims description 7
- 108020005196 Mitochondrial DNA Proteins 0.000 claims description 7
- 108020003633 Protozoan DNA Proteins 0.000 claims description 7
- 108020005202 Viral DNA Proteins 0.000 claims description 7
- 230000003115 biocidal effect Effects 0.000 claims description 7
- 238000003780 insertion Methods 0.000 claims description 7
- 230000037431 insertion Effects 0.000 claims description 7
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 claims description 6
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 6
- 244000052769 pathogen Species 0.000 claims description 6
- 230000001717 pathogenic effect Effects 0.000 claims description 6
- 238000006467 substitution reaction Methods 0.000 claims description 6
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 claims description 5
- 201000003883 Cystic fibrosis Diseases 0.000 claims description 5
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 claims description 5
- 240000007019 Oxalis corniculata Species 0.000 claims description 5
- 235000004279 alanine Nutrition 0.000 claims description 5
- XMQFTWRPUQYINF-UHFFFAOYSA-N bensulfuron-methyl Chemical compound COC(=O)C1=CC=CC=C1CS(=O)(=O)NC(=O)NC1=NC(OC)=CC(OC)=N1 XMQFTWRPUQYINF-UHFFFAOYSA-N 0.000 claims description 5
- 210000001124 body fluid Anatomy 0.000 claims description 5
- 210000000416 exudates and transudate Anatomy 0.000 claims description 5
- 230000007935 neutral effect Effects 0.000 claims description 5
- 210000005259 peripheral blood Anatomy 0.000 claims description 5
- 239000011886 peripheral blood Substances 0.000 claims description 5
- 230000000241 respiratory effect Effects 0.000 claims description 5
- 238000013518 transcription Methods 0.000 claims description 5
- 230000035897 transcription Effects 0.000 claims description 5
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 claims description 4
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 4
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 claims description 4
- 239000004472 Lysine Substances 0.000 claims description 4
- 125000000129 anionic group Chemical group 0.000 claims description 4
- 210000001165 lymph node Anatomy 0.000 claims description 4
- 238000005096 rolling process Methods 0.000 claims description 4
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 claims description 3
- 208000001914 Fragile X syndrome Diseases 0.000 claims description 3
- 208000018565 Hemochromatosis Diseases 0.000 claims description 3
- 208000031220 Hemophilia Diseases 0.000 claims description 3
- 208000009292 Hemophilia A Diseases 0.000 claims description 3
- 208000023105 Huntington disease Diseases 0.000 claims description 3
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 claims description 3
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 claims description 3
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 claims description 3
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 claims description 3
- 208000003221 Lysosomal acid lipase deficiency Diseases 0.000 claims description 3
- 208000024556 Mendelian disease Diseases 0.000 claims description 3
- 208000002678 Mucopolysaccharidoses Diseases 0.000 claims description 3
- 208000003019 Neurofibromatosis 1 Diseases 0.000 claims description 3
- 208000024834 Neurofibromatosis type 1 Diseases 0.000 claims description 3
- 201000011252 Phenylketonuria Diseases 0.000 claims description 3
- 208000002903 Thalassemia Diseases 0.000 claims description 3
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 claims description 3
- 239000004473 Threonine Substances 0.000 claims description 3
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 claims description 3
- 239000010839 body fluid Substances 0.000 claims description 3
- 125000002091 cationic group Chemical group 0.000 claims description 3
- 201000001386 familial hypercholesterolemia Diseases 0.000 claims description 3
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 claims description 3
- 208000007345 glycogen storage disease Diseases 0.000 claims description 3
- 238000011901 isothermal amplification Methods 0.000 claims description 3
- 206010028093 mucopolysaccharidosis Diseases 0.000 claims description 3
- 208000030761 polycystic kidney disease Diseases 0.000 claims description 3
- 208000007056 sickle cell anemia Diseases 0.000 claims description 3
- 208000011580 syndromic disease Diseases 0.000 claims description 3
- 230000005945 translocation Effects 0.000 claims description 3
- 102000040650 (ribonucleotides)n+m Human genes 0.000 claims description 2
- 239000013615 primer Substances 0.000 description 88
- 125000005647 linker group Chemical group 0.000 description 83
- 238000001514 detection method Methods 0.000 description 70
- 238000003776 cleavage reaction Methods 0.000 description 53
- 210000004027 cell Anatomy 0.000 description 51
- 108091033409 CRISPR Proteins 0.000 description 49
- 230000000295 complement effect Effects 0.000 description 49
- 238000004458 analytical method Methods 0.000 description 46
- 230000007017 scission Effects 0.000 description 45
- 239000002585 base Substances 0.000 description 40
- 239000000047 product Substances 0.000 description 35
- 102000004196 processed proteins & peptides Human genes 0.000 description 34
- -1 physical Chemical class 0.000 description 33
- 102000040430 polynucleotide Human genes 0.000 description 32
- 108091033319 polynucleotide Proteins 0.000 description 32
- 239000002157 polynucleotide Substances 0.000 description 32
- 229920001184 polypeptide Polymers 0.000 description 31
- 150000001413 amino acids Chemical class 0.000 description 30
- 102100034343 Integrase Human genes 0.000 description 28
- 108091027544 Subgenomic mRNA Proteins 0.000 description 26
- 235000001014 amino acid Nutrition 0.000 description 26
- 238000006062 fragmentation reaction Methods 0.000 description 26
- 239000012139 lysis buffer Substances 0.000 description 26
- 108010073062 Transcription Activator-Like Effectors Proteins 0.000 description 25
- 238000013467 fragmentation Methods 0.000 description 25
- 230000000694 effects Effects 0.000 description 24
- 230000002255 enzymatic effect Effects 0.000 description 22
- TVEXGJYMHHTVKP-UHFFFAOYSA-N 6-oxabicyclo[3.2.1]oct-3-en-7-one Chemical compound C1C2C(=O)OC1C=CC2 TVEXGJYMHHTVKP-UHFFFAOYSA-N 0.000 description 21
- 210000004369 blood Anatomy 0.000 description 21
- 239000008280 blood Substances 0.000 description 21
- 238000009396 hybridization Methods 0.000 description 21
- 230000000670 limiting effect Effects 0.000 description 20
- 239000000178 monomer Substances 0.000 description 20
- 108091034117 Oligonucleotide Proteins 0.000 description 18
- 241000700605 Viruses Species 0.000 description 18
- 210000004899 c-terminal region Anatomy 0.000 description 17
- 239000013592 cell lysate Substances 0.000 description 17
- 241001138501 Salmonella enterica Species 0.000 description 16
- 238000011534 incubation Methods 0.000 description 16
- 241000589877 Campylobacter coli Species 0.000 description 15
- SOEGEPHNZOISMT-BYPYZUCNSA-N Gly-Ser-Gly Chemical compound NCC(=O)N[C@@H](CO)C(=O)NCC(O)=O SOEGEPHNZOISMT-BYPYZUCNSA-N 0.000 description 15
- 239000012530 fluid Substances 0.000 description 15
- 239000003550 marker Substances 0.000 description 15
- 229940088598 enzyme Drugs 0.000 description 14
- 238000010362 genome editing Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 13
- 238000001574 biopsy Methods 0.000 description 12
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 12
- 108091092584 GDNA Proteins 0.000 description 11
- 239000003599 detergent Substances 0.000 description 11
- 229920000642 polymer Polymers 0.000 description 11
- 241000894007 species Species 0.000 description 11
- 238000012408 PCR amplification Methods 0.000 description 10
- 239000003153 chemical reaction reagent Substances 0.000 description 10
- 150000001875 compounds Chemical class 0.000 description 10
- 239000000975 dye Substances 0.000 description 10
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 10
- 150000002500 ions Chemical class 0.000 description 10
- 238000012986 modification Methods 0.000 description 10
- 238000003556 assay Methods 0.000 description 9
- 230000009089 cytolysis Effects 0.000 description 9
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 9
- 230000004048 modification Effects 0.000 description 9
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 9
- 238000011002 quantification Methods 0.000 description 9
- 150000003839 salts Chemical class 0.000 description 9
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 9
- 108091093037 Peptide nucleic acid Proteins 0.000 description 8
- UIGMAMGZOJVTDN-WHFBIAKZSA-N Ser-Gly-Ser Chemical compound OC[C@H](N)C(=O)NCC(=O)N[C@@H](CO)C(O)=O UIGMAMGZOJVTDN-WHFBIAKZSA-N 0.000 description 8
- 230000001413 cellular effect Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000004927 fusion Effects 0.000 description 8
- 125000000623 heterocyclic group Chemical group 0.000 description 8
- 230000032965 negative regulation of cell volume Effects 0.000 description 8
- 239000007787 solid Substances 0.000 description 8
- 125000006850 spacer group Chemical group 0.000 description 8
- 108091005804 Peptidases Proteins 0.000 description 7
- 239000004365 Protease Substances 0.000 description 7
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 7
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 7
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 7
- 238000007792 addition Methods 0.000 description 7
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 7
- 239000000539 dimer Substances 0.000 description 7
- XKUKSGPZAADMRA-UHFFFAOYSA-N glycyl-glycyl-glycine Chemical compound NCC(=O)NCC(=O)NCC(O)=O XKUKSGPZAADMRA-UHFFFAOYSA-N 0.000 description 7
- 239000007788 liquid Substances 0.000 description 7
- 239000002777 nucleoside Substances 0.000 description 7
- 210000000056 organ Anatomy 0.000 description 7
- 235000019419 proteases Nutrition 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- 230000002441 reversible effect Effects 0.000 description 7
- 239000000243 solution Substances 0.000 description 7
- 235000000346 sugar Nutrition 0.000 description 7
- 230000008685 targeting Effects 0.000 description 7
- 241000701161 unidentified adenovirus Species 0.000 description 7
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 6
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 6
- 241000894006 Bacteria Species 0.000 description 6
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 6
- YWAQATDNEKZFFK-BYPYZUCNSA-N Gly-Gly-Ser Chemical compound NCC(=O)NCC(=O)N[C@@H](CO)C(O)=O YWAQATDNEKZFFK-BYPYZUCNSA-N 0.000 description 6
- 125000000217 alkyl group Chemical group 0.000 description 6
- 239000000090 biomarker Substances 0.000 description 6
- 239000000872 buffer Substances 0.000 description 6
- 230000006037 cell lysis Effects 0.000 description 6
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 6
- 229940104302 cytosine Drugs 0.000 description 6
- 108010050663 endodeoxyribonuclease CreI Proteins 0.000 description 6
- 239000000499 gel Substances 0.000 description 6
- 238000007481 next generation sequencing Methods 0.000 description 6
- 210000003296 saliva Anatomy 0.000 description 6
- 230000035945 sensitivity Effects 0.000 description 6
- 229910052717 sulfur Inorganic materials 0.000 description 6
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 6
- HKZAAJSTFUZYTO-LURJTMIESA-N (2s)-2-[[2-[[2-[[2-[(2-aminoacetyl)amino]acetyl]amino]acetyl]amino]acetyl]amino]-3-hydroxypropanoic acid Chemical compound NCC(=O)NCC(=O)NCC(=O)NCC(=O)N[C@@H](CO)C(O)=O HKZAAJSTFUZYTO-LURJTMIESA-N 0.000 description 5
- VOUUHEHYSHWUHG-UWVGGRQHSA-N (2s)-2-[[2-[[2-[[2-[[(2s)-2-[[2-[[2-[(2-aminoacetyl)amino]acetyl]amino]acetyl]amino]-3-hydroxypropanoyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]-3-hydroxypropanoic acid Chemical compound NCC(=O)NCC(=O)NCC(=O)N[C@@H](CO)C(=O)NCC(=O)NCC(=O)NCC(=O)N[C@@H](CO)C(O)=O VOUUHEHYSHWUHG-UWVGGRQHSA-N 0.000 description 5
- 229930024421 Adenine Natural products 0.000 description 5
- 241000701022 Cytomegalovirus Species 0.000 description 5
- 230000004544 DNA amplification Effects 0.000 description 5
- 238000001712 DNA sequencing Methods 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 5
- 241000701085 Human alphaherpesvirus 3 Species 0.000 description 5
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 5
- 241000700584 Simplexvirus Species 0.000 description 5
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 5
- 229960000643 adenine Drugs 0.000 description 5
- 150000001408 amides Chemical group 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 238000001976 enzyme digestion Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 150000002243 furanoses Chemical group 0.000 description 5
- 208000015181 infectious disease Diseases 0.000 description 5
- 230000017730 intein-mediated protein splicing Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 150000003833 nucleoside derivatives Chemical class 0.000 description 5
- 244000045947 parasite Species 0.000 description 5
- 239000002987 primer (paints) Substances 0.000 description 5
- 241001529453 unidentified herpesvirus Species 0.000 description 5
- 210000002700 urine Anatomy 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 4
- 241000283690 Bos taurus Species 0.000 description 4
- 238000010354 CRISPR gene editing Methods 0.000 description 4
- 241000606153 Chlamydia trachomatis Species 0.000 description 4
- 201000007336 Cryptococcosis Diseases 0.000 description 4
- 241000221204 Cryptococcus neoformans Species 0.000 description 4
- 102220605874 Cytosolic arginine sensor for mTORC1 subunit 2_D10A_mutation Human genes 0.000 description 4
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 4
- BCCRXDTUTZHDEU-VKHMYHEASA-N Gly-Ser Chemical compound NCC(=O)N[C@@H](CO)C(O)=O BCCRXDTUTZHDEU-VKHMYHEASA-N 0.000 description 4
- 241000700721 Hepatitis B virus Species 0.000 description 4
- 241000228404 Histoplasma capsulatum Species 0.000 description 4
- 241000701806 Human papillomavirus Species 0.000 description 4
- 241000187479 Mycobacterium tuberculosis Species 0.000 description 4
- 241000588650 Neisseria meningitidis Species 0.000 description 4
- 102000007079 Peptide Fragments Human genes 0.000 description 4
- 108010033276 Peptide Fragments Proteins 0.000 description 4
- 241000223960 Plasmodium falciparum Species 0.000 description 4
- YMTLKLXDFCSCNX-BYPYZUCNSA-N Ser-Gly-Gly Chemical compound OC[C@H](N)C(=O)NCC(=O)NCC(O)=O YMTLKLXDFCSCNX-BYPYZUCNSA-N 0.000 description 4
- 241000193996 Streptococcus pyogenes Species 0.000 description 4
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 4
- 238000002835 absorbance Methods 0.000 description 4
- 238000000137 annealing Methods 0.000 description 4
- 229910052799 carbon Inorganic materials 0.000 description 4
- 239000002738 chelating agent Substances 0.000 description 4
- 239000003638 chemical reducing agent Substances 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 4
- 229940038705 chlamydia trachomatis Drugs 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 4
- 230000029087 digestion Effects 0.000 description 4
- 238000001962 electrophoresis Methods 0.000 description 4
- 230000002550 fecal effect Effects 0.000 description 4
- 238000002875 fluorescence polarization Methods 0.000 description 4
- 108010067216 glycyl-glycyl-glycine Proteins 0.000 description 4
- 229910052739 hydrogen Inorganic materials 0.000 description 4
- 239000001257 hydrogen Substances 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000007834 ligase chain reaction Methods 0.000 description 4
- YFVGRULMIQXYNE-UHFFFAOYSA-M lithium;dodecyl sulfate Chemical compound [Li+].CCCCCCCCCCCCOS([O-])(=O)=O YFVGRULMIQXYNE-UHFFFAOYSA-M 0.000 description 4
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- 108020004999 messenger RNA Proteins 0.000 description 4
- 238000002493 microarray Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 4
- 239000002243 precursor Substances 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000009870 specific binding Effects 0.000 description 4
- 108010068698 spleen exonuclease Proteins 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 229940035893 uracil Drugs 0.000 description 4
- 239000011701 zinc Substances 0.000 description 4
- 229910052725 zinc Inorganic materials 0.000 description 4
- CSCPPACGZOOCGX-UHFFFAOYSA-N Acetone Chemical compound CC(C)=O CSCPPACGZOOCGX-UHFFFAOYSA-N 0.000 description 3
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 3
- 241000193830 Bacillus <bacterium> Species 0.000 description 3
- 241000193755 Bacillus cereus Species 0.000 description 3
- 230000007018 DNA scission Effects 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- BPLNJYHNAJVLRT-ACZMJKKPSA-N Glu-Ser-Ala Chemical compound [H]N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(O)=O BPLNJYHNAJVLRT-ACZMJKKPSA-N 0.000 description 3
- UMZHHILWZBFPGL-LOKLDPHHSA-N Glu-Thr-Pro Chemical compound C[C@H]([C@@H](C(=O)N1CCC[C@@H]1C(=O)O)NC(=O)[C@H](CCC(=O)O)N)O UMZHHILWZBFPGL-LOKLDPHHSA-N 0.000 description 3
- FFALDIDGPLUDKV-ZDLURKLDSA-N Gly-Thr-Ser Chemical compound [H]NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(O)=O FFALDIDGPLUDKV-ZDLURKLDSA-N 0.000 description 3
- 108091029499 Group II intron Proteins 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 3
- 241000204031 Mycoplasma Species 0.000 description 3
- 241000588653 Neisseria Species 0.000 description 3
- 241000244206 Nematoda Species 0.000 description 3
- 108091007494 Nucleic acid- binding domains Proteins 0.000 description 3
- 241000223810 Plasmodium vivax Species 0.000 description 3
- 241000125945 Protoparvovirus Species 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 241000191967 Staphylococcus aureus Species 0.000 description 3
- 241000244155 Taenia Species 0.000 description 3
- 241000223997 Toxoplasma gondii Species 0.000 description 3
- 241000223109 Trypanosoma cruzi Species 0.000 description 3
- 241000607618 Vibrio harveyi Species 0.000 description 3
- 208000000260 Warts Diseases 0.000 description 3
- 239000013060 biological fluid Substances 0.000 description 3
- 210000001185 bone marrow Anatomy 0.000 description 3
- 150000001720 carbohydrates Chemical class 0.000 description 3
- 101150038500 cas9 gene Proteins 0.000 description 3
- 150000001768 cations Chemical class 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 230000021615 conjugation Effects 0.000 description 3
- 230000034994 death Effects 0.000 description 3
- 238000004925 denaturation Methods 0.000 description 3
- 230000036425 denaturation Effects 0.000 description 3
- 230000007062 hydrolysis Effects 0.000 description 3
- 238000006460 hydrolysis reaction Methods 0.000 description 3
- 230000000415 inactivating effect Effects 0.000 description 3
- 238000010348 incorporation Methods 0.000 description 3
- 238000009830 intercalation Methods 0.000 description 3
- 150000002632 lipids Chemical group 0.000 description 3
- 238000007403 mPCR Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 239000002105 nanoparticle Substances 0.000 description 3
- 229910052760 oxygen Inorganic materials 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 150000004713 phosphodiesters Chemical class 0.000 description 3
- 125000004437 phosphorous atom Chemical group 0.000 description 3
- 108010029020 prolylglycine Proteins 0.000 description 3
- 238000011897 real-time detection Methods 0.000 description 3
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 3
- 238000010839 reverse transcription Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 201000010153 skin papilloma Diseases 0.000 description 3
- 235000019333 sodium laurylsulphate Nutrition 0.000 description 3
- 238000010561 standard procedure Methods 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 150000003573 thiols Chemical class 0.000 description 3
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 3
- 229940113082 thymine Drugs 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- ALBODLTZUXKBGZ-JUUVMNCLSA-N (2s)-2-amino-3-phenylpropanoic acid;(2s)-2,6-diaminohexanoic acid Chemical compound NCCCC[C@H](N)C(O)=O.OC(=O)[C@@H](N)CC1=CC=CC=C1 ALBODLTZUXKBGZ-JUUVMNCLSA-N 0.000 description 2
- UFSCXDAOCAIFOG-UHFFFAOYSA-N 1,10-dihydropyrimido[5,4-b][1,4]benzothiazin-2-one Chemical compound S1C2=CC=CC=C2N=C2C1=CNC(=O)N2 UFSCXDAOCAIFOG-UHFFFAOYSA-N 0.000 description 2
- FGRBYDKOBBBPOI-UHFFFAOYSA-N 10,10-dioxo-2-[4-(N-phenylanilino)phenyl]thioxanthen-9-one Chemical compound O=C1c2ccccc2S(=O)(=O)c2ccc(cc12)-c1ccc(cc1)N(c1ccccc1)c1ccccc1 FGRBYDKOBBBPOI-UHFFFAOYSA-N 0.000 description 2
- WJFKNYWRSNBZNX-UHFFFAOYSA-N 10H-phenothiazine Chemical compound C1=CC=C2NC3=CC=CC=C3SC2=C1 WJFKNYWRSNBZNX-UHFFFAOYSA-N 0.000 description 2
- PIINGYXNCHTJTF-UHFFFAOYSA-N 2-(2-azaniumylethylamino)acetate Chemical group NCCNCC(O)=O PIINGYXNCHTJTF-UHFFFAOYSA-N 0.000 description 2
- FZWGECJQACGGTI-UHFFFAOYSA-N 2-amino-7-methyl-1,7-dihydro-6H-purin-6-one Chemical compound NC1=NC(O)=C2N(C)C=NC2=N1 FZWGECJQACGGTI-UHFFFAOYSA-N 0.000 description 2
- PDBUTMYDZLUVCP-UHFFFAOYSA-N 3,4-dihydro-1,4-benzoxazin-2-one Chemical compound C1=CC=C2OC(=O)CNC2=C1 PDBUTMYDZLUVCP-UHFFFAOYSA-N 0.000 description 2
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 2
- OVONXEQGWXGFJD-UHFFFAOYSA-N 4-sulfanylidene-1h-pyrimidin-2-one Chemical compound SC=1C=CNC(=O)N=1 OVONXEQGWXGFJD-UHFFFAOYSA-N 0.000 description 2
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- HCGHYQLFMPXSDU-UHFFFAOYSA-N 7-methyladenine Chemical compound C1=NC(N)=C2N(C)C=NC2=N1 HCGHYQLFMPXSDU-UHFFFAOYSA-N 0.000 description 2
- UJOBWOGCFQCDNV-UHFFFAOYSA-N 9H-carbazole Chemical compound C1=CC=C2C3=CC=CC=C3NC2=C1 UJOBWOGCFQCDNV-UHFFFAOYSA-N 0.000 description 2
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 2
- LRFVTYWOQMYALW-UHFFFAOYSA-N 9H-xanthine Chemical compound O=C1NC(=O)NC2=C1NC=N2 LRFVTYWOQMYALW-UHFFFAOYSA-N 0.000 description 2
- 241000606748 Actinobacillus pleuropneumoniae Species 0.000 description 2
- 241000948980 Actinobacillus succinogenes Species 0.000 description 2
- 241000606731 Actinobacillus suis Species 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 2
- 108091093088 Amplicon Proteins 0.000 description 2
- 241000193399 Bacillus smithii Species 0.000 description 2
- 241000193388 Bacillus thuringiensis Species 0.000 description 2
- 241000589567 Brucella abortus Species 0.000 description 2
- 241000589876 Campylobacter Species 0.000 description 2
- 241000222122 Candida albicans Species 0.000 description 2
- 241000223205 Coccidioides immitis Species 0.000 description 2
- 241000195493 Cryptophyta Species 0.000 description 2
- 239000003155 DNA primer Substances 0.000 description 2
- 241000450599 DNA viruses Species 0.000 description 2
- 241000702421 Dependoparvovirus Species 0.000 description 2
- 241000223932 Eimeria tenella Species 0.000 description 2
- 108010067770 Endopeptidase K Proteins 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- 241000702463 Geminiviridae Species 0.000 description 2
- 241000193385 Geobacillus stearothermophilus Species 0.000 description 2
- 241000224466 Giardia Species 0.000 description 2
- 241000606768 Haemophilus influenzae Species 0.000 description 2
- 241000711549 Hepacivirus C Species 0.000 description 2
- 241000700739 Hepadnaviridae Species 0.000 description 2
- 241000046923 Human bocavirus Species 0.000 description 2
- 241001502974 Human gammaherpesvirus 8 Species 0.000 description 2
- 241000702617 Human parvovirus B19 Species 0.000 description 2
- 241001651351 Ichtadenovirus Species 0.000 description 2
- 108010061833 Integrases Proteins 0.000 description 2
- 241000589242 Legionella pneumophila Species 0.000 description 2
- 241000186779 Listeria monocytogenes Species 0.000 description 2
- 241001112727 Listeriaceae Species 0.000 description 2
- 241000192041 Micrococcus Species 0.000 description 2
- 241000700627 Monkeypox virus Species 0.000 description 2
- 241000202938 Mycoplasma hyorhinis Species 0.000 description 2
- 241001336717 Nanoviridae Species 0.000 description 2
- 241000588654 Neisseria cinerea Species 0.000 description 2
- 241000588652 Neisseria gonorrhoeae Species 0.000 description 2
- 108010006232 Neuraminidase Proteins 0.000 description 2
- 102000005348 Neuraminidase Human genes 0.000 description 2
- 241000700635 Orf virus Species 0.000 description 2
- 241000606856 Pasteurella multocida Species 0.000 description 2
- 108090000284 Pepsin A Proteins 0.000 description 2
- 102000057297 Pepsin A Human genes 0.000 description 2
- 241001505332 Polyomavirus sp. Species 0.000 description 2
- 241000589517 Pseudomonas aeruginosa Species 0.000 description 2
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 2
- 208000035977 Rare disease Diseases 0.000 description 2
- 241000190950 Rhodopseudomonas palustris Species 0.000 description 2
- 241000242677 Schistosoma japonicum Species 0.000 description 2
- 241000242680 Schistosoma mansoni Species 0.000 description 2
- 108010052160 Site-specific recombinase Proteins 0.000 description 2
- 241000194017 Streptococcus Species 0.000 description 2
- 241000193985 Streptococcus agalactiae Species 0.000 description 2
- 244000057717 Streptococcus lactis Species 0.000 description 2
- 235000014897 Streptococcus lactis Nutrition 0.000 description 2
- 241000244154 Taenia ovis Species 0.000 description 2
- 241000404000 Tanapox virus Species 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 2
- XKWABWFMQXMUMT-HJGDQZAQSA-N Thr-Pro-Glu Chemical compound [H]N[C@@H]([C@@H](C)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(O)=O XKWABWFMQXMUMT-HJGDQZAQSA-N 0.000 description 2
- 241000223996 Toxoplasma Species 0.000 description 2
- 108091028113 Trans-activating crRNA Proteins 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 241000589884 Treponema pallidum Species 0.000 description 2
- 241000224526 Trichomonas Species 0.000 description 2
- 241000223105 Trypanosoma brucei Species 0.000 description 2
- 108090000631 Trypsin Proteins 0.000 description 2
- 102000004142 Trypsin Human genes 0.000 description 2
- 241000700618 Vaccinia virus Species 0.000 description 2
- 241000700647 Variola virus Species 0.000 description 2
- 241000710886 West Nile virus Species 0.000 description 2
- 241000589634 Xanthomonas Species 0.000 description 2
- 241001536558 Yaba monkey tumor virus Species 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 125000000304 alkynyl group Chemical group 0.000 description 2
- 125000000539 amino acid group Chemical group 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 125000004429 atom Chemical group 0.000 description 2
- 229940097012 bacillus thuringiensis Drugs 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000000941 bile Anatomy 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- NXVYSVARUKNFNF-UHFFFAOYSA-N bis(2,5-dioxopyrrolidin-1-yl) 2,3-dihydroxybutanedioate Chemical compound O=C1CCC(=O)N1OC(=O)C(O)C(O)C(=O)ON1C(=O)CCC1=O NXVYSVARUKNFNF-UHFFFAOYSA-N 0.000 description 2
- LNQHREYHFRFJAU-UHFFFAOYSA-N bis(2,5-dioxopyrrolidin-1-yl) pentanedioate Chemical compound O=C1CCC(=O)N1OC(=O)CCCC(=O)ON1C(=O)CCC1=O LNQHREYHFRFJAU-UHFFFAOYSA-N 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 239000010836 blood and blood product Substances 0.000 description 2
- 229940125691 blood product Drugs 0.000 description 2
- 235000010633 broth Nutrition 0.000 description 2
- 229940056450 brucella abortus Drugs 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 229940095731 candida albicans Drugs 0.000 description 2
- 238000005251 capillar electrophoresis Methods 0.000 description 2
- 235000014633 carbohydrates Nutrition 0.000 description 2
- 150000001721 carbon Chemical group 0.000 description 2
- 239000008004 cell lysis buffer Substances 0.000 description 2
- 239000002771 cell marker Substances 0.000 description 2
- YTRQFSDWAXHJCC-UHFFFAOYSA-N chloroform;phenol Chemical compound ClC(Cl)Cl.OC1=CC=CC=C1 YTRQFSDWAXHJCC-UHFFFAOYSA-N 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 238000004737 colorimetric analysis Methods 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000004132 cross linking Methods 0.000 description 2
- 125000000753 cycloalkyl group Chemical group 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000005238 degreasing Methods 0.000 description 2
- 238000007435 diagnostic evaluation Methods 0.000 description 2
- 102000038379 digestive enzymes Human genes 0.000 description 2
- 108091007734 digestive enzymes Proteins 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 2
- 229960005542 ethidium bromide Drugs 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000000855 fermentation Methods 0.000 description 2
- 230000004151 fermentation Effects 0.000 description 2
- MHMNJMPURVTYEJ-UHFFFAOYSA-N fluorescein-5-isothiocyanate Chemical compound O1C(=O)C2=CC(N=C=S)=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 MHMNJMPURVTYEJ-UHFFFAOYSA-N 0.000 description 2
- 238000007672 fourth generation sequencing Methods 0.000 description 2
- 238000001502 gel electrophoresis Methods 0.000 description 2
- 229930182470 glycoside Natural products 0.000 description 2
- 150000002338 glycosides Chemical class 0.000 description 2
- 108010001064 glycyl-glycyl-glycyl-glycine Proteins 0.000 description 2
- 229910052736 halogen Inorganic materials 0.000 description 2
- 244000000013 helminth Species 0.000 description 2
- 125000005842 heteroatom Chemical group 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 230000002458 infectious effect Effects 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 229940115932 legionella pneumophila Drugs 0.000 description 2
- KWGKDLIKAYFUFQ-UHFFFAOYSA-M lithium chloride Chemical compound [Li+].[Cl-] KWGKDLIKAYFUFQ-UHFFFAOYSA-M 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 210000002751 lymph Anatomy 0.000 description 2
- 239000006166 lysate Substances 0.000 description 2
- 230000002934 lysing effect Effects 0.000 description 2
- 201000004792 malaria Diseases 0.000 description 2
- 210000004962 mammalian cell Anatomy 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000004949 mass spectrometry Methods 0.000 description 2
- 238000007855 methylation-specific PCR Methods 0.000 description 2
- 125000001570 methylene group Chemical group [H]C([H])([*:1])[*:2] 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 239000003147 molecular marker Substances 0.000 description 2
- 210000003097 mucus Anatomy 0.000 description 2
- 238000004848 nephelometry Methods 0.000 description 2
- 125000003835 nucleoside group Chemical group 0.000 description 2
- 239000002751 oligonucleotide probe Substances 0.000 description 2
- 239000012188 paraffin wax Substances 0.000 description 2
- 229940051027 pasteurella multocida Drugs 0.000 description 2
- HMFHBZSHGGEWLO-UHFFFAOYSA-N pentofuranose Chemical compound OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 2
- 229940111202 pepsin Drugs 0.000 description 2
- 229950000688 phenothiazine Drugs 0.000 description 2
- 150000002991 phenoxazines Chemical class 0.000 description 2
- PTMHPRAIXMAOOB-UHFFFAOYSA-L phosphoramidate Chemical compound NP([O-])([O-])=O PTMHPRAIXMAOOB-UHFFFAOYSA-L 0.000 description 2
- 150000008300 phosphoramidites Chemical class 0.000 description 2
- 206010035114 pityriasis rosea Diseases 0.000 description 2
- XJMOSONTPMZWPB-UHFFFAOYSA-M propidium iodide Chemical compound [I-].[I-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CCC[N+](C)(CC)CC)=C1C1=CC=CC=C1 XJMOSONTPMZWPB-UHFFFAOYSA-M 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000012340 reverse transcriptase PCR Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 210000000582 semen Anatomy 0.000 description 2
- 238000004904 shortening Methods 0.000 description 2
- 239000000344 soap Substances 0.000 description 2
- 239000002689 soil Substances 0.000 description 2
- 239000007790 solid phase Substances 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 210000001179 synovial fluid Anatomy 0.000 description 2
- 210000001138 tear Anatomy 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 210000001541 thymus gland Anatomy 0.000 description 2
- 238000007862 touchdown PCR Methods 0.000 description 2
- 239000012588 trypsin Substances 0.000 description 2
- 241000712461 unidentified influenza virus Species 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- AGGWFDNPHKLBBV-YUMQZZPRSA-N (2s)-2-[[(2s)-2-amino-3-methylbutanoyl]amino]-5-(carbamoylamino)pentanoic acid Chemical compound CC(C)[C@H](N)C(=O)N[C@H](C(O)=O)CCCNC(N)=O AGGWFDNPHKLBBV-YUMQZZPRSA-N 0.000 description 1
- KUHSEZKIEJYEHN-BXRBKJIMSA-N (2s)-2-amino-3-hydroxypropanoic acid;(2s)-2-aminopropanoic acid Chemical compound C[C@H](N)C(O)=O.OC[C@H](N)C(O)=O KUHSEZKIEJYEHN-BXRBKJIMSA-N 0.000 description 1
- PTFYZDMJTFMPQW-UHFFFAOYSA-N 1,10-dihydropyrimido[5,4-b][1,4]benzoxazin-2-one Chemical compound O1C2=CC=CC=C2N=C2C1=CNC(=O)N2 PTFYZDMJTFMPQW-UHFFFAOYSA-N 0.000 description 1
- TZMSYXZUNZXBOL-UHFFFAOYSA-N 10H-phenoxazine Chemical compound C1=CC=C2NC3=CC=CC=C3OC2=C1 TZMSYXZUNZXBOL-UHFFFAOYSA-N 0.000 description 1
- UHUHBFMZVCOEOV-UHFFFAOYSA-N 1h-imidazo[4,5-c]pyridin-4-amine Chemical compound NC1=NC=CC2=C1N=CN2 UHUHBFMZVCOEOV-UHFFFAOYSA-N 0.000 description 1
- QWTLUPDHBKBULE-UHFFFAOYSA-N 2-[[2-[[2-[[2-[[2-[[2-[[2-[[2-[[2-[(2-aminoacetyl)amino]acetyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]acetyl]amino]acetic acid Chemical compound NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(=O)NCC(O)=O QWTLUPDHBKBULE-UHFFFAOYSA-N 0.000 description 1
- WKMPTBDYDNUJLF-UHFFFAOYSA-N 2-fluoroadenine Chemical compound NC1=NC(F)=NC2=C1N=CN2 WKMPTBDYDNUJLF-UHFFFAOYSA-N 0.000 description 1
- ZLOIGESWDJYCTF-XVFCMESISA-N 4-thiouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=S)C=C1 ZLOIGESWDJYCTF-XVFCMESISA-N 0.000 description 1
- JWBWJOKTZVXSRT-DWQAGKKUSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-2-aminopentanoic acid Chemical compound N1C(=O)N[C@@H]2[C@H](CCCC(N)C(O)=O)SC[C@@H]21 JWBWJOKTZVXSRT-DWQAGKKUSA-N 0.000 description 1
- LQLQRFGHAALLLE-UHFFFAOYSA-N 5-bromouracil Chemical compound BrC1=CNC(=O)NC1=O LQLQRFGHAALLLE-UHFFFAOYSA-N 0.000 description 1
- ZLAQATDNGLKIEV-UHFFFAOYSA-N 5-methyl-2-sulfanylidene-1h-pyrimidin-4-one Chemical compound CC1=CNC(=S)NC1=O ZLAQATDNGLKIEV-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- DCPSTSVLRXOYGS-UHFFFAOYSA-N 6-amino-1h-pyrimidine-2-thione Chemical compound NC1=CC=NC(S)=N1 DCPSTSVLRXOYGS-UHFFFAOYSA-N 0.000 description 1
- LOSIULRWFAEMFL-UHFFFAOYSA-N 7-deazaguanine Chemical compound O=C1NC(N)=NC2=C1CC=N2 LOSIULRWFAEMFL-UHFFFAOYSA-N 0.000 description 1
- HRYKDUPGBWLLHO-UHFFFAOYSA-N 8-azaadenine Chemical compound NC1=NC=NC2=NNN=C12 HRYKDUPGBWLLHO-UHFFFAOYSA-N 0.000 description 1
- LPXQRXLUHJKZIE-UHFFFAOYSA-N 8-azaguanine Chemical compound NC1=NC(O)=C2NN=NC2=N1 LPXQRXLUHJKZIE-UHFFFAOYSA-N 0.000 description 1
- 229960005508 8-azaguanine Drugs 0.000 description 1
- FJNCXZZQNBKEJT-UHFFFAOYSA-N 8beta-hydroxymarrubiin Natural products O1C(=O)C2(C)CCCC3(C)C2C1CC(C)(O)C3(O)CCC=1C=COC=1 FJNCXZZQNBKEJT-UHFFFAOYSA-N 0.000 description 1
- 241000203022 Acholeplasma laidlawii Species 0.000 description 1
- 241001600124 Acidovorax avenae Species 0.000 description 1
- 241000186046 Actinomyces Species 0.000 description 1
- 241001147825 Actinomyces sp. Species 0.000 description 1
- 101100385358 Alicyclobacillus acidoterrestris (strain ATCC 49025 / DSM 3922 / CIP 106132 / NCIMB 13137 / GD3B) cas12b gene Proteins 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 208000008710 Amebic Dysentery Diseases 0.000 description 1
- 241001621927 Aminomonas Species 0.000 description 1
- 241001621924 Aminomonas paucivorans Species 0.000 description 1
- 206010001986 Amoebic dysentery Diseases 0.000 description 1
- 241000024188 Andala Species 0.000 description 1
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 206010053555 Arthritis bacterial Diseases 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 102100032481 B-cell CLL/lymphoma 9 protein Human genes 0.000 description 1
- 102100032424 B-cell CLL/lymphoma 9-like protein Human genes 0.000 description 1
- 241000223836 Babesia Species 0.000 description 1
- 241000223838 Babesia bovis Species 0.000 description 1
- 241000193752 Bacillus circulans Species 0.000 description 1
- 241000606125 Bacteroides Species 0.000 description 1
- 241000228405 Blastomyces dermatitidis Species 0.000 description 1
- 241000589957 Blastopirellula marina Species 0.000 description 1
- 241000120506 Bluetongue virus Species 0.000 description 1
- 241000589171 Bradyrhizobium sp. Species 0.000 description 1
- 244000304217 Brassica oleracea var. gongylodes Species 0.000 description 1
- 241000193417 Brevibacillus laterosporus Species 0.000 description 1
- 241000186146 Brevibacterium Species 0.000 description 1
- 241000269417 Bufo Species 0.000 description 1
- 241001678559 COVID-19 virus Species 0.000 description 1
- 101150017047 CSM3 gene Proteins 0.000 description 1
- 101150069031 CSN2 gene Proteins 0.000 description 1
- 101150078885 CSY3 gene Proteins 0.000 description 1
- 101100167280 Caenorhabditis elegans cin-4 gene Proteins 0.000 description 1
- 241000282832 Camelidae Species 0.000 description 1
- 241000589875 Campylobacter jejuni Species 0.000 description 1
- 101100485230 Campylobacter jejuni subsp. jejuni serotype O:2 (strain ATCC 700819 / NCTC 11168) xerH gene Proteins 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical group [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241000242722 Cestoda Species 0.000 description 1
- 241000283153 Cetacea Species 0.000 description 1
- 229920002101 Chitin Polymers 0.000 description 1
- 241000251730 Chondrichthyes Species 0.000 description 1
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 1
- 244000241235 Citrullus lanatus Species 0.000 description 1
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 description 1
- 241000193468 Clostridium perfringens Species 0.000 description 1
- 208000003495 Coccidiosis Diseases 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 101100329224 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) cpf1 gene Proteins 0.000 description 1
- KQLDDLUWUFBQHP-UHFFFAOYSA-N Cordycepin Natural products C1=NC=2C(N)=NC=NC=2N1C1OCC(CO)C1O KQLDDLUWUFBQHP-UHFFFAOYSA-N 0.000 description 1
- 241000186216 Corynebacterium Species 0.000 description 1
- 241000186227 Corynebacterium diphtheriae Species 0.000 description 1
- 241001125840 Coryphaenidae Species 0.000 description 1
- 108091029523 CpG island Proteins 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 201000003808 Cystic echinococcosis Diseases 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 238000007399 DNA isolation Methods 0.000 description 1
- 208000001490 Dengue Diseases 0.000 description 1
- 206010012310 Dengue fever Diseases 0.000 description 1
- 241000725619 Dengue virus Species 0.000 description 1
- LTMHDMANZUZIPE-AMTYYWEZSA-N Digoxin Natural products O([C@H]1[C@H](C)O[C@H](O[C@@H]2C[C@@H]3[C@@](C)([C@@H]4[C@H]([C@]5(O)[C@](C)([C@H](O)C4)[C@H](C4=CC(=O)OC4)CC5)CC3)CC2)C[C@@H]1O)[C@H]1O[C@H](C)[C@@H](O[C@H]2O[C@@H](C)[C@H](O)[C@@H](O)C2)[C@@H](O)C1 LTMHDMANZUZIPE-AMTYYWEZSA-N 0.000 description 1
- 108010016626 Dipeptides Proteins 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 241000244160 Echinococcus Species 0.000 description 1
- 241000244170 Echinococcus granulosus Species 0.000 description 1
- 101100275895 Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139) csnB gene Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 241000991587 Enterovirus C Species 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 101100007788 Escherichia coli (strain K12) casA gene Proteins 0.000 description 1
- 101100007792 Escherichia coli (strain K12) casB gene Proteins 0.000 description 1
- 101100219622 Escherichia coli (strain K12) casC gene Proteins 0.000 description 1
- 101100382541 Escherichia coli (strain K12) casD gene Proteins 0.000 description 1
- 101100046554 Escherichia coli (strain K12) tnpX gene Proteins 0.000 description 1
- 101100326871 Escherichia coli (strain K12) ygbF gene Proteins 0.000 description 1
- 101100005249 Escherichia coli (strain K12) ygcB gene Proteins 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 101150096839 Fcmr gene Proteins 0.000 description 1
- 241000714165 Feline leukemia virus Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 208000005577 Gastroenteritis Diseases 0.000 description 1
- 206010017943 Gastrointestinal conditions Diseases 0.000 description 1
- 208000018522 Gastrointestinal disease Diseases 0.000 description 1
- WCORRBXVISTKQL-WHFBIAKZSA-N Gly-Ser-Ser Chemical compound NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(O)=O WCORRBXVISTKQL-WHFBIAKZSA-N 0.000 description 1
- 241000282575 Gorilla Species 0.000 description 1
- 201000005569 Gout Diseases 0.000 description 1
- 108060003760 HNH nuclease Proteins 0.000 description 1
- 102000029812 HNH nuclease Human genes 0.000 description 1
- 241000168525 Haematococcus Species 0.000 description 1
- 241000606790 Haemophilus Species 0.000 description 1
- 241000709721 Hepatovirus A Species 0.000 description 1
- 241000224421 Heterolobosea Species 0.000 description 1
- 108050008836 Holliday junction resolvase Hjc Proteins 0.000 description 1
- 102100030307 Homeobox protein Hox-A13 Human genes 0.000 description 1
- 102100039545 Homeobox protein Hox-D11 Human genes 0.000 description 1
- 241001272567 Hominoidea Species 0.000 description 1
- 101000798495 Homo sapiens B-cell CLL/lymphoma 9 protein Proteins 0.000 description 1
- 101000798491 Homo sapiens B-cell CLL/lymphoma 9-like protein Proteins 0.000 description 1
- 101000918311 Homo sapiens Exostosin-1 Proteins 0.000 description 1
- 101000962591 Homo sapiens Homeobox protein Hox-D11 Proteins 0.000 description 1
- 101001053270 Homo sapiens Insulin gene enhancer protein ISL-2 Proteins 0.000 description 1
- 241000701074 Human alphaherpesvirus 2 Species 0.000 description 1
- 241000404944 Human parvovirus 4 G1 Species 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 108010000178 IGF-I-IGFBP-3 complex Proteins 0.000 description 1
- 108010042653 IgA receptor Proteins 0.000 description 1
- 206010061598 Immunodeficiency Diseases 0.000 description 1
- 208000029462 Immunodeficiency disease Diseases 0.000 description 1
- 208000004575 Infectious Arthritis Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100024390 Insulin gene enhancer protein ISL-2 Human genes 0.000 description 1
- 102000012330 Integrases Human genes 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 206010023076 Isosporiasis Diseases 0.000 description 1
- FADYJNXDPBKVCA-UHFFFAOYSA-N L-Phenylalanyl-L-lysin Natural products NCCCCC(C(O)=O)NC(=O)C(N)CC1=CC=CC=C1 FADYJNXDPBKVCA-UHFFFAOYSA-N 0.000 description 1
- 241000186660 Lactobacillus Species 0.000 description 1
- 240000001046 Lactobacillus acidophilus Species 0.000 description 1
- 235000013956 Lactobacillus acidophilus Nutrition 0.000 description 1
- 241000282838 Lama Species 0.000 description 1
- 241000272168 Laridae Species 0.000 description 1
- 241000222722 Leishmania <genus> Species 0.000 description 1
- 241000222736 Leishmania tropica Species 0.000 description 1
- 241000712899 Lymphocytic choriomeningitis mammarenavirus Species 0.000 description 1
- 241000829100 Macaca mulatta polyomavirus 1 Species 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 241001357706 Marinitoga piezophila Species 0.000 description 1
- 241000712079 Measles morbillivirus Species 0.000 description 1
- 241001068914 Melicope knudsenii Species 0.000 description 1
- 241000002163 Mesapamea fractilinea Species 0.000 description 1
- 241000520674 Mesocestoides corti Species 0.000 description 1
- RJQXTJLFIWVMTO-TYNCELHUSA-N Methicillin Chemical compound COC1=CC=CC(OC)=C1C(=O)N[C@@H]1C(=O)N2[C@@H](C(O)=O)C(C)(C)S[C@@H]21 RJQXTJLFIWVMTO-TYNCELHUSA-N 0.000 description 1
- 241000589351 Methylosinus trichosporium Species 0.000 description 1
- 241000203732 Mobiluncus mulieris Species 0.000 description 1
- 241000713333 Mouse mammary tumor virus Species 0.000 description 1
- 241000711386 Mumps virus Species 0.000 description 1
- 241000714177 Murine leukemia virus Species 0.000 description 1
- 241000711408 Murine respirovirus Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 208000023178 Musculoskeletal disease Diseases 0.000 description 1
- 241000186359 Mycobacterium Species 0.000 description 1
- 241000186362 Mycobacterium leprae Species 0.000 description 1
- 241000202956 Mycoplasma arthritidis Species 0.000 description 1
- 241000204045 Mycoplasma hyopneumoniae Species 0.000 description 1
- 241000202894 Mycoplasma orale Species 0.000 description 1
- 241000202934 Mycoplasma pneumoniae Species 0.000 description 1
- 241000202889 Mycoplasma salivarium Species 0.000 description 1
- 101100387128 Myxococcus xanthus (strain DK1622) devR gene Proteins 0.000 description 1
- 101100387131 Myxococcus xanthus (strain DK1622) devS gene Proteins 0.000 description 1
- 241000169176 Natronobacterium gregoryi Species 0.000 description 1
- 241000109432 Neisseria bacilliformis Species 0.000 description 1
- 241000588651 Neisseria flavescens Species 0.000 description 1
- 241000588649 Neisseria lactamica Species 0.000 description 1
- 241000086765 Neisseria wadsworthii Species 0.000 description 1
- 241000143395 Nitrosomonas sp. Species 0.000 description 1
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 description 1
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 description 1
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 1
- 241000243985 Onchocerca volvulus Species 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 241000282579 Pan Species 0.000 description 1
- 241001631646 Papillomaviridae Species 0.000 description 1
- 241001386755 Parvibaculum lavamentivorans Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 241000801571 Phascolarctobacterium succinatutens Species 0.000 description 1
- CXOFVDLJLONNDW-UHFFFAOYSA-N Phenytoin Chemical compound N1C(=O)NC(=O)C1(C=1C=CC=CC=1)C1=CC=CC=C1 CXOFVDLJLONNDW-UHFFFAOYSA-N 0.000 description 1
- 241001040659 Plasmodium (Plasmodium) Species 0.000 description 1
- 208000002151 Pleural effusion Diseases 0.000 description 1
- 229920001213 Polysorbate 20 Polymers 0.000 description 1
- 239000004793 Polystyrene Substances 0.000 description 1
- 101710178069 Probable outer membrane protein PmpA Proteins 0.000 description 1
- 102100034014 Prolyl 3-hydroxylase 3 Human genes 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 102000002067 Protein Subunits Human genes 0.000 description 1
- 206010037075 Protozoal infections Diseases 0.000 description 1
- 229930185560 Pseudouridine Natural products 0.000 description 1
- PTJWIQPHWPFNBW-UHFFFAOYSA-N Pseudouridine C Natural products OC1C(O)C(CO)OC1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-UHFFFAOYSA-N 0.000 description 1
- 238000002123 RNA extraction Methods 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 241000711798 Rabies lyssavirus Species 0.000 description 1
- 241001135508 Ralstonia syzygii Species 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 241000702263 Reovirus sp. Species 0.000 description 1
- 241000725643 Respiratory syncytial virus Species 0.000 description 1
- 206010039101 Rhinorrhoea Diseases 0.000 description 1
- 241000589180 Rhizobium Species 0.000 description 1
- 241000158504 Rhodococcus hoagii Species 0.000 description 1
- 241000190932 Rhodopseudomonas Species 0.000 description 1
- 241001478306 Rhodovulum sp. Species 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 241000710799 Rubella virus Species 0.000 description 1
- 241000282849 Ruminantia Species 0.000 description 1
- CGNLCCVKSWNSDG-UHFFFAOYSA-N SYBR Green I Chemical compound CN(C)CCCN(CCC)C1=CC(C=C2N(C3=CC=CC=C3S2)C)=C2C=CC=CC2=[N+]1C1=CC=CC=C1 CGNLCCVKSWNSDG-UHFFFAOYSA-N 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 241000242678 Schistosoma Species 0.000 description 1
- 101100060558 Schizosaccharomyces pombe (strain 972 / ATCC 24843) coa3 gene Proteins 0.000 description 1
- 206010040047 Sepsis Diseases 0.000 description 1
- 101710145752 Serine recombinase gin Proteins 0.000 description 1
- 206010040102 Seroma Diseases 0.000 description 1
- 241000863010 Simonsiella muelleri Species 0.000 description 1
- 241000710960 Sindbis virus Species 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 241000736131 Sphingomonas Species 0.000 description 1
- 241001135759 Sphingomonas sp. Species 0.000 description 1
- 241000589970 Spirochaetales Species 0.000 description 1
- 241000439819 Sporolactobacillus vineae Species 0.000 description 1
- 101100166144 Staphylococcus aureus cas9 gene Proteins 0.000 description 1
- 241001134656 Staphylococcus lugdunensis Species 0.000 description 1
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 1
- UCKMPCXJQFINFW-UHFFFAOYSA-N Sulphide Chemical compound [S-2] UCKMPCXJQFINFW-UHFFFAOYSA-N 0.000 description 1
- 208000000389 T-cell leukemia Diseases 0.000 description 1
- 208000028530 T-cell lymphoblastic leukemia/lymphoma Diseases 0.000 description 1
- 241001672171 Taenia hydatigena Species 0.000 description 1
- 241000223779 Theileria parva Species 0.000 description 1
- 241001313699 Thermosynechococcus elongatus Species 0.000 description 1
- 101100273269 Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) cse3 gene Proteins 0.000 description 1
- 108010010574 Tn3 resolvase Proteins 0.000 description 1
- 101710183280 Topoisomerase Proteins 0.000 description 1
- 201000005485 Toxoplasmosis Diseases 0.000 description 1
- 241000283907 Tragelaphus oryx Species 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 241000242541 Trematoda Species 0.000 description 1
- 241000869417 Trematodes Species 0.000 description 1
- 241001439624 Trichina Species 0.000 description 1
- 241000243777 Trichinella spiralis Species 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 102220483600 Troponin I, cardiac muscle_E54V_mutation Human genes 0.000 description 1
- 102220483626 Troponin I, cardiac muscle_M56A_mutation Human genes 0.000 description 1
- 241000223104 Trypanosoma Species 0.000 description 1
- 241001442397 Trypanosoma brucei rhodesiense Species 0.000 description 1
- 241000223097 Trypanosoma rangeli Species 0.000 description 1
- 241001447269 Verminephrobacter eiseniae Species 0.000 description 1
- 241000711975 Vesicular stomatitis virus Species 0.000 description 1
- 241001416177 Vicugna pacos Species 0.000 description 1
- 241000710772 Yellow fever virus Species 0.000 description 1
- ULHRKLSNHXXJLO-UHFFFAOYSA-L Yo-Pro-1 Chemical class [I-].[I-].C1=CC=C2C(C=C3N(C4=CC=CC=C4O3)C)=CC=[N+](CCC[N+](C)(C)C)C2=C1 ULHRKLSNHXXJLO-UHFFFAOYSA-L 0.000 description 1
- 241000193453 [Clostridium] cellulolyticum Species 0.000 description 1
- NOXMCJDDSWCSIE-DAGMQNCNSA-N [[(2R,3S,4R,5R)-5-(2-amino-4-oxo-3H-pyrrolo[2,3-d]pyrimidin-7-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=2NC(N)=NC(=O)C=2C=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O NOXMCJDDSWCSIE-DAGMQNCNSA-N 0.000 description 1
- 206010000269 abscess Diseases 0.000 description 1
- 238000011481 absorbance measurement Methods 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000000862 absorption spectrum Methods 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- DPKHZNPWBDQZCN-UHFFFAOYSA-N acridine orange free base Chemical compound C1=CC(N(C)C)=CC2=NC3=CC(N(C)C)=CC=C3C=C21 DPKHZNPWBDQZCN-UHFFFAOYSA-N 0.000 description 1
- 208000012873 acute gastroenteritis Diseases 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 239000003513 alkali Substances 0.000 description 1
- 150000001336 alkenes Chemical class 0.000 description 1
- 125000003342 alkenyl group Chemical group 0.000 description 1
- 125000005600 alkyl phosphonate group Chemical group 0.000 description 1
- 238000003016 alphascreen Methods 0.000 description 1
- 125000004103 aminoalkyl group Chemical group 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 210000003001 amoeba Anatomy 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 210000000436 anus Anatomy 0.000 description 1
- 210000001742 aqueous humor Anatomy 0.000 description 1
- 206010003246 arthritis Diseases 0.000 description 1
- 125000003710 aryl alkyl group Chemical group 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 238000007846 asymmetric PCR Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000001363 autoimmune Effects 0.000 description 1
- 238000011888 autopsy Methods 0.000 description 1
- 201000008680 babesiosis Diseases 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 244000052616 bacterial pathogen Species 0.000 description 1
- DZBUGLKDJFMEHC-UHFFFAOYSA-N benzoquinolinylidene Natural products C1=CC=CC2=CC3=CC=CC=C3N=C21 DZBUGLKDJFMEHC-UHFFFAOYSA-N 0.000 description 1
- 108010051210 beta-Fructofuranosidase Proteins 0.000 description 1
- WGDUUQDYDIIBKT-UHFFFAOYSA-N beta-Pseudouridine Natural products OC1OC(CN2C=CC(=O)NC2=O)C(O)C1O WGDUUQDYDIIBKT-UHFFFAOYSA-N 0.000 description 1
- 125000002619 bicyclic group Chemical group 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 210000003103 bodily secretion Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 239000011203 carbon fibre reinforced carbon Substances 0.000 description 1
- 101150059443 cas12a gene Proteins 0.000 description 1
- 101150117416 cas2 gene Proteins 0.000 description 1
- 101150055191 cas3 gene Proteins 0.000 description 1
- 101150111685 cas4 gene Proteins 0.000 description 1
- 101150049463 cas5 gene Proteins 0.000 description 1
- 101150106467 cas6 gene Proteins 0.000 description 1
- 101150044165 cas7 gene Proteins 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 210000003679 cervix uteri Anatomy 0.000 description 1
- 230000003196 chaotropic effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 125000003636 chemical group Chemical group 0.000 description 1
- 229940099352 cholate Drugs 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- BHQCQFFYRZLCQQ-OELDTZBJSA-N cholic acid Chemical compound C([C@H]1C[C@H]2O)[C@H](O)CC[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H]([C@@H](CCC(O)=O)C)[C@@]2(C)[C@@H](O)C1 BHQCQFFYRZLCQQ-OELDTZBJSA-N 0.000 description 1
- 210000004252 chorionic villi Anatomy 0.000 description 1
- 229960002173 citrulline Drugs 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 101150100788 cmr3 gene Proteins 0.000 description 1
- 101150040342 cmr4 gene Proteins 0.000 description 1
- 101150095330 cmr5 gene Proteins 0.000 description 1
- 101150034961 cmr6 gene Proteins 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 201000003486 coccidioidomycosis Diseases 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- OFEZSBMBBKLLBJ-BAJZRUMYSA-N cordycepin Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)C[C@H]1O OFEZSBMBBKLLBJ-BAJZRUMYSA-N 0.000 description 1
- OFEZSBMBBKLLBJ-UHFFFAOYSA-N cordycepine Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(CO)CC1O OFEZSBMBBKLLBJ-UHFFFAOYSA-N 0.000 description 1
- 101150085344 csa5 gene Proteins 0.000 description 1
- 101150089829 csc-1 gene Proteins 0.000 description 1
- 101150088639 csm4 gene Proteins 0.000 description 1
- 101150022488 csm5 gene Proteins 0.000 description 1
- 101150064365 csm6 gene Proteins 0.000 description 1
- 101150056210 csx1 gene Proteins 0.000 description 1
- 101150088252 csy1 gene Proteins 0.000 description 1
- 101150016576 csy2 gene Proteins 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 150000001923 cyclic compounds Chemical class 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 125000000596 cyclohexenyl group Chemical group C1(=CCCCC1)* 0.000 description 1
- 230000002380 cytological effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 108010025198 decaglycine Proteins 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 208000025729 dengue disease Diseases 0.000 description 1
- 229940009976 deoxycholate Drugs 0.000 description 1
- KXGVEGMKQFWNSR-LLQZFEROSA-N deoxycholic acid Chemical compound C([C@H]1CC2)[C@H](O)CC[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H]([C@@H](CCC(O)=O)C)[C@@]2(C)[C@@H](O)C1 KXGVEGMKQFWNSR-LLQZFEROSA-N 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000007847 digital PCR Methods 0.000 description 1
- LTMHDMANZUZIPE-PUGKRICDSA-N digoxin Chemical compound C1[C@H](O)[C@H](O)[C@@H](C)O[C@H]1O[C@@H]1[C@@H](C)O[C@@H](O[C@@H]2[C@H](O[C@@H](O[C@@H]3C[C@@H]4[C@]([C@@H]5[C@H]([C@]6(CC[C@@H]([C@@]6(C)[C@H](O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)C[C@@H]2O)C)C[C@@H]1O LTMHDMANZUZIPE-PUGKRICDSA-N 0.000 description 1
- 229960005156 digoxin Drugs 0.000 description 1
- LTMHDMANZUZIPE-UHFFFAOYSA-N digoxine Natural products C1C(O)C(O)C(C)OC1OC1C(C)OC(OC2C(OC(OC3CC4C(C5C(C6(CCC(C6(C)C(O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)CC2O)C)CC1O LTMHDMANZUZIPE-UHFFFAOYSA-N 0.000 description 1
- ZPTBLXKRQACLCR-XVFCMESISA-N dihydrouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)CC1 ZPTBLXKRQACLCR-XVFCMESISA-N 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- NAGJZTKCGNOGPW-UHFFFAOYSA-N dithiophosphoric acid Chemical class OP(O)(S)=S NAGJZTKCGNOGPW-UHFFFAOYSA-N 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 208000001848 dysentery Diseases 0.000 description 1
- 108010063460 elongation factor T Proteins 0.000 description 1
- 238000000295 emission spectrum Methods 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 150000002148 esters Chemical class 0.000 description 1
- DEFVIWRASFVYLL-UHFFFAOYSA-N ethylene glycol bis(2-aminoethyl)tetraacetic acid Chemical compound OC(=O)CN(CC(O)=O)CCOCCOCCN(CC(O)=O)CC(O)=O DEFVIWRASFVYLL-UHFFFAOYSA-N 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 210000003722 extracellular fluid Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000005002 female reproductive tract Anatomy 0.000 description 1
- 210000004700 fetal blood Anatomy 0.000 description 1
- 210000003754 fetus Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 101150038062 fliC gene Proteins 0.000 description 1
- GNBHRKFJIUUOQI-UHFFFAOYSA-N fluorescein Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GNBHRKFJIUUOQI-UHFFFAOYSA-N 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 244000053095 fungal pathogen Species 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000000227 grinding Methods 0.000 description 1
- 230000005283 ground state Effects 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 229940045808 haemophilus influenzae type b Drugs 0.000 description 1
- 208000009601 hereditary spherocytosis Diseases 0.000 description 1
- 125000001072 heteroaryl group Chemical group 0.000 description 1
- 125000004446 heteroarylalkyl group Chemical group 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 108010021685 homeobox protein HOXA13 Proteins 0.000 description 1
- 238000007849 hot-start PCR Methods 0.000 description 1
- 235000020256 human milk Nutrition 0.000 description 1
- 210000004251 human milk Anatomy 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 239000000815 hypotonic solution Substances 0.000 description 1
- 230000003100 immobilizing effect Effects 0.000 description 1
- 238000003018 immunoassay Methods 0.000 description 1
- 230000007813 immunodeficiency Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 238000007850 in situ PCR Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 208000037797 influenza A Diseases 0.000 description 1
- 208000037798 influenza B Diseases 0.000 description 1
- 208000037799 influenza C Diseases 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007852 inverse PCR Methods 0.000 description 1
- 235000011073 invertase Nutrition 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 125000001449 isopropyl group Chemical group [H]C([H])([H])C([H])(*)C([H])([H])[H] 0.000 description 1
- 229940039696 lactobacillus Drugs 0.000 description 1
- 229940039695 lactobacillus acidophilus Drugs 0.000 description 1
- 238000002357 laparoscopic surgery Methods 0.000 description 1
- 239000004816 latex Substances 0.000 description 1
- 229920000126 latex Polymers 0.000 description 1
- 229930013686 lignan Natural products 0.000 description 1
- 150000005692 lignans Chemical class 0.000 description 1
- 235000009408 lignans Nutrition 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 108010026228 mRNA guanylyltransferase Proteins 0.000 description 1
- 125000005439 maleimidyl group Chemical group C1(C=CC(N1*)=O)=O 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000004379 membrane Anatomy 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 229960003085 meticillin Drugs 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 239000004005 microsphere Substances 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 208000012268 mitochondrial disease Diseases 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000000329 molecular dynamics simulation Methods 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 208000010753 nasal discharge Diseases 0.000 description 1
- 229920005615 natural polymer Polymers 0.000 description 1
- 238000007857 nested PCR Methods 0.000 description 1
- 230000000926 neurological effect Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 1
- 238000001821 nucleic acid purification Methods 0.000 description 1
- 229920002113 octoxynol Polymers 0.000 description 1
- JRZJOMJEPLMPRA-UHFFFAOYSA-N olefin Natural products CCCCCCCC=C JRZJOMJEPLMPRA-UHFFFAOYSA-N 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000003204 osmotic effect Effects 0.000 description 1
- 201000008482 osteoarthritis Diseases 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 125000004430 oxygen atom Chemical group O* 0.000 description 1
- 210000003899 penis Anatomy 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 210000004303 peritoneum Anatomy 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- UEZVMMHDMIWARA-UHFFFAOYSA-M phosphonate Chemical compound [O-]P(=O)=O UEZVMMHDMIWARA-UHFFFAOYSA-M 0.000 description 1
- XRBCRPZXSCBRTK-UHFFFAOYSA-N phosphonous acid Chemical compound OPO XRBCRPZXSCBRTK-UHFFFAOYSA-N 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 244000000003 plant pathogen Species 0.000 description 1
- 230000010287 polarization Effects 0.000 description 1
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 1
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 1
- 229920000136 polysorbate Polymers 0.000 description 1
- 229920002223 polystyrene Polymers 0.000 description 1
- 244000144977 poultry Species 0.000 description 1
- 230000035935 pregnancy Effects 0.000 description 1
- 150000003141 primary amines Chemical class 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 210000004908 prostatic fluid Anatomy 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 244000079416 protozoan pathogen Species 0.000 description 1
- PTJWIQPHWPFNBW-GBNDHIKLSA-N pseudouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-GBNDHIKLSA-N 0.000 description 1
- VTGOHKSTWXHQJK-UHFFFAOYSA-N pyrimidin-2-ol Chemical compound OC1=NC=CC=N1 VTGOHKSTWXHQJK-UHFFFAOYSA-N 0.000 description 1
- 238000012175 pyrosequencing Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 239000000700 radioactive tracer Substances 0.000 description 1
- 238000003259 recombinant expression Methods 0.000 description 1
- 208000017443 reproductive system disease Diseases 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000001177 retroviral effect Effects 0.000 description 1
- 206010039073 rheumatoid arthritis Diseases 0.000 description 1
- PYWVYCXTNDRMGF-UHFFFAOYSA-N rhodamine B Chemical compound [Cl-].C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=CC=C1C(O)=O PYWVYCXTNDRMGF-UHFFFAOYSA-N 0.000 description 1
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 1
- 229940016590 sarkosyl Drugs 0.000 description 1
- 108700004121 sarkosyl Proteins 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- JRPHGDYSKGJTKZ-UHFFFAOYSA-N selenophosphoric acid Chemical compound OP(O)([SeH])=O JRPHGDYSKGJTKZ-UHFFFAOYSA-N 0.000 description 1
- 201000001223 septic arthritis Diseases 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 230000005783 single-strand break Effects 0.000 description 1
- KSAVQLQVUXSOCR-UHFFFAOYSA-M sodium lauroyl sarcosinate Chemical compound [Na+].CCCCCCCCCCCC(=O)N(C)CC([O-])=O KSAVQLQVUXSOCR-UHFFFAOYSA-M 0.000 description 1
- VUFNRPJNRFOTGK-UHFFFAOYSA-M sodium;1-[4-[(2,5-dioxopyrrol-1-yl)methyl]cyclohexanecarbonyl]oxy-2,5-dioxopyrrolidine-3-sulfonate Chemical compound [Na+].O=C1C(S(=O)(=O)[O-])CC(=O)N1OC(=O)C1CCC(CN2C(C=CC2=O)=O)CC1 VUFNRPJNRFOTGK-UHFFFAOYSA-M 0.000 description 1
- 239000011343 solid material Substances 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 235000021092 sugar substitutes Nutrition 0.000 description 1
- 150000008163 sugars Chemical class 0.000 description 1
- IIACRCGMVDHOTQ-UHFFFAOYSA-N sulfamic acid Chemical group NS(O)(=O)=O IIACRCGMVDHOTQ-UHFFFAOYSA-N 0.000 description 1
- 150000003456 sulfonamides Chemical group 0.000 description 1
- BDHFUVZGWQCTTF-UHFFFAOYSA-M sulfonate Chemical compound [O-]S(=O)=O BDHFUVZGWQCTTF-UHFFFAOYSA-M 0.000 description 1
- 150000003457 sulfones Chemical group 0.000 description 1
- 150000003462 sulfoxides Chemical class 0.000 description 1
- 239000011593 sulfur Substances 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 239000003765 sweetening agent Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000011191 terminal modification Methods 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 238000005382 thermal cycling Methods 0.000 description 1
- ACOJCCLIDPZYJC-UHFFFAOYSA-M thiazole orange Chemical class CC1=CC=C(S([O-])(=O)=O)C=C1.C1=CC=C2C(C=C3N(C4=CC=CC=C4S3)C)=CC=[N+](C)C2=C1 ACOJCCLIDPZYJC-UHFFFAOYSA-M 0.000 description 1
- 238000004809 thin layer chromatography Methods 0.000 description 1
- 125000003396 thiol group Chemical group [H]S* 0.000 description 1
- ANRHNWWPFJCPAZ-UHFFFAOYSA-M thionine Chemical compound [Cl-].C1=CC(N)=CC2=[S+]C3=CC(N)=CC=C3N=C21 ANRHNWWPFJCPAZ-UHFFFAOYSA-M 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 229940096911 trichinella spiralis Drugs 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 201000002311 trypanosomiasis Diseases 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 210000003708 urethra Anatomy 0.000 description 1
- 210000001215 vagina Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 210000004127 vitreous body Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 239000001993 wax Substances 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
- 229940075420 xanthine Drugs 0.000 description 1
- 101150060755 xerC gene Proteins 0.000 description 1
- 101150005813 xerD gene Proteins 0.000 description 1
- 229940051021 yellow-fever virus Drugs 0.000 description 1
- DGVVWUTYPXICAM-UHFFFAOYSA-N β‐Mercaptoethanol Chemical compound OCCS DGVVWUTYPXICAM-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The disclosure herein includes methods, compositions, and kits suitable for generating libraries for nucleic acid sequencing. In some embodiments, more than one protein complex is provided. Each protein complex may comprise a transposome and a programmable DNA binding unit capable of specifically binding to a user-selected binding site on target double-stranded DNA (dsDNA). The binding sites of each of the more than one protein complexes may be different from each other. The transposomes may comprise a transposase, a first adaptor and a second adaptor. The first adapter, the second adapter, or both may be sequencing adapters.
Description
RELATED APPLICATIONS
The present application claims the benefit of U.S. patent application Ser. No. 63/189,032 filed on day 2021, 5, 14 and U.S. patent application Ser. No. 63/243,443 filed on day 2021, 9, 13 in accordance with 35 U.S. C. ≡119 (e), the contents of these related applications are incorporated herein by reference in their entirety for all purposes.
Reference to sequence Listing
The present application is filed with a sequence listing in electronic format. The sequence listing is provided as a file titled 68eb_317326_wo_sequence_listing, which was created at 2022, 5 months, 12 days, and is 56.0 kilobytes in size. The information of the sequence listing in electronic format is incorporated herein by reference in its entirety.
Background
FIELD
The present disclosure relates generally to the field of molecular biology, such as tagging nucleic acids to generate customized locus-specific sequencing libraries.
Description of related Art
Conventional library preparation methods for nucleic acid sequencing may take several hours to make, and the process produces a randomly made library. The reason for these libraries is random is that the methods used to fragment the nucleic acids (including physical, enzymatic and chemical fragmentation methods) are performed in a random manner. Thus, the output of DNA sequencing cannot be controlled. Currently, two methods have been used for targeted sequencing. The first is amplicon sequencing. This method relies on the use of primers to amplify a region of interest by DNA amplification. This additional amplification step further increases the cost, time and resources of standard library preparation methods. The second targeted sequencing method is target capture. Such methods rely on the use of probes or pools of probes so that they can hybridize to a particular nucleic acid target. Hybridization of probes to their targets and separation of these targets is a time consuming process, which may take days. In addition, the probes used in this method are expensive to synthesize. There is a need for compositions, methods, systems, and kits for custom locus specific library preparation. There is a need for methods, compositions, kits and systems that enable rapid targeted sequencing (and thus rapid sequencing-based diagnostics, e.g., less than 2 hours), as well as therapeutic diagnostics (theranotics) that can provide simultaneous diagnostics and determine appropriate therapeutic methods.
SUMMARY
The disclosure herein includes compositions. In some embodiments, the composition comprises: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA). In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other.
In some embodiments, at least two of the more than one protein complexes comprise the same transposomes. In some embodiments, more than one protein complex all comprise the same transposomes. In some embodiments, more than one protein complex all comprise the same transposase. In some embodiments, the first adaptor and the second adaptor in the same transposome are the same. In some embodiments, the first adapter, the second adapter, or both in different transposomes are different. In some embodiments, the first adapter, the second adapter, or both are dsDNA or RNA/DNA duplex. In some embodiments, the length of the adapter is about 3-200 base pairs. In some embodiments, the first adapter, the second adapter, or both are sequencing adapters. In some embodiments, the sequencing adapter comprises a P5 or P7 primer sequence.
In some embodiments, the binding sites of at least two of the more than one protein complexes are located on the same target dsDNA. In some embodiments, the binding sites of at least two of the more than one protein complexes are about 1-50000 nucleotides apart on the same target dsDNA. In some embodiments, the distance between the binding sites of one pair of more than one protein complex is substantially the same as the distance between the binding sites of another pair of more than one protein complex. In some embodiments, the distance between the binding sites of one pair of more than one protein complex is different from the distance between the binding sites of another pair of more than one protein complex. In some embodiments, the binding sites of at least two of the more than one protein complexes are located on different strands of the target dsDNA. In some embodiments, at least two of the more than one protein complex are capable of specifically binding to different target dsDNA.
In some embodiments, more than one protein complex is capable of specifically binding to about 2-5000 targets dsDNA. In some embodiments, the transposase is a Tn5 transposase, a Tn7 transposase, a mariner Tc 1-like transposase, a Himar1C9 transposase, or a Sleeping Beauty (Sleeping Beauty) transposase. In some embodiments, the transposase is a superactive transposase. In some embodiments, the programmable DNA binding units include nuclease-deficient CRISPR-associated protein (dCAS protein) and guide RNAs (grnas) capable of specifically binding to a target dsDNA binding site. In some embodiments, the transposomes are associated with the programmable DNA binding unit by a linker linking the transposase and the dCAS protein. In some embodiments, the linker comprises a peptide linker, a chemical linker, or both. In some embodiments, the transposase is present as a fusion protein comprising a dCAS protein. In some embodiments, the dCAS protein is dCAS9, dCAS12, dCAS13, dCAS14, or SpRY dCAS. In some embodiments, the dCAS13 protein is dCAS13a, dCAS13b, dCAS13c, or dCAS13d.
In some embodiments, the programmable DNA binding unit comprises a protein component capable of specifically binding to a binding site on the target dsDNA. In some embodiments, the protein component comprises an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof. In some embodiments, the transposomes are associated with the programmable DNA binding unit through a linker linking the transposase and the protein component. In some embodiments, the linker comprises a peptide linker, a chemical linker, or both. In some embodiments, the peptide linker comprises more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof. In some embodiments, the peptide linker comprises a GS linker. In some embodiments, the peptide linker is an XTEN linker. In some embodiments, the protein component is present as a fusion protein comprising a transposase.
The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. In some embodiments, the adapter is covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively. In some embodiments, the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. In some embodiments, the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue (respiratory tissue), exudates, bodily fluids, or combinations thereof.
The disclosure herein includes methods for tagging nucleic acids. In some embodiments, the method comprises: contacting a composition disclosed herein with a sample suspected of containing more than one target dsDNA to form a reaction mixture; and incubating the reaction mixture to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively.
The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target dsDNA to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments to generate a sequencing library.
In some embodiments, each primer is about 5-80 nucleotides in length. In some embodiments, amplification of more than one dsDNA fragment with primers is performed using Polymerase Chain Reaction (PCR). In some embodiments, PCR is loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA). In some embodiments, the PCR is real-time PCR or quantitative real-time PCR (QRT-PCR). In some embodiments, the sample comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.
In some embodiments, the more than one target dsDNA comprises genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample is, or is derived from, a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, more than one target dsDNA comprises DNA from at least 2 different organisms. In some embodiments, more than one target dsDNA comprises DNA from at least 2 different genes. The method may include: more than one target dsDNA is produced from more than one target RNA using reverse transcriptase. In some embodiments, the more than one target dsDNA comprises a target dsDNA produced from a target RNA with a reverse transcriptase.
In some embodiments, more than one target dsDNA comprises a genetic feature of interest (genetic signature). In some embodiments, the genetic feature of interest comprises one or more mutations of interest. In some embodiments, the one or more mutations of interest include point mutations, inversions, deletions, insertions, translocations, replications, copy number variations, or combinations thereof. In some embodiments, the one or more mutations of interest include nucleotide substitutions, deletions, insertions, or combinations thereof. In some embodiments, the genetic characteristic of interest is indicative of antibiotic resistance or antibiotic susceptibility of the organism from which the target dsDNA is derived. In some embodiments, the genetic feature of interest is indicative of the cancer status of the organism from which the target dsDNA is derived. In some embodiments, the genetic characteristic of interest is indicative of a state of a genetic disease of the target dsDNA-derived organism. In some embodiments, the genetic disease is a monogenic disorder. In some embodiments, the genetic disease is cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis (hereditary spherocytosis), ma Fanzeng syndrome, tay-saxosis, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis (thermochromatis).
In some embodiments, contacting more than one target dsDNA with more than one protein complex pair is performed at about 25 ℃ to about 80 ℃. In some embodiments, incubating the reaction mixture comprises incubating the reaction mixture at about 37 ℃ to about 55 ℃. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 2000:1. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 200:1.
The method may include: one or both ends of one or more of the more than one dsDNA fragments are labeled. The method may include: the two ends of one or more of the more than one dsDNA fragments are labeled differently. In some embodiments, labeling includes labeling with an anionic label, a cationic label, a neutral label, an electrochemical label, a protein label, a fluorescent label, a magnetic label, or a combination thereof. The method may include: enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments.
Brief Description of Drawings
FIG. 1 depicts a non-limiting exemplary conventional library preparation method for next generation sequencing. The ligation-based library preparation shown was replicated from www.idtdna.com/pages/technology/next-generation-sequencing/library-preparation/ligation-based-library-prep.
FIG. 2 depicts a non-limiting exemplary conventional sequencing library prepared by the enzyme digestion process-replication Nextera XT Library Prep: tips and Troubleshooting (2015) from Illumina.
FIG. 3 depicts a non-limiting exemplary schematic of the custom locus specific library preparation (CLLP) disclosed herein.
Fig. 4 depicts a non-limiting exemplary embodiment of targeted sequencing using a genome editing tool (Cas 9 and guide RNA).
Fig. 5A-5F depict non-limiting exemplary embodiments of the custom locus specific library preparation (CLLP) disclosed herein.
Fig. 6 depicts a non-limiting exemplary embodiment showing an ONT rapid sequencing kit based on enzymatic cleavage fragmentation. The workflow described from the Nanopore rapid sequencing kit replicates.
Fig. 7A-7H depict non-limiting exemplary embodiments of genome editing enzyme digestion fragmentation (Genome Editing Tagmentation, GET) for generating sequencing libraries for existing sequencing platforms (e.g., sequencing platforms from Oxford Nanopore).
FIG. 8 depicts a non-limiting exemplary schematic of a plasmid construct (3 XFlag-Cas9-Fl26-Tn5; SEQ ID NO: 1) for use in generating the protein complexes provided herein.
FIG. 9 depicts a non-limiting exemplary schematic of a plasmid construct (3 XFlag-Cas9-xTen-Tn5; SEQ ID NO: 2) for use in generating the protein complexes provided herein.
FIG. 10 depicts a non-limiting exemplary schematic of a plasmid construct (pET-Tn 5-xTen-dCAs9; SEQ ID NO: 3) for use in generating the protein complexes provided herein.
FIG. 11 depicts the relative binding sites of exemplary sgRNAs for the Salmonella enterica (S.enterica) InvA gene.
FIG. 12 depicts the relative binding sites of exemplary sgRNAs for the Salmonella enterica fliC gene.
Fig. 13 shows a graph of exemplary bioanalyzer data showing that cleavage in genomic DNA is specific to the expected size, demonstrating that guide RNAs for salmonella enterica (Salmonella Enterica) are functional. See also table 2.
FIG. 14 depicts a graph showing a tape station analysis of amplification of fragments generated by Tn5 using adapter A as a primer pair. This suggests that adaptors are added to the 5 'and 3' ends of the cleavage molecules.
FIG. 15 depicts a graph showing a tape station analysis of amplification of fragments generated by Tn5 using adapter B as a primer pair. This suggests that adaptors are added to the 5 'and 3' ends of the cleavage molecules.
FIG. 16 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified dCS 9-Fl26-Tn5 fusion protein. Arrows point to the fusion protein bands.
FIG. 17 depicts bioanalyzer analysis of an exemplary electrophoresis gel of recombinantly expressed and purified dCS 9-Fl26-Tn5 fusion protein.
FIG. 18 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified dCS 9-xTen-Tn5 fusion protein. Arrows point to the fusion protein bands.
FIG. 19 depicts bioanalyzer data from an exemplary electrophoretic analysis of recombinantly expressed and purified dCS 9-xTen-Tn5 fusion proteins.
FIG. 20 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified Tn5-Fl26-dCAS9 fusion proteins. Arrows point to the fusion protein bands.
FIG. 21 depicts an exemplary SDS-PAGE gel analysis of recombinantly expressed and purified Tn5-xTen-dCAS9 fusion proteins. Arrows point to the fusion protein bands.
Fig. 22 depicts a tape station analysis of an amplification reaction using only catalytically active Cas9 (no fusion protein). No amplification was observed, indicating that Cas9 by itself cannot add adaptors to the 5 'and 3' ends of the digested fragments. The visible signal was from samples incubated with Cas9, but not PCR. The lower peak (lower peak) is a marker of 100bp size, and the upper peak (upper peak) is genomic DNA.
FIG. 23 depicts the tape station analysis of the amplification reaction after cleavage reaction with enzyme digestion of dCAS9-Fl26-Tn 5. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The inclusion of gRNA in this reaction resulted in broad peaks indicating random cleavage fragmentation. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.
FIG. 24 depicts an exemplary tape station analysis of an amplification reaction after cleavage reaction with the enzyme cleavage of dCAS9-xTen-Tn 5. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The inclusion of gRNA in this reaction resulted in broad peaks indicating random cleavage fragmentation. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.
FIG. 25 depicts a tape station analysis of an amplification reaction after cleavage with 100nM dXas 9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.
FIG. 26 depicts the tape station analysis of the amplification reaction after cleavage reaction with 1nM dXas 9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.
FIG. 27 depicts the tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-Fl26-Tn5 fusion protease. Arrows indicate signals of reactions subjected to PCR conditions after incubation with Cas9-Tn5 fusion proteins. The lower peak is a marker of 100bp size and the upper peak is genomic DNA.
FIG. 28 depicts a tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-Fl26-Tn5 fusion protease from enlarged FIG. 27.
FIG. 29 depicts the tape station analysis of the amplification reaction after cleavage reaction with 100pM dCAS9-xTen-Tn5 fusion protease. Lower, lower 100bp markers.
FIG. 30 depicts the tape station analysis of the amplification reaction after cleavage reaction with 10pM dCAS9-xTen-Tn5 fusion protease. Lower, lower 100bp markers.
FIG. 31 depicts the tape station analysis of the amplification reaction after cleavage reaction with 1pM dCAS9-xTen-Tn5 fusion protease.
FIG. 32 depicts bioanalyzer analysis of amplification from libraries loaded with only one adapter (adapter B) prepared by Tn 5-only enzymatic cleavage.
FIG. 33 depicts bioanalyzer analysis of library amplifications prepared from cleavage by dCS 9-Fl26-Tn 5-directed enzymatic cleavage loaded with only one adapter (adapter B). In this experiment, a shorter incubation protocol was used.
FIG. 34 depicts bioanalyzer analysis of library amplifications prepared from cleavage by dCS 9-Fl26-Tn 5-directed enzymatic cleavage loaded with only one adapter (adapter B). In this experiment, a longer incubation protocol was used.
FIG. 35 depicts an exemplary bioanalyzer analysis of library amplifications prepared from cleavage by dCAS9-Fl26-Tn5 directed enzymatic cleavage loaded with both adaptors A and B. In this experiment, a longer incubation protocol was used.
FIG. 36 depicts an exemplary bioanalyzer analysis of library amplifications prepared from cleavage by dCAS9-Fl26-Tn5 directed enzymatic cleavage loaded with both adaptors A and B. In this experiment, a shorter incubation protocol was used.
FIG. 37 depicts an exemplary embodiment of DNA fragments labeled with NGS sequence adaptors using the library methods disclosed herein based on CasTn-NEBNext ligation.
FIG. 38 depicts an exemplary tape station analysis of PCR amplification of Salmonella enterica genomic DNA samples incubated with dCAS9-xTen-Tn5 loaded with Salmonella enterica sgRNA. Lower, lower 100bp markers.
FIG. 39 depicts an exemplary tape station analysis of PCR amplification of Salmonella enterica samples incubated with dCAS9-xTen-Tn5 without sgRNA. Lower, lower 100bp markers.
FIG. 40 shows a graphical representation of fragments produced by dCS 9-Tn5 using a single adapter (e.g., adapter B).
FIG. 41 depicts a graphical representation of dCS 9-Tn5 fragments resulting from a reaction in which Tn5 is loaded with two different adaptors (e.g., adaptor A and adaptor B).
Fig. 42A-42B depict illustrations of the preparation of nebnet ligation-based libraries for next generation sequencing. The symbols shown in the legend mark the portions of the adaptor and primer sequences. Preparation of NEBNExt library fragments generated by digestion with dCAS9-Tn5 are shown in FIG. 37.
FIG. 43 depicts a graphical representation of the preparation of an enzyme digestion based fragmented Nextera library for next generation sequencing.
FIG. 44 depicts the preparation of a library based on enzymatic fragmentation using dCS 9-Tn5 directed enzymatic fragmentation.
Detailed description of the preferred embodiments
The following detailed description references the accompanying drawings, which form a part hereof. In the drawings, like reference numerals generally identify like elements unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated and make part of the disclosure herein.
Regarding the related art, all patents, published patent applications, other publications, and sequences from GenBank and other databases mentioned herein are incorporated by reference in their entirety.
The disclosure herein includes compositions. In some embodiments, the composition comprises: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA). In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other.
The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. In some embodiments, the adapter is covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively. In some embodiments, the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. In some embodiments, the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. In some embodiments, the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof. In some embodiments, the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.
The disclosure herein includes methods for tagging nucleic acids. In some embodiments, the method comprises: contacting a composition disclosed herein with a sample suspected of containing more than one target dsDNA to form a reaction mixture; and incubating the reaction mixture to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively.
The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments to generate a sequencing library.
Definition of the definition
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. See, e.g., singleton et al, dictionary of Microbiology and Molecular Biology, 2 nd edition, j.wiley & Sons (New York, NY 1994); sambrook et al Molecular Cloning, A Laboratory Manual, cold Spring Harbor Press (Cold Spring Harbor, NY 1989). For the purposes of this disclosure, the following terms are defined below.
As used herein, the term "adapter" may mean a sequence capable of facilitating amplification or sequencing of an associated nucleic acid. The associated nucleic acid may include a target nucleic acid. The associated nucleic acids may include one or more of a spatial marker, a target marker, a sample marker, an index marker, or a barcode sequence (e.g., a molecular marker). The adaptors may be linear. The adaptor may be a pre-adenylated adaptor (pre-adenylated adaptors). The adaptors may be double-stranded or single-stranded. One or more adaptors may be located at the 5 'end or the 3' end of the nucleic acid. When the adaptor comprises a known sequence at the 5 'end and the 3' end, the known sequences may be the same or different sequences. Adaptors located at the 5 'end and/or 3' end of the polynucleotide may be capable of hybridizing to one or more oligonucleotides immobilized on a surface. In some embodiments, the adapter may comprise a universal sequence. A universal sequence may be a region of nucleotide sequence that is common to two or more nucleic acid molecules. Two or more nucleic acid molecules may also have regions of different sequences. Thus, for example, a 5 'adapter may comprise the same and/or a universal nucleic acid sequence, and a 3' adapter may comprise the same and/or a universal sequence. A universal sequence that may be present in different members of more than one nucleic acid molecule may allow replication or amplification of more than one different sequence using a single universal primer that is complementary to the universal sequence. Similarly, at least one, two (e.g., a pair), or more universal sequences that may be present in different members of a collection of nucleic acid molecules may allow replication or amplification of more than one different sequence using at least one, two (e.g., a pair), or more single universal primers that are complementary to the universal sequences. Thus, the universal primers comprise sequences that can hybridize to such universal sequences. Molecules having target nucleic acid sequences can be modified to attach adaptors (e.g., non-target nucleic acid sequences) to one or both ends of different target nucleic acid sequences. The one or more universal primers attached to the target nucleic acid may provide sites for hybridization of the universal primers. The one or more universal primers attached to the target nucleic acid may be the same or different from each other.
As used herein, the term "associated" or "associated with" may mean that two or more substances may be identified as co-located at a point in time. Association may mean that two or more substances are or were in similar containers. The association may be an informatics association. For example, digital information about two or more substances may be stored and may be used to determine that one or more substances are co-located at a point in time. The association may also be a physical association. In some embodiments, two or more associated substances are "tethered", "attached" or "immobilized" to each other or to a common solid or semi-solid surface. Association may refer to covalent or non-covalent means for attaching the label to a solid or semi-solid support, such as a bead. The association may be a covalent bond between the target and the label. Association may include hybridization between two molecules, such as a target molecule and a label.
As used herein, the term "complementary" may refer to the ability to precisely pair between two nucleotides. For example, a nucleic acid is considered to be complementary to one another at a given position if the nucleotide at that position is capable of forming hydrogen bonds with the nucleotide of the other nucleic acid. Complementarity between two single-stranded nucleic acid molecules may be "partial" in that only some nucleotides bind, or it may be complete when there is complete complementarity between the single-stranded molecules. A first nucleotide sequence may be referred to as a "complement" of a second sequence if the first nucleotide sequence is complementary to the second nucleotide sequence. A first nucleotide sequence may be referred to as a "reverse complement" of a second sequence if the first nucleotide sequence is complementary to a sequence that is opposite (i.e., opposite in nucleotide order) the second sequence. As used herein, a "complement" sequence may refer to the "complement" or "reverse complement" of a sequence. It is understood from this disclosure that if one molecule can hybridize to another molecule, it can be complementary or partially complementary to the molecule to which it hybridizes.
As used herein, the term "one label" or "more than one label" may refer to a nucleic acid code associated with a target in a sample. The label may be, for example, a nucleic acid label. The label may be a fully or partially amplifiable label. The tag may be a fully or partially sequencable tag. The marker may be part of a natural nucleic acid that can be identified as distinct. The tag may be a known sequence. The marker may include a junction of nucleic acid sequences, such as a junction of natural and non-natural sequences. As used herein, the term "tag" may be used interchangeably with the terms "index," label, "or" tag-label. The indicia may convey information. For example, in various embodiments, a label may be used to determine the identity of the sample, the source of the sample, the identity of the cell, and/or the target.
As used herein, the term "nucleic acid" refers to a polynucleotide sequence or fragment thereof. The nucleic acid may comprise a nucleotide. The nucleic acid may be exogenous or endogenous to the cell. The nucleic acid may be present in a cell-free environment. The nucleic acid may be a gene or a fragment thereof. The nucleic acid may be DNA. The nucleic acid may be RNA. The nucleic acid may include one or more analogs (e.g., altered backbones, sugars, or nucleobases). Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acids, heterologous nucleic acids, morpholino nucleic acids, locked nucleic acids, diol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or saccharide linked fluorescein), thiol-containing nucleotides, biotin linked nucleotides, fluorescent base analogs, cpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, braided glycosides, and hua russian glycosides. "nucleic acid", "polynucleotide", "target polynucleotide" and "target nucleic acid" are used interchangeably.
The nucleic acid may include one or more modifications (e.g., base modifications, backbone modifications) to provide the nucleic acid with new or enhanced features (e.g., improved stability). The nucleic acid may comprise a nucleic acid affinity tag. The nucleoside may be a base-sugar combination. The base portion of a nucleoside may be a heterocyclic base. Two of the most common classes of such heterocyclic bases are purine and pyrimidine. The nucleotide may be a nucleoside that also includes a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranose, the phosphate group can be attached to the 2', 3', or 5' hydroxyl moiety of the sugar. In forming nucleic acids, phosphate groups can covalently link adjacent nucleosides to one another to form a linear polymeric compound. In turn, each end of this linear polymeric compound may be further linked to form a cyclic compound; however, linear compounds are generally suitable. Furthermore, the linear compounds may have internal nucleotide base complementarity and thus may fold in a manner that results in a full or partial double chain compound. In nucleic acids, phosphate groups can generally be referred to as forming the internucleoside backbone of the nucleic acid. The linkage (linkage) or backbone (backbone) may be a 3 'to 5' phosphodiester linkage.
The nucleic acid may include a modified backbone and/or modified internucleoside linkages. Modified backbones may include those that retain phosphorus atoms in the backbone and those that do not have phosphorus atoms in the backbone. Suitable modified nucleic acid backbones in which phosphorus atoms are present may include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkylphosphonates such as 3' -alkylphosphonate, 5' -alkylphosphonate, chiral phosphonate, phosphonite, phosphoramidate (including 3' -phosphoramidate and phosphoramidate, phosphodiamidate, phosphorothioate), phosphorothioate, phosphoroselenate and borophosphate, analogs with normal 3' -5' linkages, 2' -5' linkages, and analogs with reversed polarity (where one or more internucleotide linkages are 3' to 3', 5' to 5' or 2' to 2' linkages).
The nucleic acid may comprise a polynucleotide backbone formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatoms, and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatom or heterocyclic internucleoside linkages. These may include those having morpholino linkages (formed in part from the sugar portion of the nucleoside); a siloxane backbone; sulfide, sulfoxide, and sulfone backbones; methylacetyl and thiomethylacetyl backbones; methylene methylacetyl and thiomethylacetyl backbones; a ribose acetyl backbone; an olefin-containing backbone; a sulfamate backbone; methylene imino and methylene hydrazino backbones; sulfonate and sulfonamide backbones; an amide backbone; and N, O, S and CH with mixing 2 Other ones of the component parts.
The nucleic acid may comprise a nucleic acid mimetic. The term "mimetic" may be intended to include polynucleotides in which only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, and the replacement of only the furanose ring may also be referred to as a sugar substitute. The heterocyclic base moiety or modified heterocyclic base moiety can be maintained to hybridize to an appropriate target nucleic acid. One such nucleic acid may be a Peptide Nucleic Acid (PNA). In PNA, the sugar backbone of the polynucleotide may be replaced by an amide containing backbone, in particular by an aminoethylglycine backbone. The nucleotide may be retained and bound directly or indirectly to the nitrogen heteroatom of the amide portion of the backbone. The backbone in the PNA compound may comprise two or more linked aminoethylglycine units, which results in PNA having an amide containing backbone. The heterocyclic base moiety may be directly or indirectly bound to the aza nitrogen atom of the amide moiety of the backbone.
The nucleic acid may include a morpholino backbone structure. For example, the nucleic acid may comprise a 6-membered morpholino ring in place of the ribose ring. In some of these embodiments, a phosphodiamide ester or other non-phosphodiester internucleoside linkage may replace a phosphodiester linkage.
The nucleic acid can include linked morpholino units having a heterocyclic base attached to a morpholino ring (e.g., morpholino nucleic acid). The linking group can be attached to a morpholino monomer unit in the morpholino nucleic acid. Nonionic morpholino-based oligomeric compounds can have fewer undesired interactions with cellular proteins. Morpholino-based polynucleotides may be nonionic mimics of nucleic acids. Various compounds within the morpholino class may be linked using different linking groups. An additional class of polynucleotide mimics may be referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in a nucleic acid molecule may be replaced by a cyclohexenyl ring. Using phosphoramidite chemistry, ceNA DMT protected phosphoramidite monomers can be prepared and used in oligomeric compound synthesis. Incorporation of CeNA monomers into nucleic acid strands can increase the stability of DNA/RNA hybrids. CeNA oligoadenylates can form complexes with nucleic acid complements, with similar stability as natural complexes. Additional modifications may include Locked Nucleic Acids (LNA) in which the 2 '-hydroxy group is attached to the 4' carbon atom of the sugar ring, thereby forming a 2'-C,4' -C-oxymethylene linkage, thereby forming a bicyclic sugar moiety. The linkage may be methylene (-CH) 2 (-), a group bridging the 2 'oxygen atom and the 4' carbon atom, wherein n is 1 or 2. LNAs and LNA analogs can exhibit very high duplex thermal stability (tm= +3 ℃ to +10 ℃) with complementary nucleic acids, stability to 3' -exonuclease degradation and good solubility.
Nucleic acids may also includeNucleobase (often simply referred to as "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleobases can include purine bases (e.g., adenine (a) and guanine (G)), as well as pyrimidine bases (e.g., thymine (T), cytosine (C), and uracil (U)). The modified nucleobases may include other synthetic as well as natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil (5-halouracil) and cytosine, 5-propynyl (-C.ident.C-CH) 3 ) Uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halogen, 8-amino, 8-thio, 8-thioalkyl, 8-hydroxy and other 8-substituted adenine and guanine, 5-halogen, in particular 5-bromo, 5-trifluoromethyl and other 5-substituted uracil and cytosine, 7-methyl guanine and 7-methyl adenine, 2-F-adenine, 2-amino adenine, 8-aza guanine and 8-aza adenine, 7-deazaguanine and 3-deazaadenine. Modified nucleobases may include tricyclopyrimidines such as phenoxazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzoxazin-2 (3H) -one), phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one), G-clamp (G-clamp) such as substituted phenoxazine cytidine (e.g., 9- (2-aminoethoxy) -H-pyrimido (5, 4- (b) (1, 4) benzoxazin-2 (3H) -one), phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one), G-clamp (e.g., substituted phenoxazine cytidine (e.g., 9- (2-aminoethoxy) -H-pyrimido (5, 4) (1, 4) benzoxazin-2 (3H) -one), carbazole cytidine (2H-pyrimido (4, 5-b) indolo (3H) -one), phenothiazine-2 (3H-pyrido-2, 4': 2 (3H) -one) ]Pyrimidin-2-one).
As used herein, the term "target" may refer to a nucleic acid of interest (e.g., target dsDNA). In some embodiments, the target may be associated with an adapter and/or a barcode. Exemplary suitable targets for analysis by the disclosed methods, devices, and systems include oligonucleotides, DNA, RNA, mRNA, micrornas, trnas, and the like. The target may be single-stranded or double-stranded. In some embodiments, the target may be a protein, peptide, or polypeptide. In some embodiments, the target is a lipid. As used herein, "target" may be used interchangeably with "species".
As used herein, the term "reverse transcriptase" may refer to a group of enzymes having reverse transcriptase activity (i.e., catalyzing the synthesis of DNA from an RNA template). Typically, such enzymes include, but are not limited to, retrovirus reverse transcriptase, retrotransposon reverse transcriptase, retroplasmid reverse transcriptase, retrotransposon reverse transcriptase, bacterial reverse transcriptase, group II intron-derived reverse transcriptase, and mutants, variants or derivatives thereof. Non-retroviral reverse transcriptases include non-LTR retrotransposon reverse transcriptases, retroplasmid reverse transcriptases, retrotranscriptase and group II intron reverse transcriptases. Examples of group II intron reverse transcriptases include lactococcus lactis (Lactococcus lactis) LI.LtrB intron reverse transcriptase, haematococcus elongatus (Thermosynechococcus elongatus) TeI4c intron reverse transcriptase, or Geobacillus stearothermophilus (Geobacillus stearothermophilus) GsI-IIC intron reverse transcriptase. Other classes of reverse transcriptase may include many types of non-retroviral reverse transcriptase (i.e., in particular, retrons, group II introns, and diversity generating reverse transcription elements).
As used herein, the term "isolated nucleic acid" may refer to the purification of nucleic acid from one or more cellular components. Those skilled in the art will appreciate that a sample that is treated to "isolate nucleic acids" therefrom may include components and impurities other than nucleic acids. The sample comprising the isolated nucleic acid may be prepared from the sample using any acceptable method known in the art. For example, the cells may be lysed using known lysing agents, and the nucleic acids may be purified or partially purified from other cellular components. Suitable reagents and protocols for DNA and RNA extraction can be found, for example, in U.S. patent application publication nos. US2010-0009351 and US 2009-013650, respectively (each of which is incorporated herein by reference in its entirety).
As used herein, a "template" may refer to all or a portion of a polynucleotide comprising at least one target nucleotide sequence.
As used herein, a "primer" may refer to a polynucleotide that may be used to initiate a nucleic acid chain extension reaction. The length of the primer may vary, for example, from about 5 to about 100 nucleotides, from about 10 to about 50 nucleotides, from about 15 to about 40 nucleotides, or from about 20 to about 30 nucleotides. The length of the primer may be about 10 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 50 nucleotides, about 75 nucleotides, about 100 nucleotides, or a range between any two of these values. In some embodiments, the primer has a length of 10 to about 50 nucleotides, i.e., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides. In some embodiments, the primer has a length of 18 to 32 nucleotides.
As used herein, a "probe" may refer to a polynucleotide that is capable of hybridizing (e.g., specifically) to a target sequence in a nucleic acid under conditions that allow hybridization, thereby allowing detection of the target sequence or amplified nucleic acid. "target" of a probe generally refers to a sequence within an amplified nucleic acid sequence or a subset of amplified nucleic acid sequences that specifically hybridizes to at least a portion of a probe oligomer by standard hydrogen bonding (i.e., base pairing). Probes may comprise target-specific sequences and other sequences that contribute to the three-dimensional conformation of the probe. Sequences are "substantially complementary" if they allow stable hybridization of the probe oligomer under appropriate hybridization conditions to a target sequence that is not fully complementary to the target-specific sequence of the probe. The length of the probe may vary, for example, from about 5 to about 100 nucleotides, from about 10 to about 50 nucleotides, from about 15 to about 40 nucleotides, or from about 20 to about 30 nucleotides. The length of the probe may be about 10 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 50 nucleotides, about 100 nucleotides, or a range between any two of these values. In some embodiments, the probe has a length of 10 to about 50 nucleotides. For example, the primer and/or probe may be at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides. In some embodiments, the probe may be non-sequence specific.
Preferably, the primers and/or probes may be between 8 and 45 nucleotides in length. For example, the primer and/or probe may be at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 or more nucleotides in length. Primers and probes may be modified to contain additional nucleotides at the 5 'end or the 3' end or both. Those skilled in the art will appreciate that the additional bases at the 3' end of the amplification primer (not necessarily the probe) are typically complementary to the template sequence. Primer and probe sequences may also be modified to remove nucleotides at the 5 'end or the 3' end. Those skilled in the art will appreciate that in order to function for amplification, the primer or probe will have a minimum length and annealing temperature as disclosed herein.
Primers and probes can be detected below the melting temperature (T m ) Is combined with their targets. As used herein, "T m "and" melting temperature "are interchangeable terms referring to a 50% double-stranded multiple A temperature at which the population of nucleotide molecules dissociates into single strands. Calculation of Polynucleotide T m Is well known in the art. For example, T may be calculated by the following equation m :T m =69.3+0.41× (g+c)% -6-50/L, where L is the length of the probe in nucleotides. T of hybrid Polynucleotide m Can also be estimated using the formula employed in hybridization assays from 1M salts, and is commonly used to calculate T for PCR primers m : [ (amount of A+T) ×2deg.C+ (amount of G+C) ×4deg.C)]. See, e.g., C.R.Newton et al PCR, 2 nd edition, springer-Verlag (New York: 1997), page 24 (incorporated herein by reference in its entirety). There are other more complex calculations in the art, which are in calculating T m The structural and sequence features are considered. The melting temperature of an oligonucleotide may depend on the complementarity between the oligonucleotide primer or probe and the binding sequence, as well as salt conditions. In some embodiments, the oligonucleotide primers or probes provided herein have a T of less than about 90℃in 50mM KCl, 10mM Tris-HCl buffer m For example, about 89 ℃, 88 ℃, 87 ℃, 86 ℃, 85 ℃, 84 ℃, 83 ℃, 82 ℃, 81 ℃, 80 ℃, 79 ℃, 78 ℃, 77 ℃, 76 ℃, 75 ℃, 74 ℃, 73 ℃, 72 ℃, 71 ℃, 70 ℃, 69 ℃, 68 ℃, 67 ℃, 66 ℃, 65 ℃, 64 ℃, 63 ℃, 62 ℃, 61 ℃, 60 ℃, 59 ℃, 58 ℃, 57 ℃, 56 ℃, 55 ℃, 54 ℃, 53 ℃, 52 ℃, 50 ℃, 49 ℃, 48 ℃, 47 ℃, 46 ℃, 45 ℃, 44 ℃, 43 ℃, 42 ℃, 41 ℃, 40 ℃, 39 ℃ or less, including ranges between any two of the listed values.
In some embodiments, the primers disclosed herein, e.g., amplification primers, can be provided as an amplification primer pair, e.g., comprising a forward primer and a reverse primer (a first amplification primer and a second amplification primer). Preferably, the forward and reverse primers have T's that differ by no more than 10 ℃, e.g., by less than 10 ℃, less than 9 ℃, less than 8 ℃, less than 7 ℃, less than 6 ℃, less than 5 ℃, less than 4 ℃, less than 3 ℃, less than 2 ℃, or less than 1 ℃ m 。
The primer sequence and the probe sequence can be modified by nucleotide substitutions (relative to the target sequence) within the oligonucleotide sequence, provided that the oligonucleotide comprises sufficient complementarity to specifically hybridize to the target nucleic acid sequence. In this way, at least 1, 2, 3, 4 or up to about 5 nucleotides may be substituted. As used herein, the term "complementary" may refer to sequence complementarity between regions of two polynucleotide strands or between two regions of the same polynucleotide strand. If at least one nucleotide of a first region of a polynucleotide is capable of base pairing with a base of a second region when the first region is aligned in an antiparallel manner with a second region of the same or a different polynucleotide, the two regions are complementary. Thus, two complementary polynucleotides are not required to base pair at each nucleotide position. "fully complementary" may refer to a first polynucleotide being 100% or "fully" complementary to a second polynucleotide and thus forming base pairs at each nucleotide position. "partially complementary" may also refer to a first polynucleotide that is not 100% complementary (e.g., 90%, 80%, or 70% complementary) and contains mismatched nucleotides at one or more nucleotide positions. In some embodiments, the oligonucleotide comprises a universal base.
As used herein, the term "substantially complementary" may refer to a continuous nucleic acid base sequence capable of hybridizing to another base sequence through hydrogen bonding between a series of complementary bases. The complementary base sequences may be complementary at each position in the oligomer sequence using standard base pairing (e.g., G: C, A: T or A: U), or may contain one or more non-complementary residues (including no base positions), but wherein the entire complementary base sequence is capable of specifically hybridizing to another base sequence under appropriate hybridization conditions. The contiguous bases may be at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99%, or 100% complementary to the sequence to which the oligomer is intended to hybridize. A substantially complementary sequence may refer to a sequence having a percent identity in the range of 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 75, 70 or less, or any number therebetween, as compared to a reference sequence. One skilled in the art can readily select appropriate hybridization conditions, which can be predicted based on base sequence composition, or determined by using routine testing (see, e.g., green and Sambrook, molecular Cloning, A Laboratory Manual, 4 th edition (Cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y., 2012)).
As used herein, the term "multiplex PCR" refers to a type of PCR in which more than one set of primers are contained in a reaction, allowing for amplification of a single target or two or more different targets in a single reaction vessel (e.g., tube). Multiplex PCR can be, for example, real-time PCR.
The disclosure herein includes methods, compositions, kits, and systems that enable rapid targeted sequencing (and thus rapid sequencing-based diagnostics, e.g., less than 2 hours) and therapeutic diagnostics that require simultaneous diagnosis and determination of suitable therapeutic methods. In some embodiments, the application of the rapid targeted sequencing method may include: rapid pathogen diagnosis, rapid cancer diagnosis, rare disease diagnosis (e.g., for cystic fibrosis).
The disclosure herein includes methods of using genome editing tools (e.g., cas proteins, zinc Finger Nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Argonaute proteins) to direct enzymes (e.g., transposases) to cleave nucleic acids at user-defined loci, thereby preparing custom locus-specific libraries for DNA and RNA sequencing. Enzymes (e.g., transposases) can be added to these site adaptors for sequencing, such as by next or third generation sequencing techniques (including, but not limited to, illumina, pacBio, roche, thermo Fisher, and Oxford Nanopore sequencing techniques).
Conventional library preparation methods for nucleic acid sequencing may take several hours to make, and the process produces a randomly generated library (fig. 1). The reason these libraries are random is that the output of DNA sequencing cannot be controlled because the methods used for nucleic acid fragmentation (including physical, enzymatic and chemical fragmentation methods) are fragmented in a random manner.
When there is interest in studying specific loci in the genome, millions of bases must be sequenced in hopes that sufficient sequence information will be available at these loci. After all of these data are obtained, bioinformatics methods must be used to extract information about the loci of interest. This process can be bioinformatic and computationally intensive, as most DNA that has been prepared and sequenced is not related to these regions of interest. Furthermore, there is a risk that these region information (coverage) is insufficient due to the randomness of the library preparation process. In this case, another library must be prepared and sequenced again in order to obtain adequate coverage of these areas, which wastes time and resources.
The rapid targeted library preparation method disclosed herein for sequencing using the custom locus specific library preparation (CLLP) method is a rapid process that takes only a few minutes to prepare, rather than the few hours required to prepare a library using conventional library preparation methods. In addition, libraries made by the CLLP methods disclosed herein are not random. In some embodiments, only selected loci are sequenced, while everything else is negligible, which provides cost-effectiveness, time and resource savings, and accuracy. Furthermore, by sequencing only the region of interest, the required bioinformatics resources and analysis will be minimal compared to standard methods. The custom locus specific library preparation (CLLP) methods disclosed herein enable DNA sequencing to be used as a rapid and affordable method for diagnostic and/or therapeutic diagnostics.
In some embodiments of CLLP, genome editing tools and transposases (e.g., superactive transposases) are used to achieve targeted fragmentation. Any genome editing tool that can make a user-defined double strand break in DNA or single strand break in RNA can be used. These means include, but are not limited to, CAS protein, ZFN, TALEN, argonaute protein, or any combination thereof. In some embodiments, genome editing tools are used to control and direct fragmentation of nucleic acids to specific regions of the genome that can be precisely selected. The cleavage by the genome editing tool can be used as a start site for a sequencing adapter. This in turn will severely bias the genomic region to be sequenced. The programmable fragmentation process disclosed herein can result in targeted sequencing. In addition, the method may be used with any sequencing technique, including but not limited to Illumina, pacBio, oxford Nanopore, roche and Thermo Fisher sequencing techniques. Fig. 5A-5F depict non-limiting exemplary embodiments of the custom locus specific library preparation (CLLP) disclosed herein.
In some embodiments, enzymatic fragmentation involves preparing a library for DNA sequencing that utilizes a superactive transposase. Enzymatic fragmentation double-stranded DNA was cleaved using transposons and DNA adaptors were attached at the cleavage sites (fig. 2). Cleavage fragmentation is a very rapid process that prepares the library in a relatively short period of time compared to standard library preparation methods. However, transposons cleave the genome in a random unbiased manner.
In some embodiments, to increase the speed of library preparation, the methods disclosed herein use a transposon that is linked to a genome editing tool. For example, the dCAS9 protein can be used as a genome editing tool. The dCAS9 protein can bind to guide RNAs that are programmable to specific regions of the genome. dCAS9 is a CAS9 protein that is mutated such that the nuclease activity of the CAS9 protein is lost, but retains target specificity. After dCAS9 binds its target, the transposase attached to CAS9 protein will cleave the DNA and attach to the cleavage site adapter for sequencing. The end result is a targeted DNA fragment ready for sequencing, shortening the targeted library preparation process to a few minutes instead of a few hours (fig. 3).
Non-limiting advantages that can be achieved by the methods, compositions, kits, and systems of the present disclosure include: faster acquisition time; using fewer laboratory, bioinformatics and computational resources than the prior art; allowing rapid detection and quantification of rare and low frequency variants; more samples than whole genome sequencing can be analyzed; can be used as a rapid diagnostic tool, capable of detecting more than one customizable number of targets simultaneously; simpler and clearer data analysis; and any combination thereof.
Currently, two methods have been used for targeted sequencing. The first is amplicon sequencing. This method relies on the use of primers to amplify a region of interest by DNA amplification. This additional amplification step further increases the cost, time and resources of standard library preparation methods. The second targeted sequencing method is target capture. Such methods rely on the use of probes or pools of probes so that they can hybridize to a particular nucleic acid target. Hybridization of probes to their targets and separation of these targets is a time consuming process, which may take days. In addition, the probes used in this method are expensive to synthesize.
In some embodiments, a superactive transposase Tn5 linked to a dCAS9 protein can be used. dCAS9 is a catalytically dead form of the CAS9 protein, which is mutated such that the nuclease activity of the CAS9 protein is lost, but which retains programmable DNA binding activity. The N-terminus of the dCAS9 protein is attached to the C-terminus of the Tn5 transposase by a linker (e.g., X-TEN), SNAP-tag or CLIP-tag. Although many different methods may be employed to attach the two proteins. TN5 transposase will be loaded with sequencing adaptors specific to the sequencing technology platform. dCas9 protein will be attached to guide RNAs (sgrnas) specific for user-defined loci. More than one sgRNA binds to the dCAS9 protein separately, each targeting a different locus for selection of more than one locus.
After dCAS9 attached to the sgRNA finds a molecule complementary to the sgRNA sequence, the attached Tn5 transposase can cleave the DNA at the designated site and attach to the cleavage site adapter for sequencing. The end result is a targeted DNA fragment ready for sequencing, shortening the targeted library preparation process to a few minutes instead of a few hours (fig. 3).
In some embodiments, the superactive transposase that is not a Tn5 transposase may be a mariner Tc 1-like transposon, a Himar1C9 transposase, a sleeping beauty transposase, a Tn7 transposon, or a combination thereof. In some embodiments, alternatives to dCas9 protein can be used for programmable DNA binding activity. For example, zinc fingers that do not bind to FOK1 nucleases can be used. Similarly, TALEN molecules without FOK1 nuclease can be used. In some embodiments, the use of a recombinase in combination with sequence-specific primers can be used as the programmable DNA binding molecule. In some embodiments, alternative methods of preparing a locus-specific library can be accomplished by using only genome editing tools (e.g., cas protein, zinc Finger Nuclease (ZFN), transcription activator-like effector nuclease (TALEN), argonaute protein) without the aid of a transposase. This will result in a programmable nucleic acid fragmentation method (FIG. 4) that can be further used to prepare a locus specific sequencing library.
Disclosed herein are the use of genome editing tools as programmable tools to target specific regions of the genome, and the use of transposases to cut and paste the adaptors required to create a sequencing library.
Some embodiments provide a disease group (e.g., sepsis group) configured to identify a pathogen/disease cause (genetic mutation) and simultaneously identify susceptibility to an antibiotic. In some embodiments, a cancer panel may include the identification of more than one mutation in a cancer cell. In some embodiments, the rare disease group may include sequencing of a particular locus associated with a mutation that may lead to a genetic disease (e.g., cystic fibrosis).
Each of the following patent application publications is hereby incorporated by reference in its entirety: WO2016028843A2 and WO2018175872A1, US20190144920A1 and CA3026206A1.
The disclosure herein includes compositions. In some embodiments, the composition comprises: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target dsDNA. In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other.
The disclosure herein includes reaction mixtures. In some embodiments, the reaction mixture comprises: the compositions disclosed herein and sample nucleic acids suspected of comprising one or more target dsDNA. The reaction mixture may comprise: DNA polymerase, dntps, or a combination thereof. The adapter may be covalently attached to the target dsDNA or fragment thereof. The reaction mixture may comprise: more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of more than one protein complex at each end, respectively.
The disclosure herein includes methods for tagging nucleic acids. In some embodiments, the method comprises: contacting a composition disclosed herein with a sample suspected of containing more than one target dsDNA to form a reaction mixture; and incubating the reaction mixture to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively.
The disclosure herein includes methods for generating sequencing libraries. In some embodiments, the method comprises: the compositions disclosed herein are contacted with a sample suspected of containing more than one target dsDNA to form a reaction mixture. The method may include: the reaction mixture is incubated to generate more than one dsDNA fragment, each fragment comprising a first adaptor and a second adaptor of one of the more than one protein complex at each end, respectively. The contacting of more than one target dsDNA with more than one protein complex pair may be performed at about 25 ℃ to about 85 ℃ (e.g., about 25 ℃, 26 ℃, 27 ℃, 28 ℃, 29 ℃, 30 ℃, 31 ℃, 32 ℃, 33 ℃, 34 ℃, 35 ℃, 36 ℃, 37 ℃, 38 ℃, 39 ℃, 40 ℃, 41 ℃, 42 ℃, 45 ℃, 50 ℃, 55 ℃, 60 ℃, 65 ℃, 70 ℃, 75 ℃, 80 ℃, 85 ℃, or numbers or ranges between any two of these values). Incubating the reaction mixture may include incubating the reaction mixture at about 37 ℃ to about 55 ℃ (e.g., about 37 ℃, 38 ℃, 39 ℃, 40 ℃, 41 ℃, 42 ℃, 43 ℃, 44 ℃, 45 ℃, 46 ℃, 47 ℃, 48 ℃, 49 ℃, 50 ℃, 51 ℃, 52 ℃, 53 ℃, 54 ℃, 55 ℃, or a value or range between any two of these values).
More than one protein complex pair and more than one target dsDNA may be present in a molecular ratio of about 2:1 to about 2000:1 (e.g., 2:1, 2.5:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 21:1, 22:1, 23:1, 24:1, 25:1, 26:1, 27:1, 28:1, 29:1, 30:1, 31:1, 32:1, 33:1, 34:1, 35:1, 36:1, 37:1, 38:1, 39:1, 40:1, 41:1, 42:1, 43:1, 44:1, 45:1, 46:1, 47:1, 48:1, 49:1, 50:1, 51:1, 52:1, 53:1, 54:1, 55:1, 56:1 }. 57:1, 58:1, 59:1, 60:1, 61:1, 62:1, 63:1, 64:1, 65:1, 66:1, 67:1, 68:1, 69:1, 70:1, 71:1, 72:1, 73:1, 74:1, 75:1, 76:1, 77:1, 78:1, 79:1, 80:1, 81:1, 82:1, 83:1, 84:1, 85:1, 86:1, 87:1, 88:1, 89:1, 90:1, 91:1, 92:1, 93:1, 94:1, 95:1, 96:1, 97:1, 98:1, 99:1, 100:1, 200:1, 300:1, 400:1, 500:1, 600:1, 700:1, 800:1, 900:1, 1000:1, 2000:1, or a number or range between any two of these values) is present in the reaction mixture. In some embodiments, more than one protein complex pair and more than one target dsDNA are present in a molecular ratio of about 2:1 to about 200:1 (e.g., 2:1, 2.5:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 21:1, 22:1, 23:1, 24:1, 25:1, 26:1, 27:1, 28:1, 29:1, 30:1, 31:1, 32:1, 33:1, 34:1, 35:1, 36:1, 37:1, 38:1, 39:1, 40:1, 41:1, 42:1, 43:1, 44:1, 45:1, 46:1, 47:1, 48:1, 49:1, 50:1, 51:1, 35:1, 38:1, 47:1, 39:1, 40:1, 42:1, and 42:1 52:1, 53:1, 54:1, 55:1, 56:1, 57:1, 58:1, 59:1, 60:1, 61:1, 62:1, 63:1, 64:1, 65:1, 66:1, 67:1, 68:1, 69:1, 70:1, 71:1, 72:1, 73:1, 74:1, 75:1, 76:1, 77:1, 78:1, 79:1, 80:1, 81:1, 82:1, 83:1, 84:1, 85:1, 86:1, 87:1, 88:1, 89:1, 90:1, 91:1, 92:1, 93:1, 94:1, 95:1, 96:1, 97:1, 98:1, 99:1, 100:1, 200:1, or a number or range between any two of these values) is present in the reaction mixture.
The binding sites of at least two of the more than one protein complexes may be on the same target dsDNA. Binding sites for at least two of more than one protein complex may be about 1-50000 nucleotides apart on the same target dsDNA. In some embodiments, the binding sites of at least two of the more than one protein complex may be or may be about the following nucleotides apart on the same target dsDNA: 1. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, or a quantity or range between any two of these values. In some embodiments, the binding sites of at least two of the more than one protein complex may be at least or at most the following nucleotides apart on the same target dsDNA: 1. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 10000, 8000, 20000, 200000, 90000, 100000, 70000, 100000, or 70000. The distance between the binding sites of one pair of more than one protein complex may be substantially the same as the distance between the binding sites of another pair of more than one protein complex. The distance between the binding sites of one pair of more than one protein complex may be different from the distance between the binding sites of another pair of more than one protein complex. The binding sites of at least two of the more than one protein complexes may be located on different strands of the target dsDNA. At least two of the more than one protein complexes are capable of specifically binding to different target dsDNA. More than one protein complex is capable of specifically binding between about 2 and 5000 targets dsDNA. In some embodiments of the present invention, in some embodiments, more than one protein complex is capable of specifically binding about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 128, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000 or a number or range of target dsDNA between any two of these values.
Swivel base
In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. At least two of the more than one protein complexes may comprise the same transposomes. All of the more than one protein complexes may comprise the same transposomes. All of the more than one protein complexes may comprise the same transposase. The transposase may be a Tn5 transposase, tn7 transposase, mariner Tc 1-like transposase, himar1C9 transposase or sleeping beauty transposase. The transposase may be a superactive transposase.
The transposase may be Tn5, tn7, muA or Vibrio harveyi (Vibrio harveyi) transposase or an active mutant thereof. In some embodiments, the transposase is a Tn5 transposase or a mutant thereof. In some embodiments, the Tn5 transposase is a superactive Tn5 transposase, or an active mutant thereof. In some embodiments, the Tn5 transposase is a Tn5 transposase as described in WO2015/160895, which is incorporated herein by reference. In some embodiments, the Tn5 transposase is a superactive Tn5 having a mutation at positions 54, 56, 372, 212, 214, 251 and 338 relative to the wild type Tn5 transposase. In some embodiments, the Tn5 transposase is a superactive Tn5 having the following mutations relative to the wild type Tn5 transposase: E54K, M56A, L372P, K212R, P214R, G251R and A338V. In some embodiments, the Tn5 transposase is a fusion protein. In some embodiments, the Tn5 transposase fusion protein comprises a fused elongation factor Ts (Tsf) tag. In some embodiments, the Tn5 transposase is a mutant superactive Tn5 transposase comprising at amino acids 54, 56 and 372 relative to the wild type sequence. In some embodiments, the superactive Tn5 transposase is a fusion protein. In some embodiments, the recognition site is a Tn5 transposase recognition site.
The transposase may comprise a single protein or comprise more than one protein subunit. The transposase may be an enzyme capable of forming a functional complex with a transposon end or a transposon end sequence. In some embodiments, the transposase complex comprises a transposase (e.g., tn5 transposase) dimer comprising first and second monomers. In some embodiments, the transposome complex comprises a dimer of two molecules of a transposase.
Transposases and/or transposomes may vary depending on the embodiment. The transposase may comprise a Tn5 transposase. THE transposase may be a Tn transposase (e.g., tn3, tn5, tn7, tn10, tn552, tn 903), a MuA transposase, a Vibhar transposase (e.g., from Vibrio harveyi), ac-Ds, ascot-1, bs1, cin4, copia, en/Spm, F elements, hobo, hsmar1, hsmar2, IN (HIV), IS1, IS2, IS3, IS4, IS5, IS6, IS10, IS21, IS30, IS50, IS51, IS150, IS256, IS407, IS427, IS630, IS903, IS911, IS982, IS1031, ISL2, L1, mariner, P elements, tam3, tc1, tc3, tel, THE-1, tn/O, tnA, tn3, tn5, tn7, tn10, tn552, tol1, tn 2, tn1, ty1, or any of THE other transposases listed or a transposase derived from any of THE organisms thereof. In some embodiments, a transposase associated with and/or derived from a parent transposase may comprise a corresponding peptide fragment of the parent transposase Peptide fragments having at least about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99% amino acid sequence homology. The peptide fragment may be at least about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 60, about 70, about 80, about 90, about 100, about 150, about 200, about 250, about 300, about 400, or about 500 amino acids in length. For example, a Tn5 derived transposase may comprise a peptide fragment 50 amino acids in length and about 80% homologous to the corresponding fragment in the parent Tn5 transposase. In some cases, insertion may be facilitated and/or triggered by the addition of one or more cations. The cation may be a divalent cation, such as Ca 2+ 、Mg 2+ And Mn of 2+ 。
Adapter
The first adaptor and the second adaptor in the same transposome may be the same. The first adapter, the second adapter, or both in different transposomes may be different. The first adaptor, the second adaptor, or both may be dsDNA or an RNA/DNA duplex. The length of the adapter can be about 3-200 base pairs (e.g., about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200 or a number or range of nucleotides between any two of these values). In some embodiments, the length of the adapter can be 3-500 base pairs (e.g., a number or range of nucleotides between about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 300, 400, 500, or any two of these values in length). The first adapter, the second adapter, or both may be sequencing adapters. The sequencing adaptors may comprise one or more components employed in a given sequencing scheme, such as sequencing platform adaptor constructs, indexing domains, clustering domains (clustering domains), and the like. The sequencing adapter may comprise a P5 or P7 primer sequence. In some embodiments, the first adapter and/or the second adapter comprises a barcode (e.g., a random barcode). In some embodiments, the first adapter and/or the second adapter comprises a universal sequence. In some embodiments, the first adaptor and/or the second adaptor comprises a single stranded portion and/or a double stranded portion. In some embodiments, the adapter comprises a transposon end sequence that binds to a transposase. The transposon end sequences may be double stranded. In some embodiments, the transposon end sequence is a Mosaic End (ME) sequence. In particular embodiments, the transposon end is a mosaic end, or a superactive form of a transposon end. The adaptor sequence may be attached to one of the two transposon end sequences. Thus, in some embodiments, the first adaptor transposon end sequence is an ME sequence and the second adaptor end sequence is an ME' sequence.
The first adapter and/or the second adapter may comprise one or more nucleotides (or analogs thereof) that are modified or otherwise non-naturally occurring. For example, the first adapter and/or the second adapter can include one or more nucleotide analogs (e.g., LNA, FANA, 2'-O-Me RNA, 2' -fluoro RNA, etc.), linkage modifications (e.g., phosphorothioate, 3'-3', and 5'-5' reverse linkages), 5 'and/or 3' terminal modifications (e.g., 5 'and/or 3' amino, biotin, DIG, phosphate, thiol, dye, quencher, etc.), one or more fluorescently labeled nucleotides, or any other feature that provides a desired function.
The first adapter and/or the second adapter may comprise all or a component of a sequencing platform adapter construct. "sequencing platform adapter construct" refers to a nucleic acid construct that includes at least a portion of a nucleic acid domain used by a sequencing platform of interest (e.g., a sequencing platform adapter nucleic acid sequence), e.g., bySequencing platforms provided (e.g., hiSeq TM 、MiSeq TM And/or Genome Analyzer TM A sequencing system); ion Torrent TM (e.g., ion PGM) TM And/or Ion Proton TM A sequencing system); pacific Biosciences (e.g., PACBIO RS II sequencing System); life Technologies TM Company (e.g., SOLiD sequencing system); roche (e.g., 454GS FLX+ and/or GS Junior sequencing systems); or any other sequencing platform of interest. The first adaptor and/or the second adaptor may comprise one or more nucleic acid domains selected from the group consisting of: specific binding surface attached sequencing platform oligonucleotides (e.g., attached to +.>A domain (e.g., a "capture site" or "capture sequence") of a P5 or P7 oligonucleotide on the surface of a flow cell in a sequencing system; sequencing primer binding domains (e.g., +.>A domain to which a read 1 or read 2 primer of the platform can bind); a barcode domain (e.g., a domain that uniquely identifies the sample source of a nucleic acid being sequenced by labeling each molecule from a given sample with a specific barcode or "tag" to effect sample multiplexing); barcode sequencing primer binding domain (the domain of primer binding for barcode sequencing); a molecular identification domain (e.g., a molecular index tag, such as a randomized tag of 4, 6, or other number of nucleotides) for uniquely labeling a molecule of interest to determine expression levels based on the number of instances that the unique tag is sequenced; or any combination of such domains. In some embodiments, the barcode domain (e.g., sample index tag) and the molecular identification domain (e.g., molecular index tag) may be contained in the same nucleic acid.
When present in the first adapter and/or the second adapter, the sequencing platform adapter domain may include one or more nucleic acid domains of any length and sequence suitable for the sequencing platform of interest. The nucleic acid domain may have a polynucleotide (e.g., oligonucleotide) that allows the sequencing platform of interest to employNucleotide) is capable of specifically binding to the length and sequence of a nucleic acid domain, for example for solid phase amplification and/or sequencing by synthesis of a cDNA insert flanked by nucleic acid domains. Exemplary nucleic acid domains are included on the basis ofP5, P7, read 1 primer and read 2 primer domains used on the sequencing platform of (c). Other example nucleic acid domains are included in Ion-based Torrent TM A-adaptor and P1-adaptor domains employed on the sequencing platform of (c).
The nucleotide sequence of the nucleic acid domains that can be used for sequencing on the sequencing platform of interest can change and/or vary over time. The adaptor sequences are typically provided by the manufacturer of the sequencing platform (e.g., in technical documentation provided by the sequencing system and/or available on the manufacturer's website). Based on this information, the sequence of the adaptors provided herein can be designed to include all or a portion of one or more nucleic acid domains configured to sequence the target dsDNA on the platform of interest.
The first adaptor and/or the second adaptor may comprise Ion PGM TM Sequencing platforms (e.g., ion PGM TM And/or Ion Proton TM Sequencing system). The first adapter and/or the second adapter may comprise a P1 adapter, an A adapter, an Ion Xpress TM Barcode adaptors, ion P1 adaptors, and/or Ion Xpress TM Barcode X adapter.
The first adapter and/or the second adapter may comprise a hairpin. The first adapter and/or the second adapter may be configured to generate SMRT bell TM Technical library. The methods provided herein can result in ligation of hairpin adaptors to the ends of the double-stranded fragments to produce a circular template molecule having a central double-stranded portion and a single-stranded hairpin loop of the ends (see from PacificSMRTbell of (a) TM ). For example in U.S. Pat. No. 8,003,330 entitled "Error-free amplification of DNA for clonal sequencingPreparation and use are described in US2009/0280538 entitled "Methods and compositions for nucleic acid sample preparation", for exampleThe method of annular template of the template, the entire disclosure of which is hereby incorporated by reference for all purposes.
The first adapter and/or the second adapter may be configured for downstream use of the tagged nucleic acid on an ONT instrument (e.g., smidgION, minION, gridION, promethION). Fig. 6 depicts a non-limiting exemplary embodiment of a rapid sequencing kit based on enzymatic cleavage fragmentation showing ONT rapid sequencing kit. The first adapter and/or the second adapter may comprise (i) a spacer; (ii) A motor protein that is arrested on the spacer, wherein an active site of the motor protein is occupied by the spacer; and/or (iii) a blocking moiety bound to the adapter, wherein the blocking moiety prevents the motor protein from exiting the spacer. The first adapter and/or the second adapter may comprise hairpin loop adapters. Hairpin loop adaptors may be adaptors comprising a single polynucleotide strand, wherein the ends of the polynucleotide strands are capable of hybridizing to each other, or are hybridized to each other, and wherein the middle portion of the polynucleotide forms a loop. Suitable hairpin loop adaptors can be designed using methods known in the art. The first adapter and/or the second adapter may comprise a linear adapter. The first adapter and/or the second adapter may be a Y adapter. Y adaptors are typically polynucleotide adaptors. The Y adapter is typically double-stranded and includes (a) a region where the two strands hybridize together at one end, and (b) a region where the two strands are not complementary at the other end. The non-complementary portions of the strand typically form a protruding portion. The presence of non-complementary regions in the Y-adaptor gives the adaptor a Y-shape, since, unlike the double stranded portion, the two strands will not typically hybridize to each other. The two single stranded portions of the Y adapter may be of the same length or may be of different lengths. The motor protein may bind to a protruding portion of an adapter, such as a Y adapter. In some embodiments, the motor protein may bind to a double stranded region. In some embodiments, the motor protein may bind to single-and/or double-stranded regions of the adapter. In some embodiments, a first motor protein may bind to a single-stranded region of such an adapter, and a second motor protein may bind to a double-stranded region of the adapter. The first adapter and/or the second adapter may comprise additional binding components that facilitate the nanopore sequencing reaction, such as binding enzymes (e.g., helicases, polymerases, or other motor proteins), membrane-binding moieties (e.g., cholesterol), and the like. Typically, the motor protein is a helicase, a polymerase, an exonuclease, a topoisomerase, or a variant thereof. In some embodiments, the motor protein on the spacer of the polynucleotide adapter is modified to prevent the motor protein from being detached from the spacer (except by removing the ends of the spacer). The motor protein may be modulated in any suitable manner. Fig. 7A-7H depict non-limiting exemplary embodiments of genome editing enzyme digestion fragmentation (GET) for generating sequencing libraries for existing sequencing platforms (e.g., sequencing platforms from Oxford Nanopore).
The adaptors provided herein (e.g., first adaptors and/or second adaptors) can include barcodes, such as random barcodes, and can include one or more markers. Barcoding, such as random barcoding, has been described in, for example, fu et al, proc Natl Acad Sci u.s.a., 201mmay 31,108 (22): 9026-31; US2011/0160078; fan et al, science,2015,347 (6222): 1258367; US2015/0299784 and WO 2015/031691; the content of each of these, including any supporting or supplemental information or material, is incorporated herein by reference in its entirety. In some embodiments, the barcodes disclosed herein may be random barcodes, which may be polynucleotide sequences that may be used to randomly label (e.g., barcoded, tagged) a target. If the ratio of the number of different barcode sequences of the random barcode to the number of occurrences of any target to be labeled can be or can be about the following: a bar code may be referred to as a random bar code if it is a number or range between 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or any two of these values. The target may be an mRNA species comprising mRNA molecules having the same or nearly the same sequence. If the ratio of the number of different barcode sequences of the random barcode to the number of occurrences of any target to be labeled is at least or at most: 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1, 14:1, 15:1, 16:1, 17:1, 18:1, 19:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, or 100:1, then the bar code may be referred to as a random bar code. The barcode sequence of a random barcode may be referred to as a molecular marker.
The adapter and/or barcode may include one or more universal labels. In some embodiments, one or more universal labels may be the same for all barcodes and/or adaptors. In some embodiments, the universal label may include a nucleic acid sequence capable of hybridizing to a sequencing primer. Sequencing primers can be used to sequence barcodes comprising universal labels. Sequencing primers (e.g., universal sequencing primers) can include sequencing primers associated with a high throughput sequencing platform. In some embodiments, the universal label may comprise a nucleic acid sequence capable of hybridizing to a PCR primer. In some embodiments, the universal label may include a nucleic acid sequence capable of hybridizing to a sequencing primer and a PCR primer. A universally tagged nucleic acid sequence capable of hybridizing to a sequencing primer or PCR primer may be referred to as a primer binding site. A universal tag may include sequences that can be used to initiate transcription of a barcode. The universal label may include a sequence that may be used to extend the barcode or a region within the barcode. The length of the universal mark may be the following or may be about the following: 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 nucleotides or a number or range of nucleotides between any two of these values. For example, a universal label may comprise at least about 10 nucleotides. The length of the universal mark may be at least or may be at most: 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 200 or 300 nucleotides.
The bar code (e.g., a random bar code) may include one or more indicia. Exemplary labels may include universal labels, cellular labels, barcode sequences (e.g., molecular labels), sample labels, plate labels, spatial labels, and/or pre-spatial labels (pre-spatial labels). The bar code may comprise a universal label, a dimensional label, a spatial label, a cellular label, and/or a molecular label. The order of the different labels in the bar code (including but not limited to universal labels, dimensional labels, spatial labels, cellular labels, and molecular labels) may vary. For example, the universal label may be a 5 'most label and the molecular label may be a 3' most label. The spatial marker, the dimensional marker and the cell marker may be in any order. In some embodiments, the universal label, the spatial label, the dimensional label, the cellular label, and the molecular label are in any order. In some embodiments, the labels (e.g., universal labels, dimensional labels, spatial labels, cellular labels, and barcode sequences) of the barcode may be separated by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides.
A marker (e.g., a cell marker) may comprise a unique set of nucleic acid subsequences of defined length, e.g., seven nucleotides each (corresponding to the number of bits used in some hamming error correction codes), which may be designed to provide error correction capability. A set of error-correcting sequences comprising seven nucleotide sequences may be designed such that any pairwise combination of sequences in the set exhibits a defined "genetic distance" (or number of mismatched bases), e.g., a set of error-correcting sequences may be designed to exhibit a genetic distance of three nucleotides. In this case, the review of the error correction sequences in the sequence data set of the labeled target nucleic acid molecule (described in more detail below) may allow one to detect or correct amplification errors or sequencing errors. In some embodiments, the nucleic acid subsequences used to generate the error-correction code may vary in length, e.g., they may be or may be about the following: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 31, 40, 50 nucleotides or a number or range of nucleotides between any two of these values. In some embodiments, other lengths of nucleic acid subsequences may be used to generate error correction codes.
CRISPR related proteins
The programmable DNA binding unit can comprise a nuclease-deficient CRISPR-associated protein (dCAS protein) and a guide RNA (gRNA) capable of specifically binding to a binding site of a target dsDNA. The dAS protein may be dAS 9, dAS 12, dAS 13, dAS 14 or SpRY dAS. The dAS 13 protein may be dAS 13a, dAS 13b, dAS 13c or dAS 13d.
In some embodiments, the Cas9 protein has an inactive (e.g., inactive) DNA cleavage domain. Nuclease-inactivated Cas9 protein may be interchangeably referred to as "dCas9" protein (Cas 9 representing nuclease death). Methods for producing Cas9 proteins (or fragments thereof) with inactive DNA cleavage domains are known (see, e.g., jink et al, science.337:816-821 (2012); qi et al, (2013) cell.28;152 (5): 1173-83, each of which is incorporated herein by reference in its entirety). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, namely an HNH nuclease subdomain and a RuvC1 subdomain. HNH subdomains cleave the strand complementary to gRNA, while RuvCl subdomains cleave the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas 9. For example, mutations D10A and H840A completely inactivate the nuclease activity of Streptococcus pyogenes (S.pyogens) Cas9 (Jinek et al and Qi et al).
The programmable DNA-binding unit can comprise a suitable nuclease-deficient Cas protein, which can still bind the guide RNA. The programmable DNA-binding unit can comprise a type 2 type II Cas protein. The class 2 type II Cas protein may be a mutated Cas protein compared to the wild-type counterpart. The mutated Cas protein may be nuclease-deficient. The mutated Cas protein may be a mutated Cas9. The mutated Cas9 may be Cas9D10A. Other examples of mutations in Cas9 include H820A, D839A, H840A, N863A or any combination thereof, e.g., D10A/H820A, D10A, D a/D839A/H840A and D10A/D839A/H840A/N863A. The mutations described herein refer to SpCas9 and also include similar mutations in CRISPR proteins other than SpCas 9. The programmable DNA-binding units can include streptococcus pyogenes Cas9 (SpCas 9), staphylococcus aureus (Staphylococcus aureus) Cas9 (SaCas 9), cas1B, cas2, cas3, cas4, cas5, cas6, cas7, cas8, cas9, cas100, csy1, csy2, csy3, cse1, cse2, csc1, csc2, csa5, csn2, csm3, csm4, csm5, csm6, cmr1, cmr3, cmr4, cmr5, cmr6, csb1, csb2, csb3, csx17, csx14, csx10, csx16, csaX, csx3, csx1, csx15, csf1, csf2, csf3, csf4, cpf1, C2C1, C3, 12a, cas12b, cas12C, cas12d, cas13b, cas13C, or any combination thereof. Cas9 molecules of a variety of species may be used in the methods and compositions described herein. Although streptococcus pyogenes and staphylococcus aureus Cas9 molecules are the subject of much of the disclosure herein, cas9 molecules derived from or based on Cas9 proteins of other species listed herein may also be used. These include, for example, cas9 molecules from the following: watermelon acidophilus (Acidovorax avenae), actinobacillus pleuropneumoniae (Actinobacillus pleuropneumoniae), actinobacillus succinogenes (Actinobacillus succinogenes), actinobacillus suis (Actinobacillus suis), actinomyces species (Actinomyces sp.), denitrifying Bacillus circulans (cycliphilus denitrificans), aminomonas baumannii (Aminomonas paucivorans), bacillus cereus (Bacillus cereus), bacillus smithii (Bacillus smithii), bacillus thuringiensis (Bacillus thuringiensis), bacteroides species (Bacillus cereus sp.), blastopirellula marina, rhizobium species (Bradyrhizobium sp.), brevibacterium laterosporus (Brevibacillus laterosporus), campylobacter coli (Campylobacter coli), campylobacter jejuni (Campylobacter coli), gull campylobacter (Campylobacter coli), campylobacter coli, clostridium cellulolyticum (Campylobacter coli), clostridium perfringens (Campylobacter coli), corynebacterium crowded (Campylobacter coli), corynebacterium diphtheriae (Campylobacter coli), corynebacterium equi (Campylobacter coli), bacillus longum (Campylobacter coli), bacillus gammae (Campylobacter coli), haemophilus (Campylobacter coli), lactobacillus acidophilus (Campylobacter coli) and lactobacillus (Campylobacter coli), listeria monocytogenes (Listeria monocytogenes), listeriaceae (Listeriaceae) bacteria, methylspora species (methylcysts sp.), methylcurved bacteria (Methylosinus trichosporium), shy campylobacter (Mobiluncus mulieris), neisseria bacilliformis, neisseria cinerea (Neisseria cinerea), neisseria flavum (Neisseria flavescens), neisseria alani (Neisseria lactamica), neisseria meningitidis (Neisseria meningitidis), neisseria species (neissenia sp.), neisseria wadsworthii, nitromonas species (Nitrosomonas sp.), parvibaculum lavamentivorans, pasteurella multocida (Pasteurella multocida), phascolarctobacterium succinatutens, ralstonia syzygii, rhodopseudomonas palustris (Rhodopseudomonas palustris), rhodopseudomonas species (Rhodovulum sp.), simonsii (Simonsiella muelleri), sphingomonas species (Sphingomonas sp.), sporolactobacillus vineae, staphylococcus lugdunensis (3772), streptococcus species (Streptococcus), micrococcus species (strococcus sp.), micrococcus sp (spirococcus sp.), or helicoid (Verminephrobacter eiseniae). Methods for catalyzing inactivating mutations and assessing nuclease activity of the mutants are known to those skilled in the art.
The programmable DNA binding unit may comprise a guide molecule. The guide RNA molecule (sgRNA) may consist of two separate molecules: target-specific crrnas and tracrRNA bound to Cas molecules. In some embodiments, the crRNA and tracrRNA are provided as separate molecules and one must anneal them to make a functional sgRNA. As used herein, the terms "guide sequence" and "guide molecule" in the context of a CRISPR-Cas system include any polynucleotide sequence that has sufficient complementarity to a selected binding site to hybridize to the selected binding site and to direct the specific binding of a programmable DNA binding unit to the sequence of the selected binding site. A gRNA molecule can refer to a nucleic acid that promotes specific targeting or homing of the gRNA molecule/Cas 9 molecule complex to a target binding site. The gRNA molecules can be single-molecular (with a single RNA molecule) (e.g., chimeric) or modular (comprising more than one, and typically two independent RNA molecules). The guide sequences prepared using the methods disclosed herein can be full length guide sequences, truncated guide sequences, full length sgRNA sequences, truncated sgRNA sequences, or e+fsgrna sequences. In some embodiments, the degree of complementarity of the guide sequence to a given binding site is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. In certain exemplary embodiments, the guide molecule comprises a guide sequence that can be designed to have at least one mismatch with the binding site such that an RNA duplex is formed between the guide sequence and the binding site. Thus, the degree of complementarity is preferably less than 99%. For example, when the guide sequence consists of 24 nucleotides, the degree of complementarity is more specifically about 96% or less. In certain embodiments, the guide sequence is designed with segments of two or more adjacent mismatched nucleotides such that the degree of complementarity of the entire guide sequence is further reduced. For example, when the guide sequence consists of 24 nucleotides, the degree of complementarity is more specifically about 96% or less, more specifically about 92% or less, more specifically about 88% or less, more specifically about 84% or less, more specifically about 80% or less, more specifically about 76% or less, more specifically about 72% or less, depending on whether a segment of two or more mismatched nucleotides comprises 2, 3, 4, 5, 6, or 7 nucleotides, and the like. In some embodiments, the degree of complementarity, in addition to the one or more segments of mismatched nucleotides, is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using a suitable alignment algorithm. The optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, the Burrows-Wheeler transformation-based algorithm (e.g., burrows-Wheeler aligners), clustal W, clustal X, clustal Omega, BLAT, novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, san Diego, calif.), SOAP (available at SOAP. Genemics. Org. Cn), and Maq (available at maq. Sourceforge. Net). The ability of the guide sequence (within the nucleic acid-targeted guide RNA) to direct sequence-specific binding of the programmable DNA binding unit to the selected binding site can be assessed by any suitable assay. In some embodiments, the guide sequence is an RNA sequence between 10 and 50nt in length, but more specifically about 20-30nt, advantageously about 20nt, 23-25nt or 24nt. The guide sequence may be selected to ensure hybridization with the selected binding site.
Death guidance sequence (DeadGuide Sequences)
The programmable DNA binding unit can comprise a CRISPR-associated protein (CAS protein) and a guide RNA (gRNA) capable of specifically binding to a binding site of a target dsDNA. In some embodiments, the guide sequence is modified in a manner that allows formation of a CRISPR Cas complex and successful binding to the binding site, while not allowing successful nuclease activity. Such modified guide sequences are referred to as "dead guides" or "dead guide sequences". In terms of nuclease activity, these dead guidance or dead guidance sequences may be considered catalytically inactive or conformationally inactive. The programmable DNA-binding unit can comprise a functional Cas protein and a guide RNA (gRNA) or crRNA, wherein the gRNA or crRNA comprises a dead guide sequence, whereby the gRNA is capable of hybridizing to the selected binding site such that the Cas protein is directed to the selected binding site without detecting the cleavage activity of the non-mutant Cas protein. The ability of the dead guidance sequence to direct sequence-specific binding of the CRISPR complex to the binding site can be assessed by any suitable assay. The dead guide sequence may generally be shorter than the corresponding guide sequence that results in active cleavage. In particular embodiments, the death guidance is 5%, 10%, 20%, 30%, 40%, 50% shorter than the corresponding guidance for the same sequence.
Protein component
The programmable DNA binding unit may comprise a protein component capable of specifically binding to a binding site on the target dsDNA. The protein component may include an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof. In some embodiments, the programmable DNA binding unit does not have a nuclease domain. In some embodiments, the programmable DNA binding unit has a nuclease domain that has been rendered catalytically inactive by one or more mutations. Methods for catalyzing inactivating mutations and assessing nuclease activity of the mutants are known to those skilled in the art.
Transcription activator-like effector (TALE)
The programmable DNA binding unit may comprise an endonuclease-deficient transcription activator-like effector nuclease (TALEN), a functional fragment thereof, or a variant thereof. Transcription activator-like effectors (TALEs) can be engineered to bind to virtually any desired DNA sequence. For example, it can be found in Cerak T.Doyle EL.Christian M.Wang L.Zhang Y.Schmidt C et al Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting.nucleic Acids Res.2011;39:e82; zhang F.Cong L.Lodato S.Kosuri S.Church GM.Arlotta PEfficient construction of sequence-specific TAL effectors for modulating mammalian transmission.Nat Biotechnol.2011;29:149-153, and U.S. Pat. Nos. 8,450,471, 8,440,431 and 8,440,432, all of which are specifically incorporated by reference, find exemplary methods of targeting using the TALEN system.
The programmable DNA binding unit may comprise a TALE polypeptide. TALEs are transcription factors from the plant pathogen Xanthomonas (Xanthomonas) that can be easily engineered to bind new DNA targets. In some embodiments provided herein, TALEs are not linked to the catalytic domain of an endonuclease (e.g., fokl). In some embodiments provided herein, the programmable DNA binding unit may comprise a TALEN, wherein the endonuclease domain is catalytically inactive. TALE polypeptides comprise a nucleic acid binding domain consisting of tandem repeats of highly conserved monomeric polypeptides, which are predominantly 33, 34 or 35 amino acids in length and differ from each other predominantly in amino acid positions 12 and 13. As used herein, the term "polypeptide monomer" or "TALE monomer" will be used to refer to a highly conserved repeat polypeptide sequence within the TALE nucleic acid binding domain, and the term "repeat variable diradicals" or "RVDs" will be used to refer to highly variable amino acids at positions 12 and 13 of the polypeptide monomer. TALE monomers have nucleotide binding affinities that are determined by the identity of the amino acids in their RVDs. For example, a polypeptide monomer with RVD NI preferentially binds adenine (a), a polypeptide monomer with RVD NG preferentially binds thymine (T), a polypeptide monomer with RVD HD preferentially binds cytosine (C), and a polypeptide monomer with RVD NN preferentially binds adenine (a) and guanine (G). In another embodiment provided herein, the polypeptide monomer of RVD IG preferentially binds T. Thus, the number and order of polypeptide monomer repeats in the nucleic acid binding domain of TALE determines its nucleic acid target specificity. In some embodiments, the polypeptide monomer of RVD NS recognizes all four base pairs and can bind A, T, G or C. TALE has the structure and function described, for example, in Moscou et al, science 326:1501 (2009); boch et al, science 326:1509-1512 (2009); and Zhang et al, nature Biotechnology 29:149-153 (2011), each of which is incorporated herein by reference in its entirety. The programmable DNA binding unit may comprise a polypeptide monomer repeat designed to target a particular nucleic acid sequence.
As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), TALE polypeptide binding efficiency can be increased by including an amino acid sequence from the "capping region" directly N-terminal or C-terminal to the DNA binding region of a naturally occurring TALE into the N-terminal or C-terminal position of the engineered TALE DNA binding region in the engineered TALE. Thus, in certain embodiments, a TALE polypeptide described herein further comprises an N-terminal capping region and/or a C-terminal capping region.
As used herein, the predetermined "N-terminal" to "C-terminal" direction of the N-terminal capping region, the DNA binding domain comprising the repeat TALE monomer, and the C-terminal capping region provide a structural basis for the organization of the different domains in a TALE or polypeptide provided herein.
The complete N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Thus, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.
In certain embodiments, a TALE polypeptide described herein comprises an N-terminal capping region fragment comprising at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, or 270 amino acids of an N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acid belongs to the C-terminus of the N-terminal capping region (proximal to the DNA binding region). As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), N-terminal capping region fragments comprising the C-terminal 240 amino acids have enhanced binding activity equal to the full-length capping region, while fragments comprising the C-terminal 147 amino acids retain greater than 80% of the full-length capping region and fragments comprising the C-terminal 117 amino acids retain greater than 50% of the full-length capping region activity.
In some embodiments, a TALE polypeptide described herein comprises a C-terminal capping region fragment comprising at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of the C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acid is N-terminal to the C-terminal capping region (proximal to the DNA binding region). As described in Zhang et al, nature Biotechnology, 29:149-153 (2011), the C-terminal capping region fragment comprising the C-terminal 68 amino acids had enhanced binding activity equal to the full-length capping region, while the fragment comprising the C-terminal 20 amino acids retained greater than 50% of the full-length capping region.
Zinc Finger (ZF) proteins
The programmable DNA binding unit may comprise a Zinc Finger (ZF) nuclease, a functional fragment thereof, or a variant thereof. The programmable DNA binding unit may comprise an endonuclease-deficient ZF nuclease, a functional fragment thereof, or a variant thereof, wherein the domain of the endonuclease (e.g., fokl) is catalytically inactive or absent. The programmable DNA binding unit may comprise a ZF protein (ZFP). ZFP may be engineered to bind to a selected target site. See, for example, beerli et al (2002) Nature Biotechnol.20:135-141; pabo et al (2001) Ann.Rev.biochem.70:313-340; isalan et al (2001) Nature Biotechnol.19:656-660; segal et al (2001) curr.Opin.Biotechnol.12:632-637; choo et al (2000) curr.Opin.struct biol.10:411-416; U.S. Pat. nos. 6,453,242; 6,534,261; 6,599,692; 6,503,717; 6,689,558; 7,030,215; 6,794,136; 7,067,317; 7,262,054; 7,070,934; 7,361,635; 7,253,273; and U.S. patent publication 2005/0064474; 2007/0218528; 2005/0267061. ZFP may comprise an array of ZF modules that target the desired DNA binding site. Each finger module in the ZF array can target three DNA bases. Custom arrays of individual zinc finger domains can be assembled into ZFPs.
Meganuclease
The programmable DNA binding unit may be an endonuclease-deficient meganuclease, a functional fragment thereof, or a variant thereof. The DNA binding domain of meganuclease may have a double-stranded DNA target sequence of 12 to 45 bp. In some embodiments, meganucleases are dimerases, wherein each meganuclease domain is located on a monomer, or a monomeric enzyme comprising two domains on a single polypeptide. Protein engineering has produced not only wild-type meganucleases, but also various meganuclease variants to cover a myriad of unique sequence combinations. In some embodiments, can also use with meganuclease A half and protein B half of the site consisting of recognition sites chimeric meganuclease. Specific examples of such chimeric meganucleases include the protein domains of I-Dmo I and I-CreI. Examples of meganucleases include homing endonucleases from the LAGLIDADG family. "LAGLIDADG meganuclease" refers to a homing endonuclease from the LAGLIDADG family as defined by Stoddard et al (Stoddard, 2005) or an engineered variant comprising a polypeptide having at least 80%, 85%, 90%, 95%, 97.5%, 99% or more identity or similarity to said native homing endonuclease. Such engineered LAGLIDADG meganucleases can be derived from monomeric or dimeric meganucleases. When derived from a dimer meganuclease, such an engineered LAGLIDADG meganuclease may be a single-stranded or a dimer endonuclease. Meganucleases can be targeted to specific sequences by modifying their recognition sequences using techniques well known to those skilled in the art. See, e.g., epinat et al 2003,Nuc.Acid Res, 31 (l l): 2952-62 and Stoddard,2005,Quarterly Review of Biophysics,pp.1-47.
The LAGLIDADG meganuclease can be I-SceI, I-ChuI, I-CreI, I-CsmI, PI-SceI, PI-TliI, PI-MtuI, I-CeuI, I-SceII I-SceIII, HO, PI-CivI, PI-CtrI, PI-AaeI, PI-BsiI, PI-DhaI, PI-DraI PI-MavI, PI-MchI, PI-MfuI, PI-MflI, PI-MgaI PI-MgoI, PI-MinI, PI-MKAI, PI-MKEI, PI-MKHI, PI-MsmI, PI-Mthi, PI-MtuI, PI-MxeI, PI-NpuI, PI-PfuI, PI-RmaI, PI-SpbI, PI-SspI, PI-FacI, PI-MjaI, PI-PhoI, PI-TagI, PI-Thyl, PI-TkoI, PI or I-MsoI; or may be a functional mutant or variant thereof, whether homodimeric, heterodimeric or monomeric. In some embodiments, the LAGLIDADG meganuclease is an I-CreI derivative. In some embodiments, the LAGLIDADG meganuclease has at least 80% similarity to the native I-CreI LAGLIDADG meganuclease. In some embodiments, the LAGLIDADG meganuclease has at least 80% similarity to residues 1-152 of the native I-CreI LAGLIDADG meganuclease. In some embodiments, the LAGLIDADG meganuclease may be composed of two monomers that have at least 80% similarity with residues 1-152 of the natural I-CreI LAGLIDADG meganuclease linked together, with or without a linker peptide.
Argonaute protein
In some embodiments, the programmable DNA binding unit comprises Argonaute without nuclease activity. In some embodiments, the programmable DNA binding unit comprises an Argonaute protein (NgAgo) from a saline-alkali bacillus griseus (Natronobacterium gregoryi), a functional fragment thereof, or a variant thereof. NgAgo is a ssDNA-directed endonuclease. NgAgo binds 5' phosphorylated ssDNA (gDNA) of about 24 nucleotides, directs it to its target site, and will double-strand break the DNA at the gDNA site. In some embodiments, the programmable DNA binding unit comprises NgAgo (dNgAgo) with no nuclease activity. Characterization and use of NgAgo has been described in Gao et al, nat biotechnol, epub 2016May 2.PubMed PMID:27136078; swarts et al, nature.507 (7491) (2014): 258-61; and Swarts et al, nucleic Acids Res.43 (10) (2015): 5120-9, each of which is incorporated herein by reference. The NgAgo-based programmable DNA binding unit may comprise at least one guide DNA element or a nucleic acid comprising a nucleic acid sequence encoding a guide DNA element and achieve specific targeting or recognition of the binding site by base pairing directly with the DNA of the binding site. Prokaryotic homologs of the Argonaute protein are known and have been described, for example, in Makarova K. Et al, "Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements", biol. Direct.2009Aug.25; described in 4:29.Doi:10.1186/1745-6150-4-29, which is incorporated herein by reference. In some embodiments, the programmable DNA binding unit is a Marinitoga piezophila Argunaute (MpAgo) protein, functional fragment thereof, or variant thereof.
Recombinant enzyme
In some embodiments, the programmable DNA binding unit comprises a recombinase configured to bind to a binding site on the target dsDNA. Site-specific recombinases are well known in the art and may be generally referred to as invertases, resolvers, or integrases. Non-limiting examples of site-specific recombinases include, but are not limited to: lambda integrase, cre, int, IHF, xis, flp, fis, hin, gin, phiC31, cin, tn3 resolvase, tndX, xerC, xerD, tnpX, hjc, gin, spCCEl and ParA.
Joint
The transposomes may be bound to the programmable DNA binding unit by a linker linking the transposase and the dCAS protein. The linker may comprise a peptide linker, a chemical linker, or both. The transposase may be present as a fusion protein comprising a dCAS protein. The transposomes may be bound to the programmable DNA binding unit by a linker linking the transposase and the protein component. The peptide linker may comprise more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof. The peptide linker may include a GS linker. The peptide linker can be an XTEN linker. The protein component may be present as a fusion protein comprising a transposase. The term "linker" as used herein refers to a molecule that facilitates interactions between molecules or molecular moieties. In some embodiments, the linker is a polypeptide linker. In some embodiments, the linker is a chemical linker. The term "peptide linker" or "polypeptide linker" as used herein refers to a peptide or polypeptide comprising two or more amino acid residues joined by peptide bonds. Such peptide or polypeptide linkers are well known in the art. The linker may include naturally occurring and/or non-naturally occurring peptides or polypeptides. The linker may be associated with the C-terminus and/or the N-terminus of the transposase and/or the programmable DNA binding unit.
The linker may be a chemical linker or a peptide linker. Thus, embodiments relate to polypeptides conjugated to other molecules via peptide bonds and polypeptides conjugated to other molecules via chemical conjugation.
Peptide linkers with a certain flexibility may be used. The peptide linker may have virtually any amino acid sequence, bearing in mind that a suitable peptide linker will have a sequence that results in a generally flexible peptide. The use of small amino acids such as glycine and alanine can be used to produce flexible peptides. The creation of such sequences is routine to those skilled in the art.
Suitable linkers can be readily selected and can have any suitable length, for example from 1 amino acid (e.g., gly) to 50 amino acids, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 amino acids, or a number or range between any two of these values (or any derivable range therein).
The preferred peptide linker sequences adopt a flexible extended conformation and do not exhibit a tendency to form ordered secondary structures. In certain embodiments, the linker may be a chemical moiety, which may be a monomer, dimer, multimer, or polymer. Preferably, the linker comprises an amino acid. Typical amino acids in flexible linkers include Gly, asn and Ser. Thus, in particular embodiments, the linker comprises a combination of one or more of Gly, asn, and Ser amino acids. Other near neutral amino acids, such as Thr and Ala, may also be used in the linker sequence. Exemplary flexible linkers include glycine polymer (G) n (SEQ ID NO: 32), glycine-serine polymers (including, for example, (GS) n (SEQ ID NO: 33), (GSGGS) n (SEQ ID NO: 34), (G4S) n (SEQ ID NO: 35) and (GGGS) n (SEQ ID NO: 36), where n is an integer of at least 1. In some embodiments, n is at least, up to or just 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 (or any derivable range therein), glycine-alanine polymers, alanine-serine polymers and other flexible linkers are known in the art. Glycine and glycine-serine polymers may be used, gly and Ser are relatively unstructured and thus may be used as neutral tethers between components. Glycine polymers may be used, glycine may even acquire more phi-psi space than alanine and be less restricted than side chains. Exemplary spacers may contain amino acid sequences, including but not limited to amino acid sequences such as GGG 37, SG (SG) and G.8, 9 or 10 (or any derivable range therein), GSID sequences such as those shown in SEQ ID NO:40, GSID G (GSID NO: 40), GSID sequence (GSID sequence of GSID NO: 40) may vary, without significantly affecting the function or activity of the fusion protein (see, e.g., U.S. patent No. 6,087,329). In some embodiments, the linker may be at least, up to, or exactly 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 amino acid residues (or any range derivable therein).
In some embodiments, the polypeptide linker is an XTEN linker. In some embodiments, the linker is an XTEN linker or a variant of an XTEN linker, e.g., SGSETPGTSESA (SEQ ID NO: 43), SGSETPGTSESATPES (SEQ ID NO: 44) or SGSETPGTSESATPEGGSGGS (SEQ ID NO: 45). XTEN linkers are described, for example, in Schellenberger et al (2009), nature Biotechnology 27:1186-1190, the entire contents of which are incorporated herein by reference.
Suitable linkers for use in the methods provided herein are well known to those skilled in the art and include, but are not limited to, straight or branched chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. However, as used herein, the linker may also be a covalent bond (carbon-carbon bond or carbon-heteroatom bond). In certain embodiments, the linker is used to separate the transposome and the programmable DNA binding unit a sufficient distance to ensure that each protein retains its desired functional properties.
The linker can be used to fuse two protein partners to form a fusion protein. A "linker" may be a chemical group or molecule that connects two molecules or moieties, e.g., two domains of a fusion protein. Typically, a linker is located between (flanking) two groups, molecules, domains or other moieties, and is attached to each group by a covalent bond, thereby linking the two. In some embodiments, the linker is an amino acid or more than one amino acid (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, a group, a polymer (e.g., a non-natural polymer, a non-peptide polymer), or a chemical moiety. In some embodiments, the linker includes a direct bond or atom, such as oxygen (O) or sulfur (S), units such as-NR-, wherein R is hydrogen or alkyl, -C (O) -, -C (O) O-, -C (O) NH-, SO 2 、-SO 2 NH-or chain of atoms, e.g. substituted or unsubstituted alkyl, substituted or unsubstitutedSubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, heteroarylalkyl. In some embodiments, one or more methylene groups in the atomic chain may be replaced by O, S, S (O), SO 2 、-SO 2 NH-、-NR-、-NR 2 -C (O) -, -C (O) O-, -C (O) NH-, a cleavable linking group, a substituted or unsubstituted aryl, a substituted or unsubstituted heteroaryl, and a substituted or unsubstituted heterocycle. Examples of linkers may also include chemical moieties and conjugation agents, such as sulfo-succinimidyl derivatives (sulfo-SMCC, sulfo-SMPB), disuccinimidyl lignan (DSS), disuccinimidyl glutarate (DSG), and disuccinimidyl tartrate (DST). Examples of linkers also include linear carbon chains such as CN (where n=l-100 carbon atoms). In some embodiments, the linker may be a dipeptide linker, such as a valine-citrulline (val-cit), phenylalanine-lysine (phe-lys) linker, or a maleimidocaproyl-valine-citrulline-p-aminobenzylcarbonyl (vc) linker. In some embodiments, the linker is sulfosuccinimidyl-4- [ N-maleimidomethyl ] ]Cyclohexane-l-carboxylate (smcc). Sulfo-smcc conjugation occurs through a maleimide group that reacts with a thiol group (thiol, -SH), while its sulfo-NHS ester reacts with a primary amine (as found in lysine and the N-terminus of proteins or peptides). Furthermore, the linker may be maleimide caproyl ester (me). In some embodiments, covalent linkages can be achieved by using Traut reagents.
FIGS. 8-10 depict non-limiting exemplary schematic diagrams of plasmid constructs 3XFlag-Cas9-Fl26-Tn5 (SEQ ID NO: 1), 3XFlag-Cas9-xTen-Tn5 (SEQ ID NO: 2), and pET-Tn5-xTen-dCAs9 (SEQ ID NO: 3), respectively, for use in generating the protein complexes provided herein. The protein complexes, adaptors, programmable DNA binding units and/or transposases disclosed herein can be encoded by nucleotide sequences that are at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% identical to the protein complexes, adaptors, programmable DNA binding units and/or transposases encoded in SEQ ID NOs 1-3 or a range between any two of these values.
Amplification of
The method may include: more than one dsDNA fragment is amplified with primers capable of binding to adaptors at the ends of the dsDNA fragments. Amplification may yield nucleic acid amplification products. The nucleic acid amplification products may constitute a library (e.g., a sequencing library). Each primer can be about 5-80 (e.g., about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80 nucleotides in length or a number or range between any two of these values) nucleotides in length. More than one dsDNA fragment can be amplified using Polymerase Chain Reaction (PCR) primers. PCR can be loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA). The PCR may be real-time PCR or quantitative real-time PCR (QRT-PCR).
As used herein, nucleic acid amplification can refer to any known procedure that uses sequence-specific methods to obtain more than one copy of a target nucleic acid sequence or its complement or fragment thereof. Examples of known amplification methods include, but are not limited to, polymerase Chain Reaction (PCR), ligase Chain Reaction (LCR), loop-mediated isothermal amplification (LAMP), strand Displacement Amplification (SDA) (e.g., multiple Displacement Amplification (MDA)), replicase-mediated amplification, immune amplification, nucleic acid sequence-based amplification (NASBA), self-sustained sequence replication (3 SR), rolling circle amplification, and transcription-mediated amplification (TMA). In some embodiments, two or more of the above-described nucleic acid amplification methods may be performed, for example, sequentially.
For example, LCR amplification uses at least four separate oligonucleotides to amplify a target and its complementary strand using more than one cycle of hybridization, ligation, and denaturation. SDA is amplified by using primers that contain recognition sites for restriction endonucleases that nick one strand of a DNA duplex that includes semi-modification of the target sequence, followed by amplification in a series of primer extension and strand displacement steps.
PCR is a well known method in the art for nucleic acid amplification. PCR involves amplifying a target sequence using two or more extendible sequence-specific oligonucleotide primers flanking the target sequence. In the presence of primers, thermostable DNA polymerase (e.g., taq polymerase), and various dntps, a nucleic acid comprising a target sequence of interest is subjected to multiple thermal cycling (denaturation, annealing, and extension) procedures, resulting in amplification of the target sequence. PCR uses multiple rounds of primer extension reactions in which complementary strands of a defined region of a DNA molecule are simultaneously synthesized by a thermostable DNA polymerase. At the end of each cycle, each newly synthesized DNA molecule acts as a template for the next cycle. During the repeated rounds of these reactions, the number of newly synthesized DNA strands increases exponentially, so that after 20 to 30 reaction cycles, the original template DNA will be replicated thousands or millions of times.
PCR can produce double stranded amplification products suitable for post amplification processing. If desired, the amplified product may be detected by agarose gel electrophoresis visualization, by an enzyme immunoassay format using probe-based colorimetric detection, by fluorescence emission techniques, or by other detection means known to those skilled in the art.
Examples of PCR methods include, but are not limited to, real-time PCR, end-point PCR, amplified fragment length polymorphism PCR (AFLP-PCR), alu-PCR, asymmetric PCR, colony PCR, DD-PCR, degenerate PCR, hot start PCR, in situ PCR, inverse PCR, long PCR (Long-PCR), multiplex PCR, nested PCR, PCR-ELISA, PCR-RFLP, PCR-single strand conformation polymorphism (PCR-SSCP), quantitative competitive PCR (QC-PCR), cDNA end rapid amplification PCR (RACE-PCR), polymorphic DNA random amplification PCR (RAPD-PCR), real-time PCR, repeated gene foreign palindromic PCR (Rep-PCR), reverse transcriptase PCR (RT-PCR), TAIL-PCR, touchdown PCR (touchdown PCR), and Vectotte PCR.
Real-time PCR, also known as real-time quantitative polymerase chain reaction (QRT-PCR), can be used to simultaneously quantify and amplify specific parts of a given nucleic acid molecule. It can be used to determine whether a particular sequence is present in a sample; and if it is present, determining the copy number of the sequence present. The term "real-time" may refer to periodic monitoring during PCR. Certain systems, such as the ABI 7700 and 7900HT sequence detection systems (Applied Biosystems, foster City, CA), monitor at predetermined or user-defined points during each thermal cycle. Real-time PCR analysis with Fluorescence Resonance Energy Transfer (FRET) probes measures the change in fluorescent dye signal cycling to cycle, preferably subtracting any internal control signal. Real-time procedures follow the general pattern of PCR, but nucleic acids are quantified after each round of amplification. Two examples of quantification methods are the use of fluorescent dyes (e.g., SYBRGreen) that intercalate double stranded DNA and modified DNA oligonucleotide probes that fluoresce when hybridized to complementary DNA. Intercalators have relatively low fluorescence when unbound and relatively high fluorescence when bound to double stranded nucleic acids. Thus, intercalators can be used to monitor the accumulation of double stranded nucleic acid during a nucleic acid amplification reaction. Examples of such non-specific dyes that may be used in the embodiments disclosed herein include intercalators such as SYBR Green I (Molecular Probes), propidium iodide, ethidium bromide, and the like.
Marking
The methods described herein may include: one or both ends of one or more of the more than one dsDNA fragments are labeled (e.g., with a detectable label). The method may include: two ends of one or more of the more than one dsDNA fragments are differentially labeled. The labeling can include labeling with a detectable label (e.g., anionic label, cationic label, neutral label, electrochemical label, protein label, fluorescent label, magnetic label, or a combination thereof). The method may include: enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments. The method may include monitoring (e.g., chemical monitoring) of the detectable label.
In some embodiments, the detectable moiety (e.g., a detectable label) comprises an optical moiety, a luminescent moiety, an electrochemically active moiety, a nanoparticle, or a combination thereof. In some embodiments, the luminescent moiety comprises a chemiluminescent moiety, an electroluminescent moiety, a photoluminescent moiety, or a combination thereof. In some embodiments, the photoluminescent moiety comprises a fluorescent moiety, a phosphorescent moiety, or a combination thereof. In some embodiments, the fluorescent moiety comprises a fluorescent dye. In some embodiments, the nanoparticle comprises a quantum dot. In some embodiments, the method comprises performing a reaction to convert a precursor of the detectable moiety to the detectable moiety. In some embodiments, performing the reaction to convert the precursor of the detectable moiety to the detectable moiety comprises contacting the precursor of the detectable moiety with a substrate. In some such embodiments, contacting the precursor of the detectable moiety with the substrate produces a detectable by-product of the reaction between the two molecules.
Detection and quantification of amplification products
Some methods provided herein comprise amplifying more than one dsDNA fragment to produce a nucleic acid amplification product. The methods described herein may also include detecting and/or quantifying a nucleic acid amplification product or a product thereof. The amplification product or products thereof may be detected and/or quantified by any suitable detection and/or quantification method, including, for example, any of the detection methods or quantification methods described herein. Non-limiting examples of detection and/or quantification methods include molecular beacons (e.g., real-time, end-point), lateral flow, fluorescence Resonance Energy Transfer (FRET), fluorescence Polarization (FP), surface capture, 5 'to 3' exonuclease hydrolysis probes (e.g., TAQMAN), intercalating/binding dyes, absorbance methods (e.g., colorimetry, nephelometry), electrophoresis (e.g., gel electrophoresis, capillary electrophoresis), mass spectrometry, nucleic acid sequencing, digital amplification, primer extension methods (e.g., iPLEX TM ) From Affymetrix Molecular Inversion Probe (MIP) techniques, restriction Fragment Length Polymorphism (RFLP) analysis, allele Specific Oligonucleotide (ASO) analysis, methylation Specific PCR (MSPCR), pyrosequencing analysis, acycloprime analysis, reverse dot blot, geneChip microarray, dynamic Allele Specific Hybridization (DASH), peptide Nucleic Acid (PNA) and Locked Nucleic Acid (LNA) probes, alphaScreen, SNPstream, genetic Bit Analysis (GBA), multiplex micro-sequencing, SNaPshot, GOOD assays, microarray miniseq, array Primer Extension (APEX), microarray primer extension, tag arrays, encoding microspheres, template Directed Incorporation (TDI), colorimetric Oligonucleotide Ligation Assay (OLA), sequence encoding OLA, microarray ligation, ligase chain reaction, padlock probes, invader assay (invader assay), hybridization using at least one probe, hybridization using at least one fluorescent labeled probe, cloning and sequencing, use of hybridization probes and quantitative real-time polymerase chain reaction (QRT-PCR), nanopore, sequencing chips, and combinations thereof. Detecting nucleic acid amplification products may include using real-time detection methods (i.e., detecting and/or continuously monitoring the products during the amplification process), using endpoint detection methods (i.e., detecting the products after completion or cessation of the amplification process), or both. The nucleic acid detection method may also use labeled nucleotides that are directly incorporated into the target sequence or into probes containing the target complementary sequence. Such labels may be radioactive and/or fluorescent in nature and may be resolved in any of the ways discussed herein. In some embodiments, quantification of nucleic acid amplification products is achieved using one or more of the following detection methods. The detection method may be used in combination with measurement of signal intensity and/or generation of a standard curve and/or a look-up table for quantification of nucleic acid amplification products (or reference).
Detecting the nucleic acid amplification product may include using molecular beacon techniques. The term molecular beacon generally refers to a detectable molecule, wherein the detectable property of the molecule is detectable under certain conditions, thereby enabling the molecule to function as a specific and informative signal. Non-limiting examples of detectable properties include optical properties (e.g., fluorescence), electrical properties, magnetic properties, chemical properties, and time or speed of passage through an opening of known size. The molecular beacon for detecting a nucleic acid molecule may be, for example, a hairpin oligonucleotide that contains a fluorophore at one end and a quenching dye at the other end. The loop of the hairpin may comprise a probe sequence complementary to the target sequence, while the stem is formed by annealing of complementary arm sequences located on either side of the probe sequence. The fluorophore and quencher molecule may be covalently linked at opposite ends of each arm. Under conditions that prevent hybridization of the oligonucleotide to its complementary target, or when the molecular beacon is free in solution, the fluorescent molecule and the quencher molecule are in proximity to each other, thereby preventing FRET. When a molecular beacon encounters a target molecule (e.g., a nucleic acid amplification product), hybridization can occur and the ring structure is converted to a stable, more rigid conformation, resulting in separation of the fluorophore and quencher molecules, thereby producing fluorescence. Due to the specificity of the probe, fluorescence is usually generated entirely due to the synthesis of the desired amplification product. In some embodiments, the molecular beacon probe sequence hybridizes to a sequence in the amplification product that is identical or complementary to a sequence in the target nucleic acid. In some embodiments, the molecular beacon probe sequence hybridizes to a sequence in the amplification product that is not identical or complementary to a sequence in the target nucleic acid (e.g., hybridizes to a sequence added to the amplification product by a tailed amplification primer or ligation). Molecular beacons can be synthesized with different colored fluorophores and different target sequences, enabling the simultaneous detection of several products in the same reaction (e.g., in multiple reactions). For quantitative amplification processes, molecular beacons can specifically bind to amplified targets after each amplification cycle, and because non-hybridized molecular beacons are dark, it is not necessary to isolate probe-target hybrids to quantitatively determine the amount of amplified product. The signal generated is proportional to the amount of amplified product. Detection using molecular beacons may be done in real time or as an endpoint detection method.
Detecting nucleic acid amplification products can include the use of lateral flow, which generally includes a solid phase fluid-permeable flow path through which fluid flows by capillary forces. Example devices include, but are not limited to, dipstick assays and thin layer chromatography plates with various suitable coatings. Immobilized on the flow path are various binding reagents for the sample, binding partners or conjugates involving binding partners for the sample and the signal generating system. Detection can be achieved in several ways, including, for example, enzymatic detection, nanoparticle detection, colorimetric detection, and fluorescent detection.
Detecting the nucleic acid amplification product may include using FRET, which is an energy transfer mechanism between two chromophores: donor and acceptor molecules. Briefly, a donor fluorophore molecule is excited at a specific excitation wavelength. When the donor molecule returns to the ground state, subsequent emission of the donor molecule can transfer excitation energy to the acceptor molecule through long Cheng Ouji-dipole interactions. The emission intensity of the acceptor molecule can be monitored and varies with the distance between the donor and acceptor, the overlap of the donor emission spectrum and acceptor absorption spectrum, and the orientation of the donor emission dipole moment and acceptor absorption dipole moment. FRET can be used to quantify molecular dynamics, for example, in DNA-DNA interactions described for molecular beacons. To monitor the production of a particular product, the probe may be labeled with a donor molecule at one end and an acceptor molecule at the other end. Probe-target hybridization results in a change in the distance or orientation of the donor and acceptor, and FRET changes are observed.
Detection of nucleic acid amplification products involves the use of FP, which is generally based on the principle that a fluorescently labeled compound will emit fluorescence with a degree of polarization inversely proportional to its rotation rate when excited by linearly polarized light. Thus, when a molecule with a fluorescent label, such as a tracer-nucleic acid conjugate, is excited by linearly polarized light, the emitted light remains highly polarized because the fluorophore is restricted from rotating between the time the light is absorbed and emitted. When a free tracer compound (i.e., not bound to a nucleic acid) is excited by linearly polarized light, it rotates much faster than the corresponding tracer-nucleic acid conjugate and the molecules are more randomly oriented, and therefore, the emitted light is depolarized. Thus, fluorescence polarization provides a quantitative method for measuring the amount of tracer-nucleic acid conjugate produced in an amplification reaction.
Detection of nucleic acid amplification products involves the use of surface capture, which can be achieved by immobilizing specific oligonucleotides to a surface, resulting in a highly sensitive and selective biosensor.
Detecting the nucleic acid amplification product may include using a 5 'to 3' exonuclease hydrolysis probe (e.g., TAQMAN). For example, TAQMAN probes are hydrolysis probes that can increase the specificity of a quantitative amplification method (e.g., quantitative PCR). The TAQMAN probe principle relies on 1) 5 'to 3' exonuclease activity of Taq polymerase to cleave dual labeled probes during hybridization with complementary target sequences and 2) fluorophore-based detection. The fluorescent signal generated allows for quantitative measurement of the accumulation of amplified product during the exponential phase of the amplification.
Detection of nucleic acid amplification products includes the use of intercalating and/or binding dyes, e.g., dyes that are capable of specifically staining nucleic acids. For example, intercalating dyes exhibit enhanced fluorescence upon binding to DNA or RNA. Non-limiting examples of dyes include82. Acridine orange, ethidium bromide, hoechst dye,/->Propidium iodide,>(asymmetric cyanine dye)>II. Toso (thiazole orange dimer) and yoyoyo (oxazole yellow dimer).
Detection of the nucleic acid amplification product includes the use of absorbance methods (e.g., colorimetry, nephelometry). For example, detection and/or quantification of nucleic acids can be accomplished by directly converting absorbance (e.g., UV absorbance measurements at 260 nm) to concentration. Direct measurement of nucleic acids can be converted to concentration using Beer Lambert's law, which uses measured path length and extinction coefficient to relate absorbance to concentration.
Detecting the nucleic acid amplification product may include using electrophoresis (e.g., gel electrophoresis, capillary electrophoresis), mass spectrometry, nucleic acid sequencing, digital amplification (e.g., digital PCR), or any combination thereof.
Genetic features of interest
More than one target dsDNA may include a genetic feature of interest (e.g., a biomarker feature). The genetic feature of interest may include one or more mutations (e.g., biomarkers) of interest. The one or more mutations of interest may include point mutations, inversions, deletions, insertions, translocations, duplications, copy number variations, or combinations thereof. The one or more mutations of interest may include nucleotide substitutions, deletions, insertions, or combinations thereof. The genetic trait of interest may be indicative of antibiotic resistance or antibiotic susceptibility of an organism derived from the target dsDNA. The genetic characteristic of interest may be indicative of the cancer status of the organism from which the target dsDNA originates. The genetic characteristic of interest may be indicative of the status of a genetic disease of the organism from which the target dsDNA originates. The genetic disease may be a monogenic disorder. The genetic disease may be cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis, ma Fanzeng syndrome, tay-saxose disease, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis. Genetic features of interest (e.g., biomarker features) can be detected using the methods and compositions provided herein. Diagnostic evaluation can be performed using the methods and compositions provided herein.
Diagnostic evaluation is performed based on biomarker characteristics (e.g., genetic characteristics of interest), alone or in combination with other evaluations or factors, as described herein. Provided herein are compositions and methods for assessing the risk of developing a disease or condition, prognosing the disease or condition, monitoring the progression or regression of the disease or condition, assessing the efficacy of a treatment, or identifying compounds capable of ameliorating or treating the disease or condition based on a biomarker signature (e.g., a genetic signature of interest).
Diseases and conditions
The methods provided herein can be applied to a variety of diseases or conditions based on the biomarker signature (e.g., genetic signature of interest) associated therewith. Exemplary diseases or conditions having genetic characteristics of interest according to the disclosed compositions and methods include cardiovascular diseases or conditions, kidney-related diseases or conditions, prenatal or pregnancy-related diseases or conditions, neurological or neuropsychiatric diseases or conditions, autoimmune or immune-related diseases or conditions, cancer, infectious diseases or conditions, pediatric diseases, disorders or conditions, mitochondrial diseases, respiratory-gastrointestinal diseases or conditions, reproductive diseases or conditions, ophthalmic diseases or conditions, musculoskeletal diseases or conditions, or dermatological diseases or conditions.
Sample of
The sample may comprise eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. The more than one target dsDNA may comprise genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. The sample may be, or be derived from, a biological sample, a clinical sample, an environmental sample, or a combination thereof. More than one target dsDNA may comprise DNA from at least 2 different organisms. More than one target dsDNA may comprise DNA from at least 2 different genes. The method may include: more than one target dsDNA is produced from more than one target RNA using reverse transcriptase. More than one target dsDNA may comprise target dsDNA produced from a target RNA with a reverse transcriptase. The sample nucleic acid may include eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof. The target dsDNA may be genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof. The sample nucleic acid may be from a biological sample, a clinical sample, an environmental sample, or a combination thereof. The biological sample may include stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.
The nucleic acids used in the methods described herein can be obtained from any suitable biological specimen or sample, and are typically isolated from a sample obtained from a subject. The subject may be any living or non-living organism including, but not limited to, a human, a non-human animal, a plant, a bacterium, a fungus, a virus, or a protozoan. Any human or non-human animal may be selected, including but not limited to mammals, reptiles, birds, amphibians, fish, ungulates, ruminants, bovine (e.g., cattle), equine (e.g., horses), caprine (caprine) and ovine (ovine) (e.g., sheep, goats), porcine (e.g., pigs), camelid (e.g., camels, llamas, alpacas), monkeys, apes (e.g., gorillas, chimpanzees), bear (e.g., bears), poultry, dogs, cats, mice, rats, fish, dolphins, whales and sharks. The subject may be male or female, and the subject may be of any age (e.g., embryo, fetus, infant, child, adult).
The sample or test sample may be any specimen isolated or obtained from a subject or portion thereof. Non-limiting examples of samples include fluids or tissues from a subject, including, but not limited to, blood or blood products (e.g., serum, plasma, etc.), umbilical cord blood, bone marrow, chorionic villus, amniotic fluid, cerebrospinal fluid, spinal fluid, lavage fluid (e.g., bronchoalveolar, stomach, peritoneum, catheter, ear, arthroscope), biopsy samples, laparoscopy samples, cells (e.g., blood cells) or portions thereof (e.g., mitochondria, nuclei, extracts, etc.), washes of the female genital tract, urine, stool, sputum, saliva, nasal mucus, prostatic fluid, lavage fluid, semen, lymph, bile, tears, sweat, breast milk, breast fluid, hard tissue (e.g., liver, spleen, kidney, lung, or ovary), etc., or combinations thereof. The term blood includes whole blood, blood products or any fraction of blood, such as serum, plasma, buffy coat or the like as conventionally defined. Plasma refers to the whole blood fraction produced by centrifugation of blood treated with an anticoagulant. Serum refers to the aqueous portion of the fluid that remains after the blood sample has coagulated. Fluid or tissue samples are typically collected according to standard protocols commonly followed by hospitals or clinics. For blood, an appropriate amount of peripheral blood is typically collected (e.g., between 3-40 milliliters) and may be stored according to standard procedures either before or after preparation.
The sample or test sample may comprise a sample containing spores, viruses, cells, nucleic acids from prokaryotes or eukaryotes, or any free nucleic acid. For example, the methods described herein can be used to detect nucleic acids outside of spores (e.g., without lysis). The sample may be isolated from any material suspected of containing the target sequence, for example from a subject as described above. In some embodiments, the target sequence is present in air, plants, soil, or other material suspected of containing a biological organism.
Nucleic acids may be derived (e.g., isolated, extracted, purified) from one or more sources by methods known in the art. Any suitable method may be used to isolate, extract and/or purify nucleic acids from biological samples, non-limiting examples of which include DNA preparation methods in the art, and various commercially available reagents or kits, such as Qiaamp cycle nucleic acid kit for Qiagen, qiaAmp DNA mini-kit or QiaAmp DNA blood mini-kit (Qiagen, hilden, germany), genomicPrep TM Blood DNA isolation kit (Promega, madison, wis.) and GFX TM Genomic blood DNA purification kits (Amersham, piscataway, NJ), and the like, or combinations thereof.
In some embodiments, a cell lysis procedure is performed. Cell lysis may be performed prior to the initiation of the reactions provided herein. Cell lysis procedures and reagents are known in the art and can generally be performed by chemical (e.g., detergents, hypotonic solutions, enzymatic procedures, etc., or a combination thereof), physical (e.g., french press, sonication, etc.), or electrolytic lysis methods. Any suitable cleavage procedure may be used. For example, chemical methods typically use lysing agents to destroy cells and extract the nucleic acids from the cells, followed by treatment with chaotropic salts. In some embodiments, cell lysis includes the use of detergents (e.g., ionic, nonionic, anionic, zwitterionic). In some embodiments, cell lysis includes the use of an ionic detergent (e.g., sodium Dodecyl Sulfate (SDS), sodium Lauryl Sulfate (SLS), deoxycholate, cholate, sarkosyl). Physical methods such as grinding after freezing/thawing, crushing using cells, and the like may also be useful. High salt lysis procedures may also be used. For example, an alkaline lysis procedure may be used. The latter procedure traditionally involves the use of phenol-chloroform solutions, and alternative phenol-chloroform free procedures involving three solutions may be used. In the latter procedure, a solution may contain 15mM Tris, pH 8.0;10mM EDTA and 100. Mu.g/ml RNase; the second solution may contain 0.2N NaOH and 1% SDS; and the third solution may comprise, for example, 3m koac, ph 5.5. In some embodiments, the cell lysis buffer is used in combination with the methods and components described herein.
Nucleic acids may be provided for performing the methods described herein without processing a sample containing the nucleic acids. For example, in some embodiments, nucleic acids are provided for use in performing the amplification methods described herein without prior nucleic acid purification. In some embodiments, the target sequence is amplified directly from the sample (e.g., without performing any nucleic acid extraction, isolation, purification, and/or partial purification steps). In some embodiments, after processing a sample containing nucleic acids, the nucleic acids are provided for performing the methods described herein. For example, nucleic acids may be extracted, isolated, purified, or partially purified from a sample. The term "isolated" generally refers to a nucleic acid that is removed from its original environment (e.g., natural environment if it is naturally occurring, host cell if it is exogenously expressed), and thus altered from its original environment by human intervention (e.g., "by human hand"). The term "isolated nucleic acid" may refer to a nucleic acid that is removed from a subject (e.g., a human subject). The isolated nucleic acid may provide less non-nucleic acid components (e.g., proteins, lipids, carbohydrates) than are present in the source sample. Compositions comprising isolated nucleic acids may be free of about 50% to greater than 99% of non-nucleic acid components. Compositions comprising isolated nucleic acids may be free of about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% non-nucleic acid components. The term "purified" generally refers to a nucleic acid provided that contains less non-nucleic acid components (e.g., proteins, lipids, carbohydrates) than the amount of non-nucleic acid components present prior to subjecting the nucleic acid to a purification procedure. A composition comprising purified nucleic acid may be free of about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater than 99% of other non-nucleic acid components.
Nucleic acids may be provided for performing the methods described herein without modifying the nucleic acids. Modifications may include, for example, denaturation, digestion, nicking, melting, incorporation and/or ligation of heterogeneous sequences, addition of epigenetic modifications, addition of labels (e.g., radiolabels such as 32 P、 33 P、 125 I or 35 S, S; enzyme labels such as alkaline phosphatase; fluorescent labels such as Fluorescein Isothiocyanate (FITC); or other labels such as biotin, avidin, digoxin, antigen, hapten, fluorescent dye) and the like. Thus, in some embodiments, unmodified nucleic acids are amplified.
The methods of the present disclosure for detecting a target nucleic acid sequence (single-or double-stranded DNA and/or RNA) in a sample can detect the target nucleic acid sequence (e.g., DNA or RNA) with high sensitivity. In some embodiments, the methods of the present disclosure can be used to detect target RNA/DNA present in a sample comprising more than one RNA/DNA (including target RNA/DNA and more than one non-target RNA/DNA), wherein the target RNA/DNA is present at every 10 7 One or more copies of non-target RNA/DNA (e.g., every 10 6 One or more copies of each 10 non-target RNA/DNA 5 One or more copies of each 10 non-target RNA/DNA 4 One or more copies of each 10 non-target RNA/DNA 3 One or more copies of each 10 non-target RNA/DNA 2 One or more copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 50 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 20 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 10 copies of non-target RNA/DNA, or one or more copies of non-target RNA/DNA per 5 copies). In some embodiments, the methods of the present disclosure can be used to detect target RNA/DNA present in a sample comprising more than one RNA/DNA (including target RNA/DNA and more than one non-target RNA/DNA), wherein the target RNA/DNA is present at every 10 18 One or more copies of non-target RNA/DNA (e.g., every 10 15 One or more copies of each 1 of non-target RNA/DNA0 12 One or more copies of each 10 non-target RNA/DNA 9 One or more copies of each 10 non-target RNA/DNA 6 One or more copies of each 10 non-target RNA/DNA 5 One or more copies of each 10 non-target RNA/DNA 4 One or more copies of each 10 non-target RNA/DNA 3 One or more copies of each 10 non-target RNA/DNA 2 One or more copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 50 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 20 copies of non-target RNA/DNA, one or more copies of non-target RNA/DNA per 10 copies of non-target RNA/DNA, or one or more copies of non-target RNA/DNA per 5 copies). As used herein, the terms "RNA/DNA" and "RNAs/DNAs" shall be given their ordinary meaning and shall also refer to DNA, or RNA, or a combination of DNA and RNA.
In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 7 One copy of each non-target RNA/DNA to every 10 copies of non-target RNA/DNA (e.g., every 10 copies of non-target RNA/DNA 7 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 5 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 6 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 6 One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies 6 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 5 One copy of the non-target RNA/DNA,every 10 5 One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies 5 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 5 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA, or every 10 5 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA).
In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 18 One copy of each non-target RNA/DNA to every 10 copies of non-target RNA/DNA (e.g., every 10 copies of non-target RNA/DNA 18 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 15 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 12 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 9 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 5 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 6 One copy of non-target RNA/DNA per 10 6 One copy of each non-target RNA/DNA to every 10 copies of each non-target RNA/DNA, every 10 copies 6 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 5 One copy of non-target RNA/DNA per 10 5 One copy of each non-target RNA/DNA to one copy of each 10 non-target RNA/DNA, each10 5 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 5 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA, or every 10 5 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA).
In some embodiments, the methods of the present disclosure can detect target RNA/DNA present in a sample, wherein the target RNA/DNA is present at every 10 7 One copy of each non-target RNA/DNA to every 100 non-target RNA/DNA copies (e.g., every 10 copies 7 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 5 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 6 One copy of non-target RNA/DNA per 10 7 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 6 One copy of each non-target RNA/DNA to every 100 copies of each non-target RNA/DNA, every 10 copies 6 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA per 10 6 One copy of non-target RNA/DNA to every 10 5 One copy of non-target RNA/DNA per 10 5 One copy of each non-target RNA/DNA to every 100 copies of each non-target RNA/DNA, every 10 copies 5 One copy of non-target RNA/DNA to every 10 2 One copy of non-target RNA/DNA per 10 5 One copy of non-target RNA/DNA to every 10 3 One copy of non-target RNA/DNA, or every 10 5 One copy of non-target RNA/DNA to every 10 4 One copy of non-target RNA/DNA).
In some embodiments, for the methods of the invention for detecting target RNA/DNA in a sample, the detection threshold is 10nM or less. The term "detection threshold" as used herein describes the minimum amount of target RNA/DNA that must be present in a sample in order for detection to occur. Thus, as an illustrative example, when the detection threshold is 10nM, then a signal can be detected when the target RNA/DNA is present in the sample at a concentration of 10nM or higher. In some embodiments, the methods of the present disclosure have a detection threshold of 5nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.5nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.1nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.05nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.01nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.0005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.0001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.00005nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 0.00001nM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 10pM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1pM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 500fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 250fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 100fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 50fM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 500aM (attomole/liter) or less. In some embodiments, the methods of the present disclosure have a detection threshold of 250aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 100aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 50aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 10aM or less. In some embodiments, the methods of the present disclosure have a detection threshold of 1aM or less.
In some embodiments, the detection threshold (for detection of target RNA and/or DNA in the methods of the invention) is in the range of 500fM to 1nM (e.g., 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM) (where concentration refers to a threshold concentration of target RNA/DNA that can detect target RNA/DNA). In some embodiments, the methods of the present disclosure have a detection threshold ranging from 800fM to 100 pM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 1pM to 10 pM. In some embodiments, the methods of the present disclosure have a detection threshold ranging from 10fM to 500fM, for example, 10fM to 50fM, 50fM to 100fM, 100fM to 250fM, or 250fM to 500fM.
In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range from 500fM to 1nM (e.g., 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM). In some embodiments, the minimum concentration at which target RNA/DNA can be detected in the sample is in the range of 800fM to 100 pM. In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range of 1pM to 10 pM.
In some embodiments, the detection threshold (for detection of target RNA/DNA in the methods of the invention) is in the range from 1aM to 1nM (e.g., 1aM to 500pM, 1aM to 200pM, 1aM to 100pM, 1aM to 10pM, 1aM to 1pM, 100aM to 1nM, 100aM to 500pM, 100aM to 200pM, 100aM to 100pM, 100aM to 10pM, 100aM to 1pM, 250aM to 1nM, 250aM to 500pM, 250aM to 200pM, 250aM to 100pM, 250aM to 10pM, 250aM to 1pM, 500aM to 1nM, 500aM to 500pM, 500aM to 200pM, 500aM to 100pM, 750aM to 1nM, 750aM to 500pM, 750aM to 200pM, 750aM to 100pM, 750aM to 10pM 750aM to 1pM, 1fM to 1nM, 1fM to 500pM, 1fM to 200pM, 1fM to 100pM, 1fM to 10pM, 1fM to 1pM, 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM) (where concentration refers to a threshold concentration of target RNA/DNA at which target RNA/DNA can be detected). In some embodiments, the methods of the present disclosure have detection thresholds ranging from 1aM to 800 aM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 50aM to 1 pM. In some embodiments, the methods of the present disclosure have detection thresholds ranging from 50aM to 500 fM.
In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range from 1aM to 1nM (e.g., 1aM to 500pM, 1aM to 200pM, 1aM to 100pM, 1aM to 10pM, 1aM to 1pM, 100aM to 1nM, 100aM to 500pM, 100aM to 200pM, 100aM to 100pM, 100aM to 10pM, 100aM to 1pM, 250aM to 1nM, 250aM to 500pM, 250aM to 200pM, 250aM to 100pM, 250aM to 10pM, 250aM to 1pM, 500aM to 1nM, 500aM to 500pM, 500aM to 200pM, 500aM to 100pM, 500aM to 10pM, 500aM to 1pM, 750aM to 1nM, 750aM to 500pM, 750aM to 200pM 750aM to 100pM, 750aM to 10pM, 750aM to 1pM, 1fM to 1nM, 1fM to 500pM, 1fM to 200pM, 1fM to 100pM, 1fM to 10pM, 1fM to 1pM, 500fM to 500pM, 500fM to 200pM, 500fM to 100pM, 500fM to 10pM, 500fM to 1pM, 800fM to 1nM, 800fM to 500pM, 800fM to 200pM, 800fM to 100pM, 800fM to 10pM, 800fM to 1pM, 1pM to 1nM, 1pM to 500pM, 1pM to 200pM, 1pM to 100pM, or 1pM to 10 pM). In some embodiments, the minimum concentration of target RNA/DNA can be detected in the sample in the range of 1aM to 500 pM. In some embodiments, the minimum concentration at which target RNA/DNA can be detected in the sample is in the range of 100aM to 500 pM.
In some embodiments, the disclosed compositions or methods exhibit an attomole per liter (aM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit femtomolar (fM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit picomolar (pM) detection sensitivity. In some embodiments, the disclosed compositions or methods exhibit nanomolar/liter (nM) detection sensitivity.
The disclosed samples include sample nucleic acids (e.g., more than one sample nucleic acid). The term "more than one" is used herein to mean two or more. Thus, in some embodiments, the sample comprises two or more (e.g., 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more) sample nucleic acids (e.g., RNAs). The disclosed methods can be used as very sensitive methods for detecting the presence of a target nucleic acid in a sample (e.g., in a complex mixture of nucleic acids such as RNA). In some embodiments, the sample comprises 5 or more DNAs (e.g., 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more DNAs) that differ in sequence from one another. In some embodiments, the sample comprises 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 10 3 One or more, 5X 10 3 One or more of 10 4 One or more, 5X 10 4 One or more of 10 5 One or more, 5X 10 5 One or more of 10 6 One or more, 5X 10 6 One or more or 10 7 One or more DNA. In some embodiments, the sample comprises 10 to 20, 20 to 50, 50 to 100, 100 to 500, 500 to 10 3 Seed, 10 3 Seed to 5x 10 3 Seed, 5x 10 3 Seed to 10 4 Seed, 10 4 Seed to 5x 10 4 Seed, 5x 10 4 Seed to 10 5 Seed, 10 5 Seed to 5x 10 5 Seed, 5x 10 5 Seed to 10 6 Seed, 10 6 Seed to 5x 10 6 Seed, or 5x 10 6 Seed to 10 7 Species, or more than 10 7 A DNA. In some embodiments, the sample comprises 5 to 10 7 Species RNAs (e.g., RNAs that differ from each other in sequence) (e.g., from 5 to 10 6 Seed, from 5 to 10 5 Seed, from 5 to 50,000, from 5 to 30,000, from 10 to 10 6 Seed, from 10 to 10 5 Seed, from 10 to 50,000, from 10 to 30,000, from 20 to 10 6 Seed, from 20 to 10 5 Species, from 20 to 50,000, or from 20 to 30,000 RNAs). In some embodiments, the sample comprises 20 or more RNAs that differ in sequence from one another. In some embodiments, the sample comprises RNA from a cell lysate (e.g., eukaryotic cell lysate, mammalian cell lysate, human cell lysate, prokaryotic cell lysate, plant cell lysate, etc.). For example, in some embodiments, the sample comprises DNA from a cell, such as a eukaryotic cell, e.g., a mammalian cell, e.g., a human cell.
The term "sample" as used herein shall have its ordinary meaning and shall include any sample comprising RNA and/or DNA (e.g., to determine whether a target DNA and/or target RNA is present in a population of RNA and/or DNA). The sample may be derived from any source, e.g., the sample may be a synthetic combination of purified DNA and/or RNA; the sample may be a cell lysate, a DNA/RNA-rich cell lysate, or DNA/RNA isolated and/or purified from a cell lysate. The sample may be from a patient (e.g., for diagnostic purposes). The sample may be from permeabilized cells. The sample may be from crosslinked cells. The sample may be a tissue slice. The sample may be from a tissue prepared by cross-linking followed by degreasing and conditioning to form a uniform refractive index.
Suitable samples include, but are not limited to, saliva, blood, serum, plasma, urine, aspirate, and biopsy samples. The sample may be from a patient and encompasses other liquid samples of blood and biological origin, solid tissue samples such as biopsy samples or tissue cultures or cells derived therefrom and their progeny. The definition also includes samples that are manipulated in any way after they are obtained, such as by treating, washing or enriching certain cell populations, such as cancer cells, with reagents. The definition also includes samples that have been enriched for a particular type of molecule (e.g., RNA). The term "sample" encompasses biological samples, such as clinical samples, such as blood, plasma, serum, aspirate, cerebrospinal fluid (CSF), and also includes tissue obtained by surgical excision, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, and the like. "biological sample" includes biological fluids derived therefrom (e.g., cancerous cells, infected cells, etc.), such as RNA-containing samples obtained from such cells (e.g., cell lysates or other cell extracts containing RNA).
In some embodiments, the source of the sample is (or is suspected of being) a diseased cell, fluid, tissue or organ. In some embodiments, the source of the sample is normal (non-diseased) cells, fluids, tissues or organs. In some embodiments, the source of the sample is (or is suspected of being) a pathogen-infected cell, tissue or organ. For example, the source of the sample may be an individual that may or may not be infected, and the sample may be any biological sample collected from the individual (e.g., blood, saliva, biopsy, plasma, serum, bronchoalveolar lavage, sputum, fecal specimen, cerebrospinal fluid, fine needle aspirate, swab sample (e.g., oral swab, cervical swab, nasal swab), interstitial fluid, synovial fluid, nasal discharge, tears, buffy coat, mucosal sample, epithelial cell sample (e.g., epithelial scraping), etc.). In some embodiments, the sample is a cell-free liquid sample. In some embodiments, the sample is a liquid sample that may comprise cells. Pathogens include viruses, fungi, helminths, protozoa, malaria parasites, plasmodium (Plasmodium) parasites, toxoplasma (Toxoplasma) parasites, schistosome (Schistonoma) parasites, and the like. "helminths" include roundworms, heart worms, and phytophagous nematodes (Nematoda), trematodes (trematoda), acanthocellates, and cestodes (Cestoda). Protozoal infections include Giardia species (Giardia spp.) infections, trichomonas species (Trichomonas spp.), trypanosomiasis, amebic dysentery, babesia, balania dysentery, cha Jiashi disease, coccidiosis, malaria, and toxoplasmosis. Examples of pathogens such as parasite/protozoan pathogens include, but are not limited to: plasmodium falciparum (Plasmodium falciparum), plasmodium vivax (Plasmodium vivax), trypanosoma cruzi (Trypanosoma cruzi), and toxoplasma gondii (Toxoplasma gondii). Fungal pathogens include, but are not limited to: cryptococcus neoformans (Cryptococcus neoformans), histoplasma capsulatum (Histoplasma capsulatum), coccidioidomycosis (Coccidioides immitis), blastodermatitidis (Blastomyces dermatitidis), chlamydia trachomatis (Chlamydia trachomatis) and Candida albicans (Candida albicans). Pathogenic viruses include, for example, immunodeficiency viruses (e.g., HIV); influenza virus; dengue fever; west nile virus; herpes virus; yellow fever virus; hepatitis c virus; hepatitis a virus; hepatitis b virus; papilloma virus; etc. Pathogenic viruses may include DNA viruses, for example: papovaviruses (e.g., human Papilloma Virus (HPV), polyomavirus); hepadnaviridae (e.g., hepatitis B Virus (HBV)); herpes viruses (e.g., herpes Simplex Virus (HSV), varicella Zoster Virus (VZV), epstein-Barr virus (EBV), cytomegalovirus (CMV), lymphophilic herpes virus (Pityriasis Rosea), kaposi sarcoma-associated herpes virus); adenoviruses (e.g., thymus, avirus, ichtadenovirus, mammalian adenovirus, sialidase adenovirus); poxviruses (e.g., smallpox, vaccinia virus, monkey pox virus, orf virus, pseudovaccinia, bovine papulostomatitis virus, tanapox virus, yaba monkey tumor virus, infectious soft wart virus (MCV)); parvovirus (e.g., adeno-associated virus (AAV), parvovirus B19, human bocavirus, bufo virus, human parvovirus 4G 1); geminiviridae; the family of nanoviridae; the family of algae viruses; etc. Pathogens may include, for example, DNA viruses [ e.g.: papovaviruses (e.g., human Papilloma Virus (HPV), polyomavirus); hepadnaviridae (e.g., hepatitis B Virus (HBV)); herpes viruses (e.g., herpes Simplex Virus (HSV), varicella Zoster Virus (VZV), epstein Barr Virus (EBV), cytomegalovirus (CMV), lymphophilic herpesvirus, pityriasis rosea, kaposi sarcoma-associated herpesvirus); adenoviruses (e.g., thymus, avirus, ichtadenovirus, mammalian adenovirus, sialidase adenovirus); poxviruses (e.g., smallpox, vaccinia virus, monkey pox virus, orf virus, pseudovaccinia, bovine papulostomatitis virus, tanapox virus, yaba monkey tumor virus, infectious soft wart virus (MCV)); parvoviruses (e.g., adeno-associated virus (AAV), parvovirus B19, human bocavirus, bufaviviridae, human parv 4G 1); geminiviridae; the family of nanoviridae; algae DNA virus family; etc. ], mycobacterium tuberculosis (Mycobacterium tuberculosis), streptococcus agalactiae (Streptococcus agalactiae), methicillin-resistant Staphylococcus aureus, legionella pneumophila (Legionella pneumophila), streptococcus pyogenes, escherichia coli (Escherichia coli), neisseria gonorrhoeae (Neisseria gonorrhoeae), neisseria meningitidis (Neisseria meningitidis), pneumococcus (Pneumococcus), cryptococcus neoformans (Cryptococcus neoformans), histoplasma capsulatum (Histoplasma capsulatum), haemophilus influenzae type B (Hemophilus influenzae B), treponema pallidum (Treponema pallidum), leme's disease spirochete, pseudomonas aeruginosa (Pseudomonas aeruginosa), mycobacterium leptospire (Mycobacterium leprae), brucella abortus (Brucella abortus), rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvovirus respiratory syncytial virus, varicella zoster virus, hepatitis B virus, hepatitis C virus, measles virus, adenovirus, T cell leukemia virus, epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, sindbis virus, lymphocytic choriomeningitis virus, wart virus, bluetongue virus, sendai virus, feline leukemia virus, reovirus, poliovirus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, west Nile virus, plasmodium falciparum (Plasmodium falciparum), plasmodium vivax, toxoplasma gondii, trypanosoma rangeli, cruz trytis (Trypanosoma cruzi), trypanosoma robusta (Trypanosoma rhodesiense), trypanosoma brucei (Trypanosoma brucei), schistosoma mansoni (Schistosoma mansoni), schistosoma japonicum (Schistosoma japonicum), babesia bovis (babisia bovis), eimeria tenella (Eimeria tenella), filarial (Onchocerca volvulus), leishmania tropicalis (Leishmania tropica), mycobacterium tuberculosis (Mycobacterium tuberculosis), trichina (Trichinella spiralis), taylor minutissima (Theileria parva), taenia tenacissifolia (Taenia hydatigena), taenia ovis (Taenia ovis), taenia tenacissima (Taenia samita), echinococcus granulosa (Echinococcus granulosus), midwia kohlrabi (Mesocestoides corti), mycoplasma arthritis (Mycoplasma arthritidis), mycoplasma hyorhini (M.hyorhinis), mycoplasma stomatae (M.orale), mycoplasma argyi (M.arginii), mycoplasma hyopneumoniae (Acholeplasma laidlawii), mycoplasma salivarium (M.salii) and mycoplasma pneumoniae (M.M.pneumonitium). Pathogenic viruses may include one or more of SARS-CoV-2, influenza A, influenza B and/or influenza C.
The sample may be a biological sample, such as a clinical sample. In some embodiments, the sample is taken from a biological source, such as vagina, urethra, penis, anus, throat, cervix, fermentation broth (fermentation broths), cell culture, and the like. The sample may include, for example, fluid and cells from a fecal sample. The biological sample may be used (1) as is obtained from a subject or source or (2) after pretreatment to modify the characteristics of the sample. Thus, the test sample may be pre-treated prior to use, for example, by disrupting cells or virus particles, preparing a liquid from a solid material, diluting a viscous fluid, filtering a liquid, concentrating a liquid, inactivating interfering components, adding reagents, purifying nucleic acids, and the like. Thus, a "biological sample" as used herein includes nucleic acids (DNA, RNA, or total nucleic acids) extracted from a clinical or biological sample. Sample preparation may also include the use of solutions containing buffers, salts, detergents, and/or the like for preparing the sample for analysis. In some embodiments, the sample is processed prior to molecular testing. In some embodiments, the sample is directly analyzed and no pretreatment is performed prior to testing. The sample may be, for example, a fecal sample. In some embodiments, the sample is a fecal sample from a patient with clinical symptoms of acute gastroenteritis.
In some embodiments, the sample to be tested is treated prior to performing the methods disclosed herein. For example, in some embodiments, the sample may be isolated, concentrated, or subjected to various other processing steps prior to performing the methods disclosed herein. For example, in some embodiments, the sample may be treated to isolate nucleic acids from the sample prior to contacting the sample with the oligonucleotides, as disclosed herein. In some embodiments, the methods disclosed herein are performed on a sample without in vitro culturing the sample. In some embodiments, the sample is subjected to the methods disclosed herein without isolating nucleic acids from the sample prior to contacting the sample with the oligonucleotides disclosed herein.
The sample may comprise one or more nucleic acids (e.g., more than one nucleic acid). The term "more than one" as used herein may refer to two or more. Thus, in some embodiments, a sample comprises two or more (e.g., 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more) nucleic acids (e.g., gDNA, mRNA). The disclosed methods can be used as very sensitive methods for detecting the presence of a target nucleic acid in a sample (e.g., in a complex mixture of nucleic acids such as gDNA). In some embodiments, the sample comprises 5 or more nucleic acids (e.g., 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more nucleic acids) that differ in sequence from one another. In some embodiments, the sample comprises 10 or more, 20 or more, 50 or more More, 100 or more, 500 or more, 10 3 One or more, 5X10 3 One or more of 10 4 One or more, 5X10 4 One or more of 10 5 One or more, 5X10 5 One or more of 10 6 One or more, 5X10 6 One or more or 10 7 Or more nucleic acids.
In some embodiments, the sample comprises 10 to 20, 20 to 50, 50 to 100, 100 to 500, 500 to 10 3 Seed, 10 3 Seed to 5x10 3 Seed, 5x10 3 Seed to 10 4 Seed, 10 4 Seed to 5x10 4 Seed, 5x10 4 Seed to 10 5 Seed, 10 5 Seed to 5x10 5 Seed, 5x10 5 Seed to 10 6 Seed, 10 6 Seed to 5x10 6 Species, or 5x10 6 Seed to 10 7 Species, or more than 10 7 A nucleic acid. In some embodiments, the sample comprises 5 to 10 7 Seed nucleic acids (e.g., sequences differing from each other) (e.g., 5 to 10 6 Seed, 5 to 10 5 Seed, 5 to 50,000 seed, 5 to 30,000 seed, 10 to 10 seed 6 Seed, 10 to 10 5 Seed, 10 to 50,000 seed, 10 to 30,000 seed, 20 to 10 seed 6 Seed, 20 to 10 5 Seed, 20 to 50,000, or 20 to 30,000 nucleic acids, or a number or range between any two of these values). In some embodiments, the sample comprises 20 or more nucleic acids that differ in sequence from one another.
The sample may be any sample comprising nucleic acid (e.g., to determine whether target nucleic acid is present in a population of nucleic acids). The sample may be derived from any source, e.g., the sample may be a synthetic combination of purified nucleic acids; the sample may be a cell lysate, a DNA-enriched cell lysate, or nucleic acids isolated and/or purified from a cell lysate. The sample may be from a patient (e.g., for diagnostic purposes). The sample may be from permeabilized cells. The sample may be from crosslinked cells. The sample may be a tissue slice. The sample may be from a tissue prepared by cross-linking followed by degreasing and conditioning to form a uniform refractive index.
The sample may comprise a target nucleic acid and more than one non-target nucleic acid. In some embodiments, the target nucleic acid is in one copy per 10 non-target nucleic acids, one copy per 20 non-target nucleic acids, one copy per 25 non-target nucleic acids, one copy per 50 non-target nucleic acids, one copy per 100 non-target nucleic acids, one copy per 500 non-target nucleic acids, one copy per 10 3 One copy of each non-target nucleic acid, 10 per 5x 3 One copy per 10 of non-target nucleic acid 4 One copy of each non-target nucleic acid, 10 per 5x 4 One copy per 10 of non-target nucleic acid 5 One copy of each non-target nucleic acid, 10 per 5x 5 One copy per 10 of non-target nucleic acid 6 One copy per 10 of non-target nucleic acid 6 Fewer than one copy of the non-target nucleic acid or numbers or ranges between any two of these values are present in the sample. In some embodiments, the target nucleic acid is copied from 10 non-target nucleic acids to 20 non-target nucleic acids, from 20 non-target nucleic acids to 50 non-target nucleic acids, from 50 non-target nucleic acids to 100 non-target nucleic acids, from 100 non-target nucleic acids to 500 non-target nucleic acids, from 500 non-target nucleic acids to 10 non-target nucleic acids 3 One copy per 10 of non-target nucleic acid 3 One copy of each non-target nucleic acid to every 5X 10 3 One copy of each non-target nucleic acid, 10 per 5x 3 One copy of each non-target nucleic acid to every 10 4 One copy per 10 of non-target nucleic acid 4 One copy of each non-target nucleic acid to every 10 5 One copy per 10 of non-target nucleic acid 5 One copy of each non-target nucleic acid to every 10 6 One copy per 10 of non-target nucleic acid 6 One copy of each non-target nucleic acid to every 10 7 A copy of a non-target nucleic acid or a number or range between any two of these values is present in the sample.
Suitable samples include, but are not limited to, saliva, blood, serum, plasma, urine, aspirate, and biopsy samples. Thus, the term "sample" in relation to a patient encompasses blood and other liquid samples of biological origin, solid tissue samples such as biopsy samples or tissue cultures or cells derived therefrom and their progeny. The definition also includes samples that are manipulated in any way after they are obtained, such as by treating, washing or enriching certain cell populations, such as cancer cells, with reagents. The definition also includes samples that have been enriched for a particular type of molecule (e.g., nucleic acid). The term "sample" encompasses biological samples, such as clinical samples, such as blood, plasma, serum, aspirate, cerebrospinal fluid (CSF), and also includes tissue obtained by surgical excision, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, and the like. "biological sample" includes biological fluids derived therefrom (e.g., cancerous cells, infected cells, etc.), such as nucleic acid-containing samples obtained from such cells (e.g., cell lysates or other cell extracts containing nucleic acids).
Suitable samples for use in the methods disclosed herein include any conventional biological sample obtained from an organism or portion thereof (such as plants, animals, bacteria, etc.). In certain embodiments, the biological sample is obtained from an animal subject, such as a human subject. Biological samples are any solid or fluid samples obtained from, excreted or secreted by, any living organism, including but not limited to single cell organisms such as bacteria, yeasts, protozoa, and amoebas, etc., multicellular organisms such as plants or animals, including samples from healthy or seemingly healthy human subjects or human patients affected by a condition or disease to be diagnosed or studied such as infection by a pathogenic microorganism such as a pathogenic bacterium or virus. For example, the biological sample may be a biological fluid obtained from: such as blood, plasma, serum, urine, stool, sputum, mucus, lymph, synovial fluid, bile, ascites, pleural effusion, seroma, saliva, cerebrospinal fluid, aqueous humor, or vitreous humor, or any bodily secretion, leakage, exudate (e.g., fluid obtained from an abscess or any other site of infection or inflammation) or fluid obtained from a joint (e.g., a normal joint, or a joint affected by a disease such as rheumatoid arthritis, osteoarthritis, gout, or septic arthritis), or a swab of a skin or mucosal surface.
The sample may also be a sample obtained from any organ or tissue (including biopsy or autopsy samples, such as tumor biopsies), or may include cells (whether primary or cultured) or media conditioned by any cell, tissue or organ. Exemplary samples include, but are not limited to, cells, cell lysates, blood smears, cell centrifuge preparations, cytological smears, bodily fluids (e.g., blood, plasma, serum, saliva, sputum, urine, bronchoalveolar lavage, semen, etc.), tissue biopsies (e.g., tumor biopsies), fine needle aspirates, and/or tissue sections (e.g., cryostat tissue sections and/or paraffin embedded tissue sections). In other examples, the sample comprises circulating tumor cells (which can be identified by cell surface markers). In particular examples, the sample is used directly (e.g., fresh or frozen), or may be manipulated prior to use, for example, by fixation (e.g., using formalin) and/or embedding in wax (such as Formalin Fixed Paraffin Embedded (FFPE) tissue samples). It will be appreciated that any method of obtaining tissue from a subject may be utilized, and that the choice of method used will depend on a variety of factors, such as the type of tissue, the age of the subject, or the procedures available to the practitioner. Standard techniques for obtaining such samples are available in the art.
The sample may be an environmental sample, such as water, soil, or a surface, such as an industrial or medical surface.
Due to the increased sensitivity of the embodiments disclosed herein, in certain example embodiments, assays and methods may be run on crude samples or samples in which the target molecules to be detected are not further fractionated or purified from the sample.
Cells can be lysed to release target molecules (e.g., target dsDNA). Cell lysis may be accomplished by any of a variety of means, such as by chemical or biochemical means, by osmotic shock, or by thermal, mechanical or optical lysis means. Cells can be lysed by adding a cell lysis buffer comprising a detergent (e.g., SDS, lithium dodecyl sulfate, triton X-100, tween-20, or NP-40), an organic solvent (e.g., methanol or acetone), or a digestive enzyme (e.g., proteinase K, pepsin, or trypsin), or any combination thereof. To increase association of the target with the barcode, the diffusion rate of the target molecule may be altered by, for example, reducing the temperature of the lysate and/or increasing the viscosity of the lysate.
In some embodiments, filter paper may be used to lyse the sample. The filter paper may be soaked with lysis buffer on top of the filter paper. The filter paper may be applied to the sample with pressure, which may facilitate cleavage of the sample and hybridization of the target of the sample to the substrate.
In some embodiments, the cleavage may be performed by mechanical cleavage, thermal cleavage, optical cleavage, and/or chemical cleavage. Chemical cleavage may include the use of digestive enzymes such as proteinase K, pepsin and trypsin. Lysis may be performed by adding a lysis buffer to the substrate. The lysis buffer may comprise Tris HCl. The lysis buffer may comprise at least about 0.01M, 0.05M, 0.1M, 0.5M, or 1M or more Tris HCl. The lysis buffer may comprise up to about 0.01M, 0.05M, 0.1M, 0.5M, or 1M or more Tris HCl. The lysis buffer may comprise about 0.1M Tris HCl. The pH of the lysis buffer may be at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or higher. The pH of the lysis buffer may be up to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or higher. In some embodiments, the pH of the lysis buffer is about 7.5. The lysis buffer may comprise a salt (e.g., liCl). The salt concentration in the lysis buffer may be at least about 0.1M, 0.5M, or 1M or higher. The salt concentration in the lysis buffer may be up to about 0.1M, 0.5M, or 1M or higher. In some embodiments, the concentration of salt in the lysis buffer is about 0.5M. The lysis buffer may comprise a detergent (e.g., SDS, lithium dodecyl sulfate, triton X, tween, NP-40). The detergent concentration in the lysis buffer may be at least about 0.0001%, 0.0005%, 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6% or 7% or more. The detergent concentration in the lysis buffer may be up to about 0.0001%, 0.0005%, 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6% or 7% or more. In some embodiments, the detergent concentration in the lysis buffer is about 1% lithium dodecyl sulfate. The time used in the lysis method may depend on the amount of detergent used. In some embodiments, the more detergent used, the less time is required for lysis. The lysis buffer may comprise a chelating agent (e.g., EDTA, EGTA). The chelating agent concentration in the lysis buffer may be at least about 1mM, 5mM, 10mM, 15mM, 20mM, 25mM, or 30mM or more. The chelating agent concentration in the lysis buffer may be up to about 1mM, 5mM, 10mM, 15mM, 20mM, 25mM, or 30mM or more. In some embodiments, the concentration of chelating agent in the lysis buffer is about 10mM. The lysis buffer may contain a reducing agent (e.g., beta-mercaptoethanol, DTT). The concentration of reducing agent in the lysis buffer may be at least about 1mM, 5mM, 10mM, 15mM, or 20mM or more. The concentration of reducing agent in the lysis buffer may be up to about 1mM, 5mM, 10mM, 15mM, or 20mM or more. In some embodiments, the concentration of reducing agent in the lysis buffer is about 5mM. In some embodiments, the lysis buffer may comprise about 0.1M Tris HCl, about pH 7.5, about 0.5M LiCl, about 1% lithium dodecyl sulfate, about 10mM EDTA and about 5mM DTT.
The cleavage may be carried out at a temperature of about 4 ℃, 10 ℃, 15 ℃, 20 ℃, 25 ℃ or 30 ℃. The lysis may be performed for about 1 minute, 5 minutes, 10 minutes, 15 minutes, or 20 minutes or more. Lysed cells may include at least about 100000, 200000, 300000, 400000, 500000, 600000, or 700000 or more target nucleic acid molecules. Lysed cells may include up to about 100000, 200000, 300000, 400000, 500000, 600000 or 700000 or more target nucleic acid molecules.
Kit for detecting a substance in a sample
Kits described herein may comprise: more than one protein complex. In some embodiments, each of the more than one protein complexes comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA). In some embodiments, the transposomes comprise a transposase, a first adaptor and a second adaptor. In some embodiments, the binding sites of each of the more than one protein complexes are different from each other. In some embodiments, the kit comprises: at least one component that provides real-time detection activity for the nucleic acid amplification product. Real-time detection activity may be provided by molecular beacons. The dried composition may comprise reverse transcriptase and/or reverse transcription primers.
The kit may comprise, for example, one or more polymerases and one or more primers, and optionally one or more reverse transcriptases and/or reverse transcription primers, as described herein. When a target is amplified, a pair of primers (forward and reverse) may be included in the kit. In the case of amplifying more than one target sequence, more than one primer pair may be included in the kit. The kit may comprise control polynucleotides and in the case of amplifying more than one target sequence, more than one control polynucleotide may be included in the kit.
The kit may also contain one or more components in any number of separate vessels, chambers, containers, packets, tubes, vials, microtiter plates, etc., or the components may be combined in various combinations in such a container. For example, the components of the kit may be present in one or more containers. In some embodiments, all components are provided in one container. In some embodiments, the enzyme (e.g., polymerase and/or reverse transcriptase) may be provided in a separate container from the primer. The components may be lyophilized, heat dried, freeze dried or in a stabilizing buffer, for example. In some embodiments, the polymerase and/or reverse transcriptase are present in lyophilized or heat-dried form in a single container, and the primer is lyophilized, heat-dried, freeze-dried, or present in a buffer in a different container. In some embodiments, the polymerase and/or reverse transcriptase and the primer are in a single container in lyophilized form or in heat dried form.
The kit may also comprise, for example, dntps used in the reaction, or modified nucleotides, vessels, cuvettes or other containers for the reaction, or vials of water or buffer for rehydrating lyophilized or heat-dried components. For example, the buffers used may be suitable for both polymerase and primer annealing activities.
The kit may also comprise instructions for performing one or more methods described herein and/or descriptions of one or more components described herein. The instructions and/or descriptions may be in printed form and may be contained in a kit insert. The kit may also contain a written description of the internet location providing such instructions or descriptions.
The kit may further comprise reagents for detection methods, such as reagents for FRET, lateral flow devices, test strips, fluorescent dyes, colloidal gold particles, latex particles, molecular beacons or polystyrene beads.
Fig. 1, 3, 4, 5A-5F, and 7A-7H of the present disclosure were created with a biorender.
Examples
Some aspects of the embodiments discussed above are disclosed in more detail in the following examples, which are not intended to limit the scope of the disclosure in any way.
Example 1
Design and validation of fusion proteins and guide RNAs (sgrnas)
Four constructs were designed for the production of fusion proteins: dCAS9-Fl26-Tn5, dCAS9-xTen-Tn5, tn5-Fl26-dCAS9, tn5-xTen-dCAS9 (see, e.g., FIGS. 8-10). These constructs have dCas9 or Tn5 sequences at the N-terminus of the fusion protein separated by a Fl26 linker or an xTen linker. In some embodiments, the plasmid design is based on the following: "Chen, s.p. & Wang, h.h. (2019) & An Engineered Cas-Transposon System for Programmable and Site-Directed DNA transfer.the CRISPR journal.vol 2,Number 6.DOI:10.1089/crispr.2019.0030 and Picelli s., bjorklund, a.k., reinius, b., sgasser, s., wingerb, g., & Sandbert, r. (2014)"; "Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome research.24:2033-2040.ISSN 1088-9051/14".
sgRNA design
Sgrnas targeting the salmonella enterica InvA and FliC genes were designed. Sequences from salmonella enterica strain ATCC 13311 were used. The sgrnas were designed using a tool of Integrated DNA Technologies (IDT) (table 1). The relative positions of the sgrnas for the InvA and FliC genes are shown in fig. 11 and 12, respectively.
Table 1: salmonella enterica sgRNA
Fragments of 264bp, 8bp, 148bp, 292bp, 458bp and 195bp were predicted for InvA. Fragments of about 130bp, 82bp and 232bp are expected for FliC.
Verification of Salmonella enterica sgRNA
To verify the specificity of sgrnas, genomic samples were cleaved with Cas 9. The adaptors were ligated to Cas 9-cleaved DNA and the PCR-amplified fragments were observed by bioanalyzer.
Table 2: bioanalytical analysis of sgRNA activity
FIG. 13 and Table 2 show that cleavage in gDNA is specific for the expected size (compare the "biological analyzer expected size [ bp ] column and the" actual [ bp ] column in Table 2), thus demonstrating that the guide RNA for Salmonella enterica is functional.
Next, sgRNAs targeting human genes EXT1, BCL9, HOXA13, HOXD11 and OLIG2 were designed for a total of 10 sgRNAs (tables 3A-3C). sgrnas were designed using GenScript tools.
Table 3A: human sgRNA targets
Table 3B: human sgRNA targets
Table 3C: human sgRNA targets
sgrnas were also designed to target chlamydia trachomatis gene polymorphic membrane protein a (pmp a) (table 4). A total of 5 sgRNAs were designed using the IDT tool.
Table 4: chlamydia trachomatis sgRNA target
Verification of transposase Tn5
FIGS. 14-15 show that Tn5 can mix designed adapter A (5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3', SEQ ID NO: 27) with adapter B (5 ' -GTCTCGTGGGCTCGG) AGATGTGTATAAGAGACAG-3', SEQ ID NO: 28) was ligated to the DNA fragment for PCR amplification, demonstrating functionality. First, gDNA was cut and stuck from Salmonella enterica using Tn5 with custom adaptors. The labeled fragments were then amplified by PCR. The data in FIGS. 14-15 indicate that Tn5 transposase is loaded with custom adaptors.
Verification of fusion proteins
Recombinant expression of dCAS9-Fl26-Tn5, dCAS9-xTen-Tn5, tn5-Fl26-dCAS9, tn5-xTen-dCAS9, and then purification. In some embodiments, the recombinant protein is isolated from a cleavage moiety (intein) on a chitin column. The purified fusion proteins were analyzed for predicted size and purity on SDS-PAGE gels (FIGS. 16-21).
SDS-PAGE analysis of dCAS9-Fl26-Tn5 is shown in FIG. 16. The samples were observed to have the following purities: >80%. In some embodiments, the fusion protein may also comprise an intein domain. Analysis by the bioanalyzer in fig. 17 showed that a portion of the protein produced (peak at 44.91) was the correct size (without intein).
SDS-PAGE analysis of dCAS9-xTen-Tn5 is shown in FIG. 18. The samples were observed to have the following purities: >70%. In some embodiments, the fusion protein may also comprise an intein domain, resulting in a larger size than desired. Analysis by the bioanalyzer in fig. 19 showed that a portion of the protein produced (peak at 44.62) was the correct size (without intein).
FIG. 20 depicts SDS-PAGE analysis of recombinantly expressed and purified Tn5-Fl26-dCAS 9. FIG. 21 depicts SDS-PAGE analysis of recombinantly expressed and purified Tn 5-xTen-dmas 9. The samples were observed to have the following purities: >65%.
Testing the functionality of fusion proteins
dAS 9-Fl26-Tn5 and dAS 9-xTen-Tn5 were tested for functionality. The scheme is as follows: (1) loading sgrnas and adaptors into fusion proteins (using human sgrnas unless otherwise indicated), (2) directing enzymatic fragmentation (Guided Tagmentation), (3) purification, (4) PCR amplification, (5) Quality Control (QC), and (6) analysis of the results.
Loading of sgrnas and adaptors into fusion proteins
The fusion protein (1 molecule dCAS9-Tn5 to 1 sgRNA to 2 adaptors) was loaded in a 1:1:2 ratio. The mixture was incubated at 24℃for 30 min.
Guiding cleavage by enzymatic cleavage
100mM dCAS9-Tn5 (6.02 e10 molecules) and 500ng human gDNA (1.52e5 molecules) were combined, the ratio of gDNA to dCAS9-Tn5 being 1 to 3.95e5. Incubating the mixture: incubation at 37 ℃ for 60 minutes and at 55 ℃ for 60 minutes to produce tagged fragments. Several incubation methods were tried, and in some embodiments dCas9 may function in the range of 25 ℃ to 42 ℃ and Tn5 may function in the range of 37 ℃ to 60 ℃. The PCR amplification procedure is shown in Table 5.
Table 5: PCR amplification
Figure 22 depicts data related to Cas 9-only control reactions. Visible lines show the tape station analysis of Cas9 digested DNA. Analysis of the sample after the PCR amplification reaction showed no signal. This data suggests that Cas9 itself cannot add an adapter to the 5 'or 3' end of a DNA fragment.
FIGS. 23-24 show the PCR amplification results after digestion and ligation of adaptors with dCS 9-Fl26-Tn5 or dCS 9-xTen-Tn5, respectively. The arrows in the figure point to the signal of the post-PCR sample. PCR amplification was detected only if both fusion proteins (dCAS 9-Fl26-Tn5 and dCAS9-xTen-Tn 5) could be transposed (e.g., adaptors (adaptors B) were added at the 5 'and 3' ends of the DNA molecules).
Results
The results indicate that Tn5 is able to add custom adaptors to human gDNA. Only Cas9 control showed that Tn5 was required for this process to amplify. These results show the functionality of Tn5 fused to dCas 9.
Fusion protein and DNA ratio test
Next, the effect of reducing the gDNA to Cas9-Tn5 ratio was tested. The DNA concentration was kept constant while the Cas-Tn fusion protein concentration was reduced: 100nM (194,071 dCAS9-Tn5 molecules to 1 DNA genome copy), 1nM (1, 940:1), 100pM (194:1), 10pM (19.4:1), 1pM (1.94:1). The results are shown in fig. 25-31. FIG. 25 depicts the results of PCR amplification after a directed enzymatic fragmentation reaction using dCAS9-Tn5 at a ratio of 194,071:1 and shows broad peaks after PCR, indicating non-specific enzymatic fragmentation. Decreasing the amount of dCas9-Tn5 (fig. 26-31) resulted in a detectable peak from the PCR reaction, indicating that decreasing the ratio of fusion protein to DNA increased the specificity of enzymatic fragmentation.
Results
The results indicate that Tn5 is able to add custom adaptors to human gDNA. Only Cas9 control showed that Tn5 was required for this process to amplify DNA. Tn5 proved to be functional and evidence exists to direct transposition. Thus, there is evidence that the fusion protein comprises both dCas9 and Tn5 activity.
Fusion proteins and sgrnas for salmonella enterica
FIGS. 38-39 depict guided cleavage fragmentation of Salmonella enterica sgRNA using for dCS 9-xTen-Tn 5. The data indicate that the addition of sgrnas increases specificity. FIG. 39 shows that guided cleavage fragmentation without sgRNA is random. FIG. 38 shows that the addition of sgRNA confers specificity.
Example 2
Sample library preparation
Guiding the digestion of library by enzymatic cleavage
Described herein are methods and compositions for generating libraries for sequencing on Illumina NextSeq.
3 libraries were prepared using ligation-based methods (fig. 37, 40, 42A-42B), where a single adaptor (e.g., adaptor B, using either Tn5 alone or dCas9-Tn5 fusion) was used to add the nebnet sequencing adaptors after the cleavage step, and two libraries were prepared using guided cleavage-based methods (fig. 41, 43-44), where sequences required for NGS were included in the guided cleavage step on adaptors a and B. All libraries were prepared using human sgrnas. dCAS9-Fl26-Tn5 fusion proteins were used to direct cleavage fragmentation by enzymatic methods. In these experiments, the DNA was incubated with dCS-Tn 5 in either a long or short incubation protocol. For the short protocol, the reaction was incubated at 30 ℃ for 30 minutes, and then at 37 ℃ for 30 minutes. For the long protocol, the reaction was incubated at 30 ℃ for 30 minutes, then at 38 ℃ for 60 minutes, and then at 55 ℃ for 60 minutes.
FIG. 32 shows highly multiplexed single primer DNA amplification using Tn5 alone. Analysis by a bioanalyzer showed non-specific DNA amplification by PCR, indicating that DNA could be amplified using only 1 primer (adapter B).
FIG. 33 (short incubation protocol) and FIG. 34 (long incubation protocol) show evidence supporting highly multiplexed single primer DNA amplification using dCS 9-Tn fusion protein. Bioanalyzer analysis of PCR amplification showed that only 1 primer (adapter B) was used to specifically amplify several DNA fragments simultaneously.
Fig. 35 (long incubation protocol) and fig. 36 (short incubation protocol) show evidence supporting custom locus specific sequencing library preparation. Analysis by a bioanalyzer indicated that a sequencing library could be created. Addition of adaptors a and B required for sequencing in the Illumina platform indicated that sequencing libraries could be created using guided enzymatic fragmentation.
In at least some of the previously described embodiments, one or more elements used in one embodiment may be used interchangeably in another embodiment unless such substitution is technically not feasible. Those skilled in the art will appreciate that various other omissions, additions and modifications may be made to the methods and structures described above without departing from the scope of the claimed subject matter. All such modifications and changes are intended to fall within the scope of the subject matter defined by the appended claims.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. For clarity, various singular/plural permutations may be explicitly set forth herein. As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Any reference herein to "or" is intended to encompass "and/or" unless otherwise specified.
Those skilled in the art will understand that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims), are generally intended as "open" terms (e.g., the term "including" should be interpreted as "including but not limited to (including but not limited to)", the term "having" should be interpreted as "having at least (having at least)", the term "including" should be interpreted as "including but not limited to (includes but is not limited to)", and so forth. Those skilled in the art will further understand that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an" should be interpreted to mean "at least one" or "one or more"); the same holds true for the use of definite articles to introduce claim recitations. Furthermore, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Further, in those instances where a convention analogous to "at least one of A, B and C, etc." is used, such a syntactic structure is generally intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B and C together, etc.). In those instances where a convention analogous to "at least one of A, B or C, etc." is used, such a syntactic structure is generally intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B and C together, etc.). Those skilled in the art will further appreciate that, in fact, any separating word and/or expression presenting two or more alternative terms, whether in the specification, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms.
Further, when features or aspects of the present disclosure are described in terms of Markush groups (Markush groups), those skilled in the art will appreciate that the present disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.
As will be understood by those of skill in the art, for any and all purposes, such as in providing a written description, all ranges disclosed herein also include any and all possible subranges and combinations of subranges of the range. Any listed range can be readily identified as sufficiently descriptive and that the same range can be broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each of the ranges discussed herein can be readily broken down into a lower third, a middle third, an upper third, and the like. As will also be understood by those skilled in the art, all language such as "up to", "at least", "greater than", "less than" and the like include the stated numbers and refer to ranges that may be subsequently broken down into subranges as discussed above. Finally, as will be appreciated by those skilled in the art, a range includes members of each individual. Thus, for example, a group of 1-3 items refers to a group of 1, 2, or 3 items. Similarly, a group of 1-5 items refers to a group of 1, 2, 3, 4, or 5 items, and so forth.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Sequence listing
<110> Beckton Di-Kirson Co Ltd
Alva Luoge Di Neiss
<120> preparation method of nucleic acid sequencing library
<130> 68EB-317326-WO
<150> US 63/189,032
<151> 2021-05-14
<150> US 63/243,443
<151> 2021-09-13
<160> 45
<170> PatentIn version 3.5
<210> 1
<211> 12339
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> 3XFlag-Cas9-Fl26-Tn5
<400> 1
atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc cgccaacacc 60
cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 120
cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 180
gctgcggtaa agctcatcag cgtggtcgtg cagcgattca cagatgtctg cctgttcatc 240
cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 300
catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 360
gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 420
tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 480
gcgggaccag agaaaaatca ctcagggtca atgccagccg aacgccagca agacgtagcc 540
cagcgcgtcg gccgccatgc cggcgataat ggcctgcttc tcgccgaaac gtttggtggc 600
gggaccagtg acgaaggctt gagcgagggc gtgcaagatt ccgaataccg caagcgacag 660
gccgatcatc gtcgcgctcc agcgaaagcg gtcctcgccg aaaatgaccc agagcgctgc 720
cggcacctgt cctacgagtt gcatgataaa gaagacagtc ataagtgcgg cgacgatagt 780
catgccccgc gcccaccgga aggagctgac tgggttgaag gctctcaagg gcatcggtcg 840
agatcccggt gcctaatgag tgagctaact tacattaatt gcgttgcgct cactgcccgc 900
tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag 960
aggcggtttg cgtattgggc gccagggtgg tttttctttt caccagtgag acgggcaaca 1020
gctgattgcc cttcaccgcc tggccctgag agagttgcag caagcggtcc acgctggttt 1080
gccccagcag gcgaaaatcc tgtttgatgg tggttaacgg cgggatataa catgagctgt 1140
cttcggtatc gtcgtatccc actaccgaga tatccgcacc aacgcgcagc ccggactcgg 1200
taatggcgcg cattgcgccc agcgccatct gatcgttggc aaccagcatc gcagtgggaa 1260
cgatgccctc attcagcatt tgcatggttt gttgaaaacc ggacatggca ctccagtcgc 1320
cttcccgttc cgctatcggc tgaatttgat tgcgagtgag atatttatgc cagccagcca 1380
gacgcagacg cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt tgctggtgac 1440
ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag aaaataatac 1500
tgttgatggg tgtctggtca gagacatcaa gaaataacgc cggaacatta gtgcaggcag 1560
cttccacagc aatggcatcc tggtcatcca gcggatagtt aatgatcagc ccactgacgc 1620
gttgcgcgag aagattgtgc accgccgctt tacaggcttc gacgccgctt cgttctacca 1680
tcgacaccac cacgctggca cccagttgat cggcgcgaga tttaatcgcc gcgacaattt 1740
gcgacggcgc gtgcagggcc agactggagg tggcaacgcc aatcagcaac gactgtttgc 1800
ccgccagttg ttgtgccacg cggttgggaa tgtaattcag ctccgccatc gccgcttcca 1860
ctttttcccg cgttttcgca gaaacgtggc tggcctggtt caccacgcgg gaaacggtct 1920
gataagagac accggcatac tctgcgacat cgtataacgt tactggtttc acattcacca 1980
ccctgaattg actctcttcc gggcgctatc atgccatacc gcgaaaggtt ttgcgccatt 2040
cgatggtgtc cgggatctcg acgctctccc ttatgcgact cctgcattag gaagcagccc 2100
agtagtaggt tgaggccgtt gagcaccgcc gccgcaagga atggtgcatg ccggcatgcc 2160
gccctttcgt cttcaagaat taattcccaa ttccccaggc atcaaataaa acgaaaggct 2220
cagtcgaaag actgggcctt tcgttttatc tgttgtttgt cggtgaacgc tctcctgagt 2280
aggacaaatc cgccgggagc ggatttgaac gttgcgaagc aacggcccgg agggtggcgg 2340
gcaggacgcc cgccataaac tgccaggaat taattcccca ggcatcaaat aaaacgaaag 2400
gctcagtcga aagactgggc ctttcgtttt atctgttgtt tgtcggtgaa cgctctcctg 2460
agtaggacaa atccgccggg agcggatttg aacgttgcga agcaacggcc cggagggtgg 2520
cgggcaggac gcccgccata aactgccagg aattaattcc ccaggcatca aataaaacga 2580
aaggctcagt cgaaagactg ggcctttcgt tttatctgtt gtttgtcggt gaacgctctc 2640
ctgagtagga caaatccgcc gggagcggat ttgaacgttg cgaagcaacg gcccggaggg 2700
tggcgggcag gacgcccgcc ataaactgcc aggaattaat tccccaggca tcaaataaaa 2760
cgaaaggctc agtcgaaaga ctgggccttt cgttttatct gttgtttgtc ggtgaacgct 2820
ctcctgagta ggacaaatcc gccgggagcg gatttgaacg ttgcgaagca acggcccgga 2880
gggtggcggg caggacgccc gccataaact gccaggaatt aattccccag gcatcaaata 2940
aaacgaaagg ctcagtcgaa agactgggcc tttcgtttta tctgttgttt gtcggtgaac 3000
gctctcctga gtaggacaaa tccgccggga gcggatttga acgttgcgaa gcaacggccc 3060
ggagggtggc gggcaggacg cccgccataa actgccagga attggggatc ggaattaatt 3120
cccggtttaa accggggatc tcgatcccgc gaaattaata cgactcacta taggggaatt 3180
gtgagcggat aacaattccc ctctagaaat aattttgttt aactttaaga aggagatata 3240
ccatgggtga ttacaaggat cacgatggcg attacaagga tcacgatatc gattacaagg 3300
atgatgatga taagatggat aaaaagtatt ctattggttt agctatcggc acaaatagcg 3360
tcggatgggc ggtgatcact gatgaatata aggttccgtc taaaaagttc aaggttctgg 3420
gaaatacaga ccgccacagt atcaaaaaaa atcttatagg ggctctttta tttgacagtg 3480
gagagacagc ggaagcgact cgtctcaaac ggacagctcg tagaaggtat acacgtcgga 3540
agaatcgtat ttgttatcta caggagattt tttcaaatga gatggcgaaa gtagatgata 3600
gtttctttca tcgacttgaa gagtcttttt tggtggaaga agacaagaag catgaacgtc 3660
atcctatttt tggaaatata gtagatgaag ttgcttatca tgagaaatat ccaactatct 3720
atcatctgcg aaaaaaattg gtagattcta ctgataaagc ggatttgcgc ttaatctatt 3780
tggccttagc gcatatgatt aagtttcgtg gtcatttttt gattgaggga gatttaaatc 3840
ctgataatag tgatgtggac aaactattta tccagttggt acaaacctac aatcaattat 3900
ttgaagaaaa ccctattaac gcaagtggag tagatgctaa agcgattctt tctgcacgat 3960
tgagtaaatc aagacgatta gaaaatctca ttgctcagct ccccggtgag aagaaaaatg 4020
gcttatttgg gaatctcatt gctttgtcat tgggtttgac ccctaatttt aaatcaaatt 4080
ttgatttggc agaagatgct aaattacagc tttcaaaaga tacttacgat gatgatttag 4140
ataatttatt ggcgcaaatt ggagatcaat atgctgattt gtttttggca gctaagaatt 4200
tatcagatgc tattttactt tcagatatcc taagagtaaa tactgaaata actaaggctc 4260
ccctatcagc atcaatgatt aaacgctacg atgaacatca tcaagacttg actcttttaa 4320
aagctttagt tcgacaacaa cttccagaaa agtataaaga aatctttttt gatcaatcaa 4380
aaaacggata tgcaggttat attgatgggg gagctagcca agaagaattt tataaattta 4440
tcaaaccaat tttagaaaaa atggatggta ctgaggaatt attggtgaaa ctaaatcgtg 4500
aagatttgct gcgcaagcaa cggacctttg acaacggctc tattccccat caaattcact 4560
tgggtgagct gcatgctatt ttgagaagac aagaagactt ttatccattt ttaaaagaca 4620
atcgtgagaa gattgaaaaa atcttgactt ttcgaattcc ttattatgtt ggtccattgg 4680
cgcgtggcaa tagtcgtttt gcatggatga ctcggaagtc tgaagaaaca attaccccat 4740
ggaattttga agaagttgtc gataaaggtg cttcagctca atcatttatt gaacgcatga 4800
caaactttga taaaaatctt ccaaatgaaa aagtactacc aaaacatagt ttgctttatg 4860
agtattttac ggtttataac gaattgacaa aggtcaaata tgttactgaa ggaatgcgaa 4920
aaccagcatt tctttcaggt gaacagaaga aagccattgt tgatttactc ttcaaaacaa 4980
atcgaaaagt aaccgttaag caattaaaag aagattattt caaaaaaata gaatgttttg 5040
atagtgttga aatttcagga gttgaagata gatttaatgc ttcattaggt acctaccatg 5100
atttgctaaa aattattaaa gataaagatt ttttggataa tgaagaaaat gaagatatct 5160
tagaggatat tgttttaaca ttgaccttat ttgaagatag ggagatgatt gaggaaagac 5220
ttaaaacata tgctcacctc tttgatgata aggtgatgaa acagcttaaa cgtcgccgtt 5280
atactggttg gggacgtttg tctcgaaaat tgattaatgg tattagggat aagcaatctg 5340
gcaaaacaat attagatttt ttgaaatcag atggttttgc caatcgcaat tttatgcagc 5400
tgatccatga tgatagtttg acatttaaag aagacattca aaaagcacaa gtgtctggac 5460
aaggcgatag tttacatgaa catattgcaa atttagctgg tagccctgct attaaaaaag 5520
gtattttaca gactgtaaaa gttgttgatg aattggtcaa agtaatgggg cggcataagc 5580
cagaaaatat cgttattgaa atggcacgtg aaaatcagac aactcaaaag ggccagaaaa 5640
attcgcgaga gcgtatgaaa cgaatcgaag aaggtatcaa agaattagga agtcagattc 5700
ttaaagagca tcctgttgaa aatactcaat tgcaaaatga aaagctctat ctctattatc 5760
tccaaaatgg aagagacatg tatgtggacc aagaattaga tattaatcgt ttaagtgatt 5820
atgatgtcga tgccattgtt ccacaaagtt tccttaaaga cgattcaata gacaataagg 5880
tcttaacgcg ttctgataaa aatcgtggta aatcggataa cgttccaagt gaagaagtag 5940
tcaaaaagat gaaaaactat tggagacaac ttctaaacgc caagttaatc actcaacgta 6000
agtttgataa tttaacgaaa gctgaacgtg gaggtttgag tgaacttgat aaagctggtt 6060
ttatcaaacg ccaattggtt gaaactcgcc aaatcactaa gcatgtggca caaattttgg 6120
atagtcgcat gaatactaaa tacgatgaaa atgataaact tattcgagag gttaaagtga 6180
ttaccttaaa atctaaatta gtttctgact tccgaaaaga tttccaattc tataaagtac 6240
gtgagattaa caattaccat catgcccatg atgcgtatct aaatgccgtc gttggaactg 6300
ctttgattaa gaaatatcca aaacttgaat cggagtttgt ctatggtgat tataaagttt 6360
atgatgttcg taaaatgatt gctaagtctg agcaagaaat aggcaaagca accgcaaaat 6420
atttctttta ctctaatatc atgaacttct tcaaaacaga aattacactt gcaaatggag 6480
agattcgcaa acgccctcta atcgaaacta atggggaaac tggagaaatt gtctgggata 6540
aagggcgaga ttttgccaca gtgcgcaaag tattgtccat gccccaagtc aatattgtca 6600
agaaaacaga agtacagaca ggcggattct ccaaggagtc aattttacca aaaagaaatt 6660
cggacaagct tattgctcgt aaaaaagact gggatccaaa aaaatatggt ggttttgata 6720
gtccaacggt agcttattca gtcctagtgg ttgctaaggt ggaaaaaggg aaatcgaaga 6780
agttaaaatc cgttaaagag ttactaggga tcacaattat ggaaagaagt tcctttgaaa 6840
aaaatccgat tgacttttta gaagctaaag gatataagga agttaaaaaa gacttaatca 6900
ttaaactacc taaatatagt ctttttgagt tagaaaacgg tcgtaaacgg atgctggcta 6960
gtgccggaga attacaaaaa ggaaatgagc tggctctgcc aagcaaatat gtgaattttt 7020
tatatttagc tagtcattat gaaaagttga agggtagtcc agaagataac gaacaaaaac 7080
aattgtttgt ggagcagcat aagcattatt tagatgagat tattgagcaa atcagtgaat 7140
tttctaagcg tgttatttta gcagatgcca atttagataa agttcttagt gcatataaca 7200
aacatagaga caaaccaata cgtgaacaag cagaaaatat tattcattta tttacgttga 7260
cgaatcttgg agctcccgct gcttttaaat attttgatac aacaattgat cgtaaacgat 7320
atacgtctac aaaagaagtt ttagatgcca ctcttatcca tcaatccatc actggtcttt 7380
atgaaacacg cattgatttg agtcagctag gaggtgacga tgacgataaa gaattcggtg 7440
gcggtggctc tggcggtggt gggagtggag gtgggggatc aggaggaggc ggttcccata 7500
tgattaccag tgcactgcat cgtgcggcgg attgggcgaa aagcgtgttt tctagtgctg 7560
cgctgggtga tccgcgtcgt accgcgcgtc tggtgaatgt tgcggcgcaa ctggccaaat 7620
atagcggcaa aagcattacc attagcagcg aaggcagcaa agccatgcag gaaggcgcgt 7680
atcgttttat tcgtaatccg aacgtgagcg cggaagcgat tcgtaaagcg ggtgccatgc 7740
agaccgtgaa actggcccag gaatttccgg aactgctggc aattgaagat accacctctc 7800
tgagctatcg tcatcaggtg gcggaagaac tgggcaaact gggtagcatt caggataaaa 7860
gccgtggttg gtgggtgcat agcgtgctgc tgctggaagc gaccaccttt cgtaccgtgg 7920
gcctgctgca tcaagaatgg tggatgcgtc cggatgatcc ggcggatgcg gatgaaaaag 7980
aaagcggcaa atggctggcc gctgctgcaa cttcgcgtct gagaatgggc agcatgatga 8040
gcaacgtgat tgcggtgtgc gatcgtgaag cggatattca tgcgtatctg caagataaac 8100
tggcccataa cgaacgtttt gtggtgcgta gcaaacatcc gcgtaaagat gtggaaagcg 8160
gcctgtatct gtatgatcac ctgaaaaacc agccggaact gggcggctat cagattagca 8220
ttccgcagaa aggcgtggtg gataaacgtg gcaaacgtaa aaaccgtccg gcgcgtaaag 8280
cgagcctgag cctgcgtagc ggccgtatta ccctgaaaca gggcaacatt accctgaacg 8340
cggtgctggc cgaagaaatt aatccgccga aaggcgaaac cccgctgaaa tggctgctgc 8400
tgaccagcga gccggtggaa agtctggccc aagcgctgcg tgtgattgat atttataccc 8460
atcgttggcg cattgaagaa tttcacaaag cgtggaaaac gggtgcgggt gcggaacgtc 8520
agcgtatgga agaaccggat aacctggaac gtatggtgag cattctgagc tttgtggcgg 8580
tgcgtctgct gcaactgcgt gaatctttta ctccgccgca agcactgcgt gcgcagggcc 8640
tgctgaaaga agcggaacac gttgaaagcc agagcgcgga aaccgtgctg accccggatg 8700
aatgccaact gctgggctat ctggataaag gcaaacgcaa acgcaaagaa aaagcgggca 8760
gcctgcaatg ggcgtatatg gcgattgcgc gtctgggcgg ctttatggat agcaaacgta 8820
ccggcattgc gagctggggt gcgctgtggg aaggttggga agcgctgcaa agcaaactgg 8880
atggctttct ggccgcgaaa gacctgatgg cgcagggcat taaaatctgc atcacgggag 8940
atgcactagt tgccctaccc gagggcgagt cggtacgcat cgccgacatc gtgccgggtg 9000
cgcggcccaa cagtgacaac gccatcgacc tgaaagtcct tgaccggcat ggcaatcccg 9060
tgctcgccga ccggctgttc cactccggcg agcatccggt gtacacggtg cgtacggtcg 9120
aaggtctgcg tgtgacgggc accgcgaacc acccgttgtt gtgtttggtc gacgtcgccg 9180
gggtgccgac cctgctgtgg aagctgatcg acgaaatcaa gccgggcgat tacgcggtga 9240
ttcaacgcag cgcattcagc gtcgactgtg caggttttgc ccgcgggaaa cccgaatttg 9300
cgcccacaac ctacacagtc ggcgtccctg gactggtgcg tttcttggaa gcacaccacc 9360
gagacccgga cgcccaagct atcgccgacg agctgaccga cgggcggttc tactacgcga 9420
aagtcgccag tgtcaccgac gccggcgtgc agccggtgta tagccttcgt gtcgacacgg 9480
cagaccacgc gtttatcacg aacgggttcg tcagccacgc tactggcctc accggtctga 9540
actcaggcct cacgacaaat cctggtgtat ccgcttggca ggtcaacaca gcttatactg 9600
cgggacaatt ggtcacatat aacggcaaga cgtataaatg tttgcagccc cacacctcct 9660
tggcaggatg ggaaccatcc aacgttcctg ccttgtggca gcttcaatga ctgcaggaag 9720
gggatccggc tgctaacaaa gcccgaaagg aagctgagtt ggctgctgcc accgctgagc 9780
aataactagc ataacccctt ggggcctcta aacgggtctt gaggggtttt ttgctgaaag 9840
gaggaactat atccggataa ctacgtcagg tggcactttt cggggaaatg tgcgcggaac 9900
ccctatttgt ttatttttct aaatacattc aaatatgtat ccgctcatga gacaataacc 9960
ctgataaatg cttcaataat attgaaaaag gaagagtatg agtattcaac atttccgtgt 10020
cgcccttatt cccttttttg cggcattttg ccttcctgtt tttgctcacc cagaaacgct 10080
ggtgaaagta aaagatgctg aagatcagtt gggtgcacga gtgggttaca tcgaactgga 10140
tctcaacagc ggtaagatcc ttgagagttt tcgccccgaa gaacgtttcc caatgatgag 10200
cacttttaaa gttctgctat gtggcgcggt attatcccgt gttgacgccg ggcaagagca 10260
actcggtcgc cgcatacact attctcagaa tgacttggtt gagtactcac cagtcacaga 10320
aaagcatctt acggatggca tgacagtaag agaattatgc agtgctgcca taaccatgag 10380
tgataacact gcggccaact tacttctgac aacgatcgga ggaccgaagg agctaaccgc 10440
ttttttgcac aacatggggg atcatgtaac tcgccttgat cgttgggaac cggagctgaa 10500
tgaagccata ccaaacgacg agcgtgacac cacgatgcct gtagcaatgg caacaacgtt 10560
gcgcaaacta ttaactggcg aactacttac tctagcttcc cggcaacaat taatagactg 10620
gatggaggcg gataaagttg caggaccact tctgcgctcg gcccttccgg ctggctggtt 10680
tattgctgat aaatctggag ccggtgagcg tgggtctcgc ggtatcattg cagcactggg 10740
gccagatggt aagccctccc gtatcgtagt tatctacacg acggggagtc aggcaactat 10800
ggatgaacga aatagacaga tcgctgagat aggtgcctca ctgattaagc attggtaact 10860
gtcagaccaa gtttactcat atatacttta gattgattta ccccggttga taatcagaaa 10920
agccccaaaa acaggaagat tgtataagca aatatttaaa ttgtaaacgt taatattttg 10980
ttaaaattcg cgttaaattt ttgttaaatc agctcatttt ttaaccaata ggccgaaatc 11040
ggcaaaatcc cttataaatc aaaagaatag cccgagatag ggttgagtgt tgttccagtt 11100
tggaacaaga gtccactatt aaagaacgtg gactccaacg tcaaagggcg aaaaaccgtc 11160
tatcagggcg atggcccact acgtgaacca tcacccaaat caagtttttt ggggtcgagg 11220
tgccgtaaag cactaaatcg gaaccctaaa gggagccccc gatttagagc ttgacgggga 11280
aagccggcga acgtggcgag aaaggaaggg aagaaagcga aaggagcggg cgctagggcg 11340
ctggcaagtg tagcggtcac gctgcgcgta accaccacac ccgccgcgct taatgcgccg 11400
ctacagggcg cgtaaaagga tctaggtgaa gatccttttt gataatctca tgaccaaaat 11460
cccttaacgt gagttttcgt tccactgagc gtcagacccc gtagaaaaga tcaaaggatc 11520
ttcttgagat cctttttttc tgcgcgtaat ctgctgcttg caaacaaaaa aaccaccgct 11580
accagcggtg gtttgtttgc cggatcaaga gctaccaact ctttttccga aggtaactgg 11640
cttcagcaga gcgcagatac caaatactgt ccttctagtg tagccgtagt taggccacca 11700
cttcaagaac tctgtagcac cgcctacata cctcgctctg ctaatcctgt taccagtggc 11760
tgctgccagt ggcgataagt cgtgtcttac cgggttggac tcaagacgat agttaccgga 11820
taaggcgcag cggtcgggct gaacgggggg ttcgtgcaca cagcccagct tggagcgaac 11880
gacctacacc gaactgagat acctacagcg tgagctatga gaaagcgcca cgcttcccga 11940
agggagaaag gcggacaggt atccggtaag cggcagggtc ggaacaggag agcgcacgag 12000
ggagcttcca gggggaaacg cctggtatct ttatagtcct gtcgggtttc gccacctctg 12060
acttgagcgt cgatttttgt gatgctcgtc aggggggcgg agcctatgga aaaacgccag 12120
caacgcggcc tttttacggt tcctggcctt ttgctggcct tttgctcaca tgttctttcc 12180
tgcgttatcc cctgattctg tggataaccg tattaccgcc tttgagtgag ctgataccgc 12240
tcgccgcagc cgaacgaccg agcgcagcga gtcagtgagc gaggaagcta tggtgcactc 12300
tcagtacaat ctgctctgat gccgcatagt taagccagt 12339
<210> 2
<211> 12306
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> 3XFlag-Cas9-xTen-Tn5
<400> 2
atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc cgccaacacc 60
cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 120
cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 180
gctgcggtaa agctcatcag cgtggtcgtg cagcgattca cagatgtctg cctgttcatc 240
cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 300
catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 360
gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 420
tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 480
gcgggaccag agaaaaatca ctcagggtca atgccagccg aacgccagca agacgtagcc 540
cagcgcgtcg gccgccatgc cggcgataat ggcctgcttc tcgccgaaac gtttggtggc 600
gggaccagtg acgaaggctt gagcgagggc gtgcaagatt ccgaataccg caagcgacag 660
gccgatcatc gtcgcgctcc agcgaaagcg gtcctcgccg aaaatgaccc agagcgctgc 720
cggcacctgt cctacgagtt gcatgataaa gaagacagtc ataagtgcgg cgacgatagt 780
catgccccgc gcccaccgga aggagctgac tgggttgaag gctctcaagg gcatcggtcg 840
agatcccggt gcctaatgag tgagctaact tacattaatt gcgttgcgct cactgcccgc 900
tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag 960
aggcggtttg cgtattgggc gccagggtgg tttttctttt caccagtgag acgggcaaca 1020
gctgattgcc cttcaccgcc tggccctgag agagttgcag caagcggtcc acgctggttt 1080
gccccagcag gcgaaaatcc tgtttgatgg tggttaacgg cgggatataa catgagctgt 1140
cttcggtatc gtcgtatccc actaccgaga tatccgcacc aacgcgcagc ccggactcgg 1200
taatggcgcg cattgcgccc agcgccatct gatcgttggc aaccagcatc gcagtgggaa 1260
cgatgccctc attcagcatt tgcatggttt gttgaaaacc ggacatggca ctccagtcgc 1320
cttcccgttc cgctatcggc tgaatttgat tgcgagtgag atatttatgc cagccagcca 1380
gacgcagacg cgccgagaca gaacttaatg ggcccgctaa cagcgcgatt tgctggtgac 1440
ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc ttcatgggag aaaataatac 1500
tgttgatggg tgtctggtca gagacatcaa gaaataacgc cggaacatta gtgcaggcag 1560
cttccacagc aatggcatcc tggtcatcca gcggatagtt aatgatcagc ccactgacgc 1620
gttgcgcgag aagattgtgc accgccgctt tacaggcttc gacgccgctt cgttctacca 1680
tcgacaccac cacgctggca cccagttgat cggcgcgaga tttaatcgcc gcgacaattt 1740
gcgacggcgc gtgcagggcc agactggagg tggcaacgcc aatcagcaac gactgtttgc 1800
ccgccagttg ttgtgccacg cggttgggaa tgtaattcag ctccgccatc gccgcttcca 1860
ctttttcccg cgttttcgca gaaacgtggc tggcctggtt caccacgcgg gaaacggtct 1920
gataagagac accggcatac tctgcgacat cgtataacgt tactggtttc acattcacca 1980
ccctgaattg actctcttcc gggcgctatc atgccatacc gcgaaaggtt ttgcgccatt 2040
cgatggtgtc cgggatctcg acgctctccc ttatgcgact cctgcattag gaagcagccc 2100
agtagtaggt tgaggccgtt gagcaccgcc gccgcaagga atggtgcatg ccggcatgcc 2160
gccctttcgt cttcaagaat taattcccaa ttccccaggc atcaaataaa acgaaaggct 2220
cagtcgaaag actgggcctt tcgttttatc tgttgtttgt cggtgaacgc tctcctgagt 2280
aggacaaatc cgccgggagc ggatttgaac gttgcgaagc aacggcccgg agggtggcgg 2340
gcaggacgcc cgccataaac tgccaggaat taattcccca ggcatcaaat aaaacgaaag 2400
gctcagtcga aagactgggc ctttcgtttt atctgttgtt tgtcggtgaa cgctctcctg 2460
agtaggacaa atccgccggg agcggatttg aacgttgcga agcaacggcc cggagggtgg 2520
cgggcaggac gcccgccata aactgccagg aattaattcc ccaggcatca aataaaacga 2580
aaggctcagt cgaaagactg ggcctttcgt tttatctgtt gtttgtcggt gaacgctctc 2640
ctgagtagga caaatccgcc gggagcggat ttgaacgttg cgaagcaacg gcccggaggg 2700
tggcgggcag gacgcccgcc ataaactgcc aggaattaat tccccaggca tcaaataaaa 2760
cgaaaggctc agtcgaaaga ctgggccttt cgttttatct gttgtttgtc ggtgaacgct 2820
ctcctgagta ggacaaatcc gccgggagcg gatttgaacg ttgcgaagca acggcccgga 2880
gggtggcggg caggacgccc gccataaact gccaggaatt aattccccag gcatcaaata 2940
aaacgaaagg ctcagtcgaa agactgggcc tttcgtttta tctgttgttt gtcggtgaac 3000
gctctcctga gtaggacaaa tccgccggga gcggatttga acgttgcgaa gcaacggccc 3060
ggagggtggc gggcaggacg cccgccataa actgccagga attggggatc ggaattaatt 3120
cccggtttaa accggggatc tcgatcccgc gaaattaata cgactcacta taggggaatt 3180
gtgagcggat aacaattccc ctctagaaat aattttgttt aactttaaga aggagatata 3240
ccatgggtga ttacaaggat cacgatggcg attacaagga tcacgatatc gattacaagg 3300
atgatgatga taagatggat aaaaagtatt ctattggttt agctatcggc acaaatagcg 3360
tcggatgggc ggtgatcact gatgaatata aggttccgtc taaaaagttc aaggttctgg 3420
gaaatacaga ccgccacagt atcaaaaaaa atcttatagg ggctctttta tttgacagtg 3480
gagagacagc ggaagcgact cgtctcaaac ggacagctcg tagaaggtat acacgtcgga 3540
agaatcgtat ttgttatcta caggagattt tttcaaatga gatggcgaaa gtagatgata 3600
gtttctttca tcgacttgaa gagtcttttt tggtggaaga agacaagaag catgaacgtc 3660
atcctatttt tggaaatata gtagatgaag ttgcttatca tgagaaatat ccaactatct 3720
atcatctgcg aaaaaaattg gtagattcta ctgataaagc ggatttgcgc ttaatctatt 3780
tggccttagc gcatatgatt aagtttcgtg gtcatttttt gattgaggga gatttaaatc 3840
ctgataatag tgatgtggac aaactattta tccagttggt acaaacctac aatcaattat 3900
ttgaagaaaa ccctattaac gcaagtggag tagatgctaa agcgattctt tctgcacgat 3960
tgagtaaatc aagacgatta gaaaatctca ttgctcagct ccccggtgag aagaaaaatg 4020
gcttatttgg gaatctcatt gctttgtcat tgggtttgac ccctaatttt aaatcaaatt 4080
ttgatttggc agaagatgct aaattacagc tttcaaaaga tacttacgat gatgatttag 4140
ataatttatt ggcgcaaatt ggagatcaat atgctgattt gtttttggca gctaagaatt 4200
tatcagatgc tattttactt tcagatatcc taagagtaaa tactgaaata actaaggctc 4260
ccctatcagc atcaatgatt aaacgctacg atgaacatca tcaagacttg actcttttaa 4320
aagctttagt tcgacaacaa cttccagaaa agtataaaga aatctttttt gatcaatcaa 4380
aaaacggata tgcaggttat attgatgggg gagctagcca agaagaattt tataaattta 4440
tcaaaccaat tttagaaaaa atggatggta ctgaggaatt attggtgaaa ctaaatcgtg 4500
aagatttgct gcgcaagcaa cggacctttg acaacggctc tattccccat caaattcact 4560
tgggtgagct gcatgctatt ttgagaagac aagaagactt ttatccattt ttaaaagaca 4620
atcgtgagaa gattgaaaaa atcttgactt ttcgaattcc ttattatgtt ggtccattgg 4680
cgcgtggcaa tagtcgtttt gcatggatga ctcggaagtc tgaagaaaca attaccccat 4740
ggaattttga agaagttgtc gataaaggtg cttcagctca atcatttatt gaacgcatga 4800
caaactttga taaaaatctt ccaaatgaaa aagtactacc aaaacatagt ttgctttatg 4860
agtattttac ggtttataac gaattgacaa aggtcaaata tgttactgaa ggaatgcgaa 4920
aaccagcatt tctttcaggt gaacagaaga aagccattgt tgatttactc ttcaaaacaa 4980
atcgaaaagt aaccgttaag caattaaaag aagattattt caaaaaaata gaatgttttg 5040
atagtgttga aatttcagga gttgaagata gatttaatgc ttcattaggt acctaccatg 5100
atttgctaaa aattattaaa gataaagatt ttttggataa tgaagaaaat gaagatatct 5160
tagaggatat tgttttaaca ttgaccttat ttgaagatag ggagatgatt gaggaaagac 5220
ttaaaacata tgctcacctc tttgatgata aggtgatgaa acagcttaaa cgtcgccgtt 5280
atactggttg gggacgtttg tctcgaaaat tgattaatgg tattagggat aagcaatctg 5340
gcaaaacaat attagatttt ttgaaatcag atggttttgc caatcgcaat tttatgcagc 5400
tgatccatga tgatagtttg acatttaaag aagacattca aaaagcacaa gtgtctggac 5460
aaggcgatag tttacatgaa catattgcaa atttagctgg tagccctgct attaaaaaag 5520
gtattttaca gactgtaaaa gttgttgatg aattggtcaa agtaatgggg cggcataagc 5580
cagaaaatat cgttattgaa atggcacgtg aaaatcagac aactcaaaag ggccagaaaa 5640
attcgcgaga gcgtatgaaa cgaatcgaag aaggtatcaa agaattagga agtcagattc 5700
ttaaagagca tcctgttgaa aatactcaat tgcaaaatga aaagctctat ctctattatc 5760
tccaaaatgg aagagacatg tatgtggacc aagaattaga tattaatcgt ttaagtgatt 5820
atgatgtcga tgccattgtt ccacaaagtt tccttaaaga cgattcaata gacaataagg 5880
tcttaacgcg ttctgataaa aatcgtggta aatcggataa cgttccaagt gaagaagtag 5940
tcaaaaagat gaaaaactat tggagacaac ttctaaacgc caagttaatc actcaacgta 6000
agtttgataa tttaacgaaa gctgaacgtg gaggtttgag tgaacttgat aaagctggtt 6060
ttatcaaacg ccaattggtt gaaactcgcc aaatcactaa gcatgtggca caaattttgg 6120
atagtcgcat gaatactaaa tacgatgaaa atgataaact tattcgagag gttaaagtga 6180
ttaccttaaa atctaaatta gtttctgact tccgaaaaga tttccaattc tataaagtac 6240
gtgagattaa caattaccat catgcccatg atgcgtatct aaatgccgtc gttggaactg 6300
ctttgattaa gaaatatcca aaacttgaat cggagtttgt ctatggtgat tataaagttt 6360
atgatgttcg taaaatgatt gctaagtctg agcaagaaat aggcaaagca accgcaaaat 6420
atttctttta ctctaatatc atgaacttct tcaaaacaga aattacactt gcaaatggag 6480
agattcgcaa acgccctcta atcgaaacta atggggaaac tggagaaatt gtctgggata 6540
aagggcgaga ttttgccaca gtgcgcaaag tattgtccat gccccaagtc aatattgtca 6600
agaaaacaga agtacagaca ggcggattct ccaaggagtc aattttacca aaaagaaatt 6660
cggacaagct tattgctcgt aaaaaagact gggatccaaa aaaatatggt ggttttgata 6720
gtccaacggt agcttattca gtcctagtgg ttgctaaggt ggaaaaaggg aaatcgaaga 6780
agttaaaatc cgttaaagag ttactaggga tcacaattat ggaaagaagt tcctttgaaa 6840
aaaatccgat tgacttttta gaagctaaag gatataagga agttaaaaaa gacttaatca 6900
ttaaactacc taaatatagt ctttttgagt tagaaaacgg tcgtaaacgg atgctggcta 6960
gtgccggaga attacaaaaa ggaaatgagc tggctctgcc aagcaaatat gtgaattttt 7020
tatatttagc tagtcattat gaaaagttga agggtagtcc agaagataac gaacaaaaac 7080
aattgtttgt ggagcagcat aagcattatt tagatgagat tattgagcaa atcagtgaat 7140
tttctaagcg tgttatttta gcagatgcca atttagataa agttcttagt gcatataaca 7200
aacatagaga caaaccaata cgtgaacaag cagaaaatat tattcattta tttacgttga 7260
cgaatcttgg agctcccgct gcttttaaat attttgatac aacaattgat cgtaaacgat 7320
atacgtctac aaaagaagtt ttagatgcca ctcttatcca tcaatccatc actggtcttt 7380
atgaaacacg cattgatttg agtcagctag gaggtgacag cggttccgaa actcccggta 7440
catcagaaag cgcgaccccc gaaagcatga ttaccagtgc actgcatcgt gcggcggatt 7500
gggcgaaaag cgtgttttct agtgctgcgc tgggtgatcc gcgtcgtacc gcgcgtctgg 7560
tgaatgttgc ggcgcaactg gccaaatata gcggcaaaag cattaccatt agcagcgaag 7620
gcagcaaagc catgcaggaa ggcgcgtatc gttttattcg taatccgaac gtgagcgcgg 7680
aagcgattcg taaagcgggt gccatgcaga ccgtgaaact ggcccaggaa tttccggaac 7740
tgctggcaat tgaagatacc acctctctga gctatcgtca tcaggtggcg gaagaactgg 7800
gcaaactggg tagcattcag gataaaagcc gtggttggtg ggtgcatagc gtgctgctgc 7860
tggaagcgac cacctttcgt accgtgggcc tgctgcatca agaatggtgg atgcgtccgg 7920
atgatccggc ggatgcggat gaaaaagaaa gcggcaaatg gctggccgct gctgcaactt 7980
cgcgtctgag aatgggcagc atgatgagca acgtgattgc ggtgtgcgat cgtgaagcgg 8040
atattcatgc gtatctgcaa gataaactgg cccataacga acgttttgtg gtgcgtagca 8100
aacatccgcg taaagatgtg gaaagcggcc tgtatctgta tgatcacctg aaaaaccagc 8160
cggaactggg cggctatcag attagcattc cgcagaaagg cgtggtggat aaacgtggca 8220
aacgtaaaaa ccgtccggcg cgtaaagcga gcctgagcct gcgtagcggc cgtattaccc 8280
tgaaacaggg caacattacc ctgaacgcgg tgctggccga agaaattaat ccgccgaaag 8340
gcgaaacccc gctgaaatgg ctgctgctga ccagcgagcc ggtggaaagt ctggcccaag 8400
cgctgcgtgt gattgatatt tatacccatc gttggcgcat tgaagaattt cacaaagcgt 8460
ggaaaacggg tgcgggtgcg gaacgtcagc gtatggaaga accggataac ctggaacgta 8520
tggtgagcat tctgagcttt gtggcggtgc gtctgctgca actgcgtgaa tcttttactc 8580
cgccgcaagc actgcgtgcg cagggcctgc tgaaagaagc ggaacacgtt gaaagccaga 8640
gcgcggaaac cgtgctgacc ccggatgaat gccaactgct gggctatctg gataaaggca 8700
aacgcaaacg caaagaaaaa gcgggcagcc tgcaatgggc gtatatggcg attgcgcgtc 8760
tgggcggctt tatggatagc aaacgtaccg gcattgcgag ctggggtgcg ctgtgggaag 8820
gttgggaagc gctgcaaagc aaactggatg gctttctggc cgcgaaagac ctgatggcgc 8880
agggcattaa aatctgcatc acgggagatg cactagttgc cctacccgag ggcgagtcgg 8940
tacgcatcgc cgacatcgtg ccgggtgcgc ggcccaacag tgacaacgcc atcgacctga 9000
aagtccttga ccggcatggc aatcccgtgc tcgccgaccg gctgttccac tccggcgagc 9060
atccggtgta cacggtgcgt acggtcgaag gtctgcgtgt gacgggcacc gcgaaccacc 9120
cgttgttgtg tttggtcgac gtcgccgggg tgccgaccct gctgtggaag ctgatcgacg 9180
aaatcaagcc gggcgattac gcggtgattc aacgcagcgc attcagcgtc gactgtgcag 9240
gttttgcccg cgggaaaccc gaatttgcgc ccacaaccta cacagtcggc gtccctggac 9300
tggtgcgttt cttggaagca caccaccgag acccggacgc ccaagctatc gccgacgagc 9360
tgaccgacgg gcggttctac tacgcgaaag tcgccagtgt caccgacgcc ggcgtgcagc 9420
cggtgtatag ccttcgtgtc gacacggcag accacgcgtt tatcacgaac gggttcgtca 9480
gccacgctac tggcctcacc ggtctgaact caggcctcac gacaaatcct ggtgtatccg 9540
cttggcaggt caacacagct tatactgcgg gacaattggt cacatataac ggcaagacgt 9600
ataaatgttt gcagccccac acctccttgg caggatggga accatccaac gttcctgcct 9660
tgtggcagct tcaatgactg caggaagggg atccggctgc taacaaagcc cgaaaggaag 9720
ctgagttggc tgctgccacc gctgagcaat aactagcata accccttggg gcctctaaac 9780
gggtcttgag gggttttttg ctgaaaggag gaactatatc cggataacta cgtcaggtgg 9840
cacttttcgg ggaaatgtgc gcggaacccc tatttgttta tttttctaaa tacattcaaa 9900
tatgtatccg ctcatgagac aataaccctg ataaatgctt caataatatt gaaaaaggaa 9960
gagtatgagt attcaacatt tccgtgtcgc ccttattccc ttttttgcgg cattttgcct 10020
tcctgttttt gctcacccag aaacgctggt gaaagtaaaa gatgctgaag atcagttggg 10080
tgcacgagtg ggttacatcg aactggatct caacagcggt aagatccttg agagttttcg 10140
ccccgaagaa cgtttcccaa tgatgagcac ttttaaagtt ctgctatgtg gcgcggtatt 10200
atcccgtgtt gacgccgggc aagagcaact cggtcgccgc atacactatt ctcagaatga 10260
cttggttgag tactcaccag tcacagaaaa gcatcttacg gatggcatga cagtaagaga 10320
attatgcagt gctgccataa ccatgagtga taacactgcg gccaacttac ttctgacaac 10380
gatcggagga ccgaaggagc taaccgcttt tttgcacaac atgggggatc atgtaactcg 10440
ccttgatcgt tgggaaccgg agctgaatga agccatacca aacgacgagc gtgacaccac 10500
gatgcctgta gcaatggcaa caacgttgcg caaactatta actggcgaac tacttactct 10560
agcttcccgg caacaattaa tagactggat ggaggcggat aaagttgcag gaccacttct 10620
gcgctcggcc cttccggctg gctggtttat tgctgataaa tctggagccg gtgagcgtgg 10680
gtctcgcggt atcattgcag cactggggcc agatggtaag ccctcccgta tcgtagttat 10740
ctacacgacg gggagtcagg caactatgga tgaacgaaat agacagatcg ctgagatagg 10800
tgcctcactg attaagcatt ggtaactgtc agaccaagtt tactcatata tactttagat 10860
tgatttaccc cggttgataa tcagaaaagc cccaaaaaca ggaagattgt ataagcaaat 10920
atttaaattg taaacgttaa tattttgtta aaattcgcgt taaatttttg ttaaatcagc 10980
tcatttttta accaataggc cgaaatcggc aaaatccctt ataaatcaaa agaatagccc 11040
gagatagggt tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac 11100
tccaacgtca aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca 11160
cccaaatcaa gttttttggg gtcgaggtgc cgtaaagcac taaatcggaa ccctaaaggg 11220
agcccccgat ttagagcttg acggggaaag ccggcgaacg tggcgagaaa ggaagggaag 11280
aaagcgaaag gagcgggcgc tagggcgctg gcaagtgtag cggtcacgct gcgcgtaacc 11340
accacacccg ccgcgcttaa tgcgccgcta cagggcgcgt aaaaggatct aggtgaagat 11400
cctttttgat aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc 11460
agaccccgta gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg 11520
ctgcttgcaa acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct 11580
accaactctt tttccgaagg taactggctt cagcagagcg cagataccaa atactgtcct 11640
tctagtgtag ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct 11700
cgctctgcta atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg 11760
gttggactca agacgatagt taccggataa ggcgcagcgg tcgggctgaa cggggggttc 11820
gtgcacacag cccagcttgg agcgaacgac ctacaccgaa ctgagatacc tacagcgtga 11880
gctatgagaa agcgccacgc ttcccgaagg gagaaaggcg gacaggtatc cggtaagcgg 11940
cagggtcgga acaggagagc gcacgaggga gcttccaggg ggaaacgcct ggtatcttta 12000
tagtcctgtc gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg 12060
ggggcggagc ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg 12120
ctggcctttt gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat 12180
taccgccttt gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc 12240
agtgagcgag gaagctatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa 12300
gccagt 12306
<210> 3
<211> 11245
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> pET-Tn5-xTen-dCas9
<400> 3
caaggagatg gcgcccaaca gtcccccggc cacggggcct gccaccatac ccacgccgaa 60
acaagcgctc atgagcccga agtggcgagc ccgatcttcc ccatcggtga tgtcggcgat 120
ataggcgcca gcaaccgcac ctgtggcgcc ggtgatgccg gccacgatgc gtccggcgta 180
gaggatcgag atctcgatcc cgcgaaatta atacgactca ctatagggga attgtgagcg 240
gataacaatt cccctctaga aataattttg tttaacttta agaaggagat ataccatgat 300
taccagtgca ctgcatcgtg cggcggattg ggcgaaaagc gtgttttcta gtgctgcgct 360
gggtgatccg cgtcgtaccg cgcgtctggt gaatgttgcg gcgcaactgg ccaaatatag 420
cggcaaaagc attaccatta gcagcgaagg cagcaaagcc atgcaggaag gcgcgtatcg 480
ttttattcgt aatccgaacg tgagcgcgga agcgattcgt aaagcgggtg ccatgcagac 540
cgtgaaactg gcccaggaat ttccggaact gctggcaatt gaagatacca cctctctgag 600
ctatcgtcat caggtggcgg aagaactggg caaactgggt agcattcagg ataaaagccg 660
tggttggtgg gtgcatagcg tgctgctgct ggaagcgacc acctttcgta ccgtgggcct 720
gctgcatcaa gaatggtgga tgcgtccgga tgatccggcg gatgcggatg aaaaagaaag 780
cggcaaatgg ctggccgctg ctgcaacttc gcgtctgaga atgggcagca tgatgagcaa 840
cgtgattgcg gtgtgcgatc gtgaagcgga tattcatgcg tatctgcaag ataaactggc 900
ccataacgaa cgttttgtgg tgcgtagcaa acatccgcgt aaagatgtgg aaagcggcct 960
gtatctgtat gatcacctga aaaaccagcc ggaactgggc ggctatcaga ttagcattcc 1020
gcagaaaggc gtggtggata aacgtggcaa acgtaaaaac cgtccggcgc gtaaagcgag 1080
cctgagcctg cgtagcggcc gtattaccct gaaacagggc aacattaccc tgaacgcggt 1140
gctggccgaa gaaattaatc cgccgaaagg cgaaaccccg ctgaaatggc tgctgctgac 1200
cagcgagccg gtggaaagtc tggcccaagc gctgcgtgtg attgatattt atacccatcg 1260
ttggcgcatt gaagaatttc acaaagcgtg gaaaacgggt gcgggtgcgg aacgtcagcg 1320
tatggaagaa ccggataacc tggaacgtat ggtgagcatt ctgagctttg tggcggtgcg 1380
tctgctgcaa ctgcgtgaat cttttactcc gccgcaagca ctgcgtgcgc agggcctgct 1440
gaaagaagcg gaacacgttg aaagccagag cgcggaaacc gtgctgaccc cggatgaatg 1500
ccaactgctg ggctatctgg ataaaggcaa acgcaaacgc aaagaaaaag cgggcagcct 1560
gcaatgggcg tatatggcga ttgcgcgtct gggcggcttt atggatagca aacgtaccgg 1620
cattgcgagc tggggtgcgc tgtgggaagg ttgggaagcg ctgcaaagca aactggatgg 1680
ctttctggcc gcgaaagacc tgatggcgca gggcattaaa atcagcggtt ccgaaactcc 1740
cggtacatca gaaagcgcga cccccgaaag catggataaa aagtattcta ttggtttagc 1800
tatcggcaca aatagcgtcg gatgggcggt gatcactgat gaatataagg ttccgtctaa 1860
aaagttcaag gttctgggaa atacagaccg ccacagtatc aaaaaaaatc ttataggggc 1920
tcttttattt gacagtggag agacagcgga agcgactcgt ctcaaacgga cagctcgtag 1980
aaggtataca cgtcggaaga atcgtatttg ttatctacag gagatttttt caaatgagat 2040
ggcgaaagta gatgatagtt tctttcatcg acttgaagag tcttttttgg tggaagaaga 2100
caagaagcat gaacgtcatc ctatttttgg aaatatagta gatgaagttg cttatcatga 2160
gaaatatcca actatctatc atctgcgaaa aaaattggta gattctactg ataaagcgga 2220
tttgcgctta atctatttgg ccttagcgca tatgattaag tttcgtggtc attttttgat 2280
tgagggagat ttaaatcctg ataatagtga tgtggacaaa ctatttatcc agttggtaca 2340
aacctacaat caattatttg aagaaaaccc tattaacgca agtggagtag atgctaaagc 2400
gattctttct gcacgattga gtaaatcaag acgattagaa aatctcattg ctcagctccc 2460
cggtgagaag aaaaatggct tatttgggaa tctcattgct ttgtcattgg gtttgacccc 2520
taattttaaa tcaaattttg atttggcaga agatgctaaa ttacagcttt caaaagatac 2580
ttacgatgat gatttagata atttattggc gcaaattgga gatcaatatg ctgatttgtt 2640
tttggcagct aagaatttat cagatgctat tttactttca gatatcctaa gagtaaatac 2700
tgaaataact aaggctcccc tatcagcatc aatgattaaa cgctacgatg aacatcatca 2760
agacttgact cttttaaaag ctttagttcg acaacaactt ccagaaaagt ataaagaaat 2820
cttttttgat caatcaaaaa acggatatgc aggttatatt gatgggggag ctagccaaga 2880
agaattttat aaatttatca aaccaatttt agaaaaaatg gatggtactg aggaattatt 2940
ggtgaaacta aatcgtgaag atttgctgcg caagcaacgg acctttgaca acggctctat 3000
tccccatcaa attcacttgg gtgagctgca tgctattttg agaagacaag aagactttta 3060
tccattttta aaagacaatc gtgagaagat tgaaaaaatc ttgacttttc gaattcctta 3120
ttatgttggt ccattggcgc gtggcaatag tcgttttgca tggatgactc ggaagtctga 3180
agaaacaatt accccatgga attttgaaga agttgtcgat aaaggtgctt cagctcaatc 3240
atttattgaa cgcatgacaa actttgataa aaatcttcca aatgaaaaag tactaccaaa 3300
acatagtttg ctttatgagt attttacggt ttataacgaa ttgacaaagg tcaaatatgt 3360
tactgaagga atgcgaaaac cagcatttct ttcaggtgaa cagaagaaag ccattgttga 3420
tttactcttc aaaacaaatc gaaaagtaac cgttaagcaa ttaaaagaag attatttcaa 3480
aaaaatagaa tgttttgata gtgttgaaat ttcaggagtt gaagatagat ttaatgcttc 3540
attaggtacc taccatgatt tgctaaaaat tattaaagat aaagattttt tggataatga 3600
agaaaatgaa gatatcttag aggatattgt tttaacattg accttatttg aagataggga 3660
gatgattgag gaaagactta aaacatatgc tcacctcttt gatgataagg tgatgaaaca 3720
gcttaaacgt cgccgttata ctggttgggg acgtttgtct cgaaaattga ttaatggtat 3780
tagggataag caatctggca aaacaatatt agattttttg aaatcagatg gttttgccaa 3840
tcgcaatttt atgcagctga tccatgatga tagtttgaca tttaaagaag acattcaaaa 3900
agcacaagtg tctggacaag gcgatagttt acatgaacat attgcaaatt tagctggtag 3960
ccctgctatt aaaaaaggta ttttacagac tgtaaaagtt gttgatgaat tggtcaaagt 4020
aatggggcgg cataagccag aaaatatcgt tattgaaatg gcacgtgaaa atcagacaac 4080
tcaaaagggc cagaaaaatt cgcgagagcg tatgaaacga atcgaagaag gtatcaaaga 4140
attaggaagt cagattctta aagagcatcc tgttgaaaat actcaattgc aaaatgaaaa 4200
gctctatctc tattatctcc aaaatggaag agacatgtat gtggaccaag aattagatat 4260
taatcgttta agtgattatg atgtcgatgc cattgttcca caaagtttcc ttaaagacga 4320
ttcaatagac aataaggtct taacgcgttc tgataaaaat cgtggtaaat cggataacgt 4380
tccaagtgaa gaagtagtca aaaagatgaa aaactattgg agacaacttc taaacgccaa 4440
gttaatcact caacgtaagt ttgataattt aacgaaagct gaacgtggag gtttgagtga 4500
acttgataaa gctggtttta tcaaacgcca attggttgaa actcgccaaa tcactaagca 4560
tgtggcacaa attttggata gtcgcatgaa tactaaatac gatgaaaatg ataaacttat 4620
tcgagaggtt aaagtgatta ccttaaaatc taaattagtt tctgacttcc gaaaagattt 4680
ccaattctat aaagtacgtg agattaacaa ttaccatcat gcccatgatg cgtatctaaa 4740
tgccgtcgtt ggaactgctt tgattaagaa atatccaaaa cttgaatcgg agtttgtcta 4800
tggtgattat aaagtttatg atgttcgtaa aatgattgct aagtctgagc aagaaatagg 4860
caaagcaacc gcaaaatatt tcttttactc taatatcatg aacttcttca aaacagaaat 4920
tacacttgca aatggagaga ttcgcaaacg ccctctaatc gaaactaatg gggaaactgg 4980
agaaattgtc tgggataaag ggcgagattt tgccacagtg cgcaaagtat tgtccatgcc 5040
ccaagtcaat attgtcaaga aaacagaagt acagacaggc ggattctcca aggagtcaat 5100
tttaccaaaa agaaattcgg acaagcttat tgctcgtaaa aaagactggg atccaaaaaa 5160
atatggtggt tttgatagtc caacggtagc ttattcagtc ctagtggttg ctaaggtgga 5220
aaaagggaaa tcgaagaagt taaaatccgt taaagagtta ctagggatca caattatgga 5280
aagaagttcc tttgaaaaaa atccgattga ctttttagaa gctaaaggat ataaggaagt 5340
taaaaaagac ttaatcatta aactacctaa atatagtctt tttgagttag aaaacggtcg 5400
taaacggatg ctggctagtg ccggagaatt acaaaaagga aatgagctgg ctctgccaag 5460
caaatatgtg aattttttat atttagctag tcattatgaa aagttgaagg gtagtccaga 5520
agataacgaa caaaaacaat tgtttgtgga gcagcataag cattatttag atgagattat 5580
tgagcaaatc agtgaatttt ctaagcgtgt tattttagca gatgccaatt tagataaagt 5640
tcttagtgca tataacaaac atagagacaa accaatacgt gaacaagcag aaaatattat 5700
tcatttattt acgttgacga atcttggagc tcccgctgct tttaaatatt ttgatacaac 5760
aattgatcgt aaacgatata cgtctacaaa agaagtttta gatgccactc ttatccatca 5820
atccatcact ggtctttatg aaacacgcat tgatttgagt cagctaggag gtgaccacca 5880
ccaccaccac cactgagatc cggctgctaa caaagcccga aaggaagctg agttggctgc 5940
tgccaccgct gagcaataac tagcataacc ccttggggcc tctaaacggg tcttgagggg 6000
ttttttgctg aaaggaggaa ctatatccgg atatcccgca agaggcccgg cagtaccggc 6060
ataaccaagc ctatgcctac agcatccagg gtgacggtgc cgaggatgac gatgagcgca 6120
ttgttagatt tcatacacgg tgcctgactg cgttagcaat ttaactgtga taaactaccg 6180
cattaaagct agcttatcga tgataagctg tcaaacatga gaattaattc ttgaagacga 6240
aagggcctcg tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag 6300
acgtcaggtg gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa 6360
atacattcaa atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat 6420
tgaaaaagga agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg 6480
gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa 6540
gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt 6600
gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt 6660
ggcgcggtat tatcccgtgt tgacgccggg caagagcaac tcggtcgccg catacactat 6720
tctcagaatg acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg 6780
acagtaagag aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta 6840
cttctgacaa cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat 6900
catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag 6960
cgtgacacca cgatgcctgc agcaatggca acaacgttgc gcaaactatt aactggcgaa 7020
ctacttactc tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca 7080
ggaccacttc tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc 7140
ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt 7200
atcgtagtta tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc 7260
gctgagatag gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat 7320
atactttaga ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt 7380
tttgataatc tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac 7440
cccgtagaaa agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc 7500
ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca 7560
actctttttc cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgtccttcta 7620
gtgtagccgt agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct 7680
ctgctaatcc tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg 7740
gactcaagac gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc 7800
acacagccca gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta 7860
tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg 7920
gtcggaacag gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt 7980
cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg 8040
cggagcctat ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg 8100
ccttttgctc acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc 8160
gcctttgagt gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg 8220
agcgaggaag cggaagagcg cctgatgcgg tattttctcc ttacgcatct gtgcggtatt 8280
tcacaccgca atggtgcact ctcagtacaa tctgctctga tgccgcatag ttaagccagt 8340
atacactccg ctatcgctac gtgactgggt catggctgcg ccccgacacc caccaacacc 8400
cgctgacgcg ccctgacggg cttgtctgct cccggcatcc gcttacagac aagctgtgac 8460
cgtctccggg agctgcatgt gtcagaggtt ttcaccgtca tcaccgaaac gcgcgaggca 8520
gctgcggtaa agctcatcag cgtggtcgtg aagcgattca cagatgtctg cctgttcatc 8580
cgcgtccagc tcgttgagtt tctccagaag cgttaatgtc tggcttctga taaagcgggc 8640
catgttaagg gcggtttttt cctgtttggt cactgatgcc tccgtgtaag ggggatttct 8700
gttcatgggg gtaatgatac cgatgaaacg agagaggatg ctcacgatac gggttactga 8760
tgatgaacat gcccggttac tggaacgttg tgagggtaaa caactggcgg tatggatgcg 8820
gcgggaccag agaaaaatca ctcagggtca atgccagcgc ttcgttaata cagatgtagg 8880
tgttccacag ggtagccagc agcatcctgc gatgcagatc cggaacataa tggtgcaggg 8940
cgctgacttc cgcgtttcca gactttacga aacacggaaa ccgaagacca ttcatgttgt 9000
tgctcaggtc gcagacgttt tgcagcagca gtcgcttcac gttcgctcgc gtatcggtga 9060
ttcattctgc taaccagtaa ggcaaccccg ccagcctagc cgggtcctca acgacaggag 9120
cacgatcatg cgcacccgtg gccaggaccc aacgctgccc gagatgcgcc gcgtgcggct 9180
gctggagatg gcggacgcga tggatatgtt ctgccaaggg ttggtttgcg cattcacagt 9240
tctccgcaag aattgattgg ctccaattct tggagtggtg aatccgttag cgaggtgccg 9300
ccggcttcca ttcaggtcga ggtggcccgg ctccatgcac cgcgacgcaa cgcggggagg 9360
cagacaaggt atagggcggc gcctacaatc catgccaacc cgttccatgt gctcgccgag 9420
gcggcataaa tcgccgtgac gatcagcggt ccaatgatcg aagttaggct ggtaagagcc 9480
gcgagcgatc cttgaagctg tccctgatgg tcgtcatcta cctgcctgga cagcatggcc 9540
tgcaacgcgg gcatcccgat gccgccggaa gcgagaagaa tcataatggg gaaggccatc 9600
cagcctcgcg tcgcgaacgc cagcaagacg tagcccagcg cgtcggccgc catgccggcg 9660
ataatggcct gcttctcgcc gaaacgtttg gtggcgggac cagtgacgaa ggcttgagcg 9720
agggcgtgca agattccgaa taccgcaagc gacaggccga tcatcgtcgc gctccagcga 9780
aagcggtcct cgccgaaaat gacccagagc gctgccggca cctgtcctac gagttgcatg 9840
ataaagaaga cagtcataag tgcggcgacg atagtcatgc cccgcgccca ccggaaggag 9900
ctgactgggt tgaaggctct caagggcatc ggtcgagatc ccggtgccta atgagtgagc 9960
taacttacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 10020
cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgccag 10080
ggtggttttt cttttcacca gtgagacggg caacagctga ttgcccttca ccgcctggcc 10140
ctgagagagt tgcagcaagc ggtccacgct ggtttgcccc agcaggcgaa aatcctgttt 10200
gatggtggtt aacggcggga tataacatga gctgtcttcg gtatcgtcgt atcccactac 10260
cgagatatcc gcaccaacgc gcagcccgga ctcggtaatg gcgcgcattg cgcccagcgc 10320
catctgatcg ttggcaacca gcatcgcagt gggaacgatg ccctcattca gcatttgcat 10380
ggtttgttga aaaccggaca tggcactcca gtcgccttcc cgttccgcta tcggctgaat 10440
ttgattgcga gtgagatatt tatgccagcc agccagacgc agacgcgccg agacagaact 10500
taatgggccc gctaacagcg cgatttgctg gtgacccaat gcgaccagat gctccacgcc 10560
cagtcgcgta ccgtcttcat gggagaaaat aatactgttg atgggtgtct ggtcagagac 10620
atcaagaaat aacgccggaa cattagtgca ggcagcttcc acagcaatgg catcctggtc 10680
atccagcgga tagttaatga tcagcccact gacgcgttgc gcgagaagat tgtgcaccgc 10740
cgctttacag gcttcgacgc cgcttcgttc taccatcgac accaccacgc tggcacccag 10800
ttgatcggcg cgagatttaa tcgccgcgac aatttgcgac ggcgcgtgca gggccagact 10860
ggaggtggca acgccaatca gcaacgactg tttgcccgcc agttgttgtg ccacgcggtt 10920
gggaatgtaa ttcagctccg ccatcgccgc ttccactttt tcccgcgttt tcgcagaaac 10980
gtggctggcc tggttcacca cgcgggaaac ggtctgataa gagacaccgg catactctgc 11040
gacatcgtat aacgttactg gtttcacatt caccaccctg aattgactct cttccgggcg 11100
ctatcatgcc ataccgcgaa aggttttgcg ccattcgatg gtgtccggga tctcgacgct 11160
ctcccttatg cgactcctgc attaggaagc agcccagtag taggttgagg ccgttgagca 11220
ccgccgccgc aaggaatggt gcatg 11245
<210> 4
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AF
<400> 4
gaaattaatg gtttaagctt 20
<210> 5
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AA
<400> 5
aggtgagcaa gatttccatt 20
<210> 6
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AC
<400> 6
tcaaggacat attctcctgt 20
<210> 7
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AJ
<400> 7
aatgtgctcc ataaggaatt 20
<210> 8
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AG
<400> 8
aactttcttc ttctgaggag 20
<210> 9
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.CVSJ0588.AB
<400> 9
aagatcacac ctatgggaaa 20
<210> 10
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RNXS0617.AA
<400> 10
gctattttga ccatttcaat 20
<210> 11
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RNXS0617.AB
<400> 11
cggaggacaa atccatacca 20
<210> 12
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RNXS0617.AC
<400> 12
cagtttatcg ttattaccaa 20
<210> 13
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RNXS0617.AD
<400> 13
acttatacca tgctgaccat 20
<210> 14
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RZZJ8230.AA
<400> 14
ggacaacacc ctgaccatcc 20
<210> 15
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RZZJ8230.AC
<400> 15
gtctgacctc gactccatcc 20
<210> 16
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> CD.Cas9.RZZJ8230.AD
<400> 16
gaacatcaaa ggtctgactc 20
<210> 17
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> EXT1-1
<400> 17
atatcacgtc cataacgggg 20
<210> 18
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> EXT1-4
<400> 18
cacttggcct gactacaccg 20
<210> 19
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> BCL9-2
<400> 19
gggttggcat cggaaccacg 20
<210> 20
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> BCL9-4
<400> 20
gatgccctct ccaaatgccg 20
<210> 21
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> HOXA13-1
<400> 21
gtagccatag ggcagcgccg 20
<210> 22
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> HOXA13-4
<400> 22
tttctctacg acaacggcgg 20
<210> 23
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> HOXD11-1
<400> 23
gggcttcgac cagttctacg 20
<210> 24
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> HOXD11-3
<400> 24
gggctacgct ccctactacg 20
<210> 25
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> OLIG2-1
<400> 25
actggtgagc gagatctacg 20
<210> 26
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> OLIG2-2
<400> 26
gcacgccgca catcaccccg 20
<210> 27
<211> 33
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> adapter A
<400> 27
tcgtcggcag cgtcagatgt gtataagaga cag 33
<210> 28
<211> 34
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> adapter B
<400> 28
gtctcgtggg ctcggagatg tgtataagag acag 34
<210> 29
<211> 19
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> mosaic Ends
<400> 29
agatgtgtat aagagacag 19
<210> 30
<211> 15
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Illumina sequence primer site-SP 2
<400> 30
gtctcgtggg ctcgg 15
<210> 31
<211> 14
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Illumina sequence primer site-SP 1
<400> 31
tcgtcggcag cgtc 14
<210> 32
<211> 10
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> (G)n
<220>
<221> Variant
<222> (2)..(10)
<223> may or may not be present
<400> 32
Gly Gly Gly Gly Gly Gly Gly Gly Gly Gly
1 5 10
<210> 33
<211> 20
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> (GS)n
<220>
<221> Variant
<222> (3)..(20)
<223> may or may not be present
<400> 33
Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser Gly Ser
1 5 10 15
Gly Ser Gly Ser
20
<210> 34
<211> 50
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> (GSGGS)n
<220>
<221> Variant
<222> (6)..(50)
<223> may or may not be present
<400> 34
Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly
1 5 10 15
Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser
20 25 30
Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly Gly Ser Gly Ser Gly
35 40 45
Gly Ser
50
<210> 35
<211> 50
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> (G4S)n
<220>
<221> Variant
<222> (6)..(50)
<223> may or may not be present
<400> 35
Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly
1 5 10 15
Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly
20 25 30
Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly
35 40 45
Gly Ser
50
<210> 36
<211> 40
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> (GGGS)n
<220>
<221> Variant
<222> (5)..(40)
<223> may or may not be present
<400> 36
Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser
1 5 10 15
Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser
20 25 30
Gly Gly Gly Ser Gly Gly Gly Ser
35 40
<210> 37
<211> 4
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 1
<400> 37
Gly Gly Ser Gly
1
<210> 38
<211> 5
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 2
<400> 38
Gly Gly Ser Gly Gly
1 5
<210> 39
<211> 5
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 3
<400> 39
Gly Ser Gly Ser Gly
1 5
<210> 40
<211> 5
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 4
<400> 40
Gly Ser Gly Gly Gly
1 5
<210> 41
<211> 5
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 5
<400> 41
Gly Gly Gly Ser Gly
1 5
<210> 42
<211> 5
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> Glycine/serine spacer 6
<400> 42
Gly Ser Ser Ser Gly
1 5
<210> 43
<211> 12
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> XTEN linker 1
<400> 43
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala
1 5 10
<210> 44
<211> 16
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> XTEN linker 2
<400> 44
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
1 5 10 15
<210> 45
<211> 21
<212> PRT
<213> Artificial sequence (Artificial Sequence)
<220>
<223> XTEN linker 3
<400> 45
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Gly
1 5 10 15
Gly Ser Gly Gly Ser
20
Claims (70)
1. A composition comprising more than one protein complex, wherein each of the more than one protein complex comprises a transposome and a programmable DNA binding unit capable of specifically binding to a binding site on target double-stranded DNA (dsDNA), wherein the transposome comprises a transposase, a first adapter, and a second adapter, and wherein the binding site of each of the more than one protein complexes is different from each other.
2. The composition of claim 1, wherein at least two of the more than one protein complexes comprise the same transposomes.
3. The composition of claim 1, wherein the more than one protein complex all comprise the same transposomes.
4. The composition of any one of claims 1-3, wherein the more than one protein complex all comprise the same transposase.
5. The composition of any one of claims 1-4, wherein the first adaptor and the second adaptor in the same transposome are the same.
6. The composition of any one of claims 1-5, wherein the first adapter, the second adapter, or both in different transposomes are different.
7. The composition of any one of claims 1-6, wherein the first adapter, the second adapter, or both are dsDNA or RNA/DNA duplex.
8. The composition of any one of claims 1-7, wherein the adapter is about 3-200 base pairs in length.
9. The composition of any one of claims 1-8, wherein the first adapter, the second adapter, or both are sequencing adapters.
10. The composition of claim 9, wherein the sequencing adapter comprises a P5 or P7 primer sequence.
11. The composition of any one of claims 1-10, wherein the binding sites of at least two of the more than one protein complexes are located on the same target dsDNA.
12. The composition of claim 11, wherein the binding sites of at least two of the more than one protein complexes are about 1-50000 nucleotides apart on the same target dsDNA.
13. The composition of claim 11, wherein the distance between the binding sites of one pair of the more than one protein complexes is substantially the same as the distance between the binding sites of another pair of the more than one protein complexes.
14. The composition of claim 11, wherein the distance between the binding sites of one pair of the more than one protein complex is different from the distance between the binding sites of another pair of the more than one protein complex.
15. The composition of any one of claims 1-11, wherein the binding sites of at least two of the more than one protein complexes are located on different strands of target dsDNA.
16. The composition of any one of claims 1-15, wherein at least two of the more than one protein complexes are capable of specifically binding to different target dsDNA.
17. The composition of any one of claims 1-15, wherein the more than one protein complex is capable of specifically binding between about 2-5000 targets dsDNA.
18. The composition of any one of claims 1-17, wherein the transposase is a Tn5 transposase, a Tn7 transposase, a mariner Tc 1-like transposase, a Himar1C9 transposase, or a sleeping beauty transposase.
19. The composition of any one of claims 1-18, wherein the transposase is a superactive transposase.
20. The composition of any one of claims 1-19, wherein the programmable DNA binding unit comprises a nuclease-deficient CRISPR-associated protein (dCAS protein) and a guide RNA capable of specifically binding to a binding site of the target dsDNA.
21. The composition of claim 20, wherein the transposome is associated with the programmable DNA binding unit through a linker that connects the transposase and the dCAS protein.
22. The composition of claim 21, wherein the linker comprises a peptide linker, a chemical linker, or both.
23. The composition of claim 20, wherein the transposase is present as a fusion protein comprising the dCAS protein.
24. The composition of any one of claims 20-23, wherein the dCAS protein is dCAS9, dCAS12, dCAS13, dCAS14, or SpRY dCAS.
25. The composition of claim 24, wherein the dCAS13 protein is dCAS13a, dCAS13b, dCAS13c, or dCAS13d.
26. The composition of any one of claims 1-19, wherein the programmable DNA binding unit comprises a protein component capable of specifically binding to a binding site on the target dsDNA, wherein the protein component comprises an endonuclease-deficient Zinc Finger Nuclease (ZFN), an endonuclease-deficient transcription activator-like effector nuclease (TALEN), an Argonaute protein, an endonuclease-deficient meganuclease, a recombinase, or a combination thereof.
27. The composition of claim 26, wherein the transposomes are associated with the programmable DNA binding unit through a linker connecting the transposase and the protein component.
28. The composition of claim 27, wherein the linker comprises a peptide linker, a chemical linker, or both.
29. The composition of claim 28, wherein the peptide linker comprises more than one glycine, serine, threonine, alanine, lysine, glutamine, or a combination thereof.
30. The composition of claim 29, wherein the peptide linker comprises a GS linker.
31. The composition of claim 28, wherein the peptide linker is an XTEN linker.
32. The composition of claim 26, wherein the protein component is present as a fusion protein comprising the transposase.
33. A reaction mixture comprising
The composition of any one of claims 1-32; and
a sample nucleic acid suspected of comprising one or more target dsDNA.
34. The reaction mixture of claim 33, further comprising a DNA polymerase, dntps, or a combination thereof.
35. The reaction mixture of any one of claims 33-34, wherein the adapter is covalently attached to the target dsDNA or fragment thereof.
36. The reaction mixture of any one of claims 33-35, comprising more than one dsDNA fragment, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complexes at each terminus, respectively.
37. The reaction mixture of any one of claims 33-36, wherein the sample nucleic acid comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.
38. The reaction mixture of any one of claims 33-37, wherein the target dsDNA is genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof.
39. The reaction mixture of any one of claims 33-38, wherein the sample nucleic acid is from a biological sample, a clinical sample, an environmental sample, or a combination thereof.
40. The reaction mixture of claim 39, wherein the biological sample comprises stool, sputum, peripheral blood, plasma, serum, lymph nodes, respiratory tissue, exudates, body fluids, or combinations thereof.
41. A method of tagging nucleic acids, comprising:
contacting the composition of any one of claims 1-32 with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture; and
incubating the reaction mixture to generate more than one dsDNA fragments, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complexes at each end, respectively.
42. A method for generating a sequencing library, comprising:
contacting the composition of any one of claims 1-32 with a sample suspected of containing more than one target double-stranded DNA (dsDNA) to form a reaction mixture;
Incubating the reaction mixture to generate more than one dsDNA fragments, each fragment comprising the first adaptor and the second adaptor of one of the more than one protein complex at each terminus, respectively; and
amplifying the more than one dsDNA fragment with primers capable of binding to the adaptors at the ends of the dsDNA fragments to generate a sequencing library.
43. The method of claim 42, wherein each of the primers is about 5-80 nucleotides in length.
44. The method of any one of claims 42-43, wherein amplifying the more than one dsDNA fragment with the primer is performed using Polymerase Chain Reaction (PCR).
45. The method of claim 44, wherein the PCR is loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase Polymerase Amplification (RPA), strand Displacement Amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription-mediated amplification (TMA), nicking Enzyme Amplification Reaction (NEAR), rolling Circle Amplification (RCA), multiple Displacement Amplification (MDA), branched amplification (RAM), circular helicase-dependent amplification (cHDA), single Primer Isothermal Amplification (SPIA), signal-mediated RNA amplification technology (SMART), self-sustained sequence replication (3 SR), genomic index amplification reaction (GEAR), or Isothermal Multiple Displacement Amplification (IMDA).
46. The method of claim 44, wherein the PCR is real-time PCR or quantitative real-time PCR (QRT-PCR).
47. The method of any one of claims 41-46, wherein the sample comprises eukaryotic DNA, bacterial DNA, viral DNA, fungal DNA, protozoan DNA, or a combination thereof.
48. The method of any one of claims 41-47, wherein the more than one target dsDNA comprises genomic DNA, mitochondrial DNA, plasmid DNA, or a combination thereof.
49. The method of any one of claims 41-48, wherein the sample is or is derived from a biological sample, a clinical sample, an environmental sample, or a combination thereof.
50. The method of any one of claims 41-49, wherein the more than one target dsDNA comprises DNA from at least 2 different organisms.
51. The method of any one of claims 41-50, wherein the more than one target dsDNA comprises DNA from at least 2 different genes.
52. The method of any one of claims 41-51, further comprising producing said more than one target dsDNA from more than one target RNA with reverse transcriptase.
53. The method of any one of claims 41-51, wherein the more than one target dsDNA comprises a target dsDNA produced from a target RNA with a reverse transcriptase.
54. The method of any one of claims 41-53, wherein the more than one target dsDNA comprises a genetic feature of interest.
55. The method of claim 54, wherein the genetic feature of interest comprises one or more mutations of interest.
56. The method of claim 55, wherein the one or more mutations of interest comprise a point mutation, an inversion, a deletion, an insertion, a translocation, a replication, a copy number variation, or a combination thereof.
57. The method of claim 55, wherein the one or more mutations of interest comprise nucleotide substitutions, deletions, insertions, or combinations thereof.
58. The method of any one of claims 54-57, wherein the genetic characteristic of interest is indicative of pathogen identification, antibiotic resistance, or antibiotic susceptibility of the target dsDNA-derived organism.
59. The method of any one of claims 54-57, wherein the genetic feature of interest is indicative of a cancer status of the target dsDNA-derived organism.
60. The method of any one of claims 54-57, wherein the genetic characteristic of interest is indicative of a state of a genetic disease of the target dsDNA-derived organism.
61. The method of claim 60, wherein the genetic disease is a monogenic disorder.
62. The method of claim 60, wherein the genetic disorder is cystic fibrosis, huntington's disease, sickle cell anemia, hemophilia, duchenne muscular dystrophy, thalassemia, fragile X syndrome, familial hypercholesterolemia, polycystic kidney disease, type I neurofibromatosis, hereditary spherical erythromatosis, ma Fanzeng syndrome, tay-saxox disease, phenylketonuria, mucopolysaccharidosis, lysosomal acid lipase deficiency, glycogen storage disease, galactosylation, or hemochromatosis.
63. The method of any one of claims 41-62, wherein contacting the more than one target dsDNA with the more than one protein complex pair occurs at about 25 ℃ to about 80 ℃.
64. The method of any one of claims 41-63, wherein incubating the reaction mixture comprises incubating the reaction mixture at about 37 ℃ to about 55 ℃.
65. The method of any one of claims 41-64, wherein the more than one protein complex pair and the more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 2,000:1.
66. The method of any one of claims 41-64, wherein the more than one protein complex pair and the more than one target dsDNA are present in the reaction mixture at a molecular ratio of about 2:1 to about 200:1.
67. The method of any one of claims 41-66, further comprising labeling one or both ends of one or more of the more than one dsDNA fragments.
68. The method of any one of claims 41-66, comprising labeling both ends of one or more of said more than one dsDNA fragments differently.
69. The method of any one of claims 67-68, wherein said labeling comprises labeling with an anionic label, a cationic label, a neutral label, an electrochemical label, a protein label, a fluorescent label, a magnetic label, or a combination thereof.
70. The method of any one of claims 67-69, further comprising enriching for the labeled dsDNA fragments, capturing the labeled dsDNA fragments, isolating the labeled dsDNA fragments, and/or visualizing the labeled dsDNA fragments.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63/189,032 | 2021-05-14 | ||
US202163243443P | 2021-09-13 | 2021-09-13 | |
US63/243,443 | 2021-09-13 | ||
PCT/US2022/029057 WO2022241158A1 (en) | 2021-05-14 | 2022-05-12 | Methods for making libraries for nucleic acid sequencing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117321194A true CN117321194A (en) | 2023-12-29 |
Family
ID=89285296
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280035174.4A Pending CN117321194A (en) | 2021-05-14 | 2022-05-12 | Preparation method of nucleic acid sequencing library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117321194A (en) |
-
2022
- 2022-05-12 CN CN202280035174.4A patent/CN117321194A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102339365B1 (en) | Chimeric genome engineering molecules and methods | |
JP2023145691A (en) | Nuclease systems for genetic engineering | |
JP7502537B2 (en) | Enzymes with RUVC domains | |
KR102425438B1 (en) | Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq) | |
KR20210149060A (en) | RNA-induced DNA integration using TN7-like transposons | |
KR20220113988A (en) | new enzyme | |
JP5116481B2 (en) | A method for simplifying microbial nucleic acids by chemical modification of cytosine | |
IL258699A (en) | Methods for genome assembly, haplotype phasing, and target independent nucleic acid detection | |
CN110719957B (en) | Methods and kits for targeted enrichment of nucleic acids | |
JP2013531983A (en) | Nucleic acids for multiplex biological detection and methods of use and production thereof | |
KR20220034109A (en) | Reagents and methods for cloning, transcription and translation of semisynthetic organisms | |
KR20130020842A (en) | High throughput screening of genetically modified photosynthetic organisms | |
EP3098324A1 (en) | Compositions and methods for preparing sequencing libraries | |
WO2023114090A2 (en) | Signal boost cascade assay | |
WO2023287669A2 (en) | Nuclease cascade assay | |
CN108026150A (en) | Stem rust of wheat resistant gene and application method | |
CN113166798A (en) | Targeted enrichment by endonuclease protection | |
CN114729343A (en) | Novel class 2 type II and type V CRISPR-CAS RNA-guided endonucleases | |
JPH09289900A (en) | Amplifier of nucleic acid of cytomegalovirus (cmv) utilizing nucleic sequence (beta 2.7) and reagent kit for detection | |
CN114836459B (en) | Cytosine base editing system and application thereof | |
TW202309291A (en) | Compositions and methods for indoor air remediation | |
WO2020169221A1 (en) | Production of plant-based active substances (e.g. cannabinoids) by recombinant microorganisms | |
KR20230054457A (en) | Systems and methods for translocating cargo nucleotide sequences | |
JP2023528715A (en) | RNA detection and transcription-dependent editing using reprogrammed tracrRNA | |
EP2935617A1 (en) | Probability-directed isolation of nucleotide sequences (pins) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |