WO2024038168A1 - Novel rna-guided nucleases and nucleic acid targeting systems comprising such - Google Patents
Novel rna-guided nucleases and nucleic acid targeting systems comprising such Download PDFInfo
- Publication number
- WO2024038168A1 WO2024038168A1 PCT/EP2023/072745 EP2023072745W WO2024038168A1 WO 2024038168 A1 WO2024038168 A1 WO 2024038168A1 EP 2023072745 W EP2023072745 W EP 2023072745W WO 2024038168 A1 WO2024038168 A1 WO 2024038168A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- nucleic acid
- rgn
- target
- targeting system
- Prior art date
Links
- 150000007523 nucleic acids Chemical class 0.000 title claims abstract description 142
- 102000039446 nucleic acids Human genes 0.000 title claims abstract description 122
- 108020004707 nucleic acids Proteins 0.000 title claims abstract description 122
- 101710163270 Nuclease Proteins 0.000 title claims abstract description 76
- 230000008685 targeting Effects 0.000 title claims abstract description 53
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 172
- 239000002773 nucleotide Substances 0.000 claims abstract description 164
- 108090000623 proteins and genes Proteins 0.000 claims description 165
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 128
- 210000004027 cell Anatomy 0.000 claims description 123
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 122
- 229920001184 polypeptide Polymers 0.000 claims description 117
- 108020004414 DNA Proteins 0.000 claims description 107
- 102000004169 proteins and genes Human genes 0.000 claims description 106
- 102000040430 polynucleotide Human genes 0.000 claims description 99
- 108091033319 polynucleotide Proteins 0.000 claims description 99
- 239000002157 polynucleotide Substances 0.000 claims description 99
- 238000000034 method Methods 0.000 claims description 90
- 230000014509 gene expression Effects 0.000 claims description 61
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 49
- 239000013598 vector Substances 0.000 claims description 46
- 125000006850 spacer group Chemical group 0.000 claims description 45
- 230000027455 binding Effects 0.000 claims description 39
- 108091033409 CRISPR Proteins 0.000 claims description 33
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 33
- 230000000295 complement effect Effects 0.000 claims description 31
- 102000053602 DNA Human genes 0.000 claims description 26
- 150000001413 amino acids Chemical class 0.000 claims description 21
- 230000004048 modification Effects 0.000 claims description 18
- 238000012986 modification Methods 0.000 claims description 18
- 230000035772 mutation Effects 0.000 claims description 18
- 238000012217 deletion Methods 0.000 claims description 13
- 230000037430 deletion Effects 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 13
- 238000003780 insertion Methods 0.000 claims description 12
- 230000037431 insertion Effects 0.000 claims description 12
- 108020004705 Codon Proteins 0.000 claims description 9
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 9
- 239000008194 pharmaceutical composition Substances 0.000 claims description 9
- 108091036078 conserved sequence Proteins 0.000 claims description 7
- 239000013607 AAV vector Substances 0.000 claims description 6
- 238000012258 culturing Methods 0.000 claims description 3
- 238000010354 CRISPR gene editing Methods 0.000 claims 13
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 claims 2
- 230000009149 molecular binding Effects 0.000 claims 1
- 235000018102 proteins Nutrition 0.000 description 103
- 230000000694 effects Effects 0.000 description 54
- 108091079001 CRISPR RNA Proteins 0.000 description 52
- 238000003776 cleavage reaction Methods 0.000 description 46
- 230000007017 scission Effects 0.000 description 46
- 239000012636 effector Substances 0.000 description 35
- 239000012634 fragment Substances 0.000 description 31
- 102000004389 Ribonucleoproteins Human genes 0.000 description 28
- 108010081734 Ribonucleoproteins Proteins 0.000 description 28
- 108020001507 fusion proteins Proteins 0.000 description 28
- 102000037865 fusion proteins Human genes 0.000 description 28
- 108020005004 Guide RNA Proteins 0.000 description 26
- 238000000338 in vitro Methods 0.000 description 24
- 230000003612 virological effect Effects 0.000 description 23
- 239000013612 plasmid Substances 0.000 description 22
- 238000006467 substitution reaction Methods 0.000 description 22
- 125000003275 alpha amino acid group Chemical group 0.000 description 21
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 19
- 210000003463 organelle Anatomy 0.000 description 17
- 210000001161 mammalian embryo Anatomy 0.000 description 16
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 14
- 108020004682 Single-Stranded DNA Proteins 0.000 description 13
- 241000700605 Viruses Species 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 12
- 239000003795 chemical substances by application Substances 0.000 description 12
- 230000002759 chromosomal effect Effects 0.000 description 12
- 230000004927 fusion Effects 0.000 description 12
- 238000003556 assay Methods 0.000 description 11
- 210000004899 c-terminal region Anatomy 0.000 description 11
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 11
- 230000001105 regulatory effect Effects 0.000 description 11
- 238000001890 transfection Methods 0.000 description 11
- 230000004568 DNA-binding Effects 0.000 description 10
- 125000000539 amino acid group Chemical group 0.000 description 10
- 239000002502 liposome Substances 0.000 description 10
- 239000000047 product Substances 0.000 description 10
- 210000001519 tissue Anatomy 0.000 description 10
- 238000013518 transcription Methods 0.000 description 10
- 230000035897 transcription Effects 0.000 description 10
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical group NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 230000006780 non-homologous end joining Effects 0.000 description 9
- 239000000523 sample Substances 0.000 description 9
- 241000894006 Bacteria Species 0.000 description 8
- 102000004190 Enzymes Human genes 0.000 description 8
- 108090000790 Enzymes Proteins 0.000 description 8
- 239000000872 buffer Substances 0.000 description 8
- 238000009396 hybridization Methods 0.000 description 8
- 230000001404 mediated effect Effects 0.000 description 8
- 208000024891 symptom Diseases 0.000 description 8
- 230000002103 transcriptional effect Effects 0.000 description 8
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 7
- 108091026890 Coding region Proteins 0.000 description 7
- 241000701022 Cytomegalovirus Species 0.000 description 7
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 7
- 230000001580 bacterial effect Effects 0.000 description 7
- 239000012472 biological sample Substances 0.000 description 7
- 238000001514 detection method Methods 0.000 description 7
- 238000001727 in vivo Methods 0.000 description 7
- 230000001965 increasing effect Effects 0.000 description 7
- 210000004962 mammalian cell Anatomy 0.000 description 7
- 238000003752 polymerase chain reaction Methods 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- 230000001177 retroviral effect Effects 0.000 description 7
- 230000009870 specific binding Effects 0.000 description 7
- 229940035893 uracil Drugs 0.000 description 7
- 239000013603 viral vector Substances 0.000 description 7
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 6
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 6
- 241000702421 Dependoparvovirus Species 0.000 description 6
- 108091092195 Intron Proteins 0.000 description 6
- 239000003623 enhancer Substances 0.000 description 6
- 238000001638 lipofection Methods 0.000 description 6
- 238000004806 packaging method and process Methods 0.000 description 6
- 230000036961 partial effect Effects 0.000 description 6
- 229920002401 polyacrylamide Polymers 0.000 description 6
- 230000008439 repair process Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 102100033934 DNA repair protein RAD51 homolog 2 Human genes 0.000 description 5
- 101001132307 Homo sapiens DNA repair protein RAD51 homolog 2 Proteins 0.000 description 5
- 239000012190 activator Substances 0.000 description 5
- 238000007792 addition Methods 0.000 description 5
- 238000003491 array Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000003197 catalytic effect Effects 0.000 description 5
- 239000003153 chemical reaction reagent Substances 0.000 description 5
- 230000004049 epigenetic modification Effects 0.000 description 5
- 239000013604 expression vector Substances 0.000 description 5
- 108091006047 fluorescent proteins Proteins 0.000 description 5
- 102000034287 fluorescent proteins Human genes 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 150000002632 lipids Chemical class 0.000 description 5
- 239000003550 marker Substances 0.000 description 5
- -1 nucleopasmin Proteins 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012163 sequencing technique Methods 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- YYGNTYWPHWGJRM-UHFFFAOYSA-N (6E,10E,14E,18E)-2,6,10,15,19,23-hexamethyltetracosa-2,6,10,14,18,22-hexaene Chemical compound CC(C)=CCCC(C)=CCCC(C)=CCCC=C(C)CCC=C(C)CCC=C(C)C YYGNTYWPHWGJRM-UHFFFAOYSA-N 0.000 description 4
- 101710169336 5'-deoxyadenosine deaminase Proteins 0.000 description 4
- 102000055025 Adenosine deaminases Human genes 0.000 description 4
- 102100026846 Cytidine deaminase Human genes 0.000 description 4
- 108010031325 Cytidine deaminase Proteins 0.000 description 4
- 230000007018 DNA scission Effects 0.000 description 4
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 4
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 4
- 102000004533 Endonucleases Human genes 0.000 description 4
- 108010042407 Endonucleases Proteins 0.000 description 4
- 102000005720 Glutathione transferase Human genes 0.000 description 4
- 108010070675 Glutathione transferase Proteins 0.000 description 4
- 241000238631 Hexapoda Species 0.000 description 4
- 102100025169 Max-binding protein MNT Human genes 0.000 description 4
- 101100494762 Mus musculus Nedd9 gene Proteins 0.000 description 4
- 208000009869 Neu-Laxova syndrome Diseases 0.000 description 4
- 108700026244 Open Reading Frames Proteins 0.000 description 4
- 108010076504 Protein Sorting Signals Proteins 0.000 description 4
- 102000018120 Recombinases Human genes 0.000 description 4
- 108010091086 Recombinases Proteins 0.000 description 4
- 241000700584 Simplexvirus Species 0.000 description 4
- BHEOSNUKNHRBNM-UHFFFAOYSA-N Tetramethylsqualene Natural products CC(=C)C(C)CCC(=C)C(C)CCC(C)=CCCC=C(C)CCC(C)C(=C)CCC(C)C(C)=C BHEOSNUKNHRBNM-UHFFFAOYSA-N 0.000 description 4
- 108020005202 Viral DNA Proteins 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- PRAKJMSDJKAYCZ-UHFFFAOYSA-N dodecahydrosqualene Natural products CC(C)CCCC(C)CCCC(C)CCCCC(C)CCCC(C)CCCC(C)C PRAKJMSDJKAYCZ-UHFFFAOYSA-N 0.000 description 4
- 238000001415 gene therapy Methods 0.000 description 4
- 230000002068 genetic effect Effects 0.000 description 4
- 239000001963 growth medium Substances 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 230000004807 localization Effects 0.000 description 4
- 230000000149 penetrating effect Effects 0.000 description 4
- 229920000447 polyanionic polymer Polymers 0.000 description 4
- 229920002643 polyglutamic acid Polymers 0.000 description 4
- 229920000642 polymer Polymers 0.000 description 4
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 4
- 239000011535 reaction buffer Substances 0.000 description 4
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 229940031439 squalene Drugs 0.000 description 4
- TUHBEKDERLKLEC-UHFFFAOYSA-N squalene Natural products CC(=CCCC(=CCCC(=CCCC=C(/C)CCC=C(/C)CC=C(C)C)C)C)C TUHBEKDERLKLEC-UHFFFAOYSA-N 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- 108091006107 transcriptional repressors Proteins 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 241000701161 unidentified adenovirus Species 0.000 description 4
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 3
- 108700010070 Codon Usage Proteins 0.000 description 3
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 3
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 239000007995 HEPES buffer Substances 0.000 description 3
- 102100021244 Integral membrane protein GPR180 Human genes 0.000 description 3
- 102100034349 Integrase Human genes 0.000 description 3
- 101710175625 Maltose/maltodextrin-binding periplasmic protein Proteins 0.000 description 3
- 241000124008 Mammalia Species 0.000 description 3
- 108060004795 Methyltransferase Proteins 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 3
- 241000589499 Thermus thermophilus Species 0.000 description 3
- 108091023040 Transcription factor Proteins 0.000 description 3
- 102000040945 Transcription factor Human genes 0.000 description 3
- 108091005764 adaptor proteins Proteins 0.000 description 3
- 102000035181 adaptor proteins Human genes 0.000 description 3
- 239000002671 adjuvant Substances 0.000 description 3
- 230000004075 alteration Effects 0.000 description 3
- 230000003115 biocidal effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000012350 deep sequencing Methods 0.000 description 3
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 3
- 230000003828 downregulation Effects 0.000 description 3
- 238000004520 electroporation Methods 0.000 description 3
- 239000000499 gel Substances 0.000 description 3
- 238000010362 genome editing Methods 0.000 description 3
- 230000012010 growth Effects 0.000 description 3
- 230000002779 inactivation Effects 0.000 description 3
- 208000015181 infectious disease Diseases 0.000 description 3
- 230000000977 initiatory effect Effects 0.000 description 3
- 239000002105 nanoparticle Substances 0.000 description 3
- 210000004940 nucleus Anatomy 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 239000000546 pharmaceutical excipient Substances 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 150000003384 small molecules Chemical class 0.000 description 3
- 239000011780 sodium chloride Substances 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000032258 transport Effects 0.000 description 3
- 241001515965 unidentified phage Species 0.000 description 3
- 238000011144 upstream manufacturing Methods 0.000 description 3
- KIUKXJAPPMFGSW-DNGZLQJQSA-N (2S,3S,4S,5R,6R)-6-[(2S,3R,4R,5S,6R)-3-Acetamido-2-[(2S,3S,4R,5R,6R)-6-[(2R,3R,4R,5S,6R)-3-acetamido-2,5-dihydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-2-carboxy-4,5-dihydroxyoxan-3-yl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3,4,5-trihydroxyoxane-2-carboxylic acid Chemical compound CC(=O)N[C@H]1[C@H](O)O[C@H](CO)[C@@H](O)[C@@H]1O[C@H]1[C@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H]([C@@H](O[C@H]3[C@@H]([C@@H](O)[C@H](O)[C@H](O3)C(O)=O)O)[C@H](O)[C@@H](CO)O2)NC(C)=O)[C@@H](C(O)=O)O1 KIUKXJAPPMFGSW-DNGZLQJQSA-N 0.000 description 2
- 208000035657 Abasia Diseases 0.000 description 2
- 108091093088 Amplicon Proteins 0.000 description 2
- 241000203069 Archaea Species 0.000 description 2
- 101710201279 Biotin carboxyl carrier protein Proteins 0.000 description 2
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 2
- 238000010446 CRISPR interference Methods 0.000 description 2
- BHPQYMZQTOCNFJ-UHFFFAOYSA-N Calcium cation Chemical compound [Ca+2] BHPQYMZQTOCNFJ-UHFFFAOYSA-N 0.000 description 2
- 101000709520 Chlamydia trachomatis serovar L2 (strain 434/Bu / ATCC VR-902B) Atypical response regulator protein ChxR Proteins 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 108010067770 Endopeptidase K Proteins 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- 241000713813 Gibbon ape leukemia virus Species 0.000 description 2
- 102100039869 Histone H2B type F-S Human genes 0.000 description 2
- 101001035372 Homo sapiens Histone H2B type F-S Proteins 0.000 description 2
- 241000701085 Human alphaherpesvirus 3 Species 0.000 description 2
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 2
- 241000725303 Human immunodeficiency virus Species 0.000 description 2
- 241000701806 Human papillomavirus Species 0.000 description 2
- 108010025815 Kanamycin Kinase Proteins 0.000 description 2
- 241000713666 Lentivirus Species 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 241000700560 Molluscum contagiosum virus Species 0.000 description 2
- 241000714177 Murine leukemia virus Species 0.000 description 2
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 2
- 101710135898 Myc proto-oncogene protein Proteins 0.000 description 2
- 102220526240 NACHT, LRR and PYD domains-containing protein 3_D82R_mutation Human genes 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 description 2
- 102000005891 Pancreatic ribonuclease Human genes 0.000 description 2
- 108091093037 Peptide nucleic acid Proteins 0.000 description 2
- 230000004570 RNA-binding Effects 0.000 description 2
- 238000003559 RNA-seq method Methods 0.000 description 2
- 108091028664 Ribonucleotide Proteins 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 241000713311 Simian immunodeficiency virus Species 0.000 description 2
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 2
- UIIMBOGNXHQVGW-UHFFFAOYSA-M Sodium bicarbonate Chemical compound [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- 102000002933 Thioredoxin Human genes 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 101710150448 Transcriptional regulator Myc Proteins 0.000 description 2
- 108700019146 Transgenes Proteins 0.000 description 2
- 102000008579 Transposases Human genes 0.000 description 2
- 108010020764 Transposases Proteins 0.000 description 2
- 108010067390 Viral Proteins Proteins 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 230000021736 acetylation Effects 0.000 description 2
- 238000006640 acetylation reaction Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 238000001042 affinity chromatography Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008512 biological response Effects 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 210000001124 body fluid Anatomy 0.000 description 2
- 239000011575 calcium Substances 0.000 description 2
- 229910001424 calcium ion Inorganic materials 0.000 description 2
- 229960003669 carbenicillin Drugs 0.000 description 2
- FPPNZSSZRUTDAP-UWFZAAFLSA-N carbenicillin Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)C(C(O)=O)C1=CC=CC=C1 FPPNZSSZRUTDAP-UWFZAAFLSA-N 0.000 description 2
- 101150038500 cas9 gene Proteins 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 239000006143 cell culture medium Substances 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 239000012707 chemical precursor Substances 0.000 description 2
- 102000021178 chitin binding proteins Human genes 0.000 description 2
- 108091011157 chitin binding proteins Proteins 0.000 description 2
- 230000021615 conjugation Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000006471 dimerization reaction Methods 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 230000005782 double-strand break Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 210000002257 embryonic structure Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 239000013613 expression plasmid Substances 0.000 description 2
- 239000012091 fetal bovine serum Substances 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 229920002674 hyaluronan Polymers 0.000 description 2
- 229960003160 hyaluronic acid Drugs 0.000 description 2
- 230000002209 hydrophobic effect Effects 0.000 description 2
- 108010002685 hygromycin-B kinase Proteins 0.000 description 2
- 238000001114 immunoprecipitation Methods 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 229960000318 kanamycin Drugs 0.000 description 2
- 229930027917 kanamycin Natural products 0.000 description 2
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 2
- 229930182823 kanamycin A Natural products 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 230000011987 methylation Effects 0.000 description 2
- 238000007069 methylation reaction Methods 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 239000002679 microRNA Substances 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 230000025608 mitochondrion localization Effects 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 229940035032 monophosphoryl lipid a Drugs 0.000 description 2
- 125000001446 muramyl group Chemical group N[C@@H](C=O)[C@@H](O[C@@H](C(=O)*)C)[C@H](O)[C@H](O)CO 0.000 description 2
- 238000002703 mutagenesis Methods 0.000 description 2
- 231100000350 mutagenesis Toxicity 0.000 description 2
- 108091027963 non-coding RNA Proteins 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000026731 phosphorylation Effects 0.000 description 2
- 238000006366 phosphorylation reaction Methods 0.000 description 2
- 210000002706 plastid Anatomy 0.000 description 2
- 230000025540 plastid localization Effects 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 230000008488 polyadenylation Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- 235000004252 protein component Nutrition 0.000 description 2
- 239000013636 protein dimer Substances 0.000 description 2
- 108020001580 protein domains Proteins 0.000 description 2
- 230000007026 protein scission Effects 0.000 description 2
- 230000004850 protein–protein interaction Effects 0.000 description 2
- 150000004053 quinones Chemical class 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 239000002336 ribonucleotide Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000002864 sequence alignment Methods 0.000 description 2
- 238000002741 site-directed mutagenesis Methods 0.000 description 2
- 238000001542 size-exclusion chromatography Methods 0.000 description 2
- 229960005322 streptomycin Drugs 0.000 description 2
- UCSJYZPVAKXKNQ-HZYVHMACSA-N streptomycin Chemical compound CN[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O[C@H]1O[C@@H]1[C@](C=O)(O)[C@H](C)O[C@H]1O[C@@H]1[C@@H](NC(N)=N)[C@H](O)[C@@H](NC(N)=N)[C@H](O)[C@H]1O UCSJYZPVAKXKNQ-HZYVHMACSA-N 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 239000004094 surface-active agent Substances 0.000 description 2
- 238000010381 tandem affinity purification Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 108060008226 thioredoxin Proteins 0.000 description 2
- 229940094937 thioredoxin Drugs 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 108091006106 transcriptional activators Proteins 0.000 description 2
- 238000010361 transduction Methods 0.000 description 2
- 230000026683 transduction Effects 0.000 description 2
- 238000011426 transformation method Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000010474 transient expression Effects 0.000 description 2
- 241001430294 unidentified retrovirus Species 0.000 description 2
- 230000003827 upregulation Effects 0.000 description 2
- 239000003981 vehicle Substances 0.000 description 2
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- IIZPXYDJLKNOIY-JXPKJXOSSA-N 1-palmitoyl-2-arachidonoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCC(=O)OC[C@H](COP([O-])(=O)OCC[N+](C)(C)C)OC(=O)CCC\C=C/C\C=C/C\C=C/C\C=C/CCCCC IIZPXYDJLKNOIY-JXPKJXOSSA-N 0.000 description 1
- OVSKIKFHRZPJSS-UHFFFAOYSA-N 2,4-D Chemical compound OC(=O)COC1=CC=C(Cl)C=C1Cl OVSKIKFHRZPJSS-UHFFFAOYSA-N 0.000 description 1
- 239000005631 2,4-Dichlorophenoxyacetic acid Substances 0.000 description 1
- 229940087195 2,4-dichlorophenoxyacetate Drugs 0.000 description 1
- GOJUJUVQIVIZAV-UHFFFAOYSA-N 2-amino-4,6-dichloropyrimidine-5-carbaldehyde Chemical group NC1=NC(Cl)=C(C=O)C(Cl)=N1 GOJUJUVQIVIZAV-UHFFFAOYSA-N 0.000 description 1
- IAJOBQBIJHVGMQ-UHFFFAOYSA-N 2-amino-4-[hydroxy(methyl)phosphoryl]butanoic acid Chemical compound CP(O)(=O)CCC(N)C(O)=O IAJOBQBIJHVGMQ-UHFFFAOYSA-N 0.000 description 1
- BFSVOASYOCHEOV-UHFFFAOYSA-N 2-diethylaminoethanol Chemical compound CCN(CC)CCO BFSVOASYOCHEOV-UHFFFAOYSA-N 0.000 description 1
- UPMXNNIRAGDFEH-UHFFFAOYSA-N 3,5-dibromo-4-hydroxybenzonitrile Chemical compound OC1=C(Br)C=C(C#N)C=C1Br UPMXNNIRAGDFEH-UHFFFAOYSA-N 0.000 description 1
- CAAMSDWKXXPUJR-UHFFFAOYSA-N 3,5-dihydro-4H-imidazol-4-one Chemical class O=C1CNC=N1 CAAMSDWKXXPUJR-UHFFFAOYSA-N 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 102100027211 Albumin Human genes 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- GUBGYTABKSRVRQ-XLOQQCSPSA-N Alpha-Lactose Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1O[C@@H]1[C@@H](CO)O[C@H](O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-XLOQQCSPSA-N 0.000 description 1
- 241000224489 Amoeba Species 0.000 description 1
- 244000303258 Annona diversifolia Species 0.000 description 1
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 101100123845 Aphanizomenon flos-aquae (strain 2012/KM1/D3) hepT gene Proteins 0.000 description 1
- 241001443586 Atadenovirus Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 241000701802 Aviadenovirus Species 0.000 description 1
- 108020000946 Bacterial DNA Proteins 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- BTBUEUYNUDRHOZ-UHFFFAOYSA-N Borate Chemical compound [O-]B([O-])[O-] BTBUEUYNUDRHOZ-UHFFFAOYSA-N 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 241000621124 Bovine papular stomatitis virus Species 0.000 description 1
- 239000005489 Bromoxynil Substances 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- QCMYYKRYFNMIEC-UHFFFAOYSA-N COP(O)=O Chemical class COP(O)=O QCMYYKRYFNMIEC-UHFFFAOYSA-N 0.000 description 1
- 101100285688 Caenorhabditis elegans hrg-7 gene Proteins 0.000 description 1
- 101000909256 Caldicellulosiruptor bescii (strain ATCC BAA-1888 / DSM 6725 / Z-1320) DNA polymerase I Proteins 0.000 description 1
- 102000000584 Calmodulin Human genes 0.000 description 1
- 108010041952 Calmodulin Proteins 0.000 description 1
- 241000282836 Camelus dromedarius Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 229920000049 Carbon (fiber) Polymers 0.000 description 1
- 108700004991 Cas12a Proteins 0.000 description 1
- 241000700199 Cavia porcellus Species 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- ZAMOUSCENKQFHK-UHFFFAOYSA-N Chlorine atom Chemical compound [Cl] ZAMOUSCENKQFHK-UHFFFAOYSA-N 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 241000700626 Cowpox virus Species 0.000 description 1
- 241000699800 Cricetinae Species 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- FBPFZTCFMRRESA-FSIIMWSLSA-N D-Glucitol Natural products OC[C@H](O)[C@H](O)[C@@H](O)[C@H](O)CO FBPFZTCFMRRESA-FSIIMWSLSA-N 0.000 description 1
- FBPFZTCFMRRESA-KVTDHHQDSA-N D-Mannitol Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-KVTDHHQDSA-N 0.000 description 1
- FBPFZTCFMRRESA-JGWLITMVSA-N D-glucitol Chemical compound OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-JGWLITMVSA-N 0.000 description 1
- 102100040263 DNA dC->dU-editing enzyme APOBEC-3A Human genes 0.000 description 1
- 238000007399 DNA isolation Methods 0.000 description 1
- 230000008301 DNA looping mechanism Effects 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 101710177611 DNA polymerase II large subunit Proteins 0.000 description 1
- 101710184669 DNA polymerase II small subunit Proteins 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 241000450599 DNA viruses Species 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 241000192091 Deinococcus radiodurans Species 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 101710091045 Envelope protein Proteins 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 108010010803 Gelatin Proteins 0.000 description 1
- 241000702463 Geminiviridae Species 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- KOSRFJWDECSPRO-WDSKDSINSA-N Glu-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(O)=O KOSRFJWDECSPRO-WDSKDSINSA-N 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108091093094 Glycol nucleic acid Proteins 0.000 description 1
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 1
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 1
- 101100412102 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) rec2 gene Proteins 0.000 description 1
- 241000700721 Hepatitis B virus Species 0.000 description 1
- 102000008157 Histone Demethylases Human genes 0.000 description 1
- 108010074870 Histone Demethylases Proteins 0.000 description 1
- 102000011787 Histone Methyltransferases Human genes 0.000 description 1
- 108010036115 Histone Methyltransferases Proteins 0.000 description 1
- 102000003893 Histone acetyltransferases Human genes 0.000 description 1
- 108090000246 Histone acetyltransferases Proteins 0.000 description 1
- 102000043851 Histone deacetylase domains Human genes 0.000 description 1
- 108700038236 Histone deacetylase domains Proteins 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 108091064358 Holliday junction Proteins 0.000 description 1
- 102000039011 Holliday junction Human genes 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000964378 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3A Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 241000046923 Human bocavirus Species 0.000 description 1
- 241001502974 Human gammaherpesvirus 8 Species 0.000 description 1
- 241000713772 Human immunodeficiency virus 1 Species 0.000 description 1
- 241000702617 Human parvovirus B19 Species 0.000 description 1
- XQFRJNBWHJMXHO-RRKCRQDMSA-N IDUR Chemical compound C1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(I)=C1 XQFRJNBWHJMXHO-RRKCRQDMSA-N 0.000 description 1
- 241001651351 Ichtadenovirus Species 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 108010061833 Integrases Proteins 0.000 description 1
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- 101710128836 Large T antigen Proteins 0.000 description 1
- 239000006142 Luria-Bertani Agar Substances 0.000 description 1
- 239000006137 Luria-Bertani broth Substances 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 239000007993 MOPS buffer Substances 0.000 description 1
- 241000282560 Macaca mulatta Species 0.000 description 1
- 229930195725 Mannitol Natural products 0.000 description 1
- 241000701244 Mastadenovirus Species 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 241000700627 Monkeypox virus Species 0.000 description 1
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 108091061960 Naked DNA Proteins 0.000 description 1
- 241001336717 Nanoviridae Species 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 241000700635 Orf virus Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 206010033976 Paravaccinia Diseases 0.000 description 1
- 241000701945 Parvoviridae Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 229930182555 Penicillin Natural products 0.000 description 1
- JGSARLDLIJGVTE-MBNYWOFBSA-N Penicillin G Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)CC1=CC=CC=C1 JGSARLDLIJGVTE-MBNYWOFBSA-N 0.000 description 1
- 241000009328 Perro Species 0.000 description 1
- 241000701253 Phycodnaviridae Species 0.000 description 1
- 241001505332 Polyomavirus sp. Species 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 101710188315 Protein X Proteins 0.000 description 1
- 241000125945 Protoparvovirus Species 0.000 description 1
- 101000902592 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) DNA polymerase Proteins 0.000 description 1
- 102000014450 RNA Polymerase III Human genes 0.000 description 1
- 108010078067 RNA Polymerase III Proteins 0.000 description 1
- 238000012228 RNA interference-mediated gene silencing Methods 0.000 description 1
- 230000007022 RNA scission Effects 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 241000620568 Siadenovirus Species 0.000 description 1
- 238000012167 Small RNA sequencing Methods 0.000 description 1
- 108020004459 Small interfering RNA Proteins 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 241000282898 Sus scrofa Species 0.000 description 1
- 241000404000 Tanapox virus Species 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- 108091028113 Trans-activating crRNA Proteins 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 241000700618 Vaccinia virus Species 0.000 description 1
- 241000700647 Variola virus Species 0.000 description 1
- 206010047139 Vasoconstriction Diseases 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 241001536558 Yaba monkey tumor virus Species 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000013543 active substance Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 1
- 101150073130 ampR gene Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000006217 arginine-methylation Effects 0.000 description 1
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 1
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 1
- 230000008970 bacterial immunity Effects 0.000 description 1
- 244000052616 bacterial pathogen Species 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- 230000008436 biogenesis Effects 0.000 description 1
- 230000008236 biological pathway Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 239000007975 buffered saline Substances 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 239000004917 carbon fiber Substances 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000032823 cell division Effects 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 230000004700 cellular uptake Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 125000003636 chemical group Chemical group 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 238000002487 chromatin immunoprecipitation Methods 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000000975 co-precipitation Methods 0.000 description 1
- 230000009918 complex formation Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000004132 cross linking Methods 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000007857 degradation product Substances 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000001687 destabilization Effects 0.000 description 1
- 239000008121 dextrose Substances 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000007877 drug screening Methods 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007608 epigenetic mechanism Effects 0.000 description 1
- 230000006846 excision repair Effects 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 239000008273 gelatin Substances 0.000 description 1
- 229920000159 gelatin Polymers 0.000 description 1
- 235000019322 gelatine Nutrition 0.000 description 1
- 235000011852 gelatine desserts Nutrition 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 1
- 230000013595 glycosylation Effects 0.000 description 1
- 238000006206 glycosylation reaction Methods 0.000 description 1
- 108010033706 glycylserine Proteins 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000002363 herbicidal effect Effects 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000007031 hydroxymethylation reaction Methods 0.000 description 1
- 238000001597 immobilized metal affinity chromatography Methods 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000001976 improved effect Effects 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 230000000415 inactivating effect Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000004255 ion exchange chromatography Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 1
- 239000000644 isotonic solution Substances 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 239000000787 lecithin Substances 0.000 description 1
- 229940067606 lecithin Drugs 0.000 description 1
- 235000010445 lecithin Nutrition 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 230000029226 lipidation Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000594 mannitol Substances 0.000 description 1
- 235000010355 mannitol Nutrition 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910021645 metal ion Inorganic materials 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 238000001823 molecular biology technique Methods 0.000 description 1
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 1
- 231100000219 mutagenic Toxicity 0.000 description 1
- 230000003505 mutagenic effect Effects 0.000 description 1
- 210000004165 myocardium Anatomy 0.000 description 1
- 210000004897 n-terminal region Anatomy 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 230000037434 nonsense mutation Effects 0.000 description 1
- 230000009871 nonspecific binding Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000001293 nucleolytic effect Effects 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- 230000030648 nucleus localization Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002018 overexpression Effects 0.000 description 1
- 244000045947 parasite Species 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 229940049954 penicillin Drugs 0.000 description 1
- 229940021222 peritoneal dialysis isotonic solution Drugs 0.000 description 1
- 239000002953 phosphate buffered saline Substances 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 150000008298 phosphoramidates Chemical class 0.000 description 1
- 206010035114 pityriasis rosea Diseases 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000000159 protein binding assay Methods 0.000 description 1
- 238000010379 pull-down assay Methods 0.000 description 1
- 239000002510 pyrogen Substances 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000008263 repair mechanism Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 238000004366 reverse phase liquid chromatography Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000013207 serial dilution Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 230000005783 single-strand break Effects 0.000 description 1
- 210000002027 skeletal muscle Anatomy 0.000 description 1
- 239000004055 small Interfering RNA Substances 0.000 description 1
- 229910000030 sodium bicarbonate Inorganic materials 0.000 description 1
- 235000017557 sodium bicarbonate Nutrition 0.000 description 1
- 229910000029 sodium carbonate Inorganic materials 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 239000000600 sorbitol Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 125000001424 substituent group Chemical group 0.000 description 1
- 230000010741 sumoylation Effects 0.000 description 1
- 230000009261 transgenic effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 230000010415 tropism Effects 0.000 description 1
- 238000001419 two-dimensional polyacrylamide gel electrophoresis Methods 0.000 description 1
- 230000034512 ubiquitination Effects 0.000 description 1
- 238000010798 ubiquitination Methods 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 241000202362 uncultured archaeon Species 0.000 description 1
- 241001529453 unidentified herpesvirus Species 0.000 description 1
- 230000025033 vasoconstriction Effects 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 239000000277 virosome Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K38/00—Medicinal preparations containing peptides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
Definitions
- the present invention relates to novel RNA-guided nucleases (RGN) and nucleic acid targeting systems comprising such.
- RGN RNA-guided nucleases
- RNA-guided nucleases such as the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (Cas) proteins allow for the targeting of specific sequences by using a short RNA sequence that specifically hybridizes with a particular target sequence.
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
- Such CRISPR systems because popular and gained multiple uses in research, diagnostics and therapeutics due to the ease of production of target-specific short RNA sequences and use of such with the same RGN protein.
- RGNs can be used to edit genomes through the introduction of a sequence-specific, double -stranded break that is either repaired and introduces a mutation or repaired by introducing a stretch of heterologous DNA.
- Inactive versions RGNs has been also widely used to target specific DNA or RNA regions and in combination with other proteins allowed to study and modulate multiple cellular processes and provide a useful tool for gene function study and modulation of their activity.
- the present invention provides novel RNA-guided nuclease (RGN) polypeptides, and long monomeric nucleic acid targeting RNAs (ImntRNAs), ImntRNA nucleic acid targeting systems comprising those, nucleic acid molecules encoding the same, and vectors and host cells comprising such nucleic acid molecules.
- RGN RNA-guided nuclease
- ImntRNAs long monomeric nucleic acid targeting RNAs
- ImntRNA nucleic acid targeting systems comprising those, nucleic acid molecules encoding the same, and vectors and host cells comprising such nucleic acid molecules.
- nucleic-acid targeting systems for binding a target nucleic acid sequence of interest, wherein the system comprises a RGN polypeptide and one or more RNA sequences targeting the nucleic acid of interest.
- methods disclosed herein are drawn to binding a target sequence of interest, and in some embodiments, cleaving or modifying the target sequence of interest.
- the target sequence of interest can be modified, for example, as a result of non-homologous end joining or homology-directed repair with an introduced donor sequence.
- FIG. 1 CRISPR locus of Type VM systems. Casl, 2, 4, or other Cas proteins are not found in any instances of Type VM systems. Effector Casl2m protein is absolutely essential, and carries out the DNA interference activity of the systems, and consists of a Rec domain and tri-split RuvC domain.
- the CRISPR array is at most 225 bases from the end of the Casl2m protein and the ImntRNA is always found within this region. The ImntRNA starts no more than 75 from the end of the effector protein and contains an antirepeat before the CRISPR array. Expression continues through the CRISPR array, but may be truncated to any number of spacer sequences.
- FIG. 1 CRISPR locus of Type VF1 systems. Casl, 2, 4, or other Cas proteins are found in some instances of Type VF1 systems. Effector Casl2fl protein is absolutely essential, and carries out the DNA interference activity of the systems and consists of a Reel domain occasionally containing a Zn-finger domain, and a tri-split RuvC domain.
- the CRISPR array can be up to 5000 bases or longer from the Casl2fl protein.
- the systems are targeted by a dual RNA system, consisting of separately expressed tracrRNA and crRNA from the CRISPR array. The position of the tracrRNA can not be identified based on the position of the CRISPR array and the effector protein.
- FIG. 1 Phylogenetic Tree of various selected Casl2fl, Casl2f2, Casl2f3, and Casl2m effector proteins. Cas 12m proteins indicated with darker lines
- FIG. 5 Example of consensus ImntRNA with four hairpins.
- SEQ ID NO: 83 The final hairpin (5) is comprised on one side by an antirepeat (AR) to the other side consisting of sequence directly from the CRISPR array (REPEAT) followed by a short leader sequence before reprogrammable sequence (SPACER) for retargeting.
- AR antirepeat
- REPEAT CRISPR array
- SPACER short leader sequence before reprogrammable sequence
- FIG. A schematic representation of ImntRNA structure with anti-repeat sequence (sequence partially complementary to the CRISPR repeat sequence), CRISPR repeat sequence and reprogrammable targeting sequence (spacer).
- FIG. 7 The ImntRNA of EGS0091 (SEQ ID NO: 38) with various truncation spots highlighted to make engineered ImntRNA (elmntRNA) designs.
- the elmntRNA designs were tested with removing hairpin 1 (diamond), partial truncation of hairpin 2 to contain only 9-11 bp on the original stem loop structure (oval), with partial truncation of hairpin 5 to replace extra sequence past the repeat-antirepeat with a GAAA tetraloop (rectangle), and with partial truncation of hairpin 5 to the first mismatch in the repeat-antirepeat with a GAAA tetraloop (hexagon).
- FIG. 8-11 Bacterial Plasmid Interference activity results showing active CRISPR interference for EGS0091-94
- Figures 12-15 Small RNAseq data showing boundaries of ImntRNA expression for EGS0091-94
- Figure 17 In vitro Cleavage by elmentRNA with EGS0091 D93R. Shows that Hairpin 1 is essential for activity, but that hairpin 2 and hairpin 5 can be truncated.
- FIG. 18 Eukaryotic Editing with DNA binding affinity mutation for EGS0091. Increased Non- homologous End Joining rates with the DNA binding affinity mutation D93R in EGS0091 compared to wildtype.
- FIG. 19 Eukaryotic Editing with ImntRNA designs.
- Figure 20 Trans activated DNA cleavage by Casl2m protein at 30 min.
- AAV adeno-associated virus
- a biological sample may contain whole cells and/or live cells and/or cell debris.
- the biological sample may contain (or be derived from) a “bodily fluid”. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
- Casl2fl refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type VF1 CRISPR system.
- the Casl2fl protein commonly used is from an uncultured archaeon (Uni).
- the Casl2fl protein may be mutated so that the nuclease activity is partly or completely inactivated.
- Casl2fl RGNs are described in Harrington et al (2016). Science, 362(6416), 839-842 and Karvelis et al (2020) Nucleic acids research, 48(9), 5016-5023.
- Casl2m refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type VM CRISPR system.
- the Casl2m protein and consists of a Reel domain and tri-split RuvC domain and may be mutated so that the nuclease activity is partly or completely inactivated.
- Cas9 refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type II CRISPR system.
- the Cas9 protein commonly used is from bacterial species Streptococcus pyogenes.
- the Cas9 protein may be mutated so that the nuclease activity is partly or completely inactivated.
- complement or “complementary” as used herein means a nucleic acid can mean Watson-Crick or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
- complementarity refers to a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary.
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
- prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacteriophages that had previously infected the prokaryote. They are used to detect and destroy DNA from similar bacteriophages during subsequent infections.
- CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas") proteins, including sequences encoding a Cas protein, a tracr (trans -activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (containing a "direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred herein to as a "spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
- a tracr trans -activating CRISPR
- tracr-mate sequence containing a "direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system
- guide sequence also referred herein to as a "spacer” in the context of an endogen
- an effective amount refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response.
- an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease.
- an effective amount of a recombinase may refer to the amount of the recombinase that is sufficient to induce recombination at a target site specifically bound and recombined by the recombinase.
- an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- an agent e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- Enhancer refers to non-coding DNA sequences containing multiple activator and repressor binding sites. Enhancers range from 200 bp to 1 kb in length and may be either proximal, 5' upstream to the promoter or within the first intron of the regulated gene, or distal, in introns of neighboring genes or intergenic regions far away from the locus. Through DNA looping, active enhancers contact the promoter dependently of the core DNA binding motif promoter specificity. 4 to 5 enhancers may interact with a promoter.
- fusion protein refers to a chimeric protein created through the covalent or non-covalent joining of two or more genes, directly or indirectly, that originally coded for separate proteins.
- the translation of the fusion gene results in a single polypeptide with functional properties derived from each of the original proteins.
- gRNA also used interchangeably herein as a chimeric single guide RNA (“sgRNA”), refers to nucleic acid which is a fusion of two noncoding RNAs: a crRNA and a tracrRNA. “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains:(l) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas complex to the target); and (2) a domain that binds a Cas protein.
- an "isolated” or “purified” polypeptide, or biologically active portion thereof is substantially or essentially free from components that normally accompany or interact with the polypeptide as found in its naturally occurring environment.
- an isolated or purified polypeptide is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
- a protein that is substantially free of cellular material includes preparations of protein having less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of contaminating protein.
- optimally culture medium represents less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of chemical precursors or non-protein-of-interest chemicals.
- linker refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
- the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
- the linker is an organic molecule, group, polymer, or chemical moiety.
- the linker is a polypeptide of 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150- 200 amino acids in length. Longer or shorter linkers are also contemplated.
- ImntRNA or “long monomeric nucleic acid targeting RNAs “herein refers to a wildtype or chimeric long monomeric nucleic acid targeting RNAs having sufficient complementarity with a target nucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of the associated RNA- guided nuclease described herein to the target nucleotide sequence.
- ImntRNA comprises sequences and secondary structures that are essential for its binding to an RGN and the target sequence of interest.
- modification in reference to a nucleic acid molecule refers to a change in the nucleotide sequence of the nucleic acid molecule, which can be a deletion, insertion, or substitution of one or more nucleotides, or a combination thereof.
- mutation refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
- nucleic acid As used herein, the terms "nucleic acid,” “nucleic acid sequence,” “nucleotide sequence,” “oligonucleotide,” and “polynucleotide” are interchangeable and refer to a polymeric form of nucleotides.
- the nucleotides may be deoxyribonucleotides (DNA), ribonucleotides (RNA), analogs thereof, or combinations thereof, and may be of any length.
- Polynucleotides may perform any function and may have any secondary and tertiary structures.
- the terms encompass known analogs of natural nucleotides and nucleotides that are modified in the base, sugar and/or phosphate moieties.
- a polynucleotide may comprise one modified nucleotide or multiple modified nucleotides. Examples of modified nucleotides include fluorinated nucleotides, methylated nucleotides, and nucleotide analogs. Nucleotide structure may be modified before or after a polymer is assembled. Following polymerization, polynucleotides may be additionally modified via, for example, conjugation with a labeling component or target binding component. A nucleotide sequence may incorporate non-nucleotide components.
- nucleic acids comprising modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, and have similar binding properties as a reference polynucleotide (e.g., DNA or RNA).
- reference polynucleotide e.g., DNA or RNA
- analogs include, but are not limited to, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), Locked Nucleic Acid (LNATM) (Exiqon, Inc., Woburn, MA) nucleosides, glycol nucleic acid, bridged nucleic acids, and morpholino structures.
- Polynucleotide sequences are displayed herein in the conventional 5' to 3' orientation unless otherwise indicated.
- operably linked means that expression of a gene is under the control of a promoter with which it is spatially connected.
- a promoter may be positioned 5' (upstream) or 3' (downstream) of a gene under its control.
- the distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.
- polypeptide As used herein, the terms “peptide,” “polypeptide,” and “protein” are interchangeable and refer to polymers of amino acids.
- a polypeptide may be of any length. It may be branched or linear, it may be interrupted by non-amino acids, and it may comprise modified amino acids.
- the terms may be used to refer to an amino acid polymer that has been modified through, for example, acetylation, disulfide bond formation, glycosylation, lipidation, phosphorylation, cross- linking, and/or conjugation (e.g., with a labeling component or ligand).
- Polypeptide sequences are displayed herein in the conventional N-terminal to C-terminal orientation.
- Polypeptides and polynucleotides can be made using routine techniques in the field of molecular biology (see, e.g., standard texts set forth above). Further, essentially any polypeptide or polynucleotide can be custom ordered from commercial sources.
- percentage of sequence identity means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
- promoter means a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell.
- a promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same.
- a promoter may also comprise distal enhancer or repressor elements, which may be located as much as several thousand base pairs from the start site of transcription.
- a promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals.
- RNA-guided endonuclease or “RGN” is used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage.
- sequence identity or “identity” in the context of two polynucleotides or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
- sequence identity or “identity” in the context of two polynucleotides or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
- percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule.
- sequences differ in conservative substitutions the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution.
- Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity”. Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g. , as implemented in the program PC/GENE (Intelligenetics, Mountain View, California).
- spacer sequence refers to a part of ImntRNA nucleotide sequence that directly hybridizes with the target nucleotide sequence of interest.
- subject and patient as used herein interchangeably refers to any vertebrate, including, but not limited to, a mammal ⁇ e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse, a non-human primate (for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.) and a human).
- a mammal ⁇ e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse
- a non-human primate for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.
- the subject may be a human or a non-human.
- the subject or patient may be undergoing
- target region refers to the region of the target gene to which the CRISPR-based system targets.
- TnpB refers to type of an RGN that cleaves nucleic acid and is encoded by the IS200/IS605 transposase family.
- the TnpB protein commonly used is from Deinococcus radiodurans ISDra2.
- the TnpB protein may be mutated so that the nuclease activity is partly or completely inactivated.
- TnpB RGNs are described in Karvelis, et al. (2021) Nature 599, 692-696
- treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
- treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
- treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed.
- treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease.
- treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence
- Type II CRISPR system refers to effector system that carries out targeted DNA doublestrand break in four sequential steps, using a single effector enzyme, Cas9, to cleave dsDNA.
- the Type II effector system may function in alternative contexts such as eukaryotic cells.
- the Type II effector system consists of a long pre-crRNA, which is transcribed from the spacer-containing CRISPR locus, the Cas9 protein, and a tracrRNA, which is involved in pre-crRNA processing.
- Type VM refers to a novel type of CRISPR system provided in this disclosure comprising an effector protein, such as a RGN, with its translation termination located within 225 bp of a CRISPR repeat spacer array. No other common CRISPR proteins are found nearby. Additionally, the system comprises a long monomeric nucleic acid targeting RNAs (ImntRNAs) which can be found between the effector protein and the CRISPR array, and starts within 75 bp from the end of the effector protein.
- ImntRNAs monomeric nucleic acid targeting RNAs
- vector means a nucleic acid sequence containing an origin of replication.
- a vector may be a viral vector, bacteriophage, bacterial artificial chromosome or yeast artificial chromosome.
- a vector may be a DNA or RNA vector.
- a vector may be a self- replicating extrachromosomal vector, or a DNA plasmid.
- the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) genomic locus is found in the genomes of many prokaryotes. CRISPR loci provide resistance to viruses and phages in prokaryotes. In this way, the CRISPR loci functions as a type of immune system to help defend prokaryotes against foreign invaders. In such system the response to such foreign invaders starts by cleaving the genome of invading viruses and plasmids and integrating segments (termed protospacers) of the genomic DNA into the CRISPR locus of the host organism.
- the segments that are integrated into the host genome are known as “spacers”, which mediate protection from subsequent attack by the same (or sufficiently related) virus or plasmid.
- Expression involves transcription of the CRISPR locus and subsequent enzymatic processing to produce short mature CRISPR RNAs (crRNA), each containing a single spacer sequence. Interference is induced after the CRISPR RNAs associate with Cas proteins to form effector complexes, which are then targeted to complementary protospacers in foreign genetic elements to induce nucleic acid degradation.
- Class 1 and Class 2 based upon the genes encoding the effector component.
- Class 1 systems have a multi-subunit crRNA-effector complex
- Class 2 systems have a single effector protein.
- Typical examples of Class 2 effector proteins are Cas9 and Cpfl (Cas 12a).
- Types I-VI of CRISPR systems have been described (for an overview see Makarova et al., Nature Reviews Microbiology (2015) 13:1-15).
- Class 1 systems comprise Type I, Type III and Type IV systems.
- Class 2 systems comprise Type II, Type V and Type VI systems.
- CRISPR loci include several short repeating sequences referred to as "repeats.”
- the repeats can form hairpin structures and/or the repeats can be single-stranded sequences.
- the repeats occur in clusters. Repeats frequently diverge between species. Repeats are regularly interspaced with unique intervening sequences, referred to as "spacers,” resulting in a repeat-spacer-repeat locus architecture. Spacers are sequences usually identical to or homologous to foreign invader sequences (such as viral sequences).
- a spacer-repeat unit encodes a crisprRNA (crRNA).
- a crRNA refers to the mature form of the spacer-repeat unit.
- a crRNA contains a spacer sequence that is involved in targeting a target nucleic.
- crRNA has a region of complementarity to a potential DNA or RNA target sequence and in some cases, e.g., in currently characterized Type II systems, a second region that forms base-pair hydrogen bonds with a transactivating CRISPR RNA (tracrRNA) to form a secondary structure, typically to form at least a stem structure.
- tracrRNA transactivating CRISPR RNA
- tracrRNA/crRNA Complex formation between tracrRNA/crRNA and a Cas protein results in conformational change of the Cas protein that facilitates binding to DNA, nuclease activities of the Cas protein, and crRNA- guided site-specific DNA cleavage by the nuclease.
- the DNA target sequence is adjacent to a cognate protospacer adjacent motif (PAM).
- PAM protospacer adjacent motif
- CRISPR locus comprises polynucleotide sequences encoding for CRISPR Associated Genes (cas) genes.
- Cas genes are involved in the biogenesis and/or the interference stages of crRNA function. Cas genes display extreme sequence diversity between different species and homologs. Some Cas proteins comprise a specific set of domain structures.
- Mature crRNAs are processed from a longer polycistronic CRISPR locus transcript, also referred to as pre-crRNA array.
- a pre-crRNA array comprises a plurality of crRNAs. The repeats in the pre-crRNA array are recognized by cas genes. Cas genes bind to the repeats and cleave the repeats. This action can liberate the plurality of crRNAs. crRNAs can be subjected to further events to produce the mature crRNA form such as trimming (e.g., with an exonuclease).
- a crRNA may comprise all, or some, of the CRISPR repeat sequences.
- Interference refers to the stage in the CRISPR system that is functionally responsible for combating infection by a foreign invader.
- CRISPR interference follows a similar mechanism to RNA interference, which results in target RNA degradation and/or destabilization.
- RNA interference results in target RNA degradation and/or destabilization.
- Currently characterized CRISPR systems perform interference of a target nucleic acid by coupling crRNAs and Cas genes, thereby forming CRISPR ribonucleoproteins (RNPs).
- crRNA of the RNP guides the RNP to foreign invader nucleic acid, (e.g. , by recognizing the foreign invader nucleic acid through hybridization).
- Hybridized target foreign invader nucleic acid- crRNA units are subjected to cleavage by Cas proteins.
- Target nucleic acid interference typically requires a protospacer adjacent motif (PAM) in a target nucleic acid.
- PAM protospacer adjacent motif
- Class 1 is characterized by multi-unit effector molecules, while class 2 contains a single effector molecule.
- Class 1 systems comprise Type I, Type III, and Type IV systems.
- Class 2 systems comprise Type II, Type V, and Type VI systems.
- Type II system is commonly represented by cas9 genes. There are two strands of RNA in Type II systems: a crRNA and a tracrRNA. The duplex formed by the tracrRNA and crRNA is recognized by, and associates with Cas9, encoded by the cas9 gene, which combines the functions of the crRNA-effector complex with target DNA cleavage. Cas9 is directed to a target nucleic acid by a sequence of the crRNA that is complementary to, and hybridizes with, a sequence in the target nucleic acid.
- nucleic acid target sequence binding involves a Casl2 protein and the crRNA, as does the nucleic acid target sequence cleavage.
- the RuvC-like nuclease domain of Casl2 protein cleaves both strands of the nucleic acid target sequence in a sequential fashion (Swarts, et al. , Mol. Cell (2017) 66:221 -233), producing 5' overhangs, which differs from the fragments generated by Cas9 protein.
- There have been multiple subtypes of Type V systems identified so far (type V- A/B/C/D/E/F/G/H/I/K/L and CRISPR-Casl2j). All of them differ by the length of Cas protein, PAM sequence and whether they require tracrRNA for its functionality.
- Type V-A is represented by Casl2a protein.
- the Casl2a protein cleavage activity of Type V-A systems does not require hybridization of crRNA to tracrRNA to form a duplex; rather Type V-A systems use a single crRNA that has a stem-loop structure forming an internal duplex.
- Cas 12a protein binds the crRNA in a sequence- and structure-specific manner by recognizing the stem loop and sequences adjacent to the stem loop, most notably the nucleotides 5' of the spacer sequence, which hybridizes to the nucleic acid target sequence.
- This stem-loop structure is typically in the range of 15 to 19 nucleotides in length. Substitutions that disrupt this stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
- nucleic acid target sequence binding involves Casl2a and the crRNA, as does the nucleic acid target sequence cleavage.
- the RuvC-like nuclease domain of Cas 12a cleaves one strand of the double-stranded nucleic acid target sequence
- a putative nuclease domain cleaves the other strand of the double- stranded nucleic acid target sequence in a staggered configuration, producing 5' overhangs, which is different from the blunt ends generated by Cas9 cleavage. These 5' overhangs may facilitate insertion of DNA.
- the Casl2a cleavage activity of Type V systems also does not require hybridization of crRNA to tracrRNA to form a duplex, rather the crRNA of Type V systems uses a single crRNA that has a stemloop structure forming an internal duplex.
- Casl2a binds the crRNA in a sequence and structure specific manner that recognizes the stem loop and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence.
- This stem-loop structure is typically in the range of 15 to 19 nucleotides in length.
- the crRNA forms a stem-loop structure at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
- Type V-Fl is represented by Casl2fl protein.
- the Casl2fl protein cleavage activity of Type V-Fl systems does require hybridization of crRNA to tracrRNA to form a duplex.
- Casl2fl protein binds the tracrRNA/crRNA in a sequence- and structure-specific manner by recognizing the stem loops and sequences adjacent to the stem loops, most notably the nucleotides 5' of the spacer sequence, which hybridizes to the nucleic acid target sequence.
- These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA. Substitutions that disrupt these stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
- nucleic acid target sequence binding involves Casl2fl and the tracrRNA/crRNA, as does the nucleic acid target sequence cleavage.
- the RuvC- like nuclease domain of Casl2fl cleaves one strand of the double-stranded nucleic acid target sequence
- a putative nuclease domain cleaves the other strand of the double- stranded nucleic acid target sequence in a staggered configuration, producing 5' overhangs, which is different from the blunt ends generated by Cas9 cleavage. These 5' overhangs may facilitate insertion of DNA.
- the Casl2fl cleavage activity of Type V systems also does require hybridization of crRNA to tracrRNA to form a duplex.
- Casl2fl binds the tracrRNA/crRNA in a sequence and structure specific manner that recognizes the stem loops and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence.
- These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA.
- the tracrRNA/crRNA forms stem-loop structures at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
- Casl2b proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage
- Casl2c proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage
- Casl2d proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage
- Casl2e proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage
- Casl2b proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage
- Casl2a proteins ranging from approximately 1000-1500 amino acids, but also require an additional RNA (either a tracRNA or a scoutRNA) (see for example Harrington et al, Molecular Cell, Volume 79, Issue 3, 2020, Pages 416-424).
- Still other proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage include Casl2fl, Casl2f2, Casl2f3, and Casl2g, which are smaller in length to Casl2a proteins, ranging from approximately 300-900 amino acids, but also require a tracrRNA.
- Type VI systems include the Casl3a protein (also known as Class 2 candidate 2 protein, or C2c2) which does not share sequence similarity with other CRISPR effector proteins (see Abudayyeh, et al, Science (2016) 353:aaf5573). Cas 13 a proteins have two HEPN domains and possess single-stranded RNA cleavage activity. Casl3a proteins are similar to Casl2a proteins in requiring a crRNA for nucleic acid target sequence binding and cleavage, but not requiring tracrRNA.
- the present disclosure provides methods for identifying a class of novel RGNs that belong to a novel class of CRISPR-based systems.
- a method of identifying novel RGNs and the ImntRNA it interacts with comprising: a) identifying sequences in a genomic or metagenomic database encoding a CRISPR array; b) identifying one or more Open Reading Frames (ORFs) in said selected sequences within 10 kb of the CRISPR array; c) identifying putative novel RGN; d) identifying putative ImntRNA, comprising part of the sequence between the Cas operon and the CRISPR array as well as part of the first CRISPR repeat; and e) selecting RGN sequences that have the corresponding ImntRNA identified in step (e).
- ORFs Open Reading Frames
- the RGN is a class 2 CRISPR RGN.
- step (a) comprises comparing sequences in a genomic database to at least one seed sequence that encodes a CRISPR array and extracting sequences that comprise said seed sequence.
- said ORF in step (b) encodes a protein of at least 300 amino acids, preferably between 300 and 600 amino acids.
- step (c) comprises identifying sequences comprising RuvC domains. In some embodiments step (c) comprises identifying sequences comprising a tri split RuvC domain. In some embodiments step (c) comprises identifying sequences that do not comprise a HNH. In some embodiments step (c) comprises identifying sequences comprising a tri split RuvC domain and do not comprise a HNH domain.
- step (d) comprises identifying the ImntRNA sequences that form 4 or 5 hairpins, wherein the final hairpin comprises the antirepeat-repeat sequence from the CRISPR array.
- step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain a MGGGYGN4-8CRYCCK motif within 95 bases of the end of the effector protein.
- step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain a RYCGAGWRAGURYn 9 . 33RKAMWCUCGRY motif within 225 bases of the end of the effector protein.
- step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain an antirepeat sequence to the CRISPR repeat of at least 6 nucleotides within 40 bases upstream of the first CRISPR repeat.
- the method includes a step of verifying that no Casl or Cas2 or Cas4 sequences are present within lOkb of the CRISPR array.
- said genomic and metagenomic sequences are obtained from a sequence database such as Ensembl or NCBI genome databases.
- RNA-guided nucleases RGNs
- RGN RNA-guided nucleases
- An RGN provided herein binds to a target nucleotide sequence and hybridizes with the RNA molecule (ImntRNA) specific to the RNA-guided nuclease.
- the target sequence can then be subsequently cleaved by the RGN if the RGN polypeptide possesses nuclease activity.
- the presently disclosed RGNs can cleave nucleotides within a polynucleotide, functioning as an endonuclease.
- the disclosed RGNs can cleave nucleotides of a target nucleotide sequence within any position of a polynucleotide and thus function as both an endonuclease and exonuclease.
- the presently disclosed RGNs can be wild-type sequences derived from bacterial or archaeal species. Alternatively, the RGNs can be variants or fragments of wild-type polypeptides.
- the wild-type RGN can be modified to alter nuclease activity or alter PAM specificity, for example. In some embodiments, the RGN is not naturally -occurring. Such RGN have a single functioning nuclease domain.
- the RGNs lacks nuclease activity altogether or exhibits reduced nuclease activity and is referred to herein as nuclease-dead RGNs.
- Any method known in the art for introducing mutations into an amino acid sequence such as PCR-mediated mutagenesis and site-directed mutagenesis, can be used for generating nuclease-dead RGNs. (e.g. US9,790,490).
- nuclease dead RGNs can be targeted to particular genomic locations to alter the expression of a desired sequence.
- the binding of a nuclease-dead RNA-guided nuclease to a target sequence results in the repression of expression of the target sequence or a gene under transcriptional control by the target sequence by interfering with the binding of RNA polymerase or transcription factors within the targeted genomic region.
- the RGN e.g. , a nuclease- dead RGN
- its complexed ImntRNA further comprises an expression modulator that, upon binding to a target sequence, serves to either repress or activate the expression of the target sequence or a gene under transcriptional control by the target sequence.
- the expression modulator modulates the expression of the target sequence or regulated gene through epigenetic mechanisms.
- one or more of the nuclease-dead RGNs disclosed herein can be targeted to particular genomic locations to modify the sequence of a target polynucleotide through fusion to a base editing polypeptide, for example a deaminase polypeptide or active variant or fragment thereof that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another.
- a base editing polypeptide for example a deaminase polypeptide or active variant or fragment thereof that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another.
- the baseediting polypeptide can be fused to the RGN at its N-terminal or C-terminal end. Additionally, the base- editing polypeptide may be fused to the RGN via a peptide linker.
- a non-limiting example of a deaminase polypeptide that is useful for such compositions and methods include cytidine deaminase or the adenosine deaminase base editor described in Gaudelli et al. (2017) Nature 551 :464-471, and WO2018/027078.
- the RGN proteins of the present disclosure employ multiple domains distributed in a recognition lobe (REC) and a nuclease lobe (NUC) for substrate recognition and cleavage.
- the RGN comprises an amino-terminal domain (NTD) and a carboxy-terminal domain (CTD), which are connected by a linker loop.
- the NTD consists of two domains: the wedge (WED) and recognition (REC) domains.
- the CTD consists of the tri split RuvC domain, which is split by a second REC domain and a target nucleic acid-binding (TNB) domain.
- the RGN polypeptides of the present disclosure do not contain a HNH domain.
- an RGN polypeptide of the disclosure comprises, from the N- to C- terminus, a Reel domain, a wedge domain a RuvC-I subdomain, a Rec2 domain, a RuvC-II subdomain, a TNB domain, a RuvC-III subdomain, and a C terminal domain.
- the RGNs of the present disclosure may comprise one or more additional domains, e.g., one or more of a Rec domains.
- the RGN polypeptides provided herein are between 300 and 600 amino acids in size, between 400 and 550 amino acids in size, between 400 and 500 amino acids in size. Size variation may be dependent on the particular domain architecture of the RGN polypeptides provided herein.
- the RuvC domain may comprise multiple subdomains: RuvC-I, RuvC-II and RuvC-III.
- the subdomains may be separated by other sequences on the amino acid sequence of the protein.
- RuvC domains include any polypeptides having a structural similarity and/or sequence similarity to a RuvC domain described in the art.
- the RuvC domain may share a structural similarity and/or sequence similarity to a RuvC of Cas9.
- the RuvC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with RuvC domains.
- the RuvC domain comprise RuvC-I polypeptide, RuvC-II polypeptide, and RuvC-III polypeptide.
- Examples of the RuvC-I domain also include any polypeptides having a structural similarity and/or sequence similarity to a RuvC-I, II, and III domains described in the art, such as the corresponding domains of Cas9.
- the RuvC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a RuvC domain of Cas9.
- the RuvC domain of Cas9 consists of a six-stranded mixed beta-sheet flanked by a-helices and two additional two-stranded antiparallel beta-sheets (see e.g., Nishimasu et al. Cell, 2014).
- the RuvC domain of Cas9 shares structural similarity with the retroviral integrase superfamily members characterized by an RNase H fold, such as Escherichia coli RuvC (PDB code 1HJR, 14% identity, root-mean-square deviation (rmsd) of 3.6 A for 126 equivalent Ca atoms) and Thermus thermophilus RuvC (PDB code 4LD0, 12% identity, rmsd of 3.4 A for 131 equivalent Ca atoms).
- E. coli RuvC is a 3-layer alpha-beta sandwich containing a 5-stranded beta-sheet sandwiched between 5 alpha-helices.
- RuvC nucleases have four catalytic residues (e.g., Asp7, Glu70, Hisl43 and Aspl46 in T. therm ophilus RuvC), and cleave Holliday junctions (or structurally analogous cruciform junctions) through a two-metal mechanism. Asp 10 (Ala), Glu762, His983 and Asp986 of the Cas9 RuvC domain are located at positions similar to those of the catalytic residues of T. thermophilus RuvC.
- the REC domain may comprise multiple subdomains: RECI and REC2.
- the subdomains may be separated by other sequences on the amino acid sequence of the protein.
- Examples of REC domains include any polypeptides having a structural similarity and/or sequence similarity to a REC domain described in the art.
- the REC domain may share a structural similarity and/or sequence similarity to a REC of Casl2a.
- the REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with REC domains.
- the REC domain comprises RECI domain and REC2 domain.
- the RECI domain also include any polypeptides having a structural similarity and/or sequence similarity to a RECI and REC2 domains described in the art, such as the corresponding domains of Casl2a.
- the REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a REC domain of Casl2a.
- the REC domain of Casl2a consists of the RECI and REC2 domains where RECI comprises 13 alpha helices, and REC2 comprises ten alpha helices and two beta strands that form a small antiparallel sheet (see e.g., Yamano et al. (2016), Cell, 165, 4, Pages 949-962).
- the RGNs may comprise one or more modifications.
- the modified RGNs may be catalytically inactive (also referred as dead).
- a catalytically inactive or dead nuclease may have reduced or no nuclease activity compared to a wildtype counterpart nuclease.
- a catalytically inactive or dead nuclease may have nickase activity.
- a catalytically inactive or dead nuclease may not have nickase activity.
- Such a catalytically inactive or dead RGN may not make either double-strand or single-strand break on a target polynucleotide but may still bind or otherwise form complex with the target polynucleotide.
- the RGN polypeptide comprises a mutation of the catalytic RuvC- residue corresponding to D240A, E339A or D416A (catalytic residues of Ruvl, II, and III which are well known in the prior art) of SEQ ID NO:11 (mutated EGS0091) or equivalent residues of other RGN sequences provided herein (see for example Kleinstiver, et al. (2019) Nat Biotechnol 37, 276-282).
- the modifications of the RGN polypeptide may or may not cause an altered functionality. Some modifications will not result in an altered functionality include for instance codon optimization for expression into a particular host, or providing the nuclease with a particular marker. Modifications which may result in altered functionality may also include mutations, including point mutations, insertions, deletions, truncations (including split nucleases), etc., as well as chimeric RGNs (e.g., comprising domains from different orthologues or homologues) or fusion proteins.
- the RGN polypeptide comprises mutations in the DNA binding pocket to increase affinity for DNA leading to enhanced binding activity. Such enhanced binding activity can lead to increased cleavage activity or can lead to increased activity of the fusion domain.
- the DNA binding affinity mutation corresponds to D93R of Seq ID No: 1 (EGS0091) or equivalent residues of other RGN sequences provided herein (example mutant sequence being SEQ ID NO: 10-14) ( Figure 18 and Figure 19).
- Fusion proteins may include, for example, fusions with heterologous domains or functional domains (e.g., localization signals, enzymes).
- functional domains e.g., localization signals, enzymes
- various different modifications may be combined (e.g., a mutated nuclease which is catalytically inactive and which further is fused to a functional domain, such as for instance to induce DNA methylation or another nucleic acid modification, such as, for example, a mutation, a deletion, an insertion, a replacement).
- the RGNs can comprise at least one nuclear localization signal (NLS) to enhance transport of the RGN to the nucleus of a cell.
- Nuclear localization signals are known in the art and generally comprise a stretch of basic amino acids (see, e.g., Lange et al., J. Biol. Chem. (2007) 282:5101-5105).
- the RGN comprises 2, 3, or more nuclear localization signals.
- the nuclear localization signal(s) can be a heterologous NLS.
- Non-limiting examples of nuclear localization signals useful for the presently disclosed RGNs are the nuclear localization signals of SV40 Large T-antigen, nucleopasmin, and c-Myc (see. e.g., Ray et al.
- the RGN comprises the NLS sequence comprising the sequence of SEQ ID NO: 78 or 80.
- the RGN may comprise one or more NLS sequences at its N-terminus, C- terminus, or both the N-terminus and C- terminus.
- the RGN may comprise two NLS sequences at the N-terminal region and four NLS sequences at the C-terminal region.
- RGNs localization signal sequences known in the art that localize polypeptides to particular subcellular location(s) can also be used to target the RGNs, including, but not limited to, plastid localization sequences, mitochondrial localization sequences, and dual-targeting signal sequences that target to both the plastid and mitochondria (see, e.g., Nassoury and Morse (2005) Biochim Biophys Acta 1743:5-19; Herrmann and Neupert (2003) IUBMB Life 55:219-225; Soil (2002) Curr Opin Plant Biol 5:529-535; Carrie and Small (2013) Biochim Biophys Acta 1833:253-259).
- the RGNs comprise at least one cell- penetrating domain that facilitates cellular uptake of the RGN.
- Cell-penetrating domains are known in the art and generally comprise stretches of positively charged amino acid residues (i.e., polycationic cell- penetrating domains), alternating polar amino acid residues and non-polar amino acid residues (i.e., amphipathic cellpenetrating domains), or hydrophobic amino acid residues (i.e., hydrophobic cell- penetrating domains) (see, e.g., Milletti F. (2012) Drug Discov Today 17:850-860).
- a non-limiting example of a cellpenetrating domain is the trans-activating transcriptional activator (TAT) from the human immunodeficiency virus 1.
- TAT trans-activating transcriptional activator
- the nuclear localization signal, plastid localization signal, mitochondrial localization signal, dual targeting localization signal, and/or cell-penetrating domain can be located at the amino-terminus (N- terminus), the carboxyl-terminus (C-terminus), or in an internal location of the RNA-guided nuclease.
- the presently disclosed RGN polypeptides may comprise a detectable label or a purification tag.
- the detectable label or purification tag can be located at the N-terminus, the C-terminus, or an internal location of the RNA-guided nuclease, either directly or indirectly via a linker peptide.
- the RGN component of the fusion protein is a nuclease-dead RGN.
- the RGN component of the fusion protein is a RGN with nickase activity.
- RGNs that lack nuclease activity can be used to deliver a fused polypeptide, polynucleotide, or small molecule payload to a particular genomic location.
- the RGN polypeptide or guide RNA can be fused to a detectable label to allow for detection of a particular sequence.
- a nuclease-dead RGN can be fused to a detectable label (e.g., fluorescent protein) and targeted to a particular sequence associated with a disease to allow for detection of the disease-associated sequence.
- a detectable label is a molecule that can be visualized or otherwise observed.
- the detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to the RGN polypeptide that can be detected visually or by other means.
- Detectable labels that can be fused to the presently disclosed RGNs as a fusion protein include any detectable protein domain, including but not limited to, a fluorescent protein or a protein domain that can be detected with a specific antibody.
- Non-limiting examples of fluorescent proteins include green fluorescent proteins (e.g., GFP, EGFP, ZsGreen) and yellow fluorescent proteins (e.g., YFP, EYFP, ZsYellow).
- RGN polypeptides can also comprise a purification tag, which is any molecule that can be utilized to isolate a protein or fused protein from a mixture (e.g., biological sample, culture medium).
- purification tags include biotin, myc, maltose binding protein (MBP), and glutathione -S- transferase (GST).
- the presently disclosed RGNs can be fused to an effector domain (a fusion protein of an RGN and an effector domain), such as a cleavage domain, a deaminase domain, or an expression modulator domain, either directly or indirectly via a linker.
- an effector domain a fusion protein of an RGN and an effector domain
- Such effector domain can be located at the N-terminus, the C-terminus, or an internal location of the RNA-guided nuclease.
- the RGN component of the fusion protein is a nuclease-dead RGN.
- RGNs that are fused to a polypeptide or domain can be separated or joined by a linker.
- a linker joins a ImntRNA binding domain of an RNA guided nuclease and a base-editing polypeptide, such as a deaminase.
- the RGN fusion protein comprises a cleavage domain, which is any domain that is capable of cleaving a polynucleotide (i.e.. RNA, DNA) and includes, but is not limited to, restriction endonucleases and homing endonucleases (see, e.g Linn et al. (eds.) Nucleases, Cold Spring Harbor Laboratory Press, 1993).
- the RGN fusion protein comprises a deaminase domain that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another, and includes, but is not limited to, a cytidine deaminase or an adenosine deaminase base editor.
- the effector domain of the fusion protein can be an expression modulator domain, which is a domain that either serves to upregulate or downregulate transcription.
- the expression modulator domain can be an epigenetic modification domain, a transcriptional repressor domain or a transcriptional activation domain.
- the expression modulator of the RGN fusion protein comprises an epigenetic modification domain that covalently modifies DNA or histone proteins to alter histone structure and/or chromosomal structure without altering the DNA sequence, leading to changes in gene expression (i. e. , upregulation or downregulation).
- epigenetic modifications include acetylation or methylation of lysine residues, arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation of histone proteins, and methylation and hydroxymethylation of cytosine residues in DNA.
- Non-limiting examples of epigenetic modification domains include histone acetyltransferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
- the expression modulator of the fusion protein comprises a transcriptional repressor domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to reduce or terminate transcription of at least one gene.
- Transcriptional repressor domains are known in the art and include, but are not limited to IKB, and Kruppel associated box (KRAB) domains.
- the expression modulator of the fusion protein comprises a transcriptional activation domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to increase or activate transcription of at least one gene.
- Transcriptional activation domains are known in the art and include, but are not limited to, a VP 16 activation domain and an NF AT activation domain.
- nucleic acid-targeting effector protein-guide RNA complex as a whole may be associated with two or more functional domains.
- there may be two or more functional domains associated with the nucleic acid-targeting effector protein or there may be two or more functional domains associated with the guide RNA (via one or more adaptor proteins), or there may be one or more functional domains associated with the nucleic acid-targeting effector protein and one or more functional domains associated with the guide RNA (via one or more adaptor proteins).
- the fusion between the adaptor protein and the activator or repressor may include a linker.
- a linker For example, GlySer linkers GGGS can be used. They can be used in repeats of 3 or 6, 9 or even 12 or more, to provide suitable lengths, as required. Linkers can be used between the guide RNAs and the functional domain (activator or repressor), or between the nucleic acid-targeting effector protein and the functional domain (activator or repressor).
- a guide RNA comprises a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA).
- Native guide RNAs that comprise both a crRNA and a tracrRNA generally comprise two separate RNA molecules that hybridize to each other through the repeat sequence of the crRNA and the anti-repeat sequence of the tracrRNA.
- the present disclosure provides RGNs that can bind to a different type of an RNA, long monomeric nucleic acid targeting RNAs (ImntRNAs).
- ImntRNAs can be modified and/or engineered by truncating, inserting, and/or replacing some parts of the sequence to enhance and/or modify its activity.
- ImntRNA guides the RGN to a specific target nucleic acid sequence.
- a RGN is one or more RNA molecules (generally, one or two), that can bind to the RGN and guide the RGN to bind to a particular target nucleotide sequence, and in those instances wherein the RGN has nickase or nuclease activity, also cleave the target nucleotide sequence.
- ImntRNA is an engineered (or chimeric) ImntRNA.
- such ImntRNA molecule is a an engineered naturally-occurring sequence that does not necessary possess all the elements and/or sequences of the naturally-expressed RNA from the corresponding CRISPR genes.
- ImntRNA scaffold comprises in 5’ to 3’ orientation: a) a sequence interacting with the corresponding RGN, b) a sequence partially complementary to the nucleotides of the CRISPR repeat array sequence, c) at least 7 nucleotides from the 3 ’end of the CRISPR repeat array sequence , d) directly followed by a spacer sequence complementary to the target nucleic acid sequence of interest.
- the sequence interacting with the RGN has a length of 30-70 nucleotides. In some embodiments the sequence interacting with the RGN has the length of at least 30 nucleotides. In some embodiments the sequence interacting with the RGN could be at least 70-80, 80-90, 90-100, 100- 110, 110-120, 120-130, 130-140, 140-150, 150-160, 160-170, or 170-180 nucleotides. [136] In some embodiments, the sequence partially complementary to the nucleotides of the CRISPR repeat array sequence is not complementary to up to 6 nucleotides from the 3 ’end of the CRISPR repeat array sequence.
- the sequence that partially complementary to the nucleotides of the CRISPR repeat array sequence is not complementary to up to 6 nucleotides from the 3 ’end of the CRISPR repeat array sequence and is at least 80, 85, 90, 95, 99, or 100% complementary to the remaining nucleotides of the CRISPR repeat array sequence present in the scaffold.
- the sequence partially complementary to the nucleotides of the CRISPR repeat array sequence comprises at least 2 nucleotides at least partially complementary to the CRISPR array repeat sequence present in the scaffold.
- the at least some nucleotides from the 3 ’end of the CRISPR repeat array sequence and the sequence partially complementary to the CRISPR repeat array sequence form a hairpin structure.
- the ImntRNA scaffold comprises all the nucleotides of the CRISPR repeat sequence. In some embodiments the at least 7 nucleotides of CRISPR repeat array sequence are obtained by truncating nucleotides from the 5’ end of the full CRISPR array repeat sequence. In other embodiments the CRISPR repeat array sequence has the length of 30-35 nucleotides,
- the ImntRNA may comprise additional CRISPR array repeat and spacer sequences.
- the spacer sequences may be replaced with desired target sequences.
- the ImntRNA scaffold comprises a conserved sequence on or near a 5’ end of the scaffold. In some aspects, such conserved sequence forms a hairpin structure. In embodiments, the conserved nucleotide sequence is on a 5’ end of the scaffold. In some embodiments, the conserved sequence is MGGGYGN ⁇ CRYCCK (SEQ ID NO: 73).
- the scaffold comprises a stretch of nucleotides capable of forming 1 or more hairpin structures between the conserved sequence forming a hairpin structure on or near a 5 ’ end of the scaffold and sequence partially complementary to the nucleotides of the CRISPR repeat array sequence.
- the scaffold comprises the sequence RYCGAGWRAGURYNg. 33RKAMWCUCGRY (SEQ ID NO: 74).
- the part of the sequence that form the loop might be truncated.
- Truncations may include, but not limited to, altering the second hairpin sequence, altering the final hairpin sequence to consist of just the anti-repeat-repeat sequence with a small linker connecting them, or altering the final hairpin to consist of just 4, 5, 6, 7, 8, 9, 10, 11 or more base pairs of the anti- repeat-repeat sequence.
- a loop of ImntRNA is provided.
- the loop may be a stem loop or a tetra loop.
- loop forming sequences include MGGGYGN4-8CRYCCK (SEQ ID NO: 73), RYCGAGWRAGURYN9-33RKAMWCUCGRY(SEQ ID NO: 74), or RYCGAGWRAGMWCUCGRY (SEQ ID NO: 75).
- a ImntRNA comprises a spacer sequence, which can be re-programmed to direct sitespecific binding to a target sequence of a target polynucleotide.
- the spacer may also be referred to herein as part of the ImntRNA scaffold and may comprise an engineered heterologous sequence.
- the spacer sequence is engineered to be fully or partially complementary to the target sequence of interest.
- the spacer sequence can comprise from 8 nucleotides to 30 nucleotides, or more.
- the spacer sequence can be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
- the spacer sequence is 10 to 26 nucleotides in length, or 12 to 30 nucleotides in length. In particular embodiments, the spacer sequence is 30 nucleotides in length.
- the ImntRNA comprises a spacer sequence linked to a conserved nucleotide sequence, wherein the conserved nucleotide sequence may comprise one or more stem loops or optimized secondary structures.
- the conserved nucleotide sequence has a minimum length of 80 nts and at least 3 stem loops.
- the spacer sequence may be linked to all or part of the natural conserved nucleotide sequence.
- certain aspects of the RNA architecture can be modified, for example by addition, subtraction, or substitution of features, whereas certain other aspects of RNA architecture are maintained.
- the RGN binds to a ImntRNA sequence comprising a at least 6 nucleotides of the CRISPR repeat sequence set forth in Table 5, or an active variant or fragment thereof. In some embodiments, the RGN binds to a ImntRNA sequence comprising a truncated ImntRNA sequence as set forth in Table 5, or an active variant or fragment thereof.
- the ImntRNA can be synthesized chemically or via in vitro transcription.
- Assays for determining sequence-specific binding between a RGN and a guide RNA are known in the art and include, but are not limited to, in vitro binding assays between an expressed RGN and the guide RNA, which can be tagged with a detectable label (e.g., biotin) and used in a pull-down detection assay in which the lmntRNA:RGN complex is captured via the detectable label (e.g., with streptavidin beads).
- a control guide RNA with an unrelated sequence or structure to the guide RNA can be used as a negative control for non-specific binding of the RGN to RNA.
- the ImntRNA can be introduced into a target cell or an organ as an RNA molecule.
- the ImntRNA can be transcribed in vitro or chemically synthesized.
- a nucleotide sequence encoding the ImntRNA is introduced into the cell or an organ.
- the nucleotide sequence encoding the ImntRNA is operably linked to a suitable promoter.
- the promoter can be a native promoter or heterologous to the ImntRNA-encoding nucleotide sequence.
- the ImntRNA can be introduced into a target cell as a ribonucleoprotein complex, as described herein, wherein the ImntRNA is bound to an RNA-guided nuclease polypeptide.
- the ImntRNA directs an associated RGN to a particular target nucleotide sequence of interest through hybridization of the ImntRNA to the target nucleotide sequence.
- a target nucleotide sequence can comprise DNA, RNA, or a combination of both and can be single-stranded or double -stranded.
- a target nucleotide sequence can be genomic DNA (i.e., chromosomal DNA), plasmid DNA, or an RNA molecule ( e.g.
- the target nucleotide sequence can be bound (and in some embodiments, cleaved) by an RNA-guided nuclease in vitro or in a cell.
- the chromosomal sequence targeted by the RGN can be a nuclear, plastid or mitochondrial chromosomal sequence. In some embodiments, the target nucleotide sequence is unique in the target genome.
- the present disclosure also provides methods for binding and/or modifying a target nucleotide sequence of interest.
- the methods include delivering a system comprising at least one ImntRNA or a polynucleotide encoding the same, and at least one fusion polypeptide comprises an RGN of the invention and a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase, or a polynucleotide encoding the fusion polypeptide, to the target sequence or a cell, organelle, or embryo comprising the target sequence.
- methods comprise the use of a single RGN polypeptide in combination with multiple, distinct ImntRNAs, which can target multiple, distinct sequences within a single gene and/or multiple genes. Also encompassed herein are methods wherein multiple, distinct ImntRNAs are introduced in combination with multiple, distinct RGN polypeptides. These ImntRNAs and ImntRNA/RGN polypeptide systems can target multiple, distinct sequences within a single gene and/or multiple genes.
- PAM Protospacer adjacent motif
- the target nucleotide sequence of the RGNs is adjacent to a sequence called protospacer adjacent motif (PAM).
- a protospacer adjacent motif is generally within
- a protospacer adjacent motif is can be within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides from the target nucleotide sequence.
- the PAM is 5' of the target sequence for the presently disclosed RGNs.
- the PAM is a consensus sequence of 2-4 nucleotides, but in particular embodiments, can be 2, 3, 4, 5, 6, 7, 8, 9, or more nucleotides in length.
- the RGN or an active variant or fragment thereof binds respectively a target nucleotide sequence adjacent to a PAM sequence as set forth in Table 6.
- PAM sequence specificity for a given nuclease enzyme is affected by enzyme concentration (see, e.g. , Karvelis et al. (2015) Genome Biol 16:253), which may be modified by altering the promoter used to express the RGN, or the amount of ribonucleoprotein complex delivered to the cell, organelle, or embryo.
- the RGN can cleave the target nucleotide sequence at a specific cleavage site.
- a cleavage site is made up of the two particular nucleotides within a target nucleotide sequence between which the nucleotide sequence is cleaved by an RGN.
- the cleavage site can comprise the 1st and 2nd, 2nd and 3rd , 3rd and 4th , 4th and 5th , 5th and 6th , 7th and 8th , or 8th and 9th nucleotides from the PAM in the 3' direction.
- the cleavage site may be over 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more nucleotides from the PAM in the 3’ direction.
- the cleavage site is 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the PAM in the 3 ’ direction
- the target polynucleotide of an RGN system can be any polynucleotide endogenous or exogenous to the eukaryotic cell.
- the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell.
- the target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regions or introns).
- the target sequence is generally associated with a PAM (protospacer adjacent motif. The precise sequence and length requirements for the PAM differ depending on the RGN used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence).
- a target nucleic acid can be single stranded DNA (ssDNA) or double stranded DNA (dsDNA).
- ssDNA single stranded DNA
- dsDNA double stranded DNA
- a PAM is usually present adjacent to the target sequence of the target DNA (e.g., see discussion of the PAM elsewhere herein).
- the source of the target DNA can be the same as the source of the sample, e.g., as described below.
- the source of the target DNA can be any source.
- the target DNA is a viral DNA (e.g., a genomic DNA of a DNA virus).
- subject method can be for detecting the presence of a viral DNA amongst a population of nucleic acids (e.g., in a sample).
- a subject method can also be used for the cleavage of non-target ssDNAs in the present of a target DNA.
- a subject method can be used to promiscuously cleave non-target ssDNAs in the cell (ssDNAs that do not hybridize with the guide sequence of the guide RNA) when a particular target DNA is present in the cell (e.g., when the cell is infected with a virus and viral target DNA is detected).
- the target polynucleotide of a RGN/RNA complex may be a disease-associated gene or polynucleotides or a gene/ polynucleotide associated with a biological pathway.
- target DNAs include, but are not limited to, viral DNAs such as: a papovavirus (e.g., human papillomavirus (HPV), polyoma virus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus (VZV), Epstein-Barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, Pityriasis Rosea, kaposi's sarcoma- associated herpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus, ichta
- HSV herpes simple
- RGNs can be complexed to a ImntRNA (ImntRNA/RGN complex) in order to deliver Cas in proximity with a target nucleic acid sequence.
- the ImntRNA is a polynucleotide that site-specifically guides a Cas nuclease, or a deactivated Cas nuclease, to a target nucleic acid region.
- the binding specificity is determined jointly by the complementary region on the cognate guide and a short DNA motif (protospacer adjacent motif or PAM) juxtaposed to the complementary region.
- the spacer present in the ImntRNA specifically hybridizes to a target nucleic acid sequence and determines the location of a Cas protein's site-specific binding and nucleolytic cleavage.
- RNA/Cas complexes can be produced using methods well known in the art.
- the RNA of the complexes can be produced in vitro and RGN polypeptides can be recombinantly produced and then the RNA and RGN proteins can be complexed together using methods known in the art.
- cell lines constitutively expressing RGN proteins can be developed and can be transfected with the ImntRNA components, and complexes can be purified from the cells using standard purification techniques, such as but not limited to affinity, ion exchange and size exclusion chromatography. See, e.g. , Jinek M., et al, "A programmable dual-R A-guided DNA endonuclease in adaptive bacterial immunity," Science (2012) 337:816-821.
- the components i.e., the ImntRNA and RGN polynucleotides may be provided separately to a cell, e.g., using separate constructs, or together, in a single construct, or in any combination, and complexes can be purified as above.
- RNA-guided nucleases comprising at least 50, 100, 150, 200, 250, 300, 350, 400, 450 or more contiguous amino acid residues of the amino acid as provided above in Table 4.
- RNA-guided nucleases provided herein can comprise at least one nuclease domain (e.g ., DNase, RNase domain) and at least one RNA recognition and/or RNA binding domain to interact with ImntRNAs.
- Further domains that can be found in RNA-guided nucleases provided herein include, but are not limited to, DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
- the RNA-guided nucleases provided herein can comprise at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% to one or more of a DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
- variants or fragments While the activity of a variant or fragment may be altered compared to the polynucleotide or polypeptide of interest, the variant and fragment should retain the functionality of the polynucleotide or polypeptide of interest. For example, a variant or fragment may have increased activity, decreased activity, different spectrum of activity or any other alteration in activity when compared to the polynucleotide or polypeptide of interest.
- fragments and variants of naturally-occurring RGN polypeptides will retain sequence-specific, RNA-guided DNA-binding activity.
- fragments and variants of naturally-occurring RGN polypeptides such as those disclosed herein, will retain nuclease activity (single -stranded or double-stranded).
- a biologically active variant of an RGN polypeptide of the invention may differ by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 amino acid residue.
- the polypeptides can comprise an N- terminal or a C-terminal truncation, which can comprise at least a deletion of 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350 amino acids or more from either the N or C terminus of the polypeptide.
- a biologically active variant of an RGN polypeptide of the invention may differ by as few as 1 or 2 amino acids.
- Fragments and variants of naturally-occurring ImntRNAs will retain the ability, when part of a ImntRNA, to guide an RGN (complexed with the ImntRNA) to a target nucleotide sequence in a sequence-specific manner.
- Fragments and variants of naturally-occurring CRISPR repeats such as those disclosed herein, will retain the ability, when part of a ImntRNA, to bind to and guide an RNA-guided nuclease (complexed with the ImntRNA) to a target nucleotide sequence in a sequence-specific manner.
- the ImntRNA comprises the nucleotide sequence as set out in Table 5, or an active variant or fragment thereof that is capable of directing the sequence-specific binding of an associated RGN provided herein to a target sequence of interest.
- an active ImntRNA sequence variant of a wild-type sequence comprises a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the ImntRNA nucleotide sequence set forth in Table 5.
- an active ImntRNA sequence fragment of a wild-type sequence comprises at least 6, or more contiguous nucleotides of the corresponding CRISPR repeat nucleotide sequence set forth in Table 5.
- the anti-repeat region of the ImntRNA that is partially complementary to the CRISPR repeat sequence comprises from 6 nucleotides to 30 nucleotides, or more.
- the region of base pairing between the anti -repeat sequence and the CRISPR repeat sequence can be 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length.
- the anti repeat region of the ImntRNA that is fully or partially complementary to a CRISPR repeat sequence is at least 6 nucleotides in length.
- the degree of complementarity between a CRISPR repeat sequence and its corresponding anti-repeat sequence when optimally aligned using a suitable alignment algorithm, is more than 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more.
- Biologically active variants of a CRISPR repeat may differ by as few as 1-15 nucleotides, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 nucleotide.
- the polynucleotides can comprise a 5' or 3' truncation, which can comprise at least a deletion of 1,2, 3 ,4, 5, 6, 7, 8, 9 10, 11,12,13,14,15, 20, or 25 nucleotides or more from either the 5' or 3' end of the polynucleotide.
- the degree of complementarity between a spacer sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm is more than 80%, 85%, 90, 95%, 96%, 97%, 98%, 99%, or more.
- the spacer sequence is free of secondary structure, which can be predicted using any suitable polynucleotide folding algorithm known in the art, including but not limited to mFold (see, e.g., Zuker and Stiegler (1981) Nucleic Acids Res . 9: 133-148) and RNAfold (see, e.g., Gruber et al. (2008) Cell 106(l):23-24).
- RGN proteins can have varying sensitivity to mismatches between a spacer sequence in a ImntRNA and its target sequence that affects the efficiency of cleavage
- the present disclosure provides polynucleotides comprising the presently disclosed ImntRNAs and polynucleotides comprising a nucleotide sequence encoding the presently disclosed RNA-guided nucleases, ImntRNAs.
- Presently disclosed polynucleotides include those comprising or encoding a CRISPR repeat sequence comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a ImntRNA is capable of directing the sequence-specific binding of an associated RNA- guided nuclease to a target sequence of interest.
- polynucleotides comprising or encoding an ImntRNA comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a ImntRNA is capable of directing the sequence- specific binding of an associated RGN to a target sequence of interest.
- Polynucleotides are also provided that encode an RGN comprising the amino acid sequence set forth in Table 7 and active fragments or variants thereof that retain the ability to bind to a target nucleotide sequence in an RNA-guided sequence-specific manner.
- the expression cassette will include in the 5'-3' direction of transcription, a transcriptional initiation region (i.e.. a promoter) for an RGN-, and/or an ImntRNA- encoding polynucleotide and a transcriptional termination region (i.e.. termination region) functional in the organism of interest.
- the promoters in the context of the coding sequences mentioned above are capable of driving expression of a coding sequence in a host cell.
- the regulatory regions e.g., promoters, transcriptional regulatory regions, and translational termination regions
- an RGN-, and an ImntRNA- encoding polynucleotides are under control of different transcriptional initiation region (e.g. promoter) optimal for their individual expression.
- Additional regulatory signals may include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, and termination signals.
- the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
- adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
- in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
- a number of promoters can be used in the practice of the invention.
- the promoters can be selected based on the desired outcome.
- the nucleic acids can be combined with constitutive, inducible, growth stage-specific, cell type-specific, tissue-preferred, tissue-specific, or other promoters for expression in the organism of interest.
- the nucleotide comprises a tissue-preferred promoter.
- the nucleic acid molecules encoding a RGN, and/or ImntRNA comprise a cell type-specific promoter.
- the nucleic acid sequences encoding the RGNs and/or ImntRNAs can be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for example, for in vitro mRNA synthesis.
- the promoter sequence can be a pol I, pol II, pol III, T7, T3, U6, CMV or SP6 promoter sequence or a variation of a T7, T3, U6, CMV or SP6 promoter sequence.
- the expressed protein and/or RNAs can be purified for use in the methods of genome modification described herein. Any Pol II promoter or terminator could express the RGN.
- the choice of a promoter depends on how strongly RGN needs to be expressed and in what tissue type. In a preferred embodiment the RGN is expressed using is the CMV promoter.
- the ImntRNA can be expressed by Pol III promoters (e.g. U6 promoter) or Pol II promoters.
- the polynucleotide encoding the RGN also can be linked to a polyadenylation signal (e.g., SV40 polyA signal, or sv40 polyA with rmG terminator) and/or at least one transcriptional termination sequence. Additionally, the sequence encoding the RGN also can be linked to sequence(s) encoding at least one nuclear localization signal, at least one cell- penetrating domain, and/or at least one signal peptide capable of trafficking proteins to particular subcellular locations.
- a polyadenylation signal e.g., SV40 polyA signal, or sv40 polyA with rmG terminator
- the sequence encoding the RGN also can be linked to sequence(s) encoding at least one nuclear localization signal, at least one cell- penetrating domain, and/or at least one signal peptide capable of trafficking proteins to particular subcellular locations.
- the polynucleotide encoding ImntRNA can be linked to a stretch of A's for termination of expression.
- conservative variants include those sequences that, because of the degeneracy of the genetic code, encode the native amino acid sequence of the gene of interest.
- Naturally occurring allelic variants such as these can be identified with the use of well-known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques as outlined below.
- Variant polynucleotides also include synthetically derived polynucleotides, such as those generated, for example, by using site-directed mutagenesis but which still encode the polypeptide or the polynucleotide of interest.
- variants of a particular polynucleotide disclosed herein will have at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment programs and parameters described elsewhere herein.
- Variants of a particular polynucleotide disclosed herein can also be evaluated by comparison of the percent sequence identity between the polypeptide encoded by a variant polynucleotide and the polypeptide encoded by the reference polynucleotide. Percent sequence identity between any two polypeptides can be calculated using sequence alignment programs and parameters described elsewhere herein.
- the percent sequence identity between the two encoded polypeptides is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
- the presently disclosed polynucleotides encode an RGN polypeptide comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater identity to an amino acid sequence set forth in Table 4.
- Variant polynucleotides and proteins also encompass sequences and proteins derived from a mutagenic and recombinogenic procedure such as DNA shuffling. With such a procedure, one or more different RGN proteins disclosed herein is manipulated to create a new RGN protein possessing the desired properties. In this manner, libraries of recombinant polynucleotides are generated from a population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo.
- sequence motifs encoding a domain of interest may be shuffled between the RGN sequences provided herein and other known RGN genes to obtain a new gene coding for a protein with an improved property of interest, such as an increased Km in the case of an enzyme.
- Strategies for such DNA shuffling are known in the art. See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA
- the nucleic acid molecules encoding RGNs and/or ImntRNA can be codon optimized for expression in a target cell or tissue of interest.
- Such polynucleotide coding sequence normally has its frequency of codon usage designed to mimic the frequency of preferred codon usage or transcription conditions of a particular host cell. Expression in the particular host cell or organism is enhanced as a result of the alteration of one or more codons at the nucleic acid level such that the translated amino acid sequence is not changed.
- Nucleic acid molecules can be codon optimized, either wholly or in part. Codon tables and other references providing preference information for a wide range of organisms are available in the art.
- the polynucleotide encoding the RGN, and/or ImntRNA can be present in a vector or multiple vectors.
- Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, adenoviral vectors).
- the vector may comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like, (see e.g.
- the vector may also comprise a selectable marker gene for the selection of transformed cells.
- Selectable marker genes are utilized for the selection of transformed cells or tissues.
- Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
- the present disclosure provides a system for binding a target sequence of interest, wherein the system comprises at least one ImntRNA or a nucleotide sequence encoding the same, and at least one RGN or a nucleotide sequence encoding the same, as described above.
- the ImntRNA hybridizes to the target sequence of interest and also binds to the RGN polypeptide, thereby directing the RGN polypeptide to the target sequence.
- the RGN comprises an amino acid sequence set forth in Table 4 or an active variant or fragment thereof.
- the ImntRNA comprises 6 or more nucleotides of the CRISPR repeat sequence comprising the nucleotide sequence set forth in Table 5 or an active variant or fragment thereof.
- the ImntRNA comprises an RNA sequence comprising a nucleotide sequence set forth in Table 5, or an active variant or fragment thereof.
- the system comprises a RGN and at least one ImntRNA, wherein the RGN and ImntRNA are not naturally complexed in nature.
- the system comprises an ImntRNA and an RGN as described above. The rules of identifying of RGN and ImntRNA scaffold sequences are provided above.
- the system for binding a target sequence of interest can be a ribonucleoprotein complex, which is at least one molecule of an RNA bound to at least one protein.
- the ribonucleoprotein complexes provided herein comprise at least one ImntRNA as the RNA component and an RGN as the protein component.
- Such ribonucleoprotein complexes can be purified from a cell or organism that naturally expresses an RGN polypeptide and has been engineered to express a particular ImntRNA that is specific for a target sequence of interest.
- the ribonucleoprotein complex can be purified from a cell or organism that has been transformed with polynucleotides that encode an RGN polypeptide and a ImntRNA and cultured under conditions to allow for the expression of the RGN polypeptide and guide RNA.
- methods are provided for making an RGN polypeptide or an RGN ribonucleoprotein complex. Such methods comprise culturing a cell comprising a nucleotide sequence encoding an RGN polypeptide under conditions in which the RGN polypeptide is expressed. In some embodiments the cell further comprises a nucleotide sequence encoding a ImntRNA. The RGN polypeptide or RGN ribonucleoprotein can then be purified from the cultured cells.
- the RGN polypeptide can be recombinantly produced and comprises a purification tag to aid in its purification, including but not limited to, glutathione- S -transferase (GST), chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP), tandem affinity purification (TAP) tag, myc, AcV5, AU1, AU5, E, ECS, E2, FLAG, HA, nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, SI, T7, V5, VSV-G, 6xHis, lOxHis, biotin carboxyl carrier protein (BCCP), and calmodulin.
- GST glutathione- S -transferase
- CBP chitin binding protein
- TRX thioredoxin
- TAP tandem affinity purification
- the tagged RGN polypeptide or RGN ribonucleoprotein complex is purified using immobilized metal affinity chromatography. It will be appreciated that other similar methods known in the art may be used, including other forms of chromatography or for example immunoprecipitation, either alone or in combination.
- Some methods provided herein for binding and/or cleaving a target sequence of interest involve the use of an in vitro assembled RGN ribonucleoprotein complex.
- In vitro assembly of an RGN ribonucleoprotein complex can be performed using any method known in the art in which an RGN polypeptide is contacted with a guide RNA under conditions to allow for binding of the RGN polypeptide to the ImntRNA.
- the RGN polypeptide can be purified from a biological sample, cell lysate, or culture medium, produced via in vitro translation, or chemically synthesized.
- the ImntRNA can be purified from a biological sample, cell lysate, or culture medium, transcribed in vitro, or chemically synthesized.
- the RGN polypeptide and ImntRNA can be brought into contact in solution (e.g., buffered saline solution) to allow for in vitro assembly of the RGN ribonucleoprotein complex.
- components of the present invention are delivered using nanoscale delivery systems, such as nanoparticles.
- nanoscale delivery systems such as nanoparticles.
- liposomes and other particulate delivery systems can be used.
- vectors including the components of the present methods can be packaged in liposomes prior to delivery.
- expression constructs comprising nucleotide sequences encoding the RGNs, and/or ImntRNA can be used to transform organisms of interest.
- Methods for transformation involve introducing a nucleotide construct into an organism of interest.
- the methods of the invention do not require a particular method for introducing a nucleotide construct to a host organism, only that the nucleotide construct gains access to the interior of a target cell.
- the host cell can be a eukaryotic or prokaryotic cell.
- the eukaryotic host cell is a plant cell, a mammalian cell, or an insect cell.
- Methods for introducing nucleotide constructs into host cells are known in the art including, but not limited to, stable transformation methods, transient transformation methods, and virus-mediated methods.
- Transformation of a host cell may be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate coprecipitation, polycation DMSO technique, DEAE dextran procedure, and viral mediated, liposome mediated and other similar methods.
- Viral -mediated introduction of a polynucleotide encoding an RGN, and/or ImntRNA includes retroviral, lentiviral, adenoviral, and adeno-associated viral mediated introduction and expression.
- Transformation may result in stable or transient incorporation of the nucleic acid into the cell.
- the cells that have been transformed may be grown into a transgenic organism using well-known methods. Alternatively, cells that have been transformed may be introduced into an organism. These cells could have originated from the organism, wherein the cells are transformed in an ex vivo approach.
- the polynucleotides encoding the RGNs, and/or ImntRNAs can also be used to transform any prokaryotic cells, including but not limited to, archaea and bacteria.
- the polynucleotides encoding the RGNs, and/or ImntRNAs can be used to transform any eukaryotic cells, including but not limited to animal (e.g., mammals, insects, fish, birds, and reptiles), fungi, amoeba, algae, and yeast cells.
- Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a nucleic acid described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
- Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
- Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
- Lipofection is described in, e.g., US 5,049,386 and lipofection reagents are wildly available commercially. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
- viral based systems for the delivery of nucleic acids allows targeting a virus to specific cells and trafficking the viral payload to the nucleus.
- Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo).
- Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
- Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis- acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression.
- Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Viral. 66:2731-2739 (1992); Johann et al., J. Viral. 66: 1635-1640 (1992); Sommnerfelt et al., Viral. 176:58-59 (1990); Wilson et al., J. Viral. 63:2374-2378 (1989); Miller et al., 7. Viral. 65:2220-2224
- adenoviral based systems may be used.
- Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system.
- Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids. Construction of recombinant AAV vectors are described in a number of publications, including U.S. 5,173,414. Packaging cells are typically used to form virus particles that are capable of infecting a host cell.
- Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle.
- the vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide( s) to be expressed.
- the missing viral functions are typically supplied in trans by the packaging cell line.
- AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome.
- Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences.
- the cell line may also be infected with adenovirus as a helper.
- the helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid.
- the helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817
- the disclosure provides methods of modifying a target polynucleotide in a eukaryotic cell, which may be performed in vivo, ex vivo or in vitro.
- the method comprises sampling a cell or population of cells from a human or non-human animal or plant (including microalgae) and modifying the cell or cells. Culturing may occur at any stage ex vivo. The cell or cells may even be re-introduced into the non-human animal or plant (including micro-algae).
- the present disclosure provides methods for binding, cleaving, and/or modifying a target nucleotide sequence of interest.
- the methods include delivering a system comprising at least one ImntRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same to the target sequence or a cell, organelle, or embryo comprising the target sequence.
- the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof.
- the ImntRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as provided above, or an active variant or fragment thereof.
- the ImntRNA comprising the nucleotide sequence as provided above, or an active variant or fragment thereof.
- the RGN of the system may be nuclease dead RGN, or may be a fusion polypeptide.
- the fusion polypeptide comprises a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase.
- the RGN and/or ImntRNA is heterologous to the cell, organelle, or embryo to which the RGN and/or ImntRNA (or polynucleotide(s) encoding at least one of the RGN and ImntRNA) are introduced.
- the cell or embryo can then be cultured under conditions in which the ImntRNA and/or RGN polypeptide are expressed.
- the method comprises contacting a target sequence with an RGN ribonucleoprotein complex.
- the RGN ribonucleoprotein complex may comprise an RGN that is nuclease dead or has nickase activity.
- the RGN of the ribonucleoprotein complex is a fusion polypeptide comprising a base-editing polypeptide.
- the method comprises introducing into a cell, organelle, or embryo comprising a target sequence an RGN ribonucleoprotein complex.
- the RGN ribonucleoprotein complex can be one that has been purified from a biological sample, recombinantly produced and subsequently purified, or in vitro- assembled as described herein.
- the method can further comprise the in vitro assembly of the complex prior to contact with the target sequence, cell, organelle, or embryo.
- a purified or in vitro assembled RGN ribonucleoprotein complex can be introduced into a cell, organelle, or embryo using any method known in the art, including, but not limited to electroporation.
- an RGN polypeptide and/or polynucleotide encoding or comprising the ImntRNA can be introduced into a cell, organelle, or embryo using any method known in the art.
- the ImntRNA directs the RGN to bind to the target sequence in a sequence-specific manner.
- the RGN polypeptide cleaves the target sequence of interest upon binding.
- the target sequence can subsequently be modified via endogenous repair mechanisms, such as non-homologous end joining, or homology-directed repair with a provided donor polynucleotide.
- Methods to measure binding of an RGN polypeptide to a target sequence include chromatin immunoprecipitation assays, gel mobility shift assays, DNA pull-down assays, reporter assays, microplate capture and detection assays.
- methods to measure cleavage or modification of a target sequence include in vitro or in vivo cleavage assays wherein cleavage is confirmed using PCR, sequencing, or gel electrophoresis, with or without the attachment of an appropriate label (e.g., radioisotope, fluorescent substance) to the target sequence to facilitate detection of degradation products.
- NTEXPAR nicking triggered exponential amplification reaction
- the methods involve the use of a single type of RGN complexed with more than one ImntRNA.
- the more than one guide RNA can target different regions of a single gene or can target multiple genes.
- a double -stranded break introduced by an RGN polypeptide can be repaired by a non-homologous end-joining (NHEJ) repair process. Due to the error-prone nature of NHEJ, repair of the double -stranded break can result in a modification to the target sequence. Modification of the target sequence can result in the expression of an altered protein product or inactivation of a coding sequence.
- NHEJ non-homologous end-joining
- the donor sequence in the donor polynucleotide can be integrated into or exchanged with the target nucleotide sequence during the course of repair of the introduced double-stranded break, resulting in the introduction of the exogenous donor sequence.
- a donor polynucleotide thus comprises a donor sequence that is desired to be introduced into a target sequence of interest.
- the donor sequence alters the original target nucleotide sequence such that the newly integrated donor sequence will not be recognized and cleaved by the RGN.
- the donor polynucleotide can comprise a donor sequence flanked by compatible overhangs, allowing for direct ligation of the donor sequence to the cleaved target nucleotide sequence comprising overhangs by a non-homologous repair process during repair of the double -stranded break.
- a method for binding a target nucleotide sequence and detecting the target sequence, wherein the method comprises introducing into a cell, organelle, or embryo at least one guide RNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the guide RNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN and further comprises a detectable label, and the method further comprises detecting the detectable label.
- the detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to or incorporated within the RGN polypeptide that can be detected visually or by other means.
- the methods comprise introducing into a cell, organelle, or embryo at least one ImntRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the ImntRNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN.
- the nuclease-dead RGN is a fusion protein comprising an expression modulator domain (i.e., epigenetic modification domain, transcriptional activation domain or a transcriptional repressor domain) as described herein.
- an expression modulator domain i.e., epigenetic modification domain, transcriptional activation domain or a transcriptional repressor domain
- An RGN polypeptide of the present disclosure once activated by detection of a target DNA (double or single stranded), can cleave non-targeted single stranded DNA (ssDNA).
- ssDNA non-targeted single stranded DNA
- an RGN polypeptide is activated by a ImntRNA, after hybridization of ImntRNA with a target sequence of a target DNA, the protein becomes a nuclease that promiscuously cleaves ssDNAs.
- the target DNA is present in the sample, the result is cleavage of ssDNAs in the sample, which can be detected using any common detection method (such as using a labeled single stranded DNA).
- the present disclosure provides systems and methods for detecting a target DNA (double stranded or single stranded) in a sample.
- a detector DNA is used that is single stranded (ssDNA) and does not hybridize with the ImntRNA (i.e., the detector ssDNA is a non-target ssDNA).
- Such methods comprise steps of: (a) contacting the sample with: (i) an RGN polypeptide; (ii) a ImntRNA comprising: a region that binds to the RGN polypeptide, and a spacer sequence that hybridizes with the target DNA; and (iii) a detector DNA that is single stranded and does not hybridize with the spacer sequence; and (b) measuring a detectable signal produced by cleavage of the single stranded detector DNA by the RGN polypeptide, thereby detecting the target DNA.
- the contacting step of a subject method can be carried out in a composition comprising divalent metal ions.
- the contacting step can be carried out outside of a cell.
- the contacting step can be carried out inside a cell.
- the contacting step can be carried out in a cell in vitro.
- the contacting step can be also carried out in a cell ex vivo.
- the contacting step can be carried out in a cell in vivo.
- the sample is contacted for 2 hours or less (e.g., 1.5 hours or less, 1 hour or less, 40 minutes or less, 30 minutes or less, 20 minutes or less, 10 minutes or less, or 5 minutes or less, or 1 minute or less), under conditions that provide for trans cleavage of the detector DNA.
- Conditions that provide for trans cleavage of the detector DNA include temperature conditions such as from 17°C to 39°C (e.g., 37°C).
- the invention provides kits containing any one or more of the elements disclosed in the above methods and compositions.
- the kit comprises a vector system and instructions for using the kit.
- the vector system comprises (a) a first regulatory element operably linked to a ImntRNA sequence and one or more insertion sites for inserting a guide sequence downstream of the ImntRNA sequence, wherein when expressed, the ImntRNA directs sequence-specific binding of a CRISPR complex to a target sequence in a eukaryotic cell, wherein the CRISPR complex comprises a CRISPR enzyme complexed with (1) the ImntRNA sequence that is hybridized to the target sequence, and (2) a second regulatory element operably linked to an enzyme coding sequence encoding said CRISPR enzyme comprising a nuclear localization sequence.
- Elements may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube.
- the kit includes instructions in one or more languages.
- a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein.
- Reagents may be provided in any suitable container.
- a kit may provide one or more reaction or storage buffers.
- Reagents may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use ( e.g. in concentrate or lyophilized form).
- a buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof.
- the buffer is alkaline.
- the buffer has a pH from 7 to 10.
- the kit comprises one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element.
- the kit comprises a homologous recombination template polynucleotide.
- the invention provides methods for using one or more elements of a CRISPR system.
- the CRISPR complex of the invention provides an effective means for modifying a target polynucleotide.
- the CRISPR complex of the disclosure has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target polynucleotide in a multiplicity of cell types.
- An exemplary CRISPR complex comprises a CRISPR enzyme complexed with a guide sequence hybridized to a target sequence within the target polynucleotide.
- cells and organisms comprising a target sequence of interest that has been modified using a process or the system based an RGN, and/or ImntRNA as described herein. Also are provided cells and organisms comprising the system for binding a target sequence of interest comprising an RGN, and/or ImntRNA as described herein.
- the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof.
- the ImntRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as disclosed above, or an active variant or fragment thereof.
- the ImntRNA comprises the nucleotide sequence as disclosed above, or an active variant or fragment thereof.
- the modified cells can be eukaryotic (e.g., mammalian, plant, insect cell) or prokaryotic.
- organelles and embryos comprising at least one nucleotide sequence that has been modified by a process utilizing an RGN and/or ImntRNA as described herein.
- the genetically modified cells, organisms, organelles, and embryos can be heterozygous or homozygous for the modified nucleotide sequence.
- the chromosomal modification of the cell, organism, organelle, or embryo can result in altered expression (up-regulation or down-regulation), inactivation, or the expression of an altered protein product or an integrated sequence.
- the genetically modified cell, organism, organelle, or embryo is referred to as a “knock-out”.
- the knock out phenotype can be the result of a deletion mutation (i.e.. deletion of at least one nucleotide), an insertion mutation (i.e.. insertion of at least one nucleotide), or a nonsense mutation (/. e. , substitution of at least one nucleotide such that a stop codon is introduced).
- the chromosomal modification of a cell, organism, organelle, or embryo can produce a “knock-in”, which results from the chromosomal integration of a nucleotide sequence that encodes a protein.
- the coding sequence is integrated into the chromosome such that the chromosomal sequence encoding the wild-type protein is inactivated, but the exogenously introduced protein is expressed.
- the chromosomal modification results in the production of a variant protein product.
- the expressed variant protein product can have at least one amino acid substitution and/or the addition or deletion of at least one amino acid.
- the variant protein product encoded by the altered chromosomal sequence can exhibit modified characteristics or activities when compared to the wild-type protein, including but not limited to altered enzymatic activity or substrate specificity.
- the chromosomal modification can result in an altered expression pattern of a protein.
- chromosomal alterations in the regulatory regions controlling the expression of a protein product can result in the overexpression or downregulation of the protein product or an altered tissue or temporal expression pattern.
- the polypeptides, nucleic acids and vectors of the present disclosure may be in a form of a pharmaceutical composition.
- the pharmaceutical composition may comprise 1 ng to 10 mg of DNA encoding the RGN/lmntRNA- based system or RGN/lmntRNA-based system protein component, i.e., the fusion protein.
- the pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified lentiviral vector.
- the pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified AAV vector and a nucleotide sequence encoding the site-specific nuclease.
- the pharmaceutical compositions according to the present invention can be formulated according to the mode of administration to be used.
- compositions are injectable pharmaceutical compositions, they are sterile, pyrogen free and particulate free.
- An isotonic formulation is preferably used.
- additives for isotonicity may include sodium chloride, dextrose, mannitol, sorbitol and lactose.
- isotonic solutions such as phosphate buffered saline are preferred.
- Stabilizers include gelatin and albumin.
- a vasoconstriction agent is added to the formulation.
- the composition may further comprise a pharmaceutically acceptable excipient.
- the pharmaceutically acceptable excipient may be functional molecules as vehicles, adjuvants, carriers, or diluents.
- the pharmaceutically acceptable excipient may be a transfection facilitating agent, which may include surface active agents, such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs, vesicles such as squalene and squalene, hyaluronic acid, lipids, liposomes, calcium ions, viral proteins, polyanions, polycations, or nanoparticles, or other known transfection facilitating agents.
- ISCOMS immune-stimulating complexes
- LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs, vesicles such as squalene and squalene, hyaluronic acid,
- the transfection facilitating agent can be a polyanion, polycation, including poly-L- glutamate (LGS), or lipid.
- the transfection facilitating agent is poly-L-glutamate, and more preferably, the poly-L- glutamate is present in the composition for genome editing in skeletal muscle or cardiac muscle at a concentration less than 6 mg/ml.
- the transfection facilitating agent may also include surface active agents such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs and vesicles such as squalene and squalene, and hyaluronic acid may also be used administered in conjunction with the genetic construct.
- ISCOMS immune-stimulating complexes
- LPS analog including monophosphoryl lipid A
- muramyl peptides muramyl peptides
- quinone analogs and vesicles such as squalen
- the DNA vector encoding the composition may also include a transfection facilitating agent such as lipids, liposomes, including lecithin liposomes or other liposomes known in the art, as a DNA- liposome mixture (see for example W09324640), calcium ions, viral proteins, polyanions, polycations, or nanoparticles, or other known transfection facilitating agents.
- the transfection facilitating agent is a polyanion, polycation, including poly-L-glutamate (LGS), or lipid.
- RGNs The identification of RGNs was performed based on the methods described for example in Russel et al. (2020) The CRISPR Journal. V.3, no.6, pp. 462-469. Metagenomic samples were searched for open reading frames (ORFs) and those that were predicted to be genes were selected. A hidden Markov model (HMM) was used to compare the putative genes to profiles of known Cas proteins. The identified Cas genes were grouped into operons, and the operon type was determined based on the presence of known signature genes. For each genome, the CRISPR arrays were identified based on the presence of regularly spaced repeats. The subtype of each CRISPR array was predicted using machine learning. Cas operons were linked to CRISPR arrays if they were less than 10 kilobases apart.
- ORFs open reading frames
- HMM hidden Markov model
- PAM requirements for each RGN were determined using a bacterial PAM depletion assay essentially adapted from Kleinstiver et al. (2015) Nature 523:481-485 and Zetsche et al. (2015) Cell 163:759-771 and Karvelis et al. Nucleic Acids Res. 2020;48(9):5016-5023. Briefly, two plasmid libraries (C2 and T2) were generated in a pUC18 backbone (ampR), with each containing a distinct 23bp protospacer (target) sequence flanked by 8 random nucleotides (i.e., the PAM region). The target sequence and flanking PAM region of libraiy T2 and library C2 for each RGN are set forth in Table 13.
- the libraries were separately electroporated into T7 Express E. coll (NEB) cells harboring pET28b expression vectors containing an the minimal CRISPR operon with the repeat spacer array modified to contain three copies of the intended libraiy target sequence at the average spacer length of the CRISPR repeat.
- Sufficient library plasmid was used in the transformation reaction to obtain > 10 A 8 cfu.
- the modified minimal CRISPR operon in the pET28b backbone were under the control of T7 promoters.
- the transformation reaction was allowed to recover for 1 hr after which it was diluted into LB media containing carbenicillin and kanamycin and grown overnight.
- the PAM and protospacer regions of uncleaved plasmids were PCR-amplified and prepared for sequencing following published protocols (16s- metagenomic library prep guide 15044223B, Illumina, San Diego, CA). Deep sequencing (55bp paired end reads) was performed on a NextSeq (Illumina). Typically, 1-4M reads were obtained per amplicon. PAM regions were extracted, counted, and normalized to total reads for each sample. PAMs that lead to plasmid cleavage were identified by being underrepresented when compared to controls (i.e., when the library is transformed into E. colt containing the RGN but lacking an appropriate ImntRNA).
- an enrichment value was computed for each kmer as the difference between the library size-normalized read counts in the control sample and in the targeting sample. This value was rounded to the nearest integer for positive numbers and set to zero for negative numbers. Enrichment values were then summed across all kmers to yield a position frequency matrix, which was represented visually as a sequence logo using the command line utility weblogo. Those RGNs with consistency among the most enriched kmers — sequence logo information content > 0.2 when including the top 100 enriched kmers — and with qualitatively consistent PAMs across plasmid libraries (T2 and C2) were deemed to have bonafide PAMs.
- the final PAM for these RGNs were obtained by summing counts across both plasmid libraries, normalizing counts, computing kmer enrichment values, summing across kmers to yield a position frequency matrix, then visually representing the PAM as a sequence logo using the command line utility web logo.
- RNAseq pipeline was written to detect the expressed small non coding RNA transcripts. Processed boundaries were determined by sequence coverage of the native locus. RNA sequencing depth confirmed the boundaries of the ImntRNA by identifying the transcript containing the motif of MGGGY GN4- sCRYCCK (fig 12-15). Manual curation of RNAs was performed using secondary structure prediction by NUPACK, an RNA folding software. ImntRNA cassettes were prepared by DNA synthesis and were generally designed as follows (5'->3'): processed ImntRNA operably linked at its 3’ to 20-30 bp spacer sequence..
- ImntRNAs were synthesized by in vitro transcription of the ImntRNA cassettes with a GeneArtTM Precision gRNA Synthesis Kit (ThermoFisher). Activity was confirmed by combing the purified RGN along with the ImntRNA in 20 mM HEPES, pH 7.5 at 37°C, 25 mM NaCl, 1 mM DTT and 5 mM MgC12 (Reaction Buffer) for 30 min at 37C. The ribonucleo-protein (RNP) complex was then added in excess to linear dsDNA in the reaction buffer and incubated at various temperatures for 30 min. The reaction was then inactivated with EDTA, Proteinase K, and RNase A before being run on a denaturing PAGE gel. Cleavage was visually confirmed (Fig 15) and quantified (Fig 16).
- ImntRNAs and elmntRNAs were synthesized by in vitro transcription of the ImntRNA cassettes with a GeneArtTM Precision gRNA Synthesis Kit (ThermoFisher). Activity was confirmed by combing the purified RGN along with the ImntRNA in 20 mM HEPES, pH 7.5 at 37°C, 25 mM NaCl, 1 mM DTT and 5 mM MgC12 (Reaction Buffer) for 30 min at 37C.
- the ribonucleo-protein (RNP) complex was then added in excess to linear dsDNA or ssDNA or no target DNA that matched the target sequence of the ImntRNA along with M13 ssDNA in the reaction buffer and incubated at 37C for a time course.
- the reaction was then with EDTA, Proteinase K, and RNase A before being run on an agarose gel. Trans activated cleavage of the M13 ssDNA was visually confirmed. (Fig. 20).
- PAM requirements for each RGN were determined using a bacterial PAM depletion assay essentially adapted from Karvelis, et al. (2020). Nucleic acids research, 48(9), 5016-5023. Briefly plasmids contain the C2 library sequence or the wild type spacer sequences of two different spacers with the appropriate PAM sequence were synthesized in pTwist Amp High Copy or pTwist Chlor High Copy (Twist Biosciences), were transformed into T7 Express E. coli (NEB) cells harboring pET28b expression vectors containing an the minimal CRISPR operon with the repeat spacer array truncated to contain only three spacer sequences including the C2 library sequence in the distal most repeat.
- NEB T7 Express E. coli
- the modified minimal CRISPR operon in the pET28b backbone were under the control of T7 promoters.
- the transformation reaction was allowed to recover for 1 hr after which it was plated in a 1 : 10 serial dilution onto LB agar containing IPTG, kanamycin, and the specific antibiotic for the targeting plasmid, either carbenicillin or chloramphenicol, and grown overnight at 37C.
- the plates were compared to non-target control sequences that were on the same backbone, but did not contain a matching spacer/PAM sequence. Active interference was defined as a greater growth density on negative plasmids compared to target plasmids (Fig 7-10).
- the RGN was codon optimized for human expression and cloned into expression cassettes with a Nterm SV40 NLS, and a Cterm FLAGtag and c-myc NLS under control of a CMV promoter for mammalian expression.
- the sequences are set forth in Table 14.
- ImntRNA expression constructs encoding a single ImntRNA each under the control of a human RNA polymerase III U6 promoter were produced and introduced into an expression vector containing GFP under control of a CMV promoter. Guides were design to targeted regions of selected genes with the appropriate PAM for the system. The constructs described were introduced into mammalian cells.
- HEK293T cells Sigma
- DMEM Dulbecco
- Penicillin-Streptomycin Gibco
- Example 7 Demonstration of base editing activity on endogenous targets in mammalian cells
- the coding sequence of the identified RGN is codon-optimized for expression in mammalian cells and introduced into the expression cassette, which produces a fusion protein that includes a NLS tag at its N-terminal end operably linked to a codon optimized known eukaryotic deaminase sequence (APOBEC3A) at its C-terminal end.
- APOBEC3A codon optimized known eukaryotic deaminase sequence
- the deaminase is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to the RNA guided nuclease at its C-terminal end, that has been mutated to have an inactive RuvC domain (dEGS0091_D93R_D240A_D416A) (That is, it has been mutated into RGN that is catalytically dead).
- the RNA-guided DNA binding polypeptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to a uracil protecting peptide (developed in house).
- the uracil protecting peptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to a second NLS at its C-terminal end.
- Each of these expression cassettes is introduced into a vector capable of driving the expression of the fusion protein in mammalian cells.
- a vector capable of expressing ImntRNA to target the deaminase-RGN-UPP fusion protein to the determined genomic location was also produced.
- These guide RNAs can guide the deaminase-RGN-UPP fusion protein to the target genome sequence for base editing.
- liposome transfection Using liposome transfection, vectors capable of expressing the deaminase-RGN-UPP fusion protein and guide RNAs were transfected into HEK293T cells. For liposome transfection, the day before transfection, the cells were distributed in a 24-well plate of growth medium (DMEM + 10% fetal bovine serum + 1% penicillin/streptomycin) at 1.3x 10 5 cells/well. According to the manufacturer's instructions, use Lipofectamine® 3000 reagent (Thermo Fisher Scientific) to transfect 500 ng deaminase-RGN fusion expression vector and 500 ng guide RNA expression vector.
- DMEM + 10% fetal bovine serum + 1% penicillin/streptomycin 1.3x 10 5 cells/well.
- Lipofectamine® 3000 reagent Thermo Fisher Scientific
- genomic DNA is harvested from the transfected cells, and the DNA is sequenced and analyzed for the presence of targeted cytosine base editing mutations using CRISPResso2 (Clement K, et al Nat Biotechnol. 2019 Mar; 37(3):224-226. doi: 10.1038/s41587-019-0032-3. PubMed PMID: 30809026).
- Tables 16 and 17 show the editing rate of cytidine bases for each deaminase-RGN-UPP fusion protein and the rate for targeted cytosine deamination for the deaminase-RGN-UPP targeted to the same region as the catalytically dead RGN-UPP.
- Active cytosine base editing was defined as a greater than 5x increase increase of OD SNP base editing along the targeted window of the deaminase-RGN-UPP under investigation, and >4x increase of OT SNP base editing at highly mutated cytosines.
- the RGN With a catalytically dead nuclease domain, the RGN will not generate a detectable INDEL formation by itself When fused with an active deaminase that acts on the opposite strand a cytosine will be turned into a uracil.
- the uracil is rapidly removed from the DNA leaving an abasic site, and eventually a gap, on the strand opposite the strand bound by the ImntRNA. This can result in a double stranded break which is repaired through non- homologous end joining (NHEJ) and detectable INDEL formation, however, with the presence of an active UPP, the converted uracil is protected from removal and the abasic site is never removed and NHEJ does not occur.
- NHEJ non- homologous end joining
Abstract
The present invention provides novel RNA-guided nuclease proteins and nucleic acid targeting system comprising such for cleaving and/or modifying the target nucleotide of interest.
Description
NOVEL RNA-GUIDED NUCLEASES AND NUCLEIC ACID TARGETING SYSTEMS COMPRISING SUCH
[001] The present invention relates to novel RNA-guided nucleases (RGN) and nucleic acid targeting systems comprising such.
BACKGROUND
[002] Targeted genome editing or modification has been undergoing many changes in the past years since the discovery of novel technologies and systems. First systems relied on meganucleases, zinc finger fusion proteins or Transcription activator-like effector nucleases (TALENs), requiring the generation of chimeric nucleases with engineered, sequence- specific DNA-binding domains specific for each particular target sequence. RNA-guided nucleases (RGNs), such as the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (Cas) proteins allow for the targeting of specific sequences by using a short RNA sequence that specifically hybridizes with a particular target sequence. Such CRISPR systems because popular and gained multiple uses in research, diagnostics and therapeutics due to the ease of production of target-specific short RNA sequences and use of such with the same RGN protein. Such RGNs can be used to edit genomes through the introduction of a sequence-specific, double -stranded break that is either repaired and introduces a mutation or repaired by introducing a stretch of heterologous DNA. Inactive versions RGNs has been also widely used to target specific DNA or RNA regions and in combination with other proteins allowed to study and modulate multiple cellular processes and provide a useful tool for gene function study and modulation of their activity.
SUMMARY OF THE INVENTION
[003] The present invention provides novel RNA-guided nuclease (RGN) polypeptides, and long monomeric nucleic acid targeting RNAs (ImntRNAs), ImntRNA nucleic acid targeting systems comprising those, nucleic acid molecules encoding the same, and vectors and host cells comprising such nucleic acid molecules.
[004] Also provided are nucleic-acid targeting systems for binding a target nucleic acid sequence of interest, wherein the system comprises a RGN polypeptide and one or more RNA sequences targeting the nucleic acid of interest.
[005] Thus, methods disclosed herein are drawn to binding a target sequence of interest, and in some embodiments, cleaving or modifying the target sequence of interest. The target sequence of interest can be modified, for example, as a result of non-homologous end joining or homology-directed repair with an introduced donor sequence.
BRIEF DESCRIPTION OF THE DRAWINGS
[006] The present invention is described below by reference to the following figures.
[007] Figure 1. CRISPR locus of Type VM systems. Casl, 2, 4, or other Cas proteins are not found in any instances of Type VM systems. Effector Casl2m protein is absolutely essential, and carries out the DNA interference activity of the systems, and consists of a Rec domain and tri-split RuvC domain. The CRISPR array is at most 225 bases from the end of the Casl2m protein and the ImntRNA is always found within this region. The ImntRNA starts no more than 75 from the end of the effector protein and contains an antirepeat before the CRISPR array. Expression continues through the CRISPR array, but may be truncated to any number of spacer sequences.
[008] Figure 2. CRISPR locus of Type VF1 systems. Casl, 2, 4, or other Cas proteins are found in some instances of Type VF1 systems. Effector Casl2fl protein is absolutely essential, and carries out the DNA interference activity of the systems and consists of a Reel domain occasionally containing a Zn-finger domain, and a tri-split RuvC domain. The CRISPR array can be up to 5000 bases or longer from the Casl2fl protein. The systems are targeted by a dual RNA system, consisting of separately expressed tracrRNA and crRNA from the CRISPR array. The position of the tracrRNA can not be identified based on the position of the CRISPR array and the effector protein.
[009] Figure 3. Phylogenetic Tree of various Cas 12 effector proteins. Cas 12m proteins indicated with darker lines
[010] Figure 4. Phylogenetic Tree of various selected Casl2fl, Casl2f2, Casl2f3, and Casl2m effector proteins. Cas 12m proteins indicated with darker lines
[Oil] Figure 5. Example of consensus ImntRNA with four hairpins. (SEQ ID NO: 83) The final hairpin (5) is comprised on one side by an antirepeat (AR) to the other side consisting of sequence directly from the CRISPR array (REPEAT) followed by a short leader sequence before reprogrammable sequence (SPACER) for retargeting.
[012] Figure 6. A schematic representation of ImntRNA structure with anti-repeat sequence (sequence partially complementary to the CRISPR repeat sequence), CRISPR repeat sequence and reprogrammable targeting sequence (spacer).
[013] Figure 7 The ImntRNA of EGS0091 (SEQ ID NO: 38) with various truncation spots highlighted to make engineered ImntRNA (elmntRNA) designs. The elmntRNA designs were tested with removing hairpin 1 (diamond), partial truncation of hairpin 2 to contain only 9-11 bp on the original stem loop structure (oval), with partial truncation of hairpin 5 to replace extra sequence past the repeat-antirepeat
with a GAAA tetraloop (rectangle), and with partial truncation of hairpin 5 to the first mismatch in the repeat-antirepeat with a GAAA tetraloop (hexagon).
[014] Figures 8-11. Bacterial Plasmid Interference activity results showing active CRISPR interference for EGS0091-94 [015] Figures 12-15. Small RNAseq data showing boundaries of ImntRNA expression for EGS0091-94
[016] Figure 16. Temperature dependence of catalytic activity for EGS0091 D93R and EGS0094 D82R. EGS0091 D93R shows activity across a broad range of temperatures. EGS0094 D82R only shows activity above 37C, with increasing activity at higher temperatures.
[017] Figure 17. In vitro Cleavage by elmentRNA with EGS0091 D93R. Shows that Hairpin 1 is essential for activity, but that hairpin 2 and hairpin 5 can be truncated.
[018] Figure 18. Eukaryotic Editing with DNA binding affinity mutation for EGS0091. Increased Non- homologous End Joining rates with the DNA binding affinity mutation D93R in EGS0091 compared to wildtype.
[019] Figure 19. Eukaryotic Editing with ImntRNA designs. [020] Figure 20. Trans activated DNA cleavage by Casl2m protein at 30 min.
DETAILED DESCRIPTION OF THE INVENTION
Definitions
[024] The following definitions are used throughout the description.
[025] The term "adeno-associated virus" or "AAV" as used interchangeably herein refers to a small virus belonging to the genus Dependovirus of the Parvoviridae family that infects humans and some other primate species. AAV is not currently known to cause disease and consequently the virus causes a very mild immune response.
[026] As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
[027] The term “Casl2fl” refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type VF1 CRISPR system. The Casl2fl protein commonly used is from an uncultured archaeon (Uni). The Casl2fl protein may be mutated so that the nuclease activity is partly or completely inactivated. Casl2fl RGNs are described in Harrington et al (2018). Science, 362(6416), 839-842 and Karvelis et al (2020) Nucleic acids research, 48(9), 5016-5023.
[028] The term “Casl2m” refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type VM CRISPR system. The Casl2m protein and consists of a Reel domain and tri-split RuvC domain and may be mutated so that the nuclease activity is partly or completely inactivated.
[029] The term “Cas9” refers to type of an RGN that cleaves nucleic acid and is encoded by the CRISPR loci and is a part of the Type II CRISPR system. The Cas9 protein commonly used is from bacterial species Streptococcus pyogenes. The Cas9 protein may be mutated so that the nuclease activity is partly or completely inactivated.
[030] The term "complement" or "complementary" as used herein means a nucleic acid can mean Watson-Crick or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. The term "complementarity" refers to a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary.
[031] The term “CRISPR” (Clustered Regularly Interspaced Short Palindromic Repeats) refers to a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacteriophages that had previously infected the prokaryote. They are used to detect and destroy DNA from similar bacteriophages during subsequent infections.
[032] The term "CRISPR system" refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ("Cas") proteins, including sequences encoding a Cas protein, a tracr (trans -activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (containing a "direct repeat" and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred herein to as a
"spacer" in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
[033] The term “effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease. In some embodiments, an effective amount of a recombinase may refer to the amount of the recombinase that is sufficient to induce recombination at a target site specifically bound and recombined by the recombinase. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a nuclease, a recombinase, a hybrid protein, a fusion protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, the specific allele, genome, target site, cell, or tissue being targeted, and the agent being used.
[034] The term "enhancer" as used herein refers to non-coding DNA sequences containing multiple activator and repressor binding sites. Enhancers range from 200 bp to 1 kb in length and may be either proximal, 5' upstream to the promoter or within the first intron of the regulated gene, or distal, in introns of neighboring genes or intergenic regions far away from the locus. Through DNA looping, active enhancers contact the promoter dependently of the core DNA binding motif promoter specificity. 4 to 5 enhancers may interact with a promoter.
[035] As used herein, the term "fusion protein" refers to a chimeric protein created through the covalent or non-covalent joining of two or more genes, directly or indirectly, that originally coded for separate proteins. In some embodiments, the translation of the fusion gene results in a single polypeptide with functional properties derived from each of the original proteins.
[036] The term “gRNA”, also used interchangeably herein as a chimeric single guide RNA (“sgRNA”), refers to nucleic acid which is a fusion of two noncoding RNAs: a crRNA and a tracrRNA. “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains:(l) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas complex to the target); and (2) a domain that binds a Cas protein.
[037] An "isolated" or "purified" polypeptide, or biologically active portion thereof, is substantially or essentially free from components that normally accompany or interact with the polypeptide as found in its naturally occurring environment. Thus, an isolated or purified polypeptide is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of
chemical precursors or other chemicals when chemically synthesized. A protein that is substantially free of cellular material includes preparations of protein having less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of contaminating protein. When a protein or biologically active portion thereof is recombinantly produced, optimally culture medium represents less than 30%, 20%, 10%, 5%, or 1% (by dry weight) of chemical precursors or non-protein-of-interest chemicals.
[038] The term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is a polypeptide of 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150- 200 amino acids in length. Longer or shorter linkers are also contemplated.
[039] The term “ImntRNA” or “long monomeric nucleic acid targeting RNAs “herein refers to a wildtype or chimeric long monomeric nucleic acid targeting RNAs having sufficient complementarity with a target nucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of the associated RNA- guided nuclease described herein to the target nucleotide sequence. ImntRNA comprises sequences and secondary structures that are essential for its binding to an RGN and the target sequence of interest.
[040] The term “modification” in reference to a nucleic acid molecule refers to a change in the nucleotide sequence of the nucleic acid molecule, which can be a deletion, insertion, or substitution of one or more nucleotides, or a combination thereof.
[041] The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
[042] As used herein, the terms "nucleic acid," "nucleic acid sequence," "nucleotide sequence," "oligonucleotide," and "polynucleotide" are interchangeable and refer to a polymeric form of nucleotides.
The nucleotides may be deoxyribonucleotides (DNA), ribonucleotides (RNA), analogs thereof, or combinations thereof, and may be of any length. Polynucleotides may perform any function and may have any secondary and tertiary structures. The terms encompass known analogs of natural nucleotides and nucleotides that are modified in the base, sugar and/or phosphate moieties. Analogs of a particular nucleotide have the same base-pairing specificity (e.g., an analog of A base pairs with T). A polynucleotide may comprise one modified nucleotide or multiple modified nucleotides. Examples of modified nucleotides include fluorinated nucleotides, methylated nucleotides, and nucleotide analogs. Nucleotide structure may be modified before or after a polymer is assembled. Following polymerization, polynucleotides may be additionally modified via, for example, conjugation with a labeling component or target binding component. A nucleotide sequence may incorporate non-nucleotide components. The terms also encompass nucleic acids comprising modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, and have similar binding properties as a reference polynucleotide (e.g., DNA or RNA). Examples of such analogs include, but are not limited to, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), Locked Nucleic Acid (LNA™) (Exiqon, Inc., Woburn, MA) nucleosides, glycol nucleic acid, bridged nucleic acids, and morpholino structures. Polynucleotide sequences are displayed herein in the conventional 5' to 3' orientation unless otherwise indicated.
[043] The term “operably linked” as used herein means that expression of a gene is under the control of a promoter with which it is spatially connected. A promoter may be positioned 5' (upstream) or 3' (downstream) of a gene under its control. The distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.
[044] The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
[045] As used herein, the terms "peptide," "polypeptide," and "protein" are interchangeable and refer to polymers of amino acids. A polypeptide may be of any length. It may be branched or linear, it may be interrupted by non-amino acids, and it may comprise modified amino acids. The terms may be used to refer to an amino acid polymer that has been modified through, for example, acetylation, disulfide bond formation, glycosylation, lipidation, phosphorylation, cross- linking, and/or conjugation (e.g., with a labeling component or ligand). Polypeptide sequences are displayed herein in the conventional N-terminal to C-terminal orientation. Polypeptides and polynucleotides can be made using routine techniques in the
field of molecular biology (see, e.g., standard texts set forth above). Further, essentially any polypeptide or polynucleotide can be custom ordered from commercial sources.
[046] As used herein, "percentage of sequence identity" means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
[047] The term "promoter" as used herein means a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell. A promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same. A promoter may also comprise distal enhancer or repressor elements, which may be located as much as several thousand base pairs from the start site of transcription. A promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals.
[048] The term “RNA-guided endonuclease” or “RGN” is used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage.
[049] As used herein, "sequence identity" or "identity" in the context of two polynucleotides or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution.
[050] Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity". Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the
percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g. , as implemented in the program PC/GENE (Intelligenetics, Mountain View, California).
[051] As used herein the term “spacer sequence” or “spacer” refers to a part of ImntRNA nucleotide sequence that directly hybridizes with the target nucleotide sequence of interest.
[052] The term "subject" and "patient" as used herein interchangeably refers to any vertebrate, including, but not limited to, a mammal {e.g., cow, pig, camel, llama, horse, goat, rabbit, sheep, hamsters, guinea pig, cat, dog, rat, and mouse, a non-human primate (for example, a monkey, such as a cynomolgous or rhesus monkey, chimpanzee, etc.) and a human). In some embodiments, the subject may be a human or a non-human. The subject or patient may be undergoing other forms of treatment.
[053] The term “target region”, “target sequence” or “protospacer” as used interchangeably herein refers to the region of the target gene to which the CRISPR-based system targets.
[054] The term “TnpB” refers to type of an RGN that cleaves nucleic acid and is encoded by the IS200/IS605 transposase family. The TnpB protein commonly used is from Deinococcus radiodurans ISDra2. The TnpB protein may be mutated so that the nuclease activity is partly or completely inactivated. TnpB RGNs are described in Karvelis, et al. (2021) Nature 599, 692-696
[055] The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence
[056] The term “Type II CRISPR system” refers to effector system that carries out targeted DNA doublestrand break in four sequential steps, using a single effector enzyme, Cas9, to cleave dsDNA. Compared to the Type I and Type III effector systems, which require multiple distinct effectors acting as a complex, the Type II effector system may function in alternative contexts such as eukaryotic cells. The Type II
effector system consists of a long pre-crRNA, which is transcribed from the spacer-containing CRISPR locus, the Cas9 protein, and a tracrRNA, which is involved in pre-crRNA processing.
[057] The term “Type VM” refers to a novel type of CRISPR system provided in this disclosure comprising an effector protein, such as a RGN, with its translation termination located within 225 bp of a CRISPR repeat spacer array. No other common CRISPR proteins are found nearby. Additionally, the system comprises a long monomeric nucleic acid targeting RNAs (ImntRNAs) which can be found between the effector protein and the CRISPR array, and starts within 75 bp from the end of the effector protein.
[058] The term "vector" as used herein means a nucleic acid sequence containing an origin of replication. A vector may be a viral vector, bacteriophage, bacterial artificial chromosome or yeast artificial chromosome. A vector may be a DNA or RNA vector. A vector may be a self- replicating extrachromosomal vector, or a DNA plasmid.
[059] Unless otherwise defined herein, scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art.
CRISPR systems
[060] The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) genomic locus is found in the genomes of many prokaryotes. CRISPR loci provide resistance to viruses and phages in prokaryotes. In this way, the CRISPR loci functions as a type of immune system to help defend prokaryotes against foreign invaders. In such system the response to such foreign invaders starts by cleaving the genome of invading viruses and plasmids and integrating segments (termed protospacers) of the genomic DNA into the CRISPR locus of the host organism. The segments that are integrated into the host genome are known as “spacers”, which mediate protection from subsequent attack by the same (or sufficiently related) virus or plasmid. Expression involves transcription of the CRISPR locus and subsequent enzymatic processing to produce short mature CRISPR RNAs (crRNA), each containing a single spacer sequence. Interference is induced after the CRISPR RNAs associate with Cas proteins to form effector complexes, which are then targeted to complementary protospacers in foreign genetic elements to induce nucleic acid degradation.
[061] Currently, two classes of CRISPR systems have been described, Class 1 and Class 2, based upon the genes encoding the effector component. Class 1 systems have a multi-subunit crRNA-effector complex, whereas Class 2 systems have a single effector protein. Typical examples of Class 2 effector proteins are Cas9 and Cpfl (Cas 12a).
[062] To date six types (Types I-VI) of CRISPR systems have been described (for an overview see Makarova et al., Nature Reviews Microbiology (2015) 13:1-15). Class 1 systems comprise Type I, Type III and Type IV systems. Class 2 systems comprise Type II, Type V and Type VI systems.
[063] CRISPR loci include several short repeating sequences referred to as "repeats." The repeats can form hairpin structures and/or the repeats can be single-stranded sequences. The repeats occur in clusters. Repeats frequently diverge between species. Repeats are regularly interspaced with unique intervening sequences, referred to as "spacers," resulting in a repeat-spacer-repeat locus architecture. Spacers are sequences usually identical to or homologous to foreign invader sequences (such as viral sequences).
[064] In some cases, a spacer-repeat unit encodes a crisprRNA (crRNA). A crRNA refers to the mature form of the spacer-repeat unit. A crRNA contains a spacer sequence that is involved in targeting a target nucleic. crRNA has a region of complementarity to a potential DNA or RNA target sequence and in some cases, e.g., in currently characterized Type II systems, a second region that forms base-pair hydrogen bonds with a transactivating CRISPR RNA (tracrRNA) to form a secondary structure, typically to form at least a stem structure. Complex formation between tracrRNA/crRNA and a Cas protein results in conformational change of the Cas protein that facilitates binding to DNA, nuclease activities of the Cas protein, and crRNA- guided site-specific DNA cleavage by the nuclease. For a Cas protein/tracrRNA/crRNA complex to cleave a DNA target sequence, the DNA target sequence is adjacent to a cognate protospacer adjacent motif (PAM).
[065] Usually, CRISPR locus comprises polynucleotide sequences encoding for CRISPR Associated Genes (cas) genes. Cas genes are involved in the biogenesis and/or the interference stages of crRNA function. Cas genes display extreme sequence diversity between different species and homologs. Some Cas proteins comprise a specific set of domain structures.
[066] Mature crRNAs are processed from a longer polycistronic CRISPR locus transcript, also referred to as pre-crRNA array. A pre-crRNA array comprises a plurality of crRNAs. The repeats in the pre-crRNA array are recognized by cas genes. Cas genes bind to the repeats and cleave the repeats. This action can liberate the plurality of crRNAs. crRNAs can be subjected to further events to produce the mature crRNA form such as trimming (e.g., with an exonuclease). A crRNA may comprise all, or some, of the CRISPR repeat sequences.
[067] Interference refers to the stage in the CRISPR system that is functionally responsible for combating infection by a foreign invader. CRISPR interference follows a similar mechanism to RNA interference, which results in target RNA degradation and/or destabilization. Currently characterized CRISPR systems perform interference of a target nucleic acid by coupling crRNAs and Cas genes, thereby forming
CRISPR ribonucleoproteins (RNPs). crRNA of the RNP guides the RNP to foreign invader nucleic acid, (e.g. , by recognizing the foreign invader nucleic acid through hybridization). Hybridized target foreign invader nucleic acid- crRNA units are subjected to cleavage by Cas proteins. Target nucleic acid interference typically requires a protospacer adjacent motif (PAM) in a target nucleic acid.
[068] Currently CRISPR-Cas systems are divided into two main classes based on their effector molecules: class 1 and class 2. Class 1 is characterized by multi-unit effector molecules, while class 2 contains a single effector molecule. Class 1 systems comprise Type I, Type III, and Type IV systems. Class 2 systems comprise Type II, Type V, and Type VI systems.
[069] Type II system is commonly represented by cas9 genes. There are two strands of RNA in Type II systems: a crRNA and a tracrRNA. The duplex formed by the tracrRNA and crRNA is recognized by, and associates with Cas9, encoded by the cas9 gene, which combines the functions of the crRNA-effector complex with target DNA cleavage. Cas9 is directed to a target nucleic acid by a sequence of the crRNA that is complementary to, and hybridizes with, a sequence in the target nucleic acid.
[070] In Type V systems, nucleic acid target sequence binding involves a Casl2 protein and the crRNA, as does the nucleic acid target sequence cleavage. In Type V systems, the RuvC-like nuclease domain of Casl2 protein cleaves both strands of the nucleic acid target sequence in a sequential fashion (Swarts, et al. , Mol. Cell (2017) 66:221 -233), producing 5' overhangs, which differs from the fragments generated by Cas9 protein. There have been multiple subtypes of Type V systems identified so far (type V- A/B/C/D/E/F/G/H/I/K/L and CRISPR-Casl2j). All of them differ by the length of Cas protein, PAM sequence and whether they require tracrRNA for its functionality.
[071] Type V-A is represented by Casl2a protein. The Casl2a protein cleavage activity of Type V-A systems does not require hybridization of crRNA to tracrRNA to form a duplex; rather Type V-A systems use a single crRNA that has a stem-loop structure forming an internal duplex. Cas 12a protein binds the crRNA in a sequence- and structure-specific manner by recognizing the stem loop and sequences adjacent to the stem loop, most notably the nucleotides 5' of the spacer sequence, which hybridizes to the nucleic acid target sequence. This stem-loop structure is typically in the range of 15 to 19 nucleotides in length. Substitutions that disrupt this stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
[072] In Type V-A systems, nucleic acid target sequence binding involves Casl2a and the crRNA, as does the nucleic acid target sequence cleavage. In Type V-A systems, the RuvC-like nuclease domain of Cas 12a cleaves one strand of the double-stranded nucleic acid target sequence, and a putative nuclease domain cleaves the other strand of the double- stranded nucleic acid target sequence in a staggered
configuration, producing 5' overhangs, which is different from the blunt ends generated by Cas9 cleavage. These 5' overhangs may facilitate insertion of DNA.
[073] The Casl2a cleavage activity of Type V systems also does not require hybridization of crRNA to tracrRNA to form a duplex, rather the crRNA of Type V systems uses a single crRNA that has a stemloop structure forming an internal duplex. Casl2a binds the crRNA in a sequence and structure specific manner that recognizes the stem loop and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence. This stem-loop structure is typically in the range of 15 to 19 nucleotides in length. Substitutions that disrupt this stemloop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity. In Type V systems, the crRNA forms a stem-loop structure at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
[074] Type V-Fl is represented by Casl2fl protein. The Casl2fl protein cleavage activity of Type V-Fl systems does require hybridization of crRNA to tracrRNA to form a duplex. Casl2fl protein binds the tracrRNA/crRNA in a sequence- and structure-specific manner by recognizing the stem loops and sequences adjacent to the stem loops, most notably the nucleotides 5' of the spacer sequence, which hybridizes to the nucleic acid target sequence. These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA. Substitutions that disrupt these stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage activity.
[075] In Type V-Fl systems, nucleic acid target sequence binding involves Casl2fl and the tracrRNA/crRNA, as does the nucleic acid target sequence cleavage. In Type V-Fl systems, the RuvC- like nuclease domain of Casl2fl cleaves one strand of the double-stranded nucleic acid target sequence, and a putative nuclease domain cleaves the other strand of the double- stranded nucleic acid target sequence in a staggered configuration, producing 5' overhangs, which is different from the blunt ends generated by Cas9 cleavage. These 5' overhangs may facilitate insertion of DNA.
[076] The Casl2fl cleavage activity of Type V systems also does require hybridization of crRNA to tracrRNA to form a duplex. Casl2fl binds the tracrRNA/crRNA in a sequence and structure specific manner that recognizes the stem loops and sequences adjacent to the stem loop, most notably the nucleotide 5' of the spacer sequences that hybridizes to the nucleic acid target sequence. These stem-loop structure are typically in the range of 150 to 170 nucleotides in length for the tracrRNA and 28-34 nucleotides in length for the crRNA. Substitutions that disrupt this stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex do not abolish cleavage
activity. In Type V systems, the tracrRNA/crRNA forms stem-loop structures at the 5 ' end, and the sequence at the 3' end is complementary to a sequence in a nucleic acid target sequence.
[077] Other proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage include Casl2b, Casl2c, Casl2d, and Casl2e which are similar in length to Casl2a proteins, ranging from approximately 1000-1500 amino acids, but also require an additional RNA (either a tracRNA or a scoutRNA) (see for example Harrington et al, Molecular Cell, Volume 79, Issue 3, 2020, Pages 416-424). Still other proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage include Casl2fl, Casl2f2, Casl2f3, and Casl2g, which are smaller in length to Casl2a proteins, ranging from approximately 300-900 amino acids, but also require a tracrRNA.
[078] Type VI systems include the Casl3a protein (also known as Class 2 candidate 2 protein, or C2c2) which does not share sequence similarity with other CRISPR effector proteins (see Abudayyeh, et al, Science (2016) 353:aaf5573). Cas 13 a proteins have two HEPN domains and possess single-stranded RNA cleavage activity. Casl3a proteins are similar to Casl2a proteins in requiring a crRNA for nucleic acid target sequence binding and cleavage, but not requiring tracrRNA.
[079] While many of type V systems have been identified, the discovery and characterization of CRISPR systems is ongoing.
Method of identifying novel RGN polypeptides and RGN-based systems
[080] The present disclosure provides methods for identifying a class of novel RGNs that belong to a novel class of CRISPR-based systems. In particular a method of identifying novel RGNs and the ImntRNA it interacts with is provided, comprising: a) identifying sequences in a genomic or metagenomic database encoding a CRISPR array; b) identifying one or more Open Reading Frames (ORFs) in said selected sequences within 10 kb of the CRISPR array; c) identifying putative novel RGN; d) identifying putative ImntRNA, comprising part of the sequence between the Cas operon and the CRISPR array as well as part of the first CRISPR repeat; and e) selecting RGN sequences that have the corresponding ImntRNA identified in step (e).
[081] In some embodiments, the RGN is a class 2 CRISPR RGN.
[082] In some embodiments, step (a) comprises comparing sequences in a genomic database to at least one seed sequence that encodes a CRISPR array and extracting sequences that comprise said seed sequence.
[083] In some embodiments, said ORF in step (b) encodes a protein of at least 300 amino acids, preferably between 300 and 600 amino acids.
[084] In some embodiments step (c) comprises identifying sequences comprising RuvC domains. In some embodiments step (c) comprises identifying sequences comprising a tri split RuvC domain. In some embodiments step (c) comprises identifying sequences that do not comprise a HNH. In some embodiments step (c) comprises identifying sequences comprising a tri split RuvC domain and do not comprise a HNH domain.
[085] In some embodiments step (d) comprises identifying the ImntRNA sequences that form 4 or 5 hairpins, wherein the final hairpin comprises the antirepeat-repeat sequence from the CRISPR array. [086] In some embodiments step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain a MGGGYGN4-8CRYCCK motif within 95 bases of the end of the effector protein.
[087] In some embodiments step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain a RYCGAGWRAGURYn9. 33RKAMWCUCGRY motif within 225 bases of the end of the effector protein.
[088] In some embodiments step (d) comprises identifying CRISPR arrays within 225 bases of the end of putative novel RGNs and identifying intervening sequences that contain an antirepeat sequence to the CRISPR repeat of at least 6 nucleotides within 40 bases upstream of the first CRISPR repeat.
[089] In some embodiments the method includes a step of verifying that no Casl or Cas2 or Cas4 sequences are present within lOkb of the CRISPR array.
[090] In some embodiments, said genomic and metagenomic sequences are obtained from a sequence database such as Ensembl or NCBI genome databases.
RNA-guided nucleases (RGNs)
[091] The present disclosure provides novel class of RNA-guided nucleases (RGN) as defined by their amino acid sequences in Table 4 and which are also referred herein to as Casl 2m.
[093] An RGN provided herein binds to a target nucleotide sequence and hybridizes with the RNA molecule (ImntRNA) specific to the RNA-guided nuclease. The target sequence can then be subsequently cleaved by the RGN if the RGN polypeptide possesses nuclease activity. The presently disclosed RGNs can cleave nucleotides within a polynucleotide, functioning as an endonuclease. In some embodiments, the disclosed RGNs can cleave nucleotides of a target nucleotide sequence within any position of a polynucleotide and thus function as both an endonuclease and exonuclease.
[094] The presently disclosed RGNs can be wild-type sequences derived from bacterial or archaeal species. Alternatively, the RGNs can be variants or fragments of wild-type polypeptides. The wild-type RGN can be modified to alter nuclease activity or alter PAM specificity, for example. In some embodiments, the RGN is not naturally -occurring. Such RGN have a single functioning nuclease domain.
[095] In other embodiments, the RGNs lacks nuclease activity altogether or exhibits reduced nuclease activity and is referred to herein as nuclease-dead RGNs. Any method known in the art for introducing mutations into an amino acid sequence, such as PCR-mediated mutagenesis and site-directed mutagenesis, can be used for generating nuclease-dead RGNs. (e.g. US9,790,490).
[096] Alternatively, nuclease dead RGNs can be targeted to particular genomic locations to alter the expression of a desired sequence. In some embodiments, the binding of a nuclease-dead RNA-guided nuclease to a target sequence results in the repression of expression of the target sequence or a gene under transcriptional control by the target sequence by interfering with the binding of RNA polymerase or transcription factors within the targeted genomic region. In other embodiments, the RGN (e.g. , a nuclease- dead RGN) or its complexed ImntRNA further comprises an expression modulator that, upon binding to a target sequence, serves to either repress or activate the expression of the target sequence or a gene under transcriptional control by the target sequence. In some of these embodiments, the expression modulator modulates the expression of the target sequence or regulated gene through epigenetic mechanisms.
[097] In other embodiments, one or more of the nuclease-dead RGNs disclosed herein can be targeted to particular genomic locations to modify the sequence of a target polynucleotide through fusion to a base editing polypeptide, for example a deaminase polypeptide or active variant or fragment thereof that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another. The baseediting polypeptide can be fused to the RGN at its N-terminal or C-terminal end. Additionally, the base-
editing polypeptide may be fused to the RGN via a peptide linker. A non-limiting example of a deaminase polypeptide that is useful for such compositions and methods include cytidine deaminase or the adenosine deaminase base editor described in Gaudelli et al. (2017) Nature 551 :464-471, and WO2018/027078.
Structural elements of the RGN peptides
[098] The RGN proteins of the present disclosure employ multiple domains distributed in a recognition lobe (REC) and a nuclease lobe (NUC) for substrate recognition and cleavage. . In one embodiment, the RGN comprises an amino-terminal domain (NTD) and a carboxy-terminal domain (CTD), which are connected by a linker loop. The NTD consists of two domains: the wedge (WED) and recognition (REC) domains. The CTD consists of the tri split RuvC domain, which is split by a second REC domain and a target nucleic acid-binding (TNB) domain. However, unlike Cas9, the RGN polypeptides of the present disclosure do not contain a HNH domain.
[099] In one example embodiment, an RGN polypeptide of the disclosure comprises, from the N- to C- terminus, a Reel domain, a wedge domain a RuvC-I subdomain, a Rec2 domain, a RuvC-II subdomain, a TNB domain, a RuvC-III subdomain, and a C terminal domain.
[100] The RGNs of the present disclosure may comprise one or more additional domains, e.g., one or more of a Rec domains.
[101] In certain embodiments, the RGN polypeptides provided herein are between 300 and 600 amino acids in size, between 400 and 550 amino acids in size, between 400 and 500 amino acids in size. Size variation may be dependent on the particular domain architecture of the RGN polypeptides provided herein.
RuvC domain
[102] The RuvC domain may comprise multiple subdomains: RuvC-I, RuvC-II and RuvC-III. The subdomains may be separated by other sequences on the amino acid sequence of the protein.
[103] Examples of RuvC domains include any polypeptides having a structural similarity and/or sequence similarity to a RuvC domain described in the art. For example, the RuvC domain may share a structural similarity and/or sequence similarity to a RuvC of Cas9. In some examples, the RuvC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with RuvC domains.
[104] In some examples, the RuvC domain comprise RuvC-I polypeptide, RuvC-II polypeptide, and RuvC-III polypeptide. Examples of the RuvC-I domain also include any polypeptides having a structural similarity and/or sequence similarity to a RuvC-I, II, and III domains described in the art, such as the
corresponding domains of Cas9. The RuvC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a RuvC domain of Cas9.
[105] The RuvC domain of Cas9 consists of a six-stranded mixed beta-sheet flanked by a-helices and two additional two-stranded antiparallel beta-sheets (see e.g., Nishimasu et al. Cell, 2014). The RuvC domain of Cas9 shares structural similarity with the retroviral integrase superfamily members characterized by an RNase H fold, such as Escherichia coli RuvC (PDB code 1HJR, 14% identity, root-mean-square deviation (rmsd) of 3.6 A for 126 equivalent Ca atoms) and Thermus thermophilus RuvC (PDB code 4LD0, 12% identity, rmsd of 3.4 A for 131 equivalent Ca atoms). E. coli RuvC is a 3-layer alpha-beta sandwich containing a 5-stranded beta-sheet sandwiched between 5 alpha-helices. RuvC nucleases have four catalytic residues (e.g., Asp7, Glu70, Hisl43 and Aspl46 in T. therm ophilus RuvC), and cleave Holliday junctions (or structurally analogous cruciform junctions) through a two-metal mechanism. Asp 10 (Ala), Glu762, His983 and Asp986 of the Cas9 RuvC domain are located at positions similar to those of the catalytic residues of T. thermophilus RuvC.
REC domain and a target nucleic acid-binding (TNB) domain
[106] The REC domain may comprise multiple subdomains: RECI and REC2. The subdomains may be separated by other sequences on the amino acid sequence of the protein.
[107] Examples of REC domains include any polypeptides having a structural similarity and/or sequence similarity to a REC domain described in the art. For example, the REC domain may share a structural similarity and/or sequence similarity to a REC of Casl2a. In some examples, the REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with REC domains.
[108] In some examples, the REC domain comprises RECI domain and REC2 domain. Examples of the RECI domain also include any polypeptides having a structural similarity and/or sequence similarity to a RECI and REC2 domains described in the art, such as the corresponding domains of Casl2a. The REC domain may have an amino acid sequence that share at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or 100% sequence identity with a REC domain of Casl2a. The REC domain of Casl2a consists of the RECI and REC2 domains where RECI comprises 13 alpha helices, and REC2 comprises ten alpha helices and two beta strands that form a small antiparallel sheet (see e.g., Yamano et al. (2016), Cell, 165, 4, Pages 949-962).
Modified RGN peptides
[109] The RGNs may comprise one or more modifications. The modified RGNs may be catalytically inactive (also referred as dead). A catalytically inactive or dead nuclease may have reduced or no nuclease activity compared to a wildtype counterpart nuclease. In some cases, a catalytically inactive or dead nuclease may have nickase activity. In some cases, a catalytically inactive or dead nuclease may not have nickase activity. Such a catalytically inactive or dead RGN may not make either double-strand or single-strand break on a target polynucleotide but may still bind or otherwise form complex with the target polynucleotide.
[HO] In an embodiment, the RGN polypeptide comprises a mutation of the catalytic RuvC- residue corresponding to D240A, E339A or D416A (catalytic residues of Ruvl, II, and III which are well known in the prior art) of SEQ ID NO:11 (mutated EGS0091) or equivalent residues of other RGN sequences provided herein (see for example Kleinstiver, et al. (2019) Nat Biotechnol 37, 276-282).
[Hl] In one embodiment, the modifications of the RGN polypeptide may or may not cause an altered functionality. Some modifications will not result in an altered functionality include for instance codon optimization for expression into a particular host, or providing the nuclease with a particular marker. Modifications which may result in altered functionality may also include mutations, including point mutations, insertions, deletions, truncations (including split nucleases), etc., as well as chimeric RGNs (e.g., comprising domains from different orthologues or homologues) or fusion proteins.
[112] In some embodiments, the RGN polypeptide comprises mutations in the DNA binding pocket to increase affinity for DNA leading to enhanced binding activity. Such enhanced binding activity can lead to increased cleavage activity or can lead to increased activity of the fusion domain. In an embodiment, the DNA binding affinity mutation corresponds to D93R of Seq ID No: 1 (EGS0091) or equivalent residues of other RGN sequences provided herein (example mutant sequence being SEQ ID NO: 10-14) (Figure 18 and Figure 19).
[113] Fusion proteins may include, for example, fusions with heterologous domains or functional domains (e.g., localization signals, enzymes). In an embodiment, various different modifications may be combined (e.g., a mutated nuclease which is catalytically inactive and which further is fused to a functional domain, such as for instance to induce DNA methylation or another nucleic acid modification, such as, for example, a mutation, a deletion, an insertion, a replacement).
Localization signal sequences
[114] The RGNs can comprise at least one nuclear localization signal (NLS) to enhance transport of the RGN to the nucleus of a cell. Nuclear localization signals are known in the art and generally comprise a stretch of basic amino acids (see, e.g., Lange et al., J. Biol. Chem. (2007) 282:5101-5105). In
embodiments, the RGN comprises 2, 3, or more nuclear localization signals. The nuclear localization signal(s) can be a heterologous NLS. Non-limiting examples of nuclear localization signals useful for the presently disclosed RGNs are the nuclear localization signals of SV40 Large T-antigen, nucleopasmin, and c-Myc (see. e.g., Ray et al. (2015) Bioconjug Chem 26(6): 1004-7). In particular embodiments, the RGN comprises the NLS sequence comprising the sequence of SEQ ID NO: 78 or 80. The RGN may comprise one or more NLS sequences at its N-terminus, C- terminus, or both the N-terminus and C- terminus. For example, the RGN may comprise two NLS sequences at the N-terminal region and four NLS sequences at the C-terminal region.
[115] Other localization signal sequences known in the art that localize polypeptides to particular subcellular location(s) can also be used to target the RGNs, including, but not limited to, plastid localization sequences, mitochondrial localization sequences, and dual-targeting signal sequences that target to both the plastid and mitochondria (see, e.g., Nassoury and Morse (2005) Biochim Biophys Acta 1743:5-19; Herrmann and Neupert (2003) IUBMB Life 55:219-225; Soil (2002) Curr Opin Plant Biol 5:529-535; Carrie and Small (2013) Biochim Biophys Acta 1833:253-259).
[116] In certain embodiments, the RGNs comprise at least one cell- penetrating domain that facilitates cellular uptake of the RGN. Cell-penetrating domains are known in the art and generally comprise stretches of positively charged amino acid residues (i.e., polycationic cell- penetrating domains), alternating polar amino acid residues and non-polar amino acid residues (i.e., amphipathic cellpenetrating domains), or hydrophobic amino acid residues (i.e., hydrophobic cell- penetrating domains) (see, e.g., Milletti F. (2012) Drug Discov Today 17:850-860). A non-limiting example of a cellpenetrating domain is the trans-activating transcriptional activator (TAT) from the human immunodeficiency virus 1.
[117] The nuclear localization signal, plastid localization signal, mitochondrial localization signal, dual targeting localization signal, and/or cell-penetrating domain can be located at the amino-terminus (N- terminus), the carboxyl-terminus (C-terminus), or in an internal location of the RNA-guided nuclease.
Additional tags and labels
[118] The presently disclosed RGN polypeptides may comprise a detectable label or a purification tag. The detectable label or purification tag can be located at the N-terminus, the C-terminus, or an internal location of the RNA-guided nuclease, either directly or indirectly via a linker peptide. In some of these embodiments, the RGN component of the fusion protein is a nuclease-dead RGN. In other embodiments, the RGN component of the fusion protein is a RGN with nickase activity.
[119] RGNs that lack nuclease activity can be used to deliver a fused polypeptide, polynucleotide, or small molecule payload to a particular genomic location. In some of these embodiments, the RGN polypeptide or guide RNA can be fused to a detectable label to allow for detection of a particular sequence. As a non-limiting example, a nuclease-dead RGN can be fused to a detectable label (e.g., fluorescent protein) and targeted to a particular sequence associated with a disease to allow for detection of the disease-associated sequence.
[120] A detectable label is a molecule that can be visualized or otherwise observed. The detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to the RGN polypeptide that can be detected visually or by other means. Detectable labels that can be fused to the presently disclosed RGNs as a fusion protein include any detectable protein domain, including but not limited to, a fluorescent protein or a protein domain that can be detected with a specific antibody. Non-limiting examples of fluorescent proteins include green fluorescent proteins (e.g., GFP, EGFP, ZsGreen) and yellow fluorescent proteins (e.g., YFP, EYFP, ZsYellow).
[121] RGN polypeptides can also comprise a purification tag, which is any molecule that can be utilized to isolate a protein or fused protein from a mixture (e.g., biological sample, culture medium). Nonlimiting examples of purification tags include biotin, myc, maltose binding protein (MBP), and glutathione -S- transferase (GST).
Fusion proteins comprising the RGNs
[122] The presently disclosed RGNs can be fused to an effector domain (a fusion protein of an RGN and an effector domain), such as a cleavage domain, a deaminase domain, or an expression modulator domain, either directly or indirectly via a linker. Such effector domain can be located at the N-terminus, the C-terminus, or an internal location of the RNA-guided nuclease. In some embodiments, the RGN component of the fusion protein is a nuclease-dead RGN.
[123] RGNs that are fused to a polypeptide or domain can be separated or joined by a linker. In some embodiments, a linker joins a ImntRNA binding domain of an RNA guided nuclease and a base-editing polypeptide, such as a deaminase.
[124] In some embodiments, the RGN fusion protein comprises a cleavage domain, which is any domain that is capable of cleaving a polynucleotide (i.e.. RNA, DNA) and includes, but is not limited to, restriction endonucleases and homing endonucleases (see, e.g Linn et al. (eds.) Nucleases, Cold Spring Harbor Laboratory Press, 1993).
[125] In some embodiments, the RGN fusion protein comprises a deaminase domain that deaminates a nucleotide base, resulting in conversion from one nucleotide base to another, and includes, but is not limited to, a cytidine deaminase or an adenosine deaminase base editor.
[126] In some embodiments, the effector domain of the fusion protein can be an expression modulator domain, which is a domain that either serves to upregulate or downregulate transcription. The expression modulator domain can be an epigenetic modification domain, a transcriptional repressor domain or a transcriptional activation domain.
[127] In some of these embodiments, the expression modulator of the RGN fusion protein comprises an epigenetic modification domain that covalently modifies DNA or histone proteins to alter histone structure and/or chromosomal structure without altering the DNA sequence, leading to changes in gene expression (i. e. , upregulation or downregulation). Non-limiting examples of epigenetic modifications include acetylation or methylation of lysine residues, arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation of histone proteins, and methylation and hydroxymethylation of cytosine residues in DNA. Non-limiting examples of epigenetic modification domains include histone acetyltransferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
[128] In other embodiments, the expression modulator of the fusion protein comprises a transcriptional repressor domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to reduce or terminate transcription of at least one gene. Transcriptional repressor domains are known in the art and include, but are not limited to IKB, and Kruppel associated box (KRAB) domains.
[129] In yet other embodiments, the expression modulator of the fusion protein comprises a transcriptional activation domain, which interacts with transcriptional control elements and/or transcriptional regulatory proteins, such as RNA polymerases and transcription factors, to increase or activate transcription of at least one gene. Transcriptional activation domains are known in the art and include, but are not limited to, a VP 16 activation domain and an NF AT activation domain.
[130] It is also envisaged that the nucleic acid-targeting effector protein-guide RNA complex as a whole may be associated with two or more functional domains. For example, there may be two or more functional domains associated with the nucleic acid-targeting effector protein, or there may be two or more functional domains associated with the guide RNA (via one or more adaptor proteins), or there may
be one or more functional domains associated with the nucleic acid-targeting effector protein and one or more functional domains associated with the guide RNA (via one or more adaptor proteins).
[131] The fusion between the adaptor protein and the activator or repressor may include a linker. For example, GlySer linkers GGGS can be used. They can be used in repeats of 3 or 6, 9 or even 12 or more, to provide suitable lengths, as required. Linkers can be used between the guide RNAs and the functional domain (activator or repressor), or between the nucleic acid-targeting effector protein and the functional domain (activator or repressor).
ImntRNA
[132] In general, a guide RNA comprises a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA). Native guide RNAs that comprise both a crRNA and a tracrRNA generally comprise two separate RNA molecules that hybridize to each other through the repeat sequence of the crRNA and the anti-repeat sequence of the tracrRNA.
[133] The present disclosure, however, provides RGNs that can bind to a different type of an RNA, long monomeric nucleic acid targeting RNAs (ImntRNAs). Such ImntRNA molecules can be modified and/or engineered by truncating, inserting, and/or replacing some parts of the sequence to enhance and/or modify its activity. Such ImntRNA guides the RGN to a specific target nucleic acid sequence. Thus, a RGN’s respective ImntRNA is one or more RNA molecules (generally, one or two), that can bind to the RGN and guide the RGN to bind to a particular target nucleotide sequence, and in those instances wherein the RGN has nickase or nuclease activity, also cleave the target nucleotide sequence. In various embodiments ImntRNA is an engineered (or chimeric) ImntRNA. In many instances such ImntRNA molecule is a an engineered naturally-occurring sequence that does not necessary possess all the elements and/or sequences of the naturally-expressed RNA from the corresponding CRISPR genes.
[134] In one embodiment, ImntRNA scaffold comprises in 5’ to 3’ orientation: a) a sequence interacting with the corresponding RGN, b) a sequence partially complementary to the nucleotides of the CRISPR repeat array sequence, c) at least 7 nucleotides from the 3 ’end of the CRISPR repeat array sequence , d) directly followed by a spacer sequence complementary to the target nucleic acid sequence of interest.
[135] In some embodiments the sequence interacting with the RGN has a length of 30-70 nucleotides. In some embodiments the sequence interacting with the RGN has the length of at least 30 nucleotides. In some embodiments the sequence interacting with the RGN could be at least 70-80, 80-90, 90-100, 100- 110, 110-120, 120-130, 130-140, 140-150, 150-160, 160-170, or 170-180 nucleotides.
[136] In some embodiments, the sequence partially complementary to the nucleotides of the CRISPR repeat array sequence is not complementary to up to 6 nucleotides from the 3 ’end of the CRISPR repeat array sequence. In some embodiments, the sequence that partially complementary to the nucleotides of the CRISPR repeat array sequence is not complementary to up to 6 nucleotides from the 3 ’end of the CRISPR repeat array sequence and is at least 80, 85, 90, 95, 99, or 100% complementary to the remaining nucleotides of the CRISPR repeat array sequence present in the scaffold. In some embodiments, the sequence partially complementary to the nucleotides of the CRISPR repeat array sequence comprises at least 2 nucleotides at least partially complementary to the CRISPR array repeat sequence present in the scaffold. In some embodiments the at least some nucleotides from the 3 ’end of the CRISPR repeat array sequence and the sequence partially complementary to the CRISPR repeat array sequence form a hairpin structure. In some embodiments the ImntRNA scaffold comprises all the nucleotides of the CRISPR repeat sequence. In some embodiments the at least 7 nucleotides of CRISPR repeat array sequence are obtained by truncating nucleotides from the 5’ end of the full CRISPR array repeat sequence. In other embodiments the CRISPR repeat array sequence has the length of 30-35 nucleotides,
[137] The ImntRNA may comprise additional CRISPR array repeat and spacer sequences. The spacer sequences may be replaced with desired target sequences.
[138] In one aspect, the ImntRNA scaffold comprises a conserved sequence on or near a 5’ end of the scaffold. In some aspects, such conserved sequence forms a hairpin structure. In embodiments, the conserved nucleotide sequence is on a 5’ end of the scaffold. In some embodiments, the conserved sequence is MGGGYGN^CRYCCK (SEQ ID NO: 73).
[139] In some embodiments, the scaffold comprises a stretch of nucleotides capable of forming 1 or more hairpin structures between the conserved sequence forming a hairpin structure on or near a 5 ’ end of the scaffold and sequence partially complementary to the nucleotides of the CRISPR repeat array sequence.
[140] In some embodiments, the scaffold comprises the sequence RYCGAGWRAGURYNg. 33RKAMWCUCGRY (SEQ ID NO: 74). The part of the sequence that form the loop might be truncated.
[141] Some parts of ImntRNA sequences can be truncated and some of such truncations may enhance the activity. Truncations may include, but not limited to, altering the second hairpin sequence, altering the final hairpin sequence to consist of just the anti-repeat-repeat sequence with a small linker connecting them, or altering the final hairpin to consist of just 4, 5, 6, 7, 8, 9, 10, 11 or more base pairs of the anti- repeat-repeat sequence.
[142] In some embodiments, a loop of ImntRNA is provided. The loop may be a stem loop or a tetra loop. Examples of loop forming sequences include MGGGYGN4-8CRYCCK (SEQ ID NO: 73),
RYCGAGWRAGURYN9-33RKAMWCUCGRY(SEQ ID NO: 74), or RYCGAGWRAGMWCUCGRY (SEQ ID NO: 75).
[143] In an aspect, a ImntRNA comprises a spacer sequence, which can be re-programmed to direct sitespecific binding to a target sequence of a target polynucleotide. The spacer may also be referred to herein as part of the ImntRNA scaffold and may comprise an engineered heterologous sequence.
[144] The spacer sequence is engineered to be fully or partially complementary to the target sequence of interest. In various embodiments, the spacer sequence can comprise from 8 nucleotides to 30 nucleotides, or more. For example, the spacer sequence can be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. In some embodiments, the spacer sequence is 10 to 26 nucleotides in length, or 12 to 30 nucleotides in length. In particular embodiments, the spacer sequence is 30 nucleotides in length.
[145] In a particular embodiment, the ImntRNA comprises a spacer sequence linked to a conserved nucleotide sequence, wherein the conserved nucleotide sequence may comprise one or more stem loops or optimized secondary structures. In an embodiment, the conserved nucleotide sequence has a minimum length of 80 nts and at least 3 stem loops. In one embodiment, the spacer sequence may be linked to all or part of the natural conserved nucleotide sequence. In one embodiment, certain aspects of the RNA architecture can be modified, for example by addition, subtraction, or substitution of features, whereas certain other aspects of RNA architecture are maintained.
[146] In some embodiments, the RGN binds to a ImntRNA sequence comprising a at least 6 nucleotides of the CRISPR repeat sequence set forth in Table 5, or an active variant or fragment thereof. In some embodiments, the RGN binds to a ImntRNA sequence comprising a truncated ImntRNA sequence as set forth in Table 5, or an active variant or fragment thereof.
[148] The ImntRNA can be synthesized chemically or via in vitro transcription. Assays for determining sequence-specific binding between a RGN and a guide RNA are known in the art and include, but are not limited to, in vitro binding assays between an expressed RGN and the guide RNA, which can be tagged with a detectable label (e.g., biotin) and used in a pull-down detection assay in which the lmntRNA:RGN complex is captured via the detectable label (e.g., with streptavidin beads). A control guide RNA with an unrelated sequence or structure to the guide RNA can be used as a negative control for non-specific binding of the RGN to RNA.
[149] In certain embodiments, the ImntRNA can be introduced into a target cell or an organ as an RNA molecule. The ImntRNA can be transcribed in vitro or chemically synthesized. In other embodiments, a nucleotide sequence encoding the ImntRNA is introduced into the cell or an organ. In some of these embodiments, the nucleotide sequence encoding the ImntRNA is operably linked to a suitable promoter. The promoter can be a native promoter or heterologous to the ImntRNA-encoding nucleotide sequence.
[150] In various embodiments, the ImntRNA can be introduced into a target cell as a ribonucleoprotein complex, as described herein, wherein the ImntRNA is bound to an RNA-guided nuclease polypeptide. The ImntRNA directs an associated RGN to a particular target nucleotide sequence of interest through hybridization of the ImntRNA to the target nucleotide sequence. A target nucleotide sequence can comprise DNA, RNA, or a combination of both and can be single-stranded or double -stranded. A target nucleotide sequence can be genomic DNA (i.e., chromosomal DNA), plasmid DNA, or an RNA molecule ( e.g. , messenger RNA, ribosomal RNA, transfer RNA, micro RNA, small interfering RNA). The target nucleotide sequence can be bound (and in some embodiments, cleaved) by an RNA-guided nuclease in vitro or in a cell. The chromosomal sequence targeted by the RGN can be a nuclear, plastid or mitochondrial chromosomal sequence. In some embodiments, the target nucleotide sequence is unique in the target genome.
Multiple ImntRNA molecules
[151] The present disclosure also provides methods for binding and/or modifying a target nucleotide sequence of interest. The methods include delivering a system comprising at least one ImntRNA or a polynucleotide encoding the same, and at least one fusion polypeptide comprises an RGN of the invention and a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase, or a
polynucleotide encoding the fusion polypeptide, to the target sequence or a cell, organelle, or embryo comprising the target sequence.
[152] One of ordinary skill in the art will appreciate that any of the presently disclosed methods can be used to target a single target sequence or multiple target sequences. Thus, methods comprise the use of a single RGN polypeptide in combination with multiple, distinct ImntRNAs, which can target multiple, distinct sequences within a single gene and/or multiple genes. Also encompassed herein are methods wherein multiple, distinct ImntRNAs are introduced in combination with multiple, distinct RGN polypeptides. These ImntRNAs and ImntRNA/RGN polypeptide systems can target multiple, distinct sequences within a single gene and/or multiple genes.
Protospacer adjacent motif (PAM) sequences
[153] In the context of the RGNs disclosed herein, the target nucleotide sequence of the RGNs is adjacent to a sequence called protospacer adjacent motif (PAM). A protospacer adjacent motif is generally within
1 to 30 nucleotides from the target nucleotide sequence. A protospacer adjacent motif is can be within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides from the target nucleotide sequence. In some embodiments, the PAM is 5' of the target sequence for the presently disclosed RGNs. Generally, the PAM is a consensus sequence of 2-4 nucleotides, but in particular embodiments, can be 2, 3, 4, 5, 6, 7, 8, 9, or more nucleotides in length.
[155] In particular embodiments, the RGN or an active variant or fragment thereof binds respectively a target nucleotide sequence adjacent to a PAM sequence as set forth in Table 6.
[156] It is well-known in the art that PAM sequence specificity for a given nuclease enzyme is affected by enzyme concentration (see, e.g. , Karvelis et al. (2015) Genome Biol 16:253), which may be modified
by altering the promoter used to express the RGN, or the amount of ribonucleoprotein complex delivered to the cell, organelle, or embryo.
[157] Upon recognizing its corresponding PAM sequence, the RGN can cleave the target nucleotide sequence at a specific cleavage site. A cleavage site is made up of the two particular nucleotides within a target nucleotide sequence between which the nucleotide sequence is cleaved by an RGN. The cleavage site can comprise the 1st and 2nd, 2nd and 3rd , 3rd and 4th , 4th and 5th , 5th and 6th , 7th and 8th , or 8th and 9th nucleotides from the PAM in the 3' direction. In some embodiments, the cleavage site may be over 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more nucleotides from the PAM in the 3’ direction. Preferably the cleavage site is 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the PAM in the 3 ’ direction
Target nucleotide sequence
[158] The target polynucleotide of an RGN system can be any polynucleotide endogenous or exogenous to the eukaryotic cell. For example, the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell. The target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regions or introns). The target sequence is generally associated with a PAM (protospacer adjacent motif. The precise sequence and length requirements for the PAM differ depending on the RGN used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence).
[159] A target nucleic acid can be single stranded DNA (ssDNA) or double stranded DNA (dsDNA). When the target DNA is single stranded, there is no preference or requirement for a PAM sequence in the target DNA. However, when the target DNA is dsDNA, a PAM is usually present adjacent to the target sequence of the target DNA (e.g., see discussion of the PAM elsewhere herein). The source of the target DNA can be the same as the source of the sample, e.g., as described below.
[160] The source of the target DNA can be any source. In some cases, the target DNA is a viral DNA (e.g., a genomic DNA of a DNA virus). As such, subject method can be for detecting the presence of a viral DNA amongst a population of nucleic acids (e.g., in a sample). A subject method can also be used for the cleavage of non-target ssDNAs in the present of a target DNA. For example, if a method takes place in a cell, a subject method can be used to promiscuously cleave non-target ssDNAs in the cell (ssDNAs that do not hybridize with the guide sequence of the guide RNA) when a particular target DNA is present in the cell (e.g., when the cell is infected with a virus and viral target DNA is detected).
[161] The target polynucleotide of a RGN/RNA complex may be a disease-associated gene or polynucleotides or a gene/ polynucleotide associated with a biological pathway.
[162] Examples of possible target DNAs include, but are not limited to, viral DNAs such as: a papovavirus (e.g., human papillomavirus (HPV), polyoma virus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus (VZV), Epstein-Barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, Pityriasis Rosea, kaposi's sarcoma- associated herpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus, ichtadenovirus, mastadenovirus, siadeno virus); a poxvirus (e.g., smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus, pseudocowpox, bovine papular stomatitis virus; tanapox virus, yaba monkey tumor virus; molluscum contagiosum virus (MCV)); a parvovirus (e.g., adeno-associated virus (AAV), Parvovirus B 19, human bocavirus, bufavirus, human parv4 Gl); Gemini viridae; Nanoviridae; Phycodnaviridae; and the like. In some cases, the target DNA is parasite DNA. In some cases, the target DNA is bacterial DNA, e.g., DNA of a pathogenic bacterium.
RGNs complexes with RNA
[163] RGNs can be complexed to a ImntRNA (ImntRNA/RGN complex) in order to deliver Cas in proximity with a target nucleic acid sequence. The ImntRNA, is a polynucleotide that site-specifically guides a Cas nuclease, or a deactivated Cas nuclease, to a target nucleic acid region. The binding specificity is determined jointly by the complementary region on the cognate guide and a short DNA motif (protospacer adjacent motif or PAM) juxtaposed to the complementary region. The spacer present in the ImntRNA specifically hybridizes to a target nucleic acid sequence and determines the location of a Cas protein's site-specific binding and nucleolytic cleavage.
[164] RNA/Cas complexes can be produced using methods well known in the art. For example, the RNA of the complexes can be produced in vitro and RGN polypeptides can be recombinantly produced and then the RNA and RGN proteins can be complexed together using methods known in the art. Additionally, cell lines constitutively expressing RGN proteins can be developed and can be transfected with the ImntRNA components, and complexes can be purified from the cells using standard purification techniques, such as but not limited to affinity, ion exchange and size exclusion chromatography. See, e.g. , Jinek M., et al, "A programmable dual-R A-guided DNA endonuclease in adaptive bacterial immunity," Science (2012) 337:816-821.
[165] Alternatively, the components, i.e., the ImntRNA and RGN polynucleotides may be provided separately to a cell, e.g., using separate constructs, or together, in a single construct, or in any combination, and complexes can be purified as above.
Variants of RGNs
[166] The present disclosure provides RGNs comprising at least 50, 100, 150, 200, 250, 300, 350, 400, 450 or more contiguous amino acid residues of the amino acid as provided above in Table 4. RNA-guided nucleases provided herein can comprise at least one nuclease domain ( e.g ., DNase, RNase domain) and at least one RNA recognition and/or RNA binding domain to interact with ImntRNAs. Further domains that can be found in RNA-guided nucleases provided herein include, but are not limited to, DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains. In specific embodiments, the RNA-guided nucleases provided herein can comprise at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% to one or more of a DNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
[167] While the activity of a variant or fragment may be altered compared to the polynucleotide or polypeptide of interest, the variant and fragment should retain the functionality of the polynucleotide or polypeptide of interest. For example, a variant or fragment may have increased activity, decreased activity, different spectrum of activity or any other alteration in activity when compared to the polynucleotide or polypeptide of interest.
[168] Fragments and variants of naturally-occurring RGN polypeptides, such as those disclosed herein, will retain sequence-specific, RNA-guided DNA-binding activity. In particular embodiments, fragments and variants of naturally-occurring RGN polypeptides, such as those disclosed herein, will retain nuclease activity (single -stranded or double-stranded).
[169] A biologically active variant of an RGN polypeptide of the invention may differ by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 amino acid residue. In specific embodiments, the polypeptides can comprise an N- terminal or a C-terminal truncation, which can comprise at least a deletion of 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350 amino acids or more from either the N or C terminus of the polypeptide.
[170] A biologically active variant of an RGN polypeptide of the invention may differ by as few as 1 or 2 amino acids.
Variants of ImntRNA
[171] Fragments and variants of naturally-occurring ImntRNAs, such as those disclosed herein, will retain the ability, when part of a ImntRNA, to guide an RGN (complexed with the ImntRNA) to a target nucleotide sequence in a sequence-specific manner.
[172] Fragments and variants of naturally-occurring CRISPR repeats, such as those disclosed herein, will retain the ability, when part of a ImntRNA, to bind to and guide an RNA-guided nuclease (complexed with the ImntRNA) to a target nucleotide sequence in a sequence-specific manner.
[173] In particular embodiments, the ImntRNA comprises the nucleotide sequence as set out in Table 5, or an active variant or fragment thereof that is capable of directing the sequence-specific binding of an associated RGN provided herein to a target sequence of interest. In certain embodiments, an active ImntRNA sequence variant of a wild-type sequence comprises a nucleotide sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the ImntRNA nucleotide sequence set forth in Table 5. In certain embodiments, an active ImntRNA sequence fragment of a wild-type sequence comprises at least 6, or more contiguous nucleotides of the corresponding CRISPR repeat nucleotide sequence set forth in Table 5.
[174] In various embodiments, the anti-repeat region of the ImntRNA that is partially complementary to the CRISPR repeat sequence comprises from 6 nucleotides to 30 nucleotides, or more. For example, the region of base pairing between the anti -repeat sequence and the CRISPR repeat sequence can be 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. In particular embodiments, the anti repeat region of the ImntRNA that is fully or partially complementary to a CRISPR repeat sequence is at least 6 nucleotides in length. In some embodiments, the degree of complementarity between a CRISPR repeat sequence and its corresponding anti-repeat sequence, when optimally aligned using a suitable alignment algorithm, is more than 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more.
[175] Biologically active variants of a CRISPR repeat may differ by as few as 1-15 nucleotides, as few as 1-10, such as 6-10, as few as 5, as few as 4, as few as 3, as few as 2, or as few as 1 nucleotide. In specific embodiments, the polynucleotides can comprise a 5' or 3' truncation, which can comprise at least a deletion of 1,2, 3 ,4, 5, 6, 7, 8, 9 10, 11,12,13,14,15, 20, or 25 nucleotides or more from either the 5' or 3' end of the polynucleotide.
Variants of spacer sequences
[176] In some embodiments, the degree of complementarity between a spacer sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is more than 80%, 85%, 90, 95%, 96%, 97%, 98%, 99%, or more. In particular embodiments, the spacer sequence is free of secondary structure, which can be predicted using any suitable polynucleotide folding algorithm
known in the art, including but not limited to mFold (see, e.g., Zuker and Stiegler (1981) Nucleic Acids Res . 9: 133-148) and RNAfold (see, e.g., Gruber et al. (2008) Cell 106(l):23-24).
[177] RGN proteins can have varying sensitivity to mismatches between a spacer sequence in a ImntRNA and its target sequence that affects the efficiency of cleavage
Nucleotides encoding RNA-guided nucleases, and/or ImntRNA
[178] The present disclosure provides polynucleotides comprising the presently disclosed ImntRNAs and polynucleotides comprising a nucleotide sequence encoding the presently disclosed RNA-guided nucleases, ImntRNAs. Presently disclosed polynucleotides include those comprising or encoding a CRISPR repeat sequence comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a ImntRNA is capable of directing the sequence-specific binding of an associated RNA- guided nuclease to a target sequence of interest.
[179] Also disclosed are polynucleotides comprising or encoding an ImntRNA comprising the nucleotide sequence set forth in Table 5, or an active variant or fragment thereof that when comprised within a ImntRNA is capable of directing the sequence- specific binding of an associated RGN to a target sequence of interest. Polynucleotides are also provided that encode an RGN comprising the amino acid sequence set forth in Table 7 and active fragments or variants thereof that retain the ability to bind to a target nucleotide sequence in an RNA-guided sequence-specific manner.
[181] The expression cassette will include in the 5'-3' direction of transcription, a transcriptional initiation region (i.e.. a promoter) for an RGN-, and/or an ImntRNA- encoding polynucleotide and a transcriptional termination region (i.e.. termination region) functional in the organism of interest. The promoters in the
context of the coding sequences mentioned above are capable of driving expression of a coding sequence in a host cell. The regulatory regions (e.g., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. Preferably an RGN-, and an ImntRNA- encoding polynucleotides are under control of different transcriptional initiation region (e.g. promoter) optimal for their individual expression.
[182] Additional regulatory signals may include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, and termination signals.
[183] In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
[184] A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. The nucleic acids can be combined with constitutive, inducible, growth stage-specific, cell type-specific, tissue-preferred, tissue-specific, or other promoters for expression in the organism of interest.
[185] In some embodiments, the nucleotide comprises a tissue-preferred promoter. In some embodiments, the nucleic acid molecules encoding a RGN, and/or ImntRNA comprise a cell type-specific promoter.
[186] The nucleic acid sequences encoding the RGNs and/or ImntRNAs can be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for example, for in vitro mRNA synthesis. For example, the promoter sequence can be a pol I, pol II, pol III, T7, T3, U6, CMV or SP6 promoter sequence or a variation of a T7, T3, U6, CMV or SP6 promoter sequence. In such embodiments, the expressed protein and/or RNAs can be purified for use in the methods of genome modification described herein. Any Pol II promoter or terminator could express the RGN. The choice of a promoter depends on how strongly RGN needs to be expressed and in what tissue type. In a preferred embodiment the RGN is expressed using is the CMV promoter. The ImntRNA can be expressed by Pol III promoters (e.g. U6 promoter) or Pol II promoters.
[187] In certain embodiments, the polynucleotide encoding the RGN also can be linked to a polyadenylation signal (e.g., SV40 polyA signal, or sv40 polyA with rmG terminator) and/or at least one transcriptional termination sequence. Additionally, the sequence encoding the RGN also can be linked to
sequence(s) encoding at least one nuclear localization signal, at least one cell- penetrating domain, and/or at least one signal peptide capable of trafficking proteins to particular subcellular locations.
[188] In certain embodiments, the polynucleotide encoding ImntRNA can be linked to a stretch of A's for termination of expression.
Variants of polynucleotides
[189] For polynucleotides, conservative variants include those sequences that, because of the degeneracy of the genetic code, encode the native amino acid sequence of the gene of interest. Naturally occurring allelic variants such as these can be identified with the use of well-known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques as outlined below. Variant polynucleotides also include synthetically derived polynucleotides, such as those generated, for example, by using site-directed mutagenesis but which still encode the polypeptide or the polynucleotide of interest. Generally, variants of a particular polynucleotide disclosed herein will have at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment programs and parameters described elsewhere herein.
[190] Variants of a particular polynucleotide disclosed herein (i.e. , the reference polynucleotide) can also be evaluated by comparison of the percent sequence identity between the polypeptide encoded by a variant polynucleotide and the polypeptide encoded by the reference polynucleotide. Percent sequence identity between any two polypeptides can be calculated using sequence alignment programs and parameters described elsewhere herein. Where any given pair of polynucleotides disclosed herein is evaluated by comparison of the percent sequence identity shared by the two polypeptides they encode, the percent sequence identity between the two encoded polypeptides is at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
[191] In particular embodiments, the presently disclosed polynucleotides encode an RGN polypeptide comprising an amino acid sequence having at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater identity to an amino acid sequence set forth in Table 4.
[192] Variant polynucleotides and proteins also encompass sequences and proteins derived from a mutagenic and recombinogenic procedure such as DNA shuffling. With such a procedure, one or more different RGN proteins disclosed herein is manipulated to create a new RGN protein possessing the desired properties. In this manner, libraries of recombinant polynucleotides are generated from a
population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo. For example, using this approach, sequence motifs encoding a domain of interest may be shuffled between the RGN sequences provided herein and other known RGN genes to obtain a new gene coding for a protein with an improved property of interest, such as an increased Km in the case of an enzyme. Strategies for such DNA shuffling are known in the art. See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA
Codon-optimized sequences
[193] The nucleic acid molecules encoding RGNs and/or ImntRNA can be codon optimized for expression in a target cell or tissue of interest. Such polynucleotide coding sequence normally has its frequency of codon usage designed to mimic the frequency of preferred codon usage or transcription conditions of a particular host cell. Expression in the particular host cell or organism is enhanced as a result of the alteration of one or more codons at the nucleic acid level such that the translated amino acid sequence is not changed. Nucleic acid molecules can be codon optimized, either wholly or in part. Codon tables and other references providing preference information for a wide range of organisms are available in the art.
Vectors
[194] The polynucleotide encoding the RGN, and/or ImntRNA can be present in a vector or multiple vectors. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, adenoviral vectors). The vector may comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like, (see e.g. "Current Protocols in Molecular Biology" Ausubel et al, John Wiley & Sons, New York, 2003 or "Molecular Cloning: A Laboratory Manual" Sambrook & Russell, Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 3rd edition, 2001).
[195] The vector may also comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
Systems for binding to the target nucleotide sequence of interest
[196] The present disclosure provides a system for binding a target sequence of interest, wherein the system comprises at least one ImntRNA or a nucleotide sequence encoding the same, and at least one RGN or a nucleotide sequence encoding the same, as described above. The ImntRNA hybridizes to the target sequence of interest and also binds to the RGN polypeptide, thereby directing the RGN polypeptide to the target sequence. In some of these embodiments, the RGN comprises an amino acid sequence set forth in Table 4 or an active variant or fragment thereof. In various embodiments, the ImntRNA comprises 6 or more nucleotides of the CRISPR repeat sequence comprising the nucleotide sequence set forth in Table 5 or an active variant or fragment thereof. In some embodiments, the ImntRNA comprises an RNA sequence comprising a nucleotide sequence set forth in Table 5, or an active variant or fragment thereof. In particular embodiments, the system comprises a RGN and at least one ImntRNA, wherein the RGN and ImntRNA are not naturally complexed in nature. In some embodiments the system comprises an ImntRNA and an RGN as described above. The rules of identifying of RGN and ImntRNA scaffold sequences are provided above.
[197] The system for binding a target sequence of interest provided herein can be a ribonucleoprotein complex, which is at least one molecule of an RNA bound to at least one protein. The ribonucleoprotein complexes provided herein comprise at least one ImntRNA as the RNA component and an RGN as the protein component. Such ribonucleoprotein complexes can be purified from a cell or organism that naturally expresses an RGN polypeptide and has been engineered to express a particular ImntRNA that is specific for a target sequence of interest.
[198] Alternatively, the ribonucleoprotein complex can be purified from a cell or organism that has been transformed with polynucleotides that encode an RGN polypeptide and a ImntRNA and cultured under conditions to allow for the expression of the RGN polypeptide and guide RNA. Thus, methods are provided for making an RGN polypeptide or an RGN ribonucleoprotein complex. Such methods comprise culturing a cell comprising a nucleotide sequence encoding an RGN polypeptide under conditions in which the RGN polypeptide is expressed. In some embodiments the cell further comprises a nucleotide sequence encoding a ImntRNA. The RGN polypeptide or RGN ribonucleoprotein can then be purified from the cultured cells.
[199] Methods for purifying an RGN polypeptide or RGN ribonucleoprotein complex from a biological sample are known in the art (e.g., size exclusion and/or affinity chromatography, 2D-PAGE, HPLC, reversed-phase chromatography, immunoprecipitation). In particular, the RGN polypeptide can be recombinantly produced and comprises a purification tag to aid in its purification, including but not limited to, glutathione- S -transferase (GST), chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP), tandem affinity purification (TAP) tag, myc, AcV5, AU1, AU5, E,
ECS, E2, FLAG, HA, nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, SI, T7, V5, VSV-G, 6xHis, lOxHis, biotin carboxyl carrier protein (BCCP), and calmodulin.
[200] Generally, the tagged RGN polypeptide or RGN ribonucleoprotein complex is purified using immobilized metal affinity chromatography. It will be appreciated that other similar methods known in the art may be used, including other forms of chromatography or for example immunoprecipitation, either alone or in combination.
[201] Some methods provided herein for binding and/or cleaving a target sequence of interest involve the use of an in vitro assembled RGN ribonucleoprotein complex. In vitro assembly of an RGN ribonucleoprotein complex can be performed using any method known in the art in which an RGN polypeptide is contacted with a guide RNA under conditions to allow for binding of the RGN polypeptide to the ImntRNA. The RGN polypeptide can be purified from a biological sample, cell lysate, or culture medium, produced via in vitro translation, or chemically synthesized. The ImntRNA can be purified from a biological sample, cell lysate, or culture medium, transcribed in vitro, or chemically synthesized. The RGN polypeptide and ImntRNA can be brought into contact in solution (e.g., buffered saline solution) to allow for in vitro assembly of the RGN ribonucleoprotein complex.
Delivery of the components to the target cells
[202] In some aspects, components of the present invention are delivered using nanoscale delivery systems, such as nanoparticles. Additionally, liposomes and other particulate delivery systems can be used. For example, vectors including the components of the present methods can be packaged in liposomes prior to delivery.
[203] As indicated, expression constructs comprising nucleotide sequences encoding the RGNs, and/or ImntRNA can be used to transform organisms of interest. Methods for transformation involve introducing a nucleotide construct into an organism of interest.
[204] The methods of the invention do not require a particular method for introducing a nucleotide construct to a host organism, only that the nucleotide construct gains access to the interior of a target cell. The host cell can be a eukaryotic or prokaryotic cell. In a particular embodiment, the eukaryotic host cell is a plant cell, a mammalian cell, or an insect cell. Methods for introducing nucleotide constructs into host cells are known in the art including, but not limited to, stable transformation methods, transient transformation methods, and virus-mediated methods.
[205] It is recognized that other exogenous or endogenous nucleic acid sequences or DNA fragments may also be incorporated into the host cell. Transformation of a host cell may be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment,
electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate coprecipitation, polycation DMSO technique, DEAE dextran procedure, and viral mediated, liposome mediated and other similar methods. Viral -mediated introduction of a polynucleotide encoding an RGN, and/or ImntRNA includes retroviral, lentiviral, adenoviral, and adeno-associated viral mediated introduction and expression.
[206] Transformation may result in stable or transient incorporation of the nucleic acid into the cell.
[207] The cells that have been transformed may be grown into a transgenic organism using well-known methods. Alternatively, cells that have been transformed may be introduced into an organism. These cells could have originated from the organism, wherein the cells are transformed in an ex vivo approach.
[208] The polynucleotides encoding the RGNs, and/or ImntRNAs can also be used to transform any prokaryotic cells, including but not limited to, archaea and bacteria.
[209] The polynucleotides encoding the RGNs, and/or ImntRNAs can be used to transform any eukaryotic cells, including but not limited to animal (e.g., mammals, insects, fish, birds, and reptiles), fungi, amoeba, algae, and yeast cells.
[210] Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a CRISPR system to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a nucleic acid described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
[211] Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid: nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in, e.g., US 5,049,386 and lipofection reagents are wildly available commercially. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
Viral delivery for therapeutic applications
[212] The use of viral based systems for the delivery of nucleic acids allows targeting a virus to specific cells and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral,
adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
[213] The tropism of a retrovirus can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis- acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Viral. 66:2731-2739 (1992); Johann et al., J. Viral. 66: 1635-1640 (1992); Sommnerfelt et al., Viral. 176:58-59 (1990); Wilson et al., J. Viral. 63:2374-2378 (1989); Miller et al., 7. Viral. 65:2220-2224
Transient expression and gene therapy applications
[214] In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus ("AAV") vectors may also be used to transduce cells with target nucleic acids. Construction of recombinant AAV vectors are described in a number of publications, including U.S. 5,173,414. Packaging cells are typically used to form virus particles that are capable of infecting a host cell.
[215] Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide( s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences.
[216] The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817
Methods of modifying target nucleotide sequence
[217] In one aspect, the disclosure provides methods of modifying a target polynucleotide in a eukaryotic cell, which may be performed in vivo, ex vivo or in vitro. In some embodiments, the method comprises sampling a cell or population of cells from a human or non-human animal or plant (including microalgae) and modifying the cell or cells. Culturing may occur at any stage ex vivo. The cell or cells may even be re-introduced into the non-human animal or plant (including micro-algae).
[218] The present disclosure provides methods for binding, cleaving, and/or modifying a target nucleotide sequence of interest. The methods include delivering a system comprising at least one ImntRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same to the target sequence or a cell, organelle, or embryo comprising the target sequence. In some of these embodiments, the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof. In various embodiments, the ImntRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as provided above, or an active variant or fragment thereof. In a particular embodiment, the ImntRNA comprising the nucleotide sequence as provided above, or an active variant or fragment thereof. The RGN of the system may be nuclease dead RGN, or may be a fusion polypeptide. In some embodiments, the fusion polypeptide comprises a base-editing polypeptide, for example a cytidine deaminase or an adenosine deaminase. In particular embodiments, the RGN and/or ImntRNA is heterologous to the cell, organelle, or embryo to which the RGN and/or ImntRNA (or polynucleotide(s) encoding at least one of the RGN and ImntRNA) are introduced.
[219] In those embodiments wherein the method comprises delivering a polynucleotide encoding a ImntRNA and/or an RGN polypeptide, the cell or embryo can then be cultured under conditions in which the ImntRNA and/or RGN polypeptide are expressed. In various embodiments, the method comprises contacting a target sequence with an RGN ribonucleoprotein complex. The RGN ribonucleoprotein complex may comprise an RGN that is nuclease dead or has nickase activity. In some embodiments, the RGN of the ribonucleoprotein complex is a fusion polypeptide comprising a base-editing polypeptide.
[220] In certain embodiments, the method comprises introducing into a cell, organelle, or embryo comprising a target sequence an RGN ribonucleoprotein complex. The RGN ribonucleoprotein complex
can be one that has been purified from a biological sample, recombinantly produced and subsequently purified, or in vitro- assembled as described herein. In those embodiments wherein the RGN ribonucleoprotein complex that is contacted with the target sequence or a cell organelle, or embryo has been assembled in vitro, the method can further comprise the in vitro assembly of the complex prior to contact with the target sequence, cell, organelle, or embryo.
[221] A purified or in vitro assembled RGN ribonucleoprotein complex can be introduced into a cell, organelle, or embryo using any method known in the art, including, but not limited to electroporation. Alternatively, an RGN polypeptide and/or polynucleotide encoding or comprising the ImntRNA can be introduced into a cell, organelle, or embryo using any method known in the art.
[222] Upon delivery to or contact with the target sequence or cell, organelle, or embryo comprising the target sequence, the ImntRNA directs the RGN to bind to the target sequence in a sequence-specific manner. In those embodiments wherein the RGN has nuclease activity, the RGN polypeptide cleaves the target sequence of interest upon binding. The target sequence can subsequently be modified via endogenous repair mechanisms, such as non-homologous end joining, or homology-directed repair with a provided donor polynucleotide.
[223] Methods to measure binding of an RGN polypeptide to a target sequence are known in the art and include chromatin immunoprecipitation assays, gel mobility shift assays, DNA pull-down assays, reporter assays, microplate capture and detection assays. Likewise, methods to measure cleavage or modification of a target sequence are known in the art and include in vitro or in vivo cleavage assays wherein cleavage is confirmed using PCR, sequencing, or gel electrophoresis, with or without the attachment of an appropriate label (e.g., radioisotope, fluorescent substance) to the target sequence to facilitate detection of degradation products. Alternatively, the nicking triggered exponential amplification reaction (NTEXPAR) assay can be used (see, e.g., Zhang et al. (2016) Chem. Sci. 7:4951-4957). In vivo cleavage can be evaluated using the Surveyor assay (Guschin et al. (2010) Methods Mol Biol 649:247-256).
[224] In some embodiments, the methods involve the use of a single type of RGN complexed with more than one ImntRNA. The more than one guide RNA can target different regions of a single gene or can target multiple genes.
[225] In those embodiments wherein a donor polynucleotide is not provided, a double -stranded break introduced by an RGN polypeptide can be repaired by a non-homologous end-joining (NHEJ) repair process. Due to the error-prone nature of NHEJ, repair of the double -stranded break can result in a modification to the target sequence. Modification of the target sequence can result in the expression of an altered protein product or inactivation of a coding sequence.
[226] In those embodiments wherein a donor polynucleotide is present, the donor sequence in the donor polynucleotide can be integrated into or exchanged with the target nucleotide sequence during the course of repair of the introduced double-stranded break, resulting in the introduction of the exogenous donor sequence. A donor polynucleotide thus comprises a donor sequence that is desired to be introduced into a target sequence of interest. In some embodiments, the donor sequence alters the original target nucleotide sequence such that the newly integrated donor sequence will not be recognized and cleaved by the RGN. Integration of the donor sequence can be enhanced by the inclusion within the donor polynucleotide of flanking sequences that have substantial sequence identity with the sequences flanking the target nucleotide sequence, allowing for a homology -directed repair process. In those embodiments wherein the RGN polypeptide introduces double -stranded staggered breaks, the donor polynucleotide can comprise a donor sequence flanked by compatible overhangs, allowing for direct ligation of the donor sequence to the cleaved target nucleotide sequence comprising overhangs by a non-homologous repair process during repair of the double -stranded break.
[227] In various embodiments, a method is provided for binding a target nucleotide sequence and detecting the target sequence, wherein the method comprises introducing into a cell, organelle, or embryo at least one guide RNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the guide RNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN and further comprises a detectable label, and the method further comprises detecting the detectable label. The detectable label may be fused to the RGN as a fusion protein (e.g., fluorescent protein) or may be a small molecule conjugated to or incorporated within the RGN polypeptide that can be detected visually or by other means.
Methods of modulating gene expression
[228] Also provided herein are methods for modulating the expression of a target sequence or a gene of interest under the regulation of a target sequence. The methods comprise introducing into a cell, organelle, or embryo at least one ImntRNA or a polynucleotide encoding the same, and at least one RGN polypeptide or a polynucleotide encoding the same, expressing the ImntRNA and/or RGN polypeptide (if coding sequences are introduced), wherein the RGN polypeptide is a nuclease-dead RGN. In some of these embodiments, the nuclease-dead RGN is a fusion protein comprising an expression modulator domain (i.e., epigenetic modification domain, transcriptional activation domain or a transcriptional repressor domain) as described herein.
Methods for detecting ssDNA
[229] An RGN polypeptide of the present disclosure, once activated by detection of a target DNA (double or single stranded), can cleave non-targeted single stranded DNA (ssDNA). Once an RGN polypeptide is activated by a ImntRNA, after hybridization of ImntRNA with a target sequence of a target DNA, the protein becomes a nuclease that promiscuously cleaves ssDNAs. Thus, when the target DNA is present in the sample, the result is cleavage of ssDNAs in the sample, which can be detected using any common detection method (such as using a labeled single stranded DNA).
[230] Hence, the present disclosure provides systems and methods for detecting a target DNA (double stranded or single stranded) in a sample. In some cases, a detector DNA is used that is single stranded (ssDNA) and does not hybridize with the ImntRNA (i.e., the detector ssDNA is a non-target ssDNA). Such methods comprise steps of: (a) contacting the sample with: (i) an RGN polypeptide; (ii) a ImntRNA comprising: a region that binds to the RGN polypeptide, and a spacer sequence that hybridizes with the target DNA; and (iii) a detector DNA that is single stranded and does not hybridize with the spacer sequence; and (b) measuring a detectable signal produced by cleavage of the single stranded detector DNA by the RGN polypeptide, thereby detecting the target DNA.
[231] The contacting step of a subject method can be carried out in a composition comprising divalent metal ions. The contacting step can be carried out outside of a cell. The contacting step can be carried out inside a cell. The contacting step can be carried out in a cell in vitro. The contacting step can be also carried out in a cell ex vivo. The contacting step can be carried out in a cell in vivo.
[232] In some embodiments the sample is contacted for 2 hours or less (e.g., 1.5 hours or less, 1 hour or less, 40 minutes or less, 30 minutes or less, 20 minutes or less, 10 minutes or less, or 5 minutes or less, or 1 minute or less), under conditions that provide for trans cleavage of the detector DNA. Conditions that provide for trans cleavage of the detector DNA include temperature conditions such as from 17°C to 39°C (e.g., 37°C).
[233] Methods for detecting ssDNA have been described for example in Chen, et al (2018). Science 360(6387), 436-439 or Kaminski, et al. Nat Biomed Eng 5, 643-656 (2021).
Kits
[234] In one aspect, the invention provides kits containing any one or more of the elements disclosed in the above methods and compositions. In some embodiments, the kit comprises a vector system and instructions for using the kit. In some embodiments, the vector system comprises (a) a first regulatory element operably linked to a ImntRNA sequence and one or more insertion sites for inserting a guide sequence downstream of the ImntRNA sequence, wherein when expressed, the ImntRNA directs sequence-specific binding of a CRISPR complex to a target sequence in a eukaryotic cell, wherein the
CRISPR complex comprises a CRISPR enzyme complexed with (1) the ImntRNA sequence that is hybridized to the target sequence, and (2) a second regulatory element operably linked to an enzyme coding sequence encoding said CRISPR enzyme comprising a nuclear localization sequence. Elements may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube.
[235] In some embodiments, the kit includes instructions in one or more languages. In some embodiments, a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein. Reagents may be provided in any suitable container. For example, a kit may provide one or more reaction or storage buffers. Reagents may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use ( e.g. in concentrate or lyophilized form). A buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof. In some embodiments, the buffer is alkaline. In some embodiments, the buffer has a pH from 7 to 10.
[236] In some embodiments, the kit comprises one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element. In some embodiments, the kit comprises a homologous recombination template polynucleotide. In one aspect, the invention provides methods for using one or more elements of a CRISPR system. The CRISPR complex of the invention provides an effective means for modifying a target polynucleotide. The CRISPR complex of the disclosure has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target polynucleotide in a multiplicity of cell types. As such the CRISPR complex of the invention has a broad spectrum of applications in, e.g., gene therapy, drug screening, disease diagnosis, and prognosis. An exemplary CRISPR complex comprises a CRISPR enzyme complexed with a guide sequence hybridized to a target sequence within the target polynucleotide.
Cells comprising the RGN systems
[237] Provided herein are cells and organisms comprising a target sequence of interest that has been modified using a process or the system based an RGN, and/or ImntRNA as described herein. Also are provided cells and organisms comprising the system for binding a target sequence of interest comprising an RGN, and/or ImntRNA as described herein.
[238] In some of these embodiments, the RGN comprises the amino acid sequence as disclosed above, or an active variant or fragment thereof. In various embodiments, the ImntRNA comprises a CRISPR repeat sequence comprising the nucleotide sequence as disclosed above, or an active variant or fragment thereof.
In particular embodiments, the ImntRNA comprises the nucleotide sequence as disclosed above, or an active variant or fragment thereof. The modified cells can be eukaryotic (e.g., mammalian, plant, insect cell) or prokaryotic. Also provided are organelles and embryos comprising at least one nucleotide sequence that has been modified by a process utilizing an RGN and/or ImntRNA as described herein. The genetically modified cells, organisms, organelles, and embryos can be heterozygous or homozygous for the modified nucleotide sequence.
[239] The chromosomal modification of the cell, organism, organelle, or embryo can result in altered expression (up-regulation or down-regulation), inactivation, or the expression of an altered protein product or an integrated sequence. In those instances wherein the chromosomal modification results in either the inactivation of a gene or the expression of a non-functional protein product, the genetically modified cell, organism, organelle, or embryo is referred to as a “knock-out”. The knock out phenotype can be the result of a deletion mutation (i.e.. deletion of at least one nucleotide), an insertion mutation (i.e.. insertion of at least one nucleotide), or a nonsense mutation (/. e. , substitution of at least one nucleotide such that a stop codon is introduced).
[240] Alternatively, the chromosomal modification of a cell, organism, organelle, or embryo can produce a “knock-in”, which results from the chromosomal integration of a nucleotide sequence that encodes a protein. In some of these embodiments, the coding sequence is integrated into the chromosome such that the chromosomal sequence encoding the wild-type protein is inactivated, but the exogenously introduced protein is expressed.
[241] In other embodiments, the chromosomal modification results in the production of a variant protein product. The expressed variant protein product can have at least one amino acid substitution and/or the addition or deletion of at least one amino acid. The variant protein product encoded by the altered chromosomal sequence can exhibit modified characteristics or activities when compared to the wild-type protein, including but not limited to altered enzymatic activity or substrate specificity.
[242] In yet other embodiments, the chromosomal modification can result in an altered expression pattern of a protein. As a non-limiting example, chromosomal alterations in the regulatory regions controlling the expression of a protein product can result in the overexpression or downregulation of the protein product or an altered tissue or temporal expression pattern.
Pharmaceutical compositions
[243] The polypeptides, nucleic acids and vectors of the present disclosure may be in a form of a pharmaceutical composition. The pharmaceutical composition may comprise 1 ng to 10 mg of DNA encoding the RGN/lmntRNA- based system or RGN/lmntRNA-based system protein component, i.e., the
fusion protein. The pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified lentiviral vector. The pharmaceutical composition may comprise 1 ng to 10 mg of the DNA of the modified AAV vector and a nucleotide sequence encoding the site-specific nuclease. The pharmaceutical compositions according to the present invention can be formulated according to the mode of administration to be used. In cases where pharmaceutical compositions are injectable pharmaceutical compositions, they are sterile, pyrogen free and particulate free. An isotonic formulation is preferably used. Generally, additives for isotonicity may include sodium chloride, dextrose, mannitol, sorbitol and lactose. In some cases, isotonic solutions such as phosphate buffered saline are preferred. Stabilizers include gelatin and albumin. In some embodiments, a vasoconstriction agent is added to the formulation.
[244] The composition may further comprise a pharmaceutically acceptable excipient. The pharmaceutically acceptable excipient may be functional molecules as vehicles, adjuvants, carriers, or diluents. The pharmaceutically acceptable excipient may be a transfection facilitating agent, which may include surface active agents, such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs, vesicles such as squalene and squalene, hyaluronic acid, lipids, liposomes, calcium ions, viral proteins, polyanions, polycations, or nanoparticles, or other known transfection facilitating agents.
[245] The transfection facilitating agent can be a polyanion, polycation, including poly-L- glutamate (LGS), or lipid. The transfection facilitating agent is poly-L-glutamate, and more preferably, the poly-L- glutamate is present in the composition for genome editing in skeletal muscle or cardiac muscle at a concentration less than 6 mg/ml. The transfection facilitating agent may also include surface active agents such as immune-stimulating complexes (ISCOMS), Freunds incomplete adjuvant, LPS analog including monophosphoryl lipid A, muramyl peptides, quinone analogs and vesicles such as squalene and squalene, and hyaluronic acid may also be used administered in conjunction with the genetic construct. In some embodiments, the DNA vector encoding the composition may also include a transfection facilitating agent such as lipids, liposomes, including lecithin liposomes or other liposomes known in the art, as a DNA- liposome mixture (see for example W09324640), calcium ions, viral proteins, polyanions, polycations, or nanoparticles, or other known transfection facilitating agents. Preferably, the transfection facilitating agent is a polyanion, polycation, including poly-L-glutamate (LGS), or lipid.
[246] The sequences included in the present invention are shown in Tables 8-12:
EXAMPLES
Example 1: Identification of novel RNA-Guided Nucleases
[252] The identification of RGNs was performed based on the methods described for example in Russel et al. (2020) The CRISPR Journal. V.3, no.6, pp. 462-469. Metagenomic samples were searched for open reading frames (ORFs) and those that were predicted to be genes were selected. A hidden Markov model (HMM) was used to compare the putative genes to profiles of known Cas proteins. The identified Cas genes were grouped into operons, and the operon type was determined based on the presence of known signature genes. For each genome, the CRISPR arrays were identified based on the presence of regularly spaced repeats. The subtype of each CRISPR array was predicted using machine learning. Cas operons were linked to CRISPR arrays if they were less than 10 kilobases apart.
Example 2: Determination of PAM requirements for each RGN through Bacterial PAM Depletion
[253] PAM requirements for each RGN were determined using a bacterial PAM depletion assay essentially adapted from Kleinstiver et al. (2015) Nature 523:481-485 and Zetsche et al. (2015) Cell 163:759-771 and Karvelis et al. Nucleic Acids Res. 2020;48(9):5016-5023. Briefly, two plasmid libraries (C2 and T2) were generated in a pUC18 backbone (ampR), with each containing a distinct 23bp protospacer (target) sequence flanked by 8 random nucleotides (i.e., the PAM region). The target sequence and flanking PAM region of libraiy T2 and library C2 for each RGN are set forth in Table 13.
[255] The libraries were separately electroporated into T7 Express E. coll (NEB) cells harboring pET28b expression vectors containing an the minimal CRISPR operon with the repeat spacer array modified to contain three copies of the intended libraiy target sequence at the average spacer length of the CRISPR repeat. Sufficient library plasmid was used in the transformation reaction to obtain > 10A8 cfu. The modified minimal CRISPR operon in the pET28b backbone were under the control of T7 promoters. The transformation reaction was allowed to recover for 1 hr after which it was diluted into LB media containing carbenicillin and kanamycin and grown overnight. The following day the mixture was diluted into self-inducing Overnight Express™ Instant TB Medium (Millipore Sigma) to allow expression of the RGN,, and ImntRNA, and grown for an additional 4h at 37C and then shifted to 30C for an additional I6h after which the cells were spun down and plasmid DNA was isolated with a Mini-prep kit (Qiagen, Germantown, MD). In the presence of the appropriate ImntRNA, plasmids containing a PAM that is
recognizable by the RGN will be cleaved resulting in their removal from the population. Plasmids containing PAMs that are not recognizable by the RGN, or that are transformed into bacteria not containing an appropriate ImntRNA, will survive and replicate. The PAM and protospacer regions of uncleaved plasmids were PCR-amplified and prepared for sequencing following published protocols (16s- metagenomic library prep guide 15044223B, Illumina, San Diego, CA). Deep sequencing (55bp paired end reads) was performed on a NextSeq (Illumina). Typically, 1-4M reads were obtained per amplicon. PAM regions were extracted, counted, and normalized to total reads for each sample. PAMs that lead to plasmid cleavage were identified by being underrepresented when compared to controls (i.e., when the library is transformed into E. colt containing the RGN but lacking an appropriate ImntRNA). To identify the PAM requirements for a novel RGN, an enrichment value was computed for each kmer as the difference between the library size-normalized read counts in the control sample and in the targeting sample. This value was rounded to the nearest integer for positive numbers and set to zero for negative numbers. Enrichment values were then summed across all kmers to yield a position frequency matrix, which was represented visually as a sequence logo using the command line utility weblogo. Those RGNs with consistency among the most enriched kmers — sequence logo information content > 0.2 when including the top 100 enriched kmers — and with qualitatively consistent PAMs across plasmid libraries (T2 and C2) were deemed to have bonafide PAMs. The final PAM for these RGNs were obtained by summing counts across both plasmid libraries, normalizing counts, computing kmer enrichment values, summing across kmers to yield a position frequency matrix, then visually representing the PAM as a sequence logo using the command line utility web logo.
Example 3: ImntRNA identification
[256] Systems with an identifiable PAM were grown without the PAM library to mid-log phase, pelleted, and flash frozen. RNA was isolated from the pellets using a mirVANA miRNA Isolation Kit (Life Technologies, Carlsbad, CA), and sequencing libraries were prepared from the isolated RNA using an NEBNext Small RNA Library Prep kit (NEB, Beverly, MA). The library prep was fractionated on a 6% polyacrylamide gel to capture the RNA species less than 200nt to detect crRNAs and tracrRNAs, respectively. Deep sequencing (75 bp paired-end) was performed on a Next Seq 500 (High Output kit). Reads were quality trimmed using Cutadapt and mapped to reference genomes using Bowtie2. A custom RNAseq pipeline was written to detect the expressed small non coding RNA transcripts. Processed boundaries were determined by sequence coverage of the native locus. RNA sequencing depth confirmed the boundaries of the ImntRNA by identifying the transcript containing the motif of MGGGY GN4- sCRYCCK (fig 12-15). Manual curation of RNAs was performed using secondary structure prediction by NUPACK, an RNA folding software. ImntRNA cassettes were prepared by DNA synthesis and were
generally designed as follows (5'->3'): processed ImntRNA operably linked at its 3’ to 20-30 bp spacer sequence..
[257] For in vitro assays, ImntRNAs were synthesized by in vitro transcription of the ImntRNA cassettes with a GeneArt™ Precision gRNA Synthesis Kit (ThermoFisher). Activity was confirmed by combing the purified RGN along with the ImntRNA in 20 mM HEPES, pH 7.5 at 37°C, 25 mM NaCl, 1 mM DTT and 5 mM MgC12 (Reaction Buffer) for 30 min at 37C. The ribonucleo-protein (RNP) complex was then added in excess to linear dsDNA in the reaction buffer and incubated at various temperatures for 30 min. The reaction was then inactivated with EDTA, Proteinase K, and RNase A before being run on a denaturing PAGE gel. Cleavage was visually confirmed (Fig 15) and quantified (Fig 16).
Example 4: Trans Activated DNA Cleavage by Casl2m proteins
[258] ImntRNAs and elmntRNAs were synthesized by in vitro transcription of the ImntRNA cassettes with a GeneArt™ Precision gRNA Synthesis Kit (ThermoFisher). Activity was confirmed by combing the purified RGN along with the ImntRNA in 20 mM HEPES, pH 7.5 at 37°C, 25 mM NaCl, 1 mM DTT and 5 mM MgC12 (Reaction Buffer) for 30 min at 37C. The ribonucleo-protein (RNP) complex was then added in excess to linear dsDNA or ssDNA or no target DNA that matched the target sequence of the ImntRNA along with M13 ssDNA in the reaction buffer and incubated at 37C for a time course. The reaction was then with EDTA, Proteinase K, and RNase A before being run on an agarose gel. Trans activated cleavage of the M13 ssDNA was visually confirmed. (Fig. 20).
Example 5: Demonstration of active CRISPR array-based DNA interference in bacteria cells
[259] PAM requirements for each RGN were determined using a bacterial PAM depletion assay essentially adapted from Karvelis, et al. (2020). Nucleic acids research, 48(9), 5016-5023. Briefly plasmids contain the C2 library sequence or the wild type spacer sequences of two different spacers with the appropriate PAM sequence were synthesized in pTwist Amp High Copy or pTwist Chlor High Copy (Twist Biosciences), were transformed into T7 Express E. coli (NEB) cells harboring pET28b expression vectors containing an the minimal CRISPR operon with the repeat spacer array truncated to contain only three spacer sequences including the C2 library sequence in the distal most repeat. The modified minimal CRISPR operon in the pET28b backbone were under the control of T7 promoters. The transformation reaction was allowed to recover for 1 hr after which it was plated in a 1 : 10 serial dilution onto LB agar containing IPTG, kanamycin, and the specific antibiotic for the targeting plasmid, either carbenicillin or chloramphenicol, and grown overnight at 37C. The plates were compared to non-target control sequences that were on the same backbone, but did not contain a matching spacer/PAM sequence. Active
interference was defined as a greater growth density on negative plasmids compared to target plasmids (Fig 7-10).
Example 6: Demonstration of gene editing activity on endogenous targets in mammalian cells
[260] The RGN was codon optimized for human expression and cloned into expression cassettes with a Nterm SV40 NLS, and a Cterm FLAGtag and c-myc NLS under control of a CMV promoter for mammalian expression. The sequences are set forth in Table 14.
[262] . ImntRNA expression constructs encoding a single ImntRNA each under the control of a human RNA polymerase III U6 promoter were produced and introduced into an expression vector containing GFP under control of a CMV promoter. Guides were design to targeted regions of selected genes with the appropriate PAM for the system. The constructs described were introduced into mammalian cells. One day prior to transfection, HEK293T cells (Sigma) were plated in 24-well dishes in Dulbecco’s modified Eagle medium (DMEM) plus 10% (vol/vol) fetal bovine serum (Gibco) and 1% Penicillin-Streptomycin (Gibco). The next day when the cells were at 50-60% confluency, 500 ng of a RGN expression plasmid plus 500 ng of a single ImntRNA expression plasmid were co-transfected using 1.5 uL of Lipofectamine 3000 (Thermo Scientific) per well, following the manufacturer’s instructions. After 48 hours of growth, total genomic DNA was harvested using a genomic DNA isolation kit (Machery-Nagel) according to the manufacturer’s instructions.
[263] The total genomic DNA was then analyzed to determine the rate of editing in the targeted gene. Oligonucleotides were produced to be used for PCR amplification and subsequent analysis of the amplified genomic target site. All PCR reactions were performed using 10 uL of 2X Master Mix Platinum SuperFi DNA polymerase (Thermo Scientific) in a 20 uL reaction including 0.5 uM of each primer specific for each guide using a program of: 98°C, 1 min; 35 cycles of [98°C, 10 sec; 65°C, 15 sec; 72°C, 30 sec]; 72°C, 5 min; 12°C, forever. Primers for PCR#2 include Nextera Read 1 and Read 2 Transposase Adapter overhang sequences for Illumina sequencing.
[264] Following the PCR amplification, DNA was cleaned using a PCR cleanup kit (Zymo) according to the manufacturer’s instructions and eluted in water. Products containing the Illumina overhang sequences underwent library preparation following the Illumina 16S Metagenomic Sequencing Library protocol. Deep sequencing was performed on an Illumina NextSeq platform. Typically, 200,000 of 150 bp paired- end reads (2 x 100,000 reads) are generated per amplicon. The reads were analyzed using CRISPResso (Pinello, et al. 2016 Nature Biotech, 34:695-697) to calculate the rates of editing. Output alignments were hand-curated to confirm insertion and deletion sites as well as identify microhomology sites at the recombination sites. The overall rates of editing for actively edited samples are shown in Table 15.
Example 7: Demonstration of base editing activity on endogenous targets in mammalian cells
[266] The coding sequence of the identified RGN is codon-optimized for expression in mammalian cells and introduced into the expression cassette, which produces a fusion protein that includes a NLS tag at its N-terminal end operably linked to a codon optimized known eukaryotic deaminase sequence (APOBEC3A) at its C-terminal end. The deaminase is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to the RNA guided nuclease at its C-terminal end, that has been mutated to have an inactive RuvC domain (dEGS0091_D93R_D240A_D416A) (That is, it has been mutated into RGN that is catalytically dead). The RNA-guided DNA binding polypeptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to a uracil protecting peptide (developed in house). The uracil protecting peptide is operably linked to a flexible amino acid linker at their C-terminal end, and the amino acid linker is operably linked to a second NLS at its C-terminal end. Each of these expression cassettes is introduced into a vector capable of driving the expression of the fusion protein in mammalian cells. A vector capable of expressing ImntRNA to target the deaminase-RGN-UPP fusion protein to the determined genomic location was also produced. These guide RNAs can guide the deaminase-RGN-UPP fusion protein to the target genome sequence for base editing.
[267] Using liposome transfection, vectors capable of expressing the deaminase-RGN-UPP fusion protein and guide RNAs were transfected into HEK293T cells. For liposome transfection, the day before transfection, the cells were distributed in a 24-well plate of growth medium (DMEM + 10% fetal bovine serum + 1% penicillin/streptomycin) at 1.3x 10 5 cells/well. According to the manufacturer's instructions, use Lipofectamine® 3000 reagent (Thermo Fisher Scientific) to transfect 500 ng deaminase-RGN fusion expression vector and 500 ng guide RNA expression vector. 48-72 hours after liposome transfection, genomic DNA is harvested from the transfected cells, and the DNA is sequenced and analyzed for the presence of targeted cytosine base editing mutations using CRISPResso2 (Clement K, et al Nat Biotechnol. 2019 Mar; 37(3):224-226. doi: 10.1038/s41587-019-0032-3. PubMed PMID: 30809026).
[268] Tables 16 and 17 show the editing rate of cytidine bases for each deaminase-RGN-UPP fusion protein and the rate for targeted cytosine deamination for the deaminase-RGN-UPP targeted to the same region as the catalytically dead RGN-UPP. Active cytosine base editing was defined as a greater than 5x increase
increase of OD SNP base editing along the targeted window of the deaminase-RGN-UPP under investigation, and >4x increase of OT SNP base editing at highly mutated cytosines. With a catalytically dead nuclease domain, the RGN will not generate a detectable INDEL formation by itself When fused with an active deaminase that acts on the opposite strand a cytosine will be turned into a uracil. The uracil is rapidly removed from the DNA leaving an abasic site, and eventually a gap, on the strand opposite the strand bound by the ImntRNA. This can result in a double stranded break which is repaired through non- homologous end joining (NHEJ) and detectable INDEL formation, however, with the presence of an active UPP, the converted uracil is protected from removal and the abasic site is never removed and NHEJ does not occur. This also leads to predominantly OT conversions because the uracil created by the deaminase is protected and not removed before the nicked strand is repaired, it will be read as a thymine when the nicked strand is replicated, and an adenosine will be inserted across from it. Then when the uracil is removed, it will be replaced by thymine during the excision repair process fixing the mutation at OT.
Claims
1. A nucleic acid targeting system, comprising
1) a polypeptide comprising an RNA-guided nuclease (RGN) protein comprising tri split RuvC domain (RuvC-I, RuvC-II, and RuvC-III), and
2) at least one long monomeric nucleic acid targeting RNA (ImntRNA) molecule binding to RNA- guided nuclease protein and targeting a nucleic acid sequence of interest, said ImntRNA scaffold comprising in 5’ to 3’ orientation: a) a sequence interacting with the RGN, b) a sequence partially complementary to the nucleotides of the CRISPR repeat array sequence, c) at least 7 nucleotides from the 3 ’end of the CRISPR repeat array sequence, d) directly followed by a spacer sequence complementary to the target nucleic acid sequence of interest.
2. The nucleic acid targeting system of claim 1, wherein the target nucleic acid is a dsDNA.
3. The nucleic acid targeting system of claim 1, wherein the target nucleic acid is a ssDNA.
4. The nucleic acid targeting system of claim 1, wherein the RGN does not comprise an HNH domain.
5. The nucleic acid targeting system of claim 1, wherein the RGN has a length of less than 600 amino acids.
6. The nucleic acid targeting system of claim 1, wherein the sequence partially complementary to the nucleotides of the CRISPR repeat array sequence is not complementary to up to 6 nucleotides from the 3 ’end of the CRISPR repeat array sequence.
7. The nucleic acid targeting system of claim 1, wherein the scaffold further comprises a conserved sequence forming a hairpin structure on or near a 5 ’ end of the scaffold.
8. The nucleic acid targeting system of claim 7, wherein the conserved sequence forming a hairpin structure comprises in 5’ to 3’ orientation the sequence MGGGYGN4-8CRYCCK (SEQ ID NO: 73).
9. The nucleic acid targeting system of claim 7, wherein the scaffold comprises a stretch of nucleotides capable of forming 1 or more hairpin structures between the conserved sequence forming a hairpin structure on or near a 5 ’ end of the scaffold and sequence partially complementary to the nucleotides of the CRISPR repeat array sequence.
10. The nucleic acid targeting system of claim 1, wherein the sequence interacting with the RGN has a length of 30-70 nucleotides.
11. The nucleic acid targeting system of claim 1 , wherein the at least 7 nucleotides from the 3 ’end of the CRISPR repeat array sequence and the sequence partially complementary to the CRISPR repeat array sequence form a hairpin structure.
12. The nucleic acid targeting system of claim 1, wherein the ImntRNA comprises all the nucleotides of the CRISPR repeat sequence.
13. The nucleic acid targeting system of claim 1, wherein the at least 7 nucleotides of CRISPR repeat array sequence are obtained by truncating nucleotides from the 5 ’ end of the full CRISPR array repeat sequence.
14. The nucleic acid targeting system of claim 1, wherein the CRISPR repeat array sequence has the length of 30-35 nucleotides.
15. The nucleic acid targeting system of claim 1, wherein the spacer sequence complementary to the target nucleic acid sequence of interest immediately follows the nucleotides of the CRISPR repeat sequence.
16. The nucleic acid targeting system of claim 1, wherein the tri split RuvC domain is split by a recognition (REC) domain and a target nucleic acid-binding (TNB) domain.
17. The nucleic acid targeting system of claim 1, wherein the RGN protein comprises the sequence at least 90% identical to the sequence selected from SEQ ID NO: 1-9.
18. The nucleic acid targeting system of claim 16, wherein the CRISPR repeat nucleotides are derived from the sequence selected from SEQ ID NO: 29-37 correspondingly.
19. The nucleic acid targeting system of claim 16, wherein at least one ImntRNA comprises the sequence as set forth in Table 5.
20. The nucleic acid targeting system of claim 1, wherein the target sequence is adjacent to a PAM sequence.
21. The nucleic acid targeting system of claim 1, wherein the target nucleic acid sequence is within a eukaryotic cell.
22. The nucleic acid targeting system of claim 1, wherein said RGN polypeptide is nuclease dead, and wherein the RGN polypeptide is operably linked to a base-editing polypeptide.
23. The nucleic acid targeting system of claim 1, wherein said at least one ImntRNA is adjacent to a PAM sequence.
24. One or more isolated polynucleotides encoding the nucleic acid targeting system of any one of claims 1-23.
25. The isolated polynucleotides of claim 24, wherein the polynucleotide sequences encoding of the nucleic acid targeting system have been codon optimized for optimal expression in a target cell or organism.
26. One or more vectors comprising one or more isolated polynucleotides of claim 24.
27. The one or more vectors of claim 24, wherein said vector is a lentiviral or an AAV vector.
28. A vector comprising polynucleotides encoding a nucleic acid targeting system of any one of claims 1- 23.
29. The vector of claim 28, wherein said vector is a lentiviral or an AAV vector.
30. A cell comprising the nucleic acid targeting system of any one of claims 1-23, the polynucleotide of claim 24, or the vector of claim 28.
31. A cell according to claim 30 wherein said cell is an eucaryotic cell.
32. A composition comprising the DNA targeting system of any one of claims 1-23, one or more polynucleotides of claim 24, or one or more vectors of claim 28.
33. A method for binding to a target DNA sequence comprising contacting the DNA targeting system according to any one of claims 1-23 with said target DNA sequence.
34. The method of claim 33, where said binding is performed in a cell.
35. A method for cleaving and/or modifying a target nucleic acid sequence, comprising
1) contacting the target nucleic acid sequence with a nucleic acid targeting system of any one of claims 1-23; and
2) incubating said nucleic acid targeting system with the target nucleic acid for the time and under conditions sufficient for the cleaving and/or modification to occur.
36. The method of claim 35, wherein said target nucleic acid sequence is a DNA.
37. The method of claim 35, wherein said modified target DNA sequence comprises insertion of heterologous nucleic acid sequence into the target DNA sequence.
38. The method of claim 35, wherein said modified target DNA sequence comprises deletion of at least one nucleotide from the target DNA sequence.
39. The method of claim 35, wherein said modified target DNA sequence comprises mutation of at least one nucleotide in the target DNA sequence.
40. The method of claim 35, wherein the target nucleic acid sequence is within a cell.
41. The method of claim 40, wherein the cell is a eukaryotic cell.
42. The method of claim 35, further comprising culturing the cell under conditions sufficient for expression of the RGN polypeptide and selecting a cell comprising said modified target nucleic acid sequence.
43. A pharmaceutical composition comprising the nucleic acid targeting system of any one of claims 1- 23, one or more polynucleotides of claim 24, or one or more vectors of claim 28.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263371911P | 2022-08-19 | 2022-08-19 | |
US63/371,911 | 2022-08-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024038168A1 true WO2024038168A1 (en) | 2024-02-22 |
Family
ID=87863559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2023/072745 WO2024038168A1 (en) | 2022-08-19 | 2023-08-17 | Novel rna-guided nucleases and nucleic acid targeting systems comprising such |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024038168A1 (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5049386A (en) | 1985-01-07 | 1991-09-17 | Syntex (U.S.A.) Inc. | N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor |
US5173414A (en) | 1990-10-30 | 1992-12-22 | Applied Immune Sciences, Inc. | Production of recombinant adeno-associated virus vectors |
WO1993024640A2 (en) | 1992-06-04 | 1993-12-09 | The Regents Of The University Of California | Methods and compositions for in vivo gene therapy |
US20030087817A1 (en) | 1999-01-12 | 2003-05-08 | Sangamo Biosciences, Inc. | Regulation of endogenous gene expression in cells using zinc finger proteins |
US9790490B2 (en) | 2015-06-18 | 2017-10-17 | The Broad Institute Inc. | CRISPR enzymes and systems |
WO2018027078A1 (en) | 2016-08-03 | 2018-02-08 | President And Fellows Of Harard College | Adenosine nucleobase editors and uses thereof |
US20200054679A1 (en) * | 2017-05-04 | 2020-02-20 | The Trustees Of The University Of Pennsylvania | Compositions and Methods for Gene Editing in T cells using CRISPR/Cpf1 |
WO2020123887A2 (en) * | 2018-12-14 | 2020-06-18 | Pioneer Hi-Bred International, Inc. | Novel crispr-cas systems for genome editing |
WO2021238128A1 (en) * | 2020-05-28 | 2021-12-02 | 上海科技大学 | Genome editing system and method |
WO2022087494A1 (en) * | 2020-10-23 | 2022-04-28 | The Broad Institute, Inc. | Reprogrammable iscb nucleases and uses thereof |
WO2022140572A1 (en) * | 2020-12-23 | 2022-06-30 | Mammoth Biosciences, Inc. | Compositions and methods of using programmable nucleases for inducing cell death |
-
2023
- 2023-08-17 WO PCT/EP2023/072745 patent/WO2024038168A1/en unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5049386A (en) | 1985-01-07 | 1991-09-17 | Syntex (U.S.A.) Inc. | N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor |
US5173414A (en) | 1990-10-30 | 1992-12-22 | Applied Immune Sciences, Inc. | Production of recombinant adeno-associated virus vectors |
WO1993024640A2 (en) | 1992-06-04 | 1993-12-09 | The Regents Of The University Of California | Methods and compositions for in vivo gene therapy |
US20030087817A1 (en) | 1999-01-12 | 2003-05-08 | Sangamo Biosciences, Inc. | Regulation of endogenous gene expression in cells using zinc finger proteins |
US9790490B2 (en) | 2015-06-18 | 2017-10-17 | The Broad Institute Inc. | CRISPR enzymes and systems |
WO2018027078A1 (en) | 2016-08-03 | 2018-02-08 | President And Fellows Of Harard College | Adenosine nucleobase editors and uses thereof |
US20200054679A1 (en) * | 2017-05-04 | 2020-02-20 | The Trustees Of The University Of Pennsylvania | Compositions and Methods for Gene Editing in T cells using CRISPR/Cpf1 |
WO2020123887A2 (en) * | 2018-12-14 | 2020-06-18 | Pioneer Hi-Bred International, Inc. | Novel crispr-cas systems for genome editing |
WO2021238128A1 (en) * | 2020-05-28 | 2021-12-02 | 上海科技大学 | Genome editing system and method |
WO2022087494A1 (en) * | 2020-10-23 | 2022-04-28 | The Broad Institute, Inc. | Reprogrammable iscb nucleases and uses thereof |
WO2022140572A1 (en) * | 2020-12-23 | 2022-06-30 | Mammoth Biosciences, Inc. | Compositions and methods of using programmable nucleases for inducing cell death |
Non-Patent Citations (43)
Title |
---|
"Nucleases", 1993, COLD SPRING HARBOR LABORATORY PRESS |
ABUDAYYEH ET AL., SCIENCE, vol. 353, 2016, pages aaf5573 |
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 2003, JOHN WILEY & SONS |
BUCHSCHER ET AL., J. VIRAL., vol. 66, 1992, pages 1635 - 1640 |
CARRIESMALL, BIOCHIM BIOPHYS ACTA, vol. 1833, 2013, pages 253 - 259 |
CLEMENT K ET AL., NAT BIOTECHNOL, vol. 37, no. 3, March 2019 (2019-03-01), pages 224 - 226 |
GAUDELLI ET AL., NATURE, vol. 551, 2017, pages 464 - 471 |
GRUBER ET AL., CELL, vol. 106, no. 1, 2008, pages 23 - 24 |
GUSCHIN ET AL., METHODS MOL BIOL, vol. 649, 2010, pages 247 - 256 |
HARRINGTON ET AL., MOLECULAR CELL, vol. 79, 2020, pages 416 - 424 |
HARRINGTON ET AL., SCIENCE, vol. 360, no. 6387, 2018, pages 436 - 439 |
HERRMANNNEUPERT, IUBMB LIFE, vol. 55, 2003, pages 219 - 225 |
JEDRZEJCZYK DOMINIKA ET AL: "CRISPR-Cas12a Nucleases Function With Structurally Engineered crRNAs - SynThetic trAcrRNA", RESEARCH SQUARE, 21 October 2021 (2021-10-21), XP093005416, Retrieved from the Internet <URL:https://www.researchsquare.com/article/rs-1003466/v1> [retrieved on 20221206], DOI: 10.21203/rs.3.rs-1003466/v1 * |
JINEK M. ET AL.: "A programmable dual-R A-guided DNA endonuclease in adaptive bacterial immunity", SCIENCE, vol. 337, 2012, pages 816 - 821, XP055229606, DOI: 10.1126/science.1225829 |
KAMINSKI ET AL., NAT BIOMED ENG, vol. 5, 2021, pages 643 - 656 |
KARVEFIS ET AL., GENOME BIOL, vol. 16, 2015, pages 253 |
KARVELIS ET AL., NATURE, vol. 599, 2021, pages 692 - 696 |
KARVELIS ET AL., NUCLEIC ACIDS RES, vol. 48, no. 9, 2020, pages 5016 - 5023 |
KARVELIS ET AL., NUCLEIC ACIDS RESEARCH, vol. 48, no. 9, 2020, pages 5016 - 5023 |
KLEINSTIVER ET AL., NAT BIOTECHNOL, vol. 37, 2019, pages 276 - 282 |
KLEINSTIVER ET AL., NATURE, vol. 523, 2015, pages 481 - 485 |
KOONIN EUGENE V ET AL: "Diversity, classification and evolution of CRISPR-Cas systems", CURRENT OPINION IN MICROBIOLOGY, vol. 37, 1 June 2017 (2017-06-01), pages 67 - 78, XP085276922, ISSN: 1369-5274, DOI: 10.1016/J.MIB.2017.05.008 * |
LANGE ET AL., J. BIOL. CHEM., vol. 282, 2007, pages 5101 - 5105 |
LI LIN ET AL: "Engineering the Direct Repeat Sequence of crRNA for Optimization of FnCpf1-Mediated Genome Editing in Human Cells", MOLECULAR THERAPY, vol. 26, no. 11, 1 November 2018 (2018-11-01), US, pages 2650 - 2657, XP055719687, ISSN: 1525-0016, DOI: 10.1016/j.ymthe.2018.08.021 * |
MAKAROVA ET AL., NATURE REVIEWS MICROBIOLOGY, vol. 13, 2015, pages 1 - 15 |
MILLER, VIRAL, vol. 65, pages 2220 - 2224 |
MILLETTI F, DRUG DISCOV TODAY, vol. 17, 2012, pages 850 - 860 |
NASSOURYMORSE, BIOCHIM BIOPHYS ACTA, vol. 1743, 2005, pages 5 - 19 |
NISHIMASU ET AL., CELL, 2014 |
PINELLO ET AL., NATURE BIOTECH, vol. 34, 2016, pages 695 - 697 |
RAY ET AL., BIOCONJUG CHEM, vol. 26, no. 6, 2015, pages 1004 - 7 |
RUSSEL ET AL., THE CRISPR JOURNAL, vol. 3, no. 6, 2020, pages 462 - 469 |
SAMBROOKRUSSELL: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR PRESS |
SOIL, CURR OPIN PLANT BIOL, vol. 5, 2002, pages 529 - 535 |
SOMMNERFELT ET AL., VIRAL, vol. 176, 1990, pages 58 - 59 |
STEMMER, PROC. NATL. ACAD. SCI. USA, 1994 |
SWARTS ET AL., MOL. CELL, vol. 66, 2017, pages 221 - 233 |
WILSON ET AL., J. VIRAL., vol. 63, 1989, pages 2374 - 2378 |
YAMANO ET AL., CELL, vol. 165, no. 4, 2016, pages 949 - 962 |
ZETSCHE ET AL., CELL, vol. 163, 2015, pages 759 - 771 |
ZHANG BO ET AL: "Mechanistic insights into the R-loop formation and cleavage in CRISPR-Cas12i1", NATURE COMMUNICATIONS, vol. 12, no. 1, 9 June 2021 (2021-06-09), XP055968167, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-021-23876-5.pdf> DOI: 10.1038/s41467-021-23876-5 * |
ZHANG ET AL., CHEM. SCI., vol. 7, 2016, pages 4951 - 4957 |
ZUKERSTIEGLER, NUCLEIC ACIDS RES, vol. 9, 1981, pages 133 - 148 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10633642B2 (en) | Engineered CRISPR-Cas9 nucleases | |
JP6896786B2 (en) | CRISPR-Cas component systems, methods and compositions for sequence manipulation | |
KR102021585B1 (en) | A method for regulation of gene expression by expressing Cas9 protein from the two independent vector | |
US10519454B2 (en) | Genome editing using Campylobacter jejuni CRISPR/CAS system-derived RGEN | |
AU2016316845B2 (en) | Engineered CRISPR-Cas9 nucleases | |
US20190134221A1 (en) | Crispr/cas-related methods and compositions for treating duchenne muscular dystrophy | |
DK2784162T3 (en) | Design of systems, methods and optimized control manipulations for sequence manipulation | |
KR20210023830A (en) | How to Inhibit Pathogenic Mutations Using a Programmable Base Editor System | |
CA3100037A1 (en) | Methods of editing single nucleotide polymorphism using programmable base editor systems | |
KR20210089629A (en) | RNA-guided nucleases and active fragments and variants thereof and methods of use | |
JP2021532794A (en) | Multi-effector nucleobase editor and methods for modifying nucleic acid target sequences using it | |
KR20210149686A (en) | Polypeptides useful for gene editing and methods of use | |
KR20160044457A (en) | Delivery, engineering and optimization of tandem guide systems, methods and compositions for sequence manipulation | |
CA3128876A1 (en) | Methods of editing a disease-associated gene using adenosine deaminase base editors, including for the treatment of genetic disease | |
KR20160034901A (en) | Optimized crispr-cas double nickase systems, methods and compositions for sequence manipulation | |
KR20150105633A (en) | Engineering of systems, methods and optimized guide compositions for sequence manipulation | |
KR20190005801A (en) | Target Specific CRISPR variants | |
KR20220062289A (en) | RNA-guided nucleases and active fragments and variants thereof and methods of use | |
US20230203463A1 (en) | Rna-guided nucleases and active fragments and variants thereof and methods of use | |
WO2024038168A1 (en) | Novel rna-guided nucleases and nucleic acid targeting systems comprising such | |
JP2024501892A (en) | Novel nucleic acid-guided nuclease | |
WO2024042168A1 (en) | Novel rna-guided nucleases and nucleic acid targeting systems comprising such rna-guided nucleases | |
WO2024042165A2 (en) | Novel rna-guided nucleases and nucleic acid targeting systems comprising such rna-guided nucleases | |
WO2020160481A1 (en) | Targetable 3'-overhang nuclease fusion proteins | |
WO2023227669A2 (en) | Novel nucleic acid-editing proteins |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23762167 Country of ref document: EP Kind code of ref document: A1 |