WO2024081383A2 - Compositions and methods for targeting, editing, or modifying genes - Google Patents
Compositions and methods for targeting, editing, or modifying genes Download PDFInfo
- Publication number
- WO2024081383A2 WO2024081383A2 PCT/US2023/035060 US2023035060W WO2024081383A2 WO 2024081383 A2 WO2024081383 A2 WO 2024081383A2 US 2023035060 W US2023035060 W US 2023035060W WO 2024081383 A2 WO2024081383 A2 WO 2024081383A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- sequence
- targeter
- certain embodiments
- modulator
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 104
- 239000000203 mixture Substances 0.000 title claims description 104
- 108090000623 proteins and genes Proteins 0.000 title abstract description 333
- 230000008685 targeting Effects 0.000 title description 23
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 535
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 529
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 529
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 323
- 239000002773 nucleotide Substances 0.000 claims abstract description 318
- 101710163270 Nuclease Proteins 0.000 claims description 175
- 102000040430 polynucleotide Human genes 0.000 claims description 142
- 108091033319 polynucleotide Proteins 0.000 claims description 142
- 239000002157 polynucleotide Substances 0.000 claims description 142
- 125000006850 spacer group Chemical group 0.000 claims description 50
- 230000027455 binding Effects 0.000 claims description 37
- 230000000295 complement effect Effects 0.000 claims description 35
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 21
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 18
- 239000003623 enhancer Substances 0.000 claims description 17
- 239000005557 antagonist Substances 0.000 claims description 6
- 101150050733 Gnas gene Proteins 0.000 claims 6
- 102000004169 proteins and genes Human genes 0.000 abstract description 244
- 230000004048 modification Effects 0.000 abstract description 78
- 238000012986 modification Methods 0.000 abstract description 78
- 108091033409 CRISPR Proteins 0.000 abstract description 53
- 238000010354 CRISPR gene editing Methods 0.000 abstract description 48
- 229920002477 rna polymer Polymers 0.000 abstract description 5
- 235000018102 proteins Nutrition 0.000 description 243
- 210000004027 cell Anatomy 0.000 description 205
- 108020004414 DNA Proteins 0.000 description 83
- 230000014509 gene expression Effects 0.000 description 83
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 76
- 125000003275 alpha amino acid group Chemical group 0.000 description 74
- 230000000694 effects Effects 0.000 description 62
- 210000001744 T-lymphocyte Anatomy 0.000 description 53
- 239000002585 base Substances 0.000 description 41
- 230000009977 dual effect Effects 0.000 description 36
- 241000282414 Homo sapiens Species 0.000 description 35
- 210000002865 immune cell Anatomy 0.000 description 35
- 108010081734 Ribonucleoproteins Proteins 0.000 description 34
- 102000004389 Ribonucleoproteins Human genes 0.000 description 34
- 230000001105 regulatory effect Effects 0.000 description 34
- -1 His Chemical compound 0.000 description 33
- 238000003776 cleavage reaction Methods 0.000 description 33
- 230000007017 scission Effects 0.000 description 33
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 32
- 108700019146 Transgenes Proteins 0.000 description 29
- 206010028980 Neoplasm Diseases 0.000 description 28
- 108091079001 CRISPR RNA Proteins 0.000 description 25
- 201000011510 cancer Diseases 0.000 description 25
- 239000012636 effector Substances 0.000 description 25
- 230000035772 mutation Effects 0.000 description 23
- 108090000765 processed proteins & peptides Proteins 0.000 description 23
- 239000013598 vector Substances 0.000 description 22
- 108091008874 T cell receptors Proteins 0.000 description 21
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 21
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 20
- 239000008194 pharmaceutical composition Substances 0.000 description 20
- 230000000875 corresponding effect Effects 0.000 description 19
- 238000003780 insertion Methods 0.000 description 19
- 230000037431 insertion Effects 0.000 description 19
- 230000001965 increasing effect Effects 0.000 description 18
- 102000004196 processed proteins & peptides Human genes 0.000 description 18
- 108020005004 Guide RNA Proteins 0.000 description 17
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 17
- 108010019670 Chimeric Antigen Receptors Proteins 0.000 description 15
- 150000001875 compounds Chemical class 0.000 description 15
- 230000001681 protective effect Effects 0.000 description 15
- 230000001225 therapeutic effect Effects 0.000 description 15
- 230000007018 DNA scission Effects 0.000 description 14
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 14
- 239000000427 antigen Substances 0.000 description 13
- 108091007433 antigens Proteins 0.000 description 13
- 102000036639 antigens Human genes 0.000 description 13
- 230000006870 function Effects 0.000 description 13
- 239000002502 liposome Substances 0.000 description 13
- 229920001184 polypeptide Polymers 0.000 description 13
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 12
- 102000053602 DNA Human genes 0.000 description 12
- 108091029865 Exogenous DNA Proteins 0.000 description 12
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 12
- 230000003213 activating effect Effects 0.000 description 12
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 12
- 239000000872 buffer Substances 0.000 description 12
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 11
- 208000026350 Inborn Genetic disease Diseases 0.000 description 11
- 235000001014 amino acid Nutrition 0.000 description 11
- 238000013459 approach Methods 0.000 description 11
- 230000003247 decreasing effect Effects 0.000 description 11
- 230000004069 differentiation Effects 0.000 description 11
- 201000010099 disease Diseases 0.000 description 11
- 208000016361 genetic disease Diseases 0.000 description 11
- 239000002105 nanoparticle Substances 0.000 description 11
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 10
- 230000015556 catabolic process Effects 0.000 description 10
- 238000006731 degradation reaction Methods 0.000 description 10
- 210000005260 human cell Anatomy 0.000 description 10
- 230000002829 reductive effect Effects 0.000 description 10
- 210000000130 stem cell Anatomy 0.000 description 10
- 210000001519 tissue Anatomy 0.000 description 10
- 102000037982 Immune checkpoint proteins Human genes 0.000 description 9
- 108091008036 Immune checkpoint proteins Proteins 0.000 description 9
- DNIAPMSPPWPWGF-UHFFFAOYSA-N Propylene glycol Chemical compound CC(O)CO DNIAPMSPPWPWGF-UHFFFAOYSA-N 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 9
- 208000035475 disorder Diseases 0.000 description 9
- 239000012634 fragment Substances 0.000 description 9
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 9
- 108020004705 Codon Proteins 0.000 description 8
- 239000003937 drug carrier Substances 0.000 description 8
- 238000010362 genome editing Methods 0.000 description 8
- 210000003958 hematopoietic stem cell Anatomy 0.000 description 8
- 238000000338 in vitro Methods 0.000 description 8
- 108020004999 messenger RNA Proteins 0.000 description 8
- 229920001223 polyethylene glycol Polymers 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 235000002639 sodium chloride Nutrition 0.000 description 8
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 7
- 241000124008 Mammalia Species 0.000 description 7
- 238000003556 assay Methods 0.000 description 7
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 7
- 238000007385 chemical modification Methods 0.000 description 7
- 150000004713 phosphodiesters Chemical class 0.000 description 7
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 7
- 229940045145 uridine Drugs 0.000 description 7
- 230000003612 virological effect Effects 0.000 description 7
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 6
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 6
- 239000002202 Polyethylene glycol Substances 0.000 description 6
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 6
- 229940024606 amino acid Drugs 0.000 description 6
- 150000001413 amino acids Chemical class 0.000 description 6
- 230000001580 bacterial effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 238000002716 delivery method Methods 0.000 description 6
- 238000004520 electroporation Methods 0.000 description 6
- 238000001415 gene therapy Methods 0.000 description 6
- 230000002068 genetic effect Effects 0.000 description 6
- 230000001939 inductive effect Effects 0.000 description 6
- 230000010354 integration Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 150000003839 salts Chemical class 0.000 description 6
- 238000012216 screening Methods 0.000 description 6
- 238000006467 substitution reaction Methods 0.000 description 6
- 239000013603 viral vector Substances 0.000 description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 6
- 102100029588 Deoxycytidine kinase Human genes 0.000 description 5
- 108010033174 Deoxycytidine kinase Proteins 0.000 description 5
- 101000983747 Homo sapiens MHC class II transactivator Proteins 0.000 description 5
- 101000952182 Homo sapiens Max-like protein X Proteins 0.000 description 5
- 102100026371 MHC class II transactivator Human genes 0.000 description 5
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 5
- 102100037423 Max-like protein X Human genes 0.000 description 5
- 239000002253 acid Substances 0.000 description 5
- 230000004075 alteration Effects 0.000 description 5
- 102000015736 beta 2-Microglobulin Human genes 0.000 description 5
- 108010081355 beta 2-Microglobulin Proteins 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 125000005647 linker group Chemical group 0.000 description 5
- 150000002632 lipids Chemical class 0.000 description 5
- 230000008439 repair process Effects 0.000 description 5
- 239000002904 solvent Substances 0.000 description 5
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 5
- 238000011191 terminal modification Methods 0.000 description 5
- 238000013518 transcription Methods 0.000 description 5
- 230000035897 transcription Effects 0.000 description 5
- 229940035893 uracil Drugs 0.000 description 5
- OVONXEQGWXGFJD-UHFFFAOYSA-N 4-sulfanylidene-1h-pyrimidin-2-one Chemical compound SC=1C=CNC(=O)N=1 OVONXEQGWXGFJD-UHFFFAOYSA-N 0.000 description 4
- 101000860090 Acidaminococcus sp. (strain BV3L6) CRISPR-associated endonuclease Cas12a Proteins 0.000 description 4
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 description 4
- 239000004215 Carbon black (E152) Substances 0.000 description 4
- 108091007741 Chimeric antigen receptor T cells Proteins 0.000 description 4
- 108091026890 Coding region Proteins 0.000 description 4
- 108700010070 Codon Usage Proteins 0.000 description 4
- 241000701022 Cytomegalovirus Species 0.000 description 4
- 108010006124 DNA-Activated Protein Kinase Proteins 0.000 description 4
- 102000005768 DNA-Activated Protein Kinase Human genes 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- 108060002716 Exonuclease Proteins 0.000 description 4
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- WCUXLLCKKVVCTQ-UHFFFAOYSA-M Potassium chloride Chemical compound [Cl-].[K+] WCUXLLCKKVVCTQ-UHFFFAOYSA-M 0.000 description 4
- 102100037272 T cell receptor beta constant 1 Human genes 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 4
- 230000009471 action Effects 0.000 description 4
- 210000004369 blood Anatomy 0.000 description 4
- 239000008280 blood Substances 0.000 description 4
- 239000000969 carrier Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000007812 deficiency Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 239000003085 diluting agent Substances 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 4
- 102000013165 exonuclease Human genes 0.000 description 4
- 238000009472 formulation Methods 0.000 description 4
- 238000009396 hybridization Methods 0.000 description 4
- 229930195733 hydrocarbon Natural products 0.000 description 4
- 238000001727 in vivo Methods 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- 210000004185 liver Anatomy 0.000 description 4
- 210000004962 mammalian cell Anatomy 0.000 description 4
- 210000003071 memory t lymphocyte Anatomy 0.000 description 4
- 201000006417 multiple sclerosis Diseases 0.000 description 4
- 230000006780 non-homologous end joining Effects 0.000 description 4
- 210000004940 nucleus Anatomy 0.000 description 4
- 239000013612 plasmid Substances 0.000 description 4
- 229920000642 polymer Polymers 0.000 description 4
- 230000002062 proliferating effect Effects 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- WYWHKKSPHMUBEB-UHFFFAOYSA-N tioguanine Chemical compound N1C(N)=NC(=S)C2=C1N=CN2 WYWHKKSPHMUBEB-UHFFFAOYSA-N 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 239000003981 vehicle Substances 0.000 description 4
- WVDDGKGOMKODPV-UHFFFAOYSA-N Benzyl alcohol Chemical compound OCC1=CC=CC=C1 WVDDGKGOMKODPV-UHFFFAOYSA-N 0.000 description 3
- 102100024217 CAMPATH-1 antigen Human genes 0.000 description 3
- 102100038078 CD276 antigen Human genes 0.000 description 3
- 108010065524 CD52 Antigen Proteins 0.000 description 3
- 108090000565 Capsid Proteins Proteins 0.000 description 3
- 102100024423 Carbonic anhydrase 9 Human genes 0.000 description 3
- 102100023321 Ceruloplasmin Human genes 0.000 description 3
- 108010077544 Chromatin Proteins 0.000 description 3
- 102000004127 Cytokines Human genes 0.000 description 3
- 108090000695 Cytokines Proteins 0.000 description 3
- FBPFZTCFMRRESA-FSIIMWSLSA-N D-Glucitol Natural products OC[C@H](O)[C@H](O)[C@@H](O)[C@H](O)CO FBPFZTCFMRRESA-FSIIMWSLSA-N 0.000 description 3
- FBPFZTCFMRRESA-KVTDHHQDSA-N D-Mannitol Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-KVTDHHQDSA-N 0.000 description 3
- FBPFZTCFMRRESA-JGWLITMVSA-N D-glucitol Chemical compound OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-JGWLITMVSA-N 0.000 description 3
- 101710177611 DNA polymerase II large subunit Proteins 0.000 description 3
- 101710184669 DNA polymerase II small subunit Proteins 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 3
- 102100031940 Epithelial cell adhesion molecule Human genes 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 101000860092 Francisella tularensis subsp. novicida (strain U112) CRISPR-associated endonuclease Cas12a Proteins 0.000 description 3
- 208000009329 Graft vs Host Disease Diseases 0.000 description 3
- 102100034458 Hepatitis A virus cellular receptor 2 Human genes 0.000 description 3
- 101000962483 Homo sapiens Max dimerization protein 1 Proteins 0.000 description 3
- 101001000302 Homo sapiens Max-interacting protein 1 Proteins 0.000 description 3
- 101000957259 Homo sapiens Mitotic spindle assembly checkpoint protein MAD2A Proteins 0.000 description 3
- 101000611936 Homo sapiens Programmed cell death protein 1 Proteins 0.000 description 3
- 102100021593 Interleukin-7 receptor subunit alpha Human genes 0.000 description 3
- 206010064912 Malignant transformation Diseases 0.000 description 3
- 229930195725 Mannitol Natural products 0.000 description 3
- 102100039185 Max dimerization protein 1 Human genes 0.000 description 3
- 102100038792 Mitotic spindle assembly checkpoint protein MAD2A Human genes 0.000 description 3
- 108010008707 Mucin-1 Proteins 0.000 description 3
- 102100034256 Mucin-1 Human genes 0.000 description 3
- 208000002678 Mucopolysaccharidoses Diseases 0.000 description 3
- 102000002488 Nucleoplasmin Human genes 0.000 description 3
- 102000011931 Nucleoproteins Human genes 0.000 description 3
- 108010061100 Nucleoproteins Proteins 0.000 description 3
- 102100040678 Programmed cell death protein 1 Human genes 0.000 description 3
- 241000714474 Rous sarcoma virus Species 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- 102100037298 T cell receptor beta constant 2 Human genes 0.000 description 3
- 102100021657 Tyrosine-protein phosphatase non-receptor type 6 Human genes 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 3
- 230000022131 cell cycle Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 210000003483 chromatin Anatomy 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 230000021615 conjugation Effects 0.000 description 3
- 230000002950 deficient Effects 0.000 description 3
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 3
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000001973 epigenetic effect Effects 0.000 description 3
- 150000002148 esters Chemical class 0.000 description 3
- 210000001808 exosome Anatomy 0.000 description 3
- OVBPIULPVIDEAO-LBPRGKRZSA-N folic acid Chemical compound C=1N=C2NC(N)=NC(=O)C2=NC=1CNC1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 OVBPIULPVIDEAO-LBPRGKRZSA-N 0.000 description 3
- 102000037865 fusion proteins Human genes 0.000 description 3
- 108020001507 fusion proteins Proteins 0.000 description 3
- 235000011187 glycerol Nutrition 0.000 description 3
- 208000024908 graft versus host disease Diseases 0.000 description 3
- 230000002779 inactivation Effects 0.000 description 3
- 238000001990 intravenous administration Methods 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 230000036212 malign transformation Effects 0.000 description 3
- 239000000594 mannitol Substances 0.000 description 3
- 235000010355 mannitol Nutrition 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 206010028093 mucopolysaccharidosis Diseases 0.000 description 3
- 238000007837 multiplex assay Methods 0.000 description 3
- 108060005597 nucleoplasmin Proteins 0.000 description 3
- 239000000546 pharmaceutical excipient Substances 0.000 description 3
- 239000002953 phosphate buffered saline Substances 0.000 description 3
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 3
- 150000003013 phosphoric acid derivatives Chemical class 0.000 description 3
- 102000005962 receptors Human genes 0.000 description 3
- 108020003175 receptors Proteins 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000001177 retroviral effect Effects 0.000 description 3
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 208000002491 severe combined immunodeficiency Diseases 0.000 description 3
- 239000011780 sodium chloride Substances 0.000 description 3
- 239000000600 sorbitol Substances 0.000 description 3
- 235000000346 sugar Nutrition 0.000 description 3
- 238000013268 sustained release Methods 0.000 description 3
- 231100000419 toxicity Toxicity 0.000 description 3
- 230000001988 toxicity Effects 0.000 description 3
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 3
- 241000701161 unidentified adenovirus Species 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- RJBDSRWGVYNDHL-XNJNKMBASA-N (2S,4R,5S,6S)-2-[(2S,3R,4R,5S,6R)-5-[(2S,3R,4R,5R,6R)-3-acetamido-4,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-2-[(2R,3S,4R,5R,6R)-4,5-dihydroxy-2-(hydroxymethyl)-6-[(E,2R,3S)-3-hydroxy-2-(octadecanoylamino)octadec-4-enoxy]oxan-3-yl]oxy-3-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-5-amino-6-[(1S,2R)-2-[(2S,4R,5S,6S)-5-amino-2-carboxy-4-hydroxy-6-[(1R,2R)-1,2,3-trihydroxypropyl]oxan-2-yl]oxy-1,3-dihydroxypropyl]-4-hydroxyoxane-2-carboxylic acid Chemical compound CCCCCCCCCCCCCCCCCC(=O)N[C@H](CO[C@@H]1O[C@H](CO)[C@@H](O[C@@H]2O[C@H](CO)[C@H](O[C@@H]3O[C@H](CO)[C@H](O)[C@H](O)[C@H]3NC(C)=O)[C@H](O[C@@]3(C[C@@H](O)[C@H](N)[C@H](O3)[C@H](O)[C@@H](CO)O[C@@]3(C[C@@H](O)[C@H](N)[C@H](O3)[C@H](O)[C@H](O)CO)C(O)=O)C(O)=O)[C@H]2O)[C@H](O)[C@H]1O)[C@@H](O)\C=C\CCCCCCCCCCCCC RJBDSRWGVYNDHL-XNJNKMBASA-N 0.000 description 2
- WRMNZCZEMHIOCP-UHFFFAOYSA-N 2-phenylethanol Chemical compound OCCC1=CC=CC=C1 WRMNZCZEMHIOCP-UHFFFAOYSA-N 0.000 description 2
- DVLFYONBTKHTER-UHFFFAOYSA-N 3-(N-morpholino)propanesulfonic acid Chemical compound OS(=O)(=O)CCCN1CCOCC1 DVLFYONBTKHTER-UHFFFAOYSA-N 0.000 description 2
- OIVLITBTBDPEFK-UHFFFAOYSA-N 5,6-dihydrouracil Chemical compound O=C1CCNC(=O)N1 OIVLITBTBDPEFK-UHFFFAOYSA-N 0.000 description 2
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- DCPSTSVLRXOYGS-UHFFFAOYSA-N 6-amino-1h-pyrimidine-2-thione Chemical compound NC1=CC=NC(S)=N1 DCPSTSVLRXOYGS-UHFFFAOYSA-N 0.000 description 2
- 102100031126 6-phosphogluconolactonase Human genes 0.000 description 2
- 108010029731 6-phosphogluconolactonase Proteins 0.000 description 2
- PEHVGBZKEYRQSX-UHFFFAOYSA-N 7-deaza-adenine Chemical compound NC1=NC=NC2=C1C=CN2 PEHVGBZKEYRQSX-UHFFFAOYSA-N 0.000 description 2
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 2
- 102100024643 ATP-binding cassette sub-family D member 1 Human genes 0.000 description 2
- 102100022089 Acyl-[acyl-carrier-protein] hydrolase Human genes 0.000 description 2
- 201000011452 Adrenoleukodystrophy Diseases 0.000 description 2
- 241000099173 Anaerovibrio sp. Species 0.000 description 2
- 102000006942 B-Cell Maturation Antigen Human genes 0.000 description 2
- 108010008014 B-Cell Maturation Antigen Proteins 0.000 description 2
- 102100024222 B-lymphocyte antigen CD19 Human genes 0.000 description 2
- BTBUEUYNUDRHOZ-UHFFFAOYSA-N Borate Chemical compound [O-]B([O-])[O-] BTBUEUYNUDRHOZ-UHFFFAOYSA-N 0.000 description 2
- 108010008629 CA-125 Antigen Proteins 0.000 description 2
- 108700012439 CA9 Proteins 0.000 description 2
- 101710185679 CD276 antigen Proteins 0.000 description 2
- 102000017420 CD3 protein, epsilon/gamma/delta subunit Human genes 0.000 description 2
- 108050005493 CD3 protein, epsilon/gamma/delta subunit Proteins 0.000 description 2
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 description 2
- 101000909256 Caldicellulosiruptor bescii (strain ATCC BAA-1888 / DSM 6725 / Z-1320) DNA polymerase I Proteins 0.000 description 2
- 208000022526 Canavan disease Diseases 0.000 description 2
- 241001040999 Candidatus Methanoplasma termitum Species 0.000 description 2
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 2
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 2
- 108700004991 Cas12a Proteins 0.000 description 2
- 102100024965 Caspase recruitment domain-containing protein 11 Human genes 0.000 description 2
- 208000010693 Charcot-Marie-Tooth Disease Diseases 0.000 description 2
- 108010009685 Cholinergic Receptors Proteins 0.000 description 2
- 102100028757 Chondroitin sulfate proteoglycan 4 Human genes 0.000 description 2
- 102100022641 Coagulation factor IX Human genes 0.000 description 2
- 208000014567 Congenital Disorders of Glycosylation Diseases 0.000 description 2
- 201000002200 Congenital disorder of glycosylation Diseases 0.000 description 2
- 201000003883 Cystic fibrosis Diseases 0.000 description 2
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 2
- 201000008163 Dentatorubral pallidoluysian atrophy Diseases 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 241000702421 Dependoparvovirus Species 0.000 description 2
- 102100031780 Endonuclease Human genes 0.000 description 2
- 102100030340 Ephrin type-A receptor 2 Human genes 0.000 description 2
- 101710116743 Ephrin type-A receptor 2 Proteins 0.000 description 2
- 108010066687 Epithelial Cell Adhesion Molecule Proteins 0.000 description 2
- 241001109644 Eubacterium coprostanoligenes Species 0.000 description 2
- 102100024405 GPI-linked NAD(P)(+)-arginine ADP-ribosyltransferase 1 Human genes 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- 108010018962 Glucosephosphate Dehydrogenase Proteins 0.000 description 2
- 102100041003 Glutamate carboxypeptidase 2 Human genes 0.000 description 2
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 2
- 239000004471 Glycine Substances 0.000 description 2
- 102000010956 Glypican Human genes 0.000 description 2
- 108050001154 Glypican Proteins 0.000 description 2
- 108050007237 Glypican-3 Proteins 0.000 description 2
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 2
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 2
- 102000000310 HNH endonucleases Human genes 0.000 description 2
- 108050008753 HNH endonucleases Proteins 0.000 description 2
- 108010007707 Hepatitis A Virus Cellular Receptor 2 Proteins 0.000 description 2
- 101000824278 Homo sapiens Acyl-[acyl-carrier-protein] hydrolase Proteins 0.000 description 2
- 101000783751 Homo sapiens Adenosine receptor A2a Proteins 0.000 description 2
- 101000980825 Homo sapiens B-lymphocyte antigen CD19 Proteins 0.000 description 2
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 2
- 101000981252 Homo sapiens GPI-linked NAD(P)(+)-arginine ADP-ribosyltransferase 1 Proteins 0.000 description 2
- 101000892862 Homo sapiens Glutamate carboxypeptidase 2 Proteins 0.000 description 2
- 101001043809 Homo sapiens Interleukin-7 receptor subunit alpha Proteins 0.000 description 2
- 101001137987 Homo sapiens Lymphocyte activation gene 3 protein Proteins 0.000 description 2
- 101000581981 Homo sapiens Neural cell adhesion molecule 1 Proteins 0.000 description 2
- 101000662909 Homo sapiens T cell receptor beta constant 1 Proteins 0.000 description 2
- 101000662902 Homo sapiens T cell receptor beta constant 2 Proteins 0.000 description 2
- 101000831007 Homo sapiens T-cell immunoreceptor with Ig and ITIM domains Proteins 0.000 description 2
- 101000655352 Homo sapiens Telomerase reverse transcriptase Proteins 0.000 description 2
- 101000617285 Homo sapiens Tyrosine-protein phosphatase non-receptor type 6 Proteins 0.000 description 2
- MHAJPDPJQMAIIY-UHFFFAOYSA-N Hydrogen peroxide Chemical compound OO MHAJPDPJQMAIIY-UHFFFAOYSA-N 0.000 description 2
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 2
- 102100020793 Interleukin-13 receptor subunit alpha-2 Human genes 0.000 description 2
- 101710112634 Interleukin-13 receptor subunit alpha-2 Proteins 0.000 description 2
- 208000027747 Kennedy disease Diseases 0.000 description 2
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 2
- 102000017578 LAG3 Human genes 0.000 description 2
- 241000416293 Lachnospiraceae bacterium COE1 Species 0.000 description 2
- 101710192602 Latent membrane protein 1 Proteins 0.000 description 2
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 2
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 2
- 239000007993 MOPS buffer Substances 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- CSNNHWWHGAXBCP-UHFFFAOYSA-L Magnesium sulfate Chemical compound [Mg+2].[O-][S+2]([O-])([O-])[O-] CSNNHWWHGAXBCP-UHFFFAOYSA-L 0.000 description 2
- 102100025169 Max-binding protein MNT Human genes 0.000 description 2
- 108010049137 Member 1 Subfamily D ATP Binding Cassette Transporter Proteins 0.000 description 2
- 201000011442 Metachromatic leukodystrophy Diseases 0.000 description 2
- 241000293008 Moraxella caprae Species 0.000 description 2
- 102100023123 Mucin-16 Human genes 0.000 description 2
- 206010056886 Mucopolysaccharidosis I Diseases 0.000 description 2
- 102100027347 Neural cell adhesion molecule 1 Human genes 0.000 description 2
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 description 2
- 229910019142 PO4 Inorganic materials 0.000 description 2
- 229920001213 Polysorbate 20 Polymers 0.000 description 2
- 241001299661 Prevotella bryantii Species 0.000 description 2
- 101710120463 Prostate stem cell antigen Proteins 0.000 description 2
- 102100036735 Prostate stem cell antigen Human genes 0.000 description 2
- 108090000412 Protein-Tyrosine Kinases Proteins 0.000 description 2
- 102000004022 Protein-Tyrosine Kinases Human genes 0.000 description 2
- 241001053116 Proteocatella sphenisci Species 0.000 description 2
- 108700020978 Proto-Oncogene Proteins 0.000 description 2
- 102000052575 Proto-Oncogene Human genes 0.000 description 2
- 101000902592 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) DNA polymerase Proteins 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 230000007022 RNA scission Effects 0.000 description 2
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 2
- 101001039269 Rattus norvegicus Glycine N-methyltransferase Proteins 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 241001037426 Smithella sp. Species 0.000 description 2
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 2
- UIIMBOGNXHQVGW-UHFFFAOYSA-M Sodium bicarbonate Chemical compound [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 2
- DWAQJAXMDSEUJJ-UHFFFAOYSA-M Sodium bisulfite Chemical compound [Na+].OS([O-])=O DWAQJAXMDSEUJJ-UHFFFAOYSA-M 0.000 description 2
- 229940100514 Syk tyrosine kinase inhibitor Drugs 0.000 description 2
- 102100029452 T cell receptor alpha chain constant Human genes 0.000 description 2
- 102100024834 T-cell immunoreceptor with Ig and ITIM domains Human genes 0.000 description 2
- 102100037906 T-cell surface glycoprotein CD3 zeta chain Human genes 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- 101150053558 TRBC1 gene Proteins 0.000 description 2
- 101150117561 TRBC2 gene Proteins 0.000 description 2
- 208000002903 Thalassemia Diseases 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 108010073062 Transcription Activator-Like Effectors Proteins 0.000 description 2
- 108091023040 Transcription factor Proteins 0.000 description 2
- 102000040945 Transcription factor Human genes 0.000 description 2
- 101800005109 Triakontatetraneuropeptide Proteins 0.000 description 2
- 239000007983 Tris buffer Substances 0.000 description 2
- 102100024036 Tyrosine-protein kinase Lck Human genes 0.000 description 2
- 108010053099 Vascular Endothelial Growth Factor Receptor-2 Proteins 0.000 description 2
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 description 2
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 2
- 102100022748 Wilms tumor protein Human genes 0.000 description 2
- 101710127857 Wilms tumor protein Proteins 0.000 description 2
- 208000006269 X-Linked Bulbo-Spinal Atrophy Diseases 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 102000034337 acetylcholine receptors Human genes 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- DZBUGLKDJFMEHC-UHFFFAOYSA-N acridine Chemical compound C1=CC=CC2=CC3=CC=CC=C3N=C21 DZBUGLKDJFMEHC-UHFFFAOYSA-N 0.000 description 2
- 230000023445 activated T cell autonomous cell death Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 239000004480 active ingredient Substances 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine group Chemical group [C@@H]1([C@H](O)[C@H](O)[C@@H](CO)O1)N1C=NC=2C(N)=NC=NC12 OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 208000026935 allergic disease Diseases 0.000 description 2
- 230000000735 allogeneic effect Effects 0.000 description 2
- 150000001408 amides Chemical class 0.000 description 2
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000003963 antioxidant agent Substances 0.000 description 2
- 235000006708 antioxidants Nutrition 0.000 description 2
- 235000010323 ascorbic acid Nutrition 0.000 description 2
- 239000011668 ascorbic acid Substances 0.000 description 2
- 229960005070 ascorbic acid Drugs 0.000 description 2
- 210000003719 b-lymphocyte Anatomy 0.000 description 2
- WPYMKLBDIGXBTP-UHFFFAOYSA-N benzoic acid Chemical compound OC(=O)C1=CC=CC=C1 WPYMKLBDIGXBTP-UHFFFAOYSA-N 0.000 description 2
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 2
- 230000008499 blood brain barrier function Effects 0.000 description 2
- 210000001218 blood-brain barrier Anatomy 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 210000001185 bone marrow Anatomy 0.000 description 2
- RYYVLZVUVIJVGH-UHFFFAOYSA-N caffeine Chemical compound CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 2
- 238000004113 cell culture Methods 0.000 description 2
- 210000000170 cell membrane Anatomy 0.000 description 2
- 238000002659 cell therapy Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 210000003169 central nervous system Anatomy 0.000 description 2
- 239000002738 chelating agent Substances 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 108010039524 chondroitin sulfate proteoglycan 4 Proteins 0.000 description 2
- 208000016532 chronic granulomatous disease Diseases 0.000 description 2
- 150000001860 citric acid derivatives Chemical class 0.000 description 2
- 238000000576 coating method Methods 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 239000005547 deoxyribonucleotide Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000003599 detergent Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000002612 dispersion medium Substances 0.000 description 2
- 239000002552 dosage form Substances 0.000 description 2
- 230000009881 electrostatic interaction Effects 0.000 description 2
- 239000000839 emulsion Substances 0.000 description 2
- 108010087914 epidermal growth factor receptor VIII Proteins 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 210000002950 fibroblast Anatomy 0.000 description 2
- 108091006047 fluorescent proteins Proteins 0.000 description 2
- 102000034287 fluorescent proteins Human genes 0.000 description 2
- 235000019152 folic acid Nutrition 0.000 description 2
- 239000011724 folic acid Substances 0.000 description 2
- ZXQYGBMAQZUVMI-GCMPRSNUSA-N gamma-cyhalothrin Chemical compound CC1(C)[C@@H](\C=C(/Cl)C(F)(F)F)[C@H]1C(=O)O[C@H](C#N)C1=CC=CC(OC=2C=CC=CC=2)=C1 ZXQYGBMAQZUVMI-GCMPRSNUSA-N 0.000 description 2
- 239000005090 green fluorescent protein Substances 0.000 description 2
- 125000001475 halogen functional group Chemical group 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 210000003494 hepatocyte Anatomy 0.000 description 2
- 230000007062 hydrolysis Effects 0.000 description 2
- 238000006460 hydrolysis reaction Methods 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000000415 inactivating effect Effects 0.000 description 2
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 238000007918 intramuscular administration Methods 0.000 description 2
- 230000007794 irritation Effects 0.000 description 2
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 description 2
- 208000032839 leukemia Diseases 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 210000004698 lymphocyte Anatomy 0.000 description 2
- 210000003738 lymphoid progenitor cell Anatomy 0.000 description 2
- 210000001161 mammalian embryo Anatomy 0.000 description 2
- 235000010270 methyl p-hydroxybenzoate Nutrition 0.000 description 2
- LXCFILQKKLGQFO-UHFFFAOYSA-N methylparaben Chemical compound COC(=O)C1=CC=C(O)C=C1 LXCFILQKKLGQFO-UHFFFAOYSA-N 0.000 description 2
- 238000002493 microarray Methods 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 239000011859 microparticle Substances 0.000 description 2
- 208000022018 mucopolysaccharidosis type 2 Diseases 0.000 description 2
- 208000011045 mucopolysaccharidosis type 3 Diseases 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 239000002070 nanowire Substances 0.000 description 2
- 210000000822 natural killer cell Anatomy 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 239000003921 oil Substances 0.000 description 2
- 230000006548 oncogenic transformation Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 210000000496 pancreas Anatomy 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 235000021317 phosphate Nutrition 0.000 description 2
- 108010079892 phosphoglycerol kinase Proteins 0.000 description 2
- XUYJLQHKOGNDPB-UHFFFAOYSA-N phosphonoacetic acid Chemical compound OC(=O)CP(O)(O)=O XUYJLQHKOGNDPB-UHFFFAOYSA-N 0.000 description 2
- 229920001983 poloxamer Polymers 0.000 description 2
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 2
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 2
- 229920000136 polysorbate Polymers 0.000 description 2
- 229920000036 polyvinylpyrrolidone Polymers 0.000 description 2
- 239000001267 polyvinylpyrrolidone Substances 0.000 description 2
- 235000013855 polyvinylpyrrolidone Nutrition 0.000 description 2
- 239000001103 potassium chloride Substances 0.000 description 2
- 235000011164 potassium chloride Nutrition 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 239000003755 preservative agent Substances 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- 230000035755 proliferation Effects 0.000 description 2
- ULWHHBHJGPPBCO-UHFFFAOYSA-N propane-1,1-diol Chemical compound CCC(O)O ULWHHBHJGPPBCO-UHFFFAOYSA-N 0.000 description 2
- YPFDHNVEDLHUCE-UHFFFAOYSA-N propane-1,3-diol Chemical compound OCCCO YPFDHNVEDLHUCE-UHFFFAOYSA-N 0.000 description 2
- QELSKZZBTMNZEB-UHFFFAOYSA-N propylparaben Chemical compound CCCOC(=O)C1=CC=C(O)C=C1 QELSKZZBTMNZEB-UHFFFAOYSA-N 0.000 description 2
- 125000006239 protecting group Chemical group 0.000 description 2
- 108020001580 protein domains Proteins 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 125000002652 ribonucleotide group Chemical group 0.000 description 2
- YGSDEFSMJLZEOE-UHFFFAOYSA-N salicylic acid Chemical compound OC(=O)C1=CC=CC=C1O YGSDEFSMJLZEOE-UHFFFAOYSA-N 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 210000003491 skin Anatomy 0.000 description 2
- 235000010267 sodium hydrogen sulphite Nutrition 0.000 description 2
- GEHJYWRUCIMESM-UHFFFAOYSA-L sodium sulfite Chemical compound [Na+].[Na+].[O-]S([O-])=O GEHJYWRUCIMESM-UHFFFAOYSA-L 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 239000003381 stabilizer Substances 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 230000001954 sterilising effect Effects 0.000 description 2
- 238000004659 sterilization and disinfection Methods 0.000 description 2
- 238000007920 subcutaneous administration Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 239000012730 sustained-release form Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004797 therapeutic response Effects 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 229960003087 tioguanine Drugs 0.000 description 2
- 230000002103 transcriptional effect Effects 0.000 description 2
- 108091006107 transcriptional repressors Proteins 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- NMEHNETUFHBYEG-IHKSMFQHSA-N tttn Chemical group C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1[C@@H](CCC1)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCSC)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)[C@@H](C)O)[C@@H](C)O)C1=CC=CC=C1 NMEHNETUFHBYEG-IHKSMFQHSA-N 0.000 description 2
- 210000003171 tumor-infiltrating lymphocyte Anatomy 0.000 description 2
- 239000000277 virosome Substances 0.000 description 2
- 239000000080 wetting agent Substances 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- ALNDFFUAQIVVPG-NGJCXOISSA-N (2r,3r,4r)-3,4,5-trihydroxy-2-methoxypentanal Chemical compound CO[C@@H](C=O)[C@H](O)[C@H](O)CO ALNDFFUAQIVVPG-NGJCXOISSA-N 0.000 description 1
- YIMATHOGWXZHFX-WCTZXXKLSA-N (2r,3r,4r,5r)-5-(hydroxymethyl)-3-(2-methoxyethoxy)oxolane-2,4-diol Chemical compound COCCO[C@H]1[C@H](O)O[C@H](CO)[C@H]1O YIMATHOGWXZHFX-WCTZXXKLSA-N 0.000 description 1
- JNYAEWCLZODPBN-JGWLITMVSA-N (2r,3r,4s)-2-[(1r)-1,2-dihydroxyethyl]oxolane-3,4-diol Chemical class OC[C@@H](O)[C@H]1OC[C@H](O)[C@H]1O JNYAEWCLZODPBN-JGWLITMVSA-N 0.000 description 1
- XMQUEQJCYRFIQS-YFKPBYRVSA-N (2s)-2-amino-5-ethoxy-5-oxopentanoic acid Chemical compound CCOC(=O)CC[C@H](N)C(O)=O XMQUEQJCYRFIQS-YFKPBYRVSA-N 0.000 description 1
- BRCNMMGLEUILLG-NTSWFWBYSA-N (4s,5r)-4,5,6-trihydroxyhexan-2-one Chemical group CC(=O)C[C@H](O)[C@H](O)CO BRCNMMGLEUILLG-NTSWFWBYSA-N 0.000 description 1
- WHBMMWSBFZVSSR-GSVOUGTGSA-N (R)-3-hydroxybutyric acid Chemical compound C[C@@H](O)CC(O)=O WHBMMWSBFZVSSR-GSVOUGTGSA-N 0.000 description 1
- 108090000344 1,4-alpha-Glucan Branching Enzyme Proteins 0.000 description 1
- 102100028734 1,4-alpha-glucan-branching enzyme Human genes 0.000 description 1
- IIZPXYDJLKNOIY-JXPKJXOSSA-N 1-palmitoyl-2-arachidonoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCC(=O)OC[C@H](COP([O-])(=O)OCC[N+](C)(C)C)OC(=O)CCC\C=C/C\C=C/C\C=C/C\C=C/CCCCC IIZPXYDJLKNOIY-JXPKJXOSSA-N 0.000 description 1
- 102100026205 1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase gamma-1 Human genes 0.000 description 1
- IHPYMWDTONKSCO-UHFFFAOYSA-N 2,2'-piperazine-1,4-diylbisethanesulfonic acid Chemical compound OS(=O)(=O)CCN1CCN(CCS(O)(=O)=O)CC1 IHPYMWDTONKSCO-UHFFFAOYSA-N 0.000 description 1
- MPXDAIBTYWGBSL-UHFFFAOYSA-N 2,4-difluoro-1-methylbenzene Chemical compound CC1=CC=C(F)C=C1F MPXDAIBTYWGBSL-UHFFFAOYSA-N 0.000 description 1
- APHFXDBDLKPMTA-UHFFFAOYSA-N 2-(3-decanoyl-4,5,7-trihydroxynaphthalen-2-yl)acetic acid Chemical compound CCCCCCCCCC(=O)c1c(CC(O)=O)cc2cc(O)cc(O)c2c1O APHFXDBDLKPMTA-UHFFFAOYSA-N 0.000 description 1
- SXGZJKUKBWWHRA-UHFFFAOYSA-N 2-(N-morpholiniumyl)ethanesulfonate Chemical compound [O-]S(=O)(=O)CC[NH+]1CCOCC1 SXGZJKUKBWWHRA-UHFFFAOYSA-N 0.000 description 1
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 description 1
- MWBWWFOAEOYUST-UHFFFAOYSA-N 2-aminopurine Chemical compound NC1=NC=C2N=CNC2=N1 MWBWWFOAEOYUST-UHFFFAOYSA-N 0.000 description 1
- 125000004200 2-methoxyethyl group Chemical group [H]C([H])([H])OC([H])([H])C([H])([H])* 0.000 description 1
- JHRPHASLIZOEBJ-UHFFFAOYSA-N 2-methylpyridine-3-carbaldehyde Chemical compound CC1=NC=CC=C1C=O JHRPHASLIZOEBJ-UHFFFAOYSA-N 0.000 description 1
- OALHHIHQOFIMEF-UHFFFAOYSA-N 3',6'-dihydroxy-2',4',5',7'-tetraiodo-3h-spiro[2-benzofuran-1,9'-xanthene]-3-one Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC(I)=C(O)C(I)=C1OC1=C(I)C(O)=C(I)C=C21 OALHHIHQOFIMEF-UHFFFAOYSA-N 0.000 description 1
- QARJWQSAAYUDJA-UHFFFAOYSA-N 3-(aminomethyl)-1,4-dihydroxy-3-(hydroxymethyl)-2-methylbutane-2-sulfonic acid Chemical compound OCC(C)(S(O)(=O)=O)C(CN)(CO)CO QARJWQSAAYUDJA-UHFFFAOYSA-N 0.000 description 1
- WCKQPPQRFNHPRJ-UHFFFAOYSA-N 4-[[4-(dimethylamino)phenyl]diazenyl]benzoic acid Chemical compound C1=CC(N(C)C)=CC=C1N=NC1=CC=C(C(O)=O)C=C1 WCKQPPQRFNHPRJ-UHFFFAOYSA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- LQLQRFGHAALLLE-UHFFFAOYSA-N 5-bromouracil Chemical compound BrC1=CNC(=O)NC1=O LQLQRFGHAALLLE-UHFFFAOYSA-N 0.000 description 1
- JDBGXEHEIRGOBU-UHFFFAOYSA-N 5-hydroxymethyluracil Chemical compound OCC1=CNC(=O)NC1=O JDBGXEHEIRGOBU-UHFFFAOYSA-N 0.000 description 1
- KSNXJLQDQOIRIP-UHFFFAOYSA-N 5-iodouracil Chemical compound IC1=CNC(=O)NC1=O KSNXJLQDQOIRIP-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- UJBCLAXPPIDQEE-UHFFFAOYSA-N 5-prop-1-ynyl-1h-pyrimidine-2,4-dione Chemical compound CC#CC1=CNC(=O)NC1=O UJBCLAXPPIDQEE-UHFFFAOYSA-N 0.000 description 1
- VOBFOFTXJVSVTJ-UHFFFAOYSA-N 5-prop-2-enyl-1h-pyrimidine-2,4-dione Chemical compound C=CCC1=CNC(=O)NC1=O VOBFOFTXJVSVTJ-UHFFFAOYSA-N 0.000 description 1
- PPYAFPNEHGRGIQ-UHFFFAOYSA-N 6-amino-5-ethynyl-1h-pyrimidin-2-one Chemical compound NC1=NC(=O)NC=C1C#C PPYAFPNEHGRGIQ-UHFFFAOYSA-N 0.000 description 1
- QNNARSZPGNJZIX-UHFFFAOYSA-N 6-amino-5-prop-1-ynyl-1h-pyrimidin-2-one Chemical compound CC#CC1=CNC(=O)N=C1N QNNARSZPGNJZIX-UHFFFAOYSA-N 0.000 description 1
- LHCPRYRLDOSKHK-UHFFFAOYSA-N 7-deaza-8-aza-adenine Chemical compound NC1=NC=NC2=C1C=NN2 LHCPRYRLDOSKHK-UHFFFAOYSA-N 0.000 description 1
- LOSIULRWFAEMFL-UHFFFAOYSA-N 7-deazaguanine Chemical compound O=C1NC(N)=NC2=C1CC=N2 LOSIULRWFAEMFL-UHFFFAOYSA-N 0.000 description 1
- VKKXEIQIGGPMHT-UHFFFAOYSA-N 7h-purine-2,8-diamine Chemical compound NC1=NC=C2NC(N)=NC2=N1 VKKXEIQIGGPMHT-UHFFFAOYSA-N 0.000 description 1
- 229960005508 8-azaguanine Drugs 0.000 description 1
- 102100031585 ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase 1 Human genes 0.000 description 1
- 101150048848 ART10 gene Proteins 0.000 description 1
- 208000035657 Abasia Diseases 0.000 description 1
- 241000093740 Acidaminococcus sp. Species 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 108010052875 Adenine deaminase Proteins 0.000 description 1
- 241000972680 Adeno-associated virus - 6 Species 0.000 description 1
- 241001164825 Adeno-associated virus - 8 Species 0.000 description 1
- 102100035990 Adenosine receptor A2a Human genes 0.000 description 1
- 201000011374 Alagille syndrome Diseases 0.000 description 1
- 102100038910 Alpha-enolase Human genes 0.000 description 1
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 102100037435 Antiviral innate immune response receptor RIG-I Human genes 0.000 description 1
- 101710127675 Antiviral innate immune response receptor RIG-I Proteins 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- 208000034318 Argininemia Diseases 0.000 description 1
- 206010058298 Argininosuccinate synthetase deficiency Diseases 0.000 description 1
- 101150038108 Art7 gene Proteins 0.000 description 1
- 108010031480 Artificial Receptors Proteins 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 206010003591 Ataxia Diseases 0.000 description 1
- 206010003594 Ataxia telangiectasia Diseases 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 102100029822 B- and T-lymphocyte attenuator Human genes 0.000 description 1
- 102100038080 B-cell receptor CD22 Human genes 0.000 description 1
- 102100022005 B-lymphocyte antigen CD20 Human genes 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 241000218495 Bactrocera correcta Species 0.000 description 1
- 206010061692 Benign muscle neoplasm Diseases 0.000 description 1
- 239000005711 Benzoic acid Substances 0.000 description 1
- 102100022548 Beta-hexosaminidase subunit alpha Human genes 0.000 description 1
- BVKZGUZCCUSVTD-UHFFFAOYSA-M Bicarbonate Chemical compound OC([O-])=O BVKZGUZCCUSVTD-UHFFFAOYSA-M 0.000 description 1
- 208000005692 Bloom Syndrome Diseases 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 101000964894 Bos taurus 14-3-3 protein zeta/delta Proteins 0.000 description 1
- 241001536303 Botryococcus braunii Species 0.000 description 1
- 241000168061 Butyrivibrio proteoclasticus Species 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 102100027207 CD27 antigen Human genes 0.000 description 1
- 102100032912 CD44 antigen Human genes 0.000 description 1
- 102100025221 CD70 antigen Human genes 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- 241000238097 Callinectes sapidus Species 0.000 description 1
- 102100025570 Cancer/testis antigen 1 Human genes 0.000 description 1
- 241000949035 Candidatus Microgenomates Species 0.000 description 1
- 241000223283 Candidatus Peregrinibacteria bacterium GW2011_GWA2_33_10 Species 0.000 description 1
- 241001316580 Candidatus Roizmanbacteria Species 0.000 description 1
- 102100033040 Carbonic anhydrase 12 Human genes 0.000 description 1
- 101150117674 Cd247 gene Proteins 0.000 description 1
- 108010067225 Cell Adhesion Molecules Proteins 0.000 description 1
- 102000016289 Cell Adhesion Molecules Human genes 0.000 description 1
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 1
- 102000020313 Cell-Penetrating Peptides Human genes 0.000 description 1
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 1
- 201000008992 Charcot-Marie-Tooth disease type 1B Diseases 0.000 description 1
- 241000195597 Chlamydomonas reinhardtii Species 0.000 description 1
- 244000249214 Chlorella pyrenoidosa Species 0.000 description 1
- 235000007091 Chlorella pyrenoidosa Nutrition 0.000 description 1
- GHXZTYHSJHQHIJ-UHFFFAOYSA-N Chlorhexidine Chemical compound C=1C=C(Cl)C=CC=1NC(N)=NC(N)=NCCCCCCN=C(N)N=C(N)NC1=CC=C(Cl)C=C1 GHXZTYHSJHQHIJ-UHFFFAOYSA-N 0.000 description 1
- 201000011297 Citrullinemia Diseases 0.000 description 1
- 102100026735 Coagulation factor VIII Human genes 0.000 description 1
- 206010010317 Congenital absence of bile ducts Diseases 0.000 description 1
- 206010010356 Congenital anomaly Diseases 0.000 description 1
- 206010053138 Congenital aplastic anaemia Diseases 0.000 description 1
- 208000001819 Crigler-Najjar Syndrome Diseases 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 229920000858 Cyclodextrin Polymers 0.000 description 1
- 102100026846 Cytidine deaminase Human genes 0.000 description 1
- 108010031325 Cytidine deaminase Proteins 0.000 description 1
- 102100025621 Cytochrome b-245 heavy chain Human genes 0.000 description 1
- WQZGKKKJIJFFOK-QTVWNMPRSA-N D-mannopyranose Chemical compound OC[C@H]1OC(O)[C@@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-QTVWNMPRSA-N 0.000 description 1
- 108010060248 DNA Ligase ATP Proteins 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 102100033195 DNA ligase 4 Human genes 0.000 description 1
- 241000450599 DNA viruses Species 0.000 description 1
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 239000004375 Dextrin Substances 0.000 description 1
- 229920001353 Dextrin Polymers 0.000 description 1
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 description 1
- 102000001301 EGF receptor Human genes 0.000 description 1
- 108060006698 EGF receptor Proteins 0.000 description 1
- 241000258955 Echinodermata Species 0.000 description 1
- 241001370750 Echinopsis oxygona Species 0.000 description 1
- 102100032384 Ecto-ADP-ribosyltransferase 3 Human genes 0.000 description 1
- 102100036992 Ecto-ADP-ribosyltransferase 5 Human genes 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 239000001116 FEMA 4028 Substances 0.000 description 1
- 108010076282 Factor IX Proteins 0.000 description 1
- 108010054218 Factor VIII Proteins 0.000 description 1
- 102000001690 Factor VIII Human genes 0.000 description 1
- 201000003542 Factor VIII deficiency Diseases 0.000 description 1
- 201000004939 Fanconi anemia Diseases 0.000 description 1
- 108010009306 Forkhead Box Protein O1 Proteins 0.000 description 1
- 102100035427 Forkhead box protein O1 Human genes 0.000 description 1
- 208000001914 Fragile X syndrome Diseases 0.000 description 1
- 241000589602 Francisella tularensis Species 0.000 description 1
- 241000589599 Francisella tularensis subsp. novicida Species 0.000 description 1
- 241000588088 Francisella tularensis subsp. novicida U112 Species 0.000 description 1
- 208000024412 Friedreich ataxia Diseases 0.000 description 1
- 102100022629 Fructose-2,6-bisphosphatase Human genes 0.000 description 1
- 102100022277 Fructose-bisphosphate aldolase A Human genes 0.000 description 1
- 230000010190 G1 phase Effects 0.000 description 1
- 208000027472 Galactosemias Diseases 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 108010010803 Gelatin Proteins 0.000 description 1
- 230000010596 Gene Editing or Modification Effects 0.000 description 1
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 description 1
- 208000006562 Glycogen Storage Disease Type VII Diseases 0.000 description 1
- 102100039262 Glycogen [starch] synthase, muscle Human genes 0.000 description 1
- 206010018464 Glycogen storage disease type I Diseases 0.000 description 1
- 206010053185 Glycogen storage disease type II Diseases 0.000 description 1
- 206010018462 Glycogen storage disease type V Diseases 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- HVLSXIKZNLPZJJ-TXZCQADKSA-N HA peptide Chemical compound C([C@@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](C)C(O)=O)NC(=O)[C@H]1N(CCC1)C(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=C(O)C=C1 HVLSXIKZNLPZJJ-TXZCQADKSA-N 0.000 description 1
- 239000007995 HEPES buffer Substances 0.000 description 1
- 102100028970 HLA class I histocompatibility antigen, alpha chain E Human genes 0.000 description 1
- 102100028967 HLA class I histocompatibility antigen, alpha chain G Human genes 0.000 description 1
- 102100030595 HLA class II histocompatibility antigen gamma chain Human genes 0.000 description 1
- 102000025850 HLA-A2 Antigen Human genes 0.000 description 1
- 108010074032 HLA-A2 Antigen Proteins 0.000 description 1
- 108010024164 HLA-G Antigens Proteins 0.000 description 1
- 102100031573 Hematopoietic progenitor cell antigen CD34 Human genes 0.000 description 1
- 101800000637 Hemokinin Proteins 0.000 description 1
- 208000009292 Hemophilia A Diseases 0.000 description 1
- 208000002972 Hepatolenticular Degeneration Diseases 0.000 description 1
- 102100039869 Histone H2B type F-S Human genes 0.000 description 1
- 102000011787 Histone Methyltransferases Human genes 0.000 description 1
- 108010036115 Histone Methyltransferases Proteins 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 101000691599 Homo sapiens 1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase gamma-1 Proteins 0.000 description 1
- 101000600756 Homo sapiens 3-phosphoinositide-dependent protein kinase 1 Proteins 0.000 description 1
- 101000777636 Homo sapiens ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase 1 Proteins 0.000 description 1
- 101000882335 Homo sapiens Alpha-enolase Proteins 0.000 description 1
- 101000864344 Homo sapiens B- and T-lymphocyte attenuator Proteins 0.000 description 1
- 101000884305 Homo sapiens B-cell receptor CD22 Proteins 0.000 description 1
- 101000897405 Homo sapiens B-lymphocyte antigen CD20 Proteins 0.000 description 1
- 101100219559 Homo sapiens CARD11 gene Proteins 0.000 description 1
- 101000914511 Homo sapiens CD27 antigen Proteins 0.000 description 1
- 101000868273 Homo sapiens CD44 antigen Proteins 0.000 description 1
- 101000934356 Homo sapiens CD70 antigen Proteins 0.000 description 1
- 101000856237 Homo sapiens Cancer/testis antigen 1 Proteins 0.000 description 1
- 101000867855 Homo sapiens Carbonic anhydrase 12 Proteins 0.000 description 1
- 101000761179 Homo sapiens Caspase recruitment domain-containing protein 11 Proteins 0.000 description 1
- 101000721661 Homo sapiens Cellular tumor antigen p53 Proteins 0.000 description 1
- 101000911390 Homo sapiens Coagulation factor VIII Proteins 0.000 description 1
- 101000589618 Homo sapiens Ecto-ADP-ribosyltransferase 3 Proteins 0.000 description 1
- 101001024566 Homo sapiens Ecto-ADP-ribosyltransferase 4 Proteins 0.000 description 1
- 101001024570 Homo sapiens Ecto-ADP-ribosyltransferase 5 Proteins 0.000 description 1
- 101000823463 Homo sapiens Fructose-2,6-bisphosphatase Proteins 0.000 description 1
- 101000755879 Homo sapiens Fructose-bisphosphate aldolase A Proteins 0.000 description 1
- 101000886596 Homo sapiens Geminin Proteins 0.000 description 1
- 101000926939 Homo sapiens Glucocorticoid receptor Proteins 0.000 description 1
- 101001036130 Homo sapiens Glycogen [starch] synthase, muscle Proteins 0.000 description 1
- 101000986085 Homo sapiens HLA class I histocompatibility antigen, alpha chain E Proteins 0.000 description 1
- 101001082627 Homo sapiens HLA class II histocompatibility antigen gamma chain Proteins 0.000 description 1
- 101000777663 Homo sapiens Hematopoietic progenitor cell antigen CD34 Proteins 0.000 description 1
- 101001068133 Homo sapiens Hepatitis A virus cellular receptor 2 Proteins 0.000 description 1
- 101000840551 Homo sapiens Hexokinase-2 Proteins 0.000 description 1
- 101001035372 Homo sapiens Histone H2B type F-S Proteins 0.000 description 1
- 101000994365 Homo sapiens Integrin alpha-6 Proteins 0.000 description 1
- 101001078143 Homo sapiens Integrin alpha-IIb Proteins 0.000 description 1
- 101001082073 Homo sapiens Interferon-induced helicase C domain-containing protein 1 Proteins 0.000 description 1
- 101000998120 Homo sapiens Interleukin-3 receptor subunit alpha Proteins 0.000 description 1
- 101001050577 Homo sapiens Kinesin-like protein KIF2A Proteins 0.000 description 1
- 101001090713 Homo sapiens L-lactate dehydrogenase A chain Proteins 0.000 description 1
- 101000972918 Homo sapiens MAX gene-associated protein Proteins 0.000 description 1
- 101001036580 Homo sapiens Max dimerization protein 4 Proteins 0.000 description 1
- 101000576320 Homo sapiens Max-binding protein MNT Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 101000653374 Homo sapiens Methylcytosine dioxygenase TET2 Proteins 0.000 description 1
- 101000957106 Homo sapiens Mitotic spindle assembly checkpoint protein MAD1 Proteins 0.000 description 1
- 101000934338 Homo sapiens Myeloid cell surface antigen CD33 Proteins 0.000 description 1
- 101000986595 Homo sapiens Ornithine transcarbamylase, mitochondrial Proteins 0.000 description 1
- 101000904196 Homo sapiens Pancreatic secretory granule membrane major glycoprotein GP2 Proteins 0.000 description 1
- 101000579123 Homo sapiens Phosphoglycerate kinase 1 Proteins 0.000 description 1
- 101001026214 Homo sapiens Potassium voltage-gated channel subfamily A member 5 Proteins 0.000 description 1
- 101000610551 Homo sapiens Prominin-1 Proteins 0.000 description 1
- 101001048456 Homo sapiens Protein Hook homolog 2 Proteins 0.000 description 1
- 101001000998 Homo sapiens Protein phosphatase 1 regulatory subunit 12C Proteins 0.000 description 1
- 101001091538 Homo sapiens Pyruvate kinase PKM Proteins 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 101000932478 Homo sapiens Receptor-type tyrosine-protein kinase FLT3 Proteins 0.000 description 1
- 101000829367 Homo sapiens Src substrate cortactin Proteins 0.000 description 1
- 101000874179 Homo sapiens Syndecan-1 Proteins 0.000 description 1
- 101000914496 Homo sapiens T-cell antigen CD7 Proteins 0.000 description 1
- 101000738335 Homo sapiens T-cell surface glycoprotein CD3 zeta chain Proteins 0.000 description 1
- 101000934341 Homo sapiens T-cell surface glycoprotein CD5 Proteins 0.000 description 1
- 101000914514 Homo sapiens T-cell-specific surface glycoprotein CD28 Proteins 0.000 description 1
- 101000799181 Homo sapiens TP53-binding protein 1 Proteins 0.000 description 1
- 101000831496 Homo sapiens Toll-like receptor 3 Proteins 0.000 description 1
- 101000669402 Homo sapiens Toll-like receptor 7 Proteins 0.000 description 1
- 101000800483 Homo sapiens Toll-like receptor 8 Proteins 0.000 description 1
- 101000611023 Homo sapiens Tumor necrosis factor receptor superfamily member 6 Proteins 0.000 description 1
- 101000851376 Homo sapiens Tumor necrosis factor receptor superfamily member 8 Proteins 0.000 description 1
- 101000851370 Homo sapiens Tumor necrosis factor receptor superfamily member 9 Proteins 0.000 description 1
- 101001047681 Homo sapiens Tyrosine-protein kinase Lck Proteins 0.000 description 1
- 101000666896 Homo sapiens V-type immunoglobulin domain-containing suppressor of T-cell activation Proteins 0.000 description 1
- 101001117146 Homo sapiens [Pyruvate dehydrogenase (acetyl-transferring)] kinase isozyme 1, mitochondrial Proteins 0.000 description 1
- 206010020460 Human T-cell lymphotropic virus type I infection Diseases 0.000 description 1
- 241000714260 Human T-lymphotropic virus 1 Species 0.000 description 1
- 208000023105 Huntington disease Diseases 0.000 description 1
- 208000015178 Hurler syndrome Diseases 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 208000029663 Hypophosphatemia Diseases 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 101150066050 IL7R gene Proteins 0.000 description 1
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 102100032816 Integrin alpha-6 Human genes 0.000 description 1
- 102100025306 Integrin alpha-IIb Human genes 0.000 description 1
- 102100027353 Interferon-induced helicase C domain-containing protein 1 Human genes 0.000 description 1
- 108010002352 Interleukin-1 Proteins 0.000 description 1
- 102000013462 Interleukin-12 Human genes 0.000 description 1
- 108010065805 Interleukin-12 Proteins 0.000 description 1
- 102000003812 Interleukin-15 Human genes 0.000 description 1
- 108090000172 Interleukin-15 Proteins 0.000 description 1
- 102000003810 Interleukin-18 Human genes 0.000 description 1
- 108090000171 Interleukin-18 Proteins 0.000 description 1
- 102100030703 Interleukin-22 Human genes 0.000 description 1
- 102100033493 Interleukin-3 receptor subunit alpha Human genes 0.000 description 1
- 108010002586 Interleukin-7 Proteins 0.000 description 1
- 102000000704 Interleukin-7 Human genes 0.000 description 1
- LPHGQDQBBGAPDZ-UHFFFAOYSA-N Isocaffeine Natural products CN1C(=O)N(C)C(=O)C2=C1N(C)C=N2 LPHGQDQBBGAPDZ-UHFFFAOYSA-N 0.000 description 1
- 102000002698 KIR Receptors Human genes 0.000 description 1
- 108010043610 KIR Receptors Proteins 0.000 description 1
- 102100023426 Kinesin-like protein KIF2A Human genes 0.000 description 1
- 125000000998 L-alanino group Chemical group [H]N([*])[C@](C([H])([H])[H])([H])C(=O)O[H] 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-N L-arginine Chemical compound OC(=O)[C@@H](N)CCCN=C(N)N ODKSFYDXXFIFQN-BYPYZUCNSA-N 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- 102100034671 L-lactate dehydrogenase A chain Human genes 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- 108010001831 LDL receptors Proteins 0.000 description 1
- 241000448224 Lachnospiraceae bacterium MA2020 Species 0.000 description 1
- 241000448225 Lachnospiraceae bacterium MC2017 Species 0.000 description 1
- 241000689670 Lachnospiraceae bacterium ND2006 Species 0.000 description 1
- 208000005870 Lafora disease Diseases 0.000 description 1
- 208000014161 Lafora myoclonic epilepsy Diseases 0.000 description 1
- 101710128836 Large T antigen Proteins 0.000 description 1
- 101150028321 Lck gene Proteins 0.000 description 1
- 201000003533 Leber congenital amaurosis Diseases 0.000 description 1
- 241000270322 Lepidosauria Species 0.000 description 1
- 241001148627 Leptospira inadai Species 0.000 description 1
- 208000009625 Lesch-Nyhan syndrome Diseases 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 208000015439 Lysosomal storage disease Diseases 0.000 description 1
- 102100022621 MAX gene-associated protein Human genes 0.000 description 1
- 101710120903 Malignant T-cell-amplified sequence 1 Proteins 0.000 description 1
- 101710186853 Malignant T-cell-amplified sequence 1 homolog Proteins 0.000 description 1
- 208000035051 Malignant migrating focal seizures of infancy Diseases 0.000 description 1
- 102100039515 Max dimerization protein 4 Human genes 0.000 description 1
- 102000008840 Melanoma-associated antigen 1 Human genes 0.000 description 1
- 108050000731 Melanoma-associated antigen 1 Proteins 0.000 description 1
- 108090000015 Mesothelin Proteins 0.000 description 1
- 102000003735 Mesothelin Human genes 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 102100030803 Methylcytosine dioxygenase TET2 Human genes 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102100034068 Monocarboxylate transporter 1 Human genes 0.000 description 1
- 241001193016 Moraxella bovoculi 237 Species 0.000 description 1
- 102100025748 Mothers against decapentaplegic homolog 3 Human genes 0.000 description 1
- 101710143111 Mothers against decapentaplegic homolog 3 Proteins 0.000 description 1
- 206010028095 Mucopolysaccharidosis IV Diseases 0.000 description 1
- 208000028781 Mucopolysaccharidosis type 1 Diseases 0.000 description 1
- 208000025915 Mucopolysaccharidosis type 6 Diseases 0.000 description 1
- 101000590284 Mus musculus 26S proteasome non-ATPase regulatory subunit 14 Proteins 0.000 description 1
- 101100078999 Mus musculus Mx1 gene Proteins 0.000 description 1
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 1
- 101710135898 Myc proto-oncogene protein Proteins 0.000 description 1
- 102100025243 Myeloid cell surface antigen CD33 Human genes 0.000 description 1
- 201000004458 Myoma Diseases 0.000 description 1
- 206010068871 Myotonic dystrophy Diseases 0.000 description 1
- OVBPIULPVIDEAO-UHFFFAOYSA-N N-Pteroyl-L-glutaminsaeure Natural products C=1N=C2NC(N)=NC(=O)C2=NC=1CNC1=CC=C(C(=O)NC(CCC(O)=O)C(O)=O)C=C1 OVBPIULPVIDEAO-UHFFFAOYSA-N 0.000 description 1
- YNLCVAQJIKOXER-UHFFFAOYSA-N N-[tris(hydroxymethyl)methyl]-3-aminopropanesulfonic acid Chemical compound OCC(CO)(CO)NCCCS(O)(=O)=O YNLCVAQJIKOXER-UHFFFAOYSA-N 0.000 description 1
- 108700043217 N-acetyl glutamate synthetase deficiency Proteins 0.000 description 1
- 206010071092 N-acetylglutamate synthase deficiency Diseases 0.000 description 1
- 241001250129 Nannochloropsis gaditana Species 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 102000003729 Neprilysin Human genes 0.000 description 1
- 108090000028 Neprilysin Proteins 0.000 description 1
- 108010069196 Neural Cell Adhesion Molecules Proteins 0.000 description 1
- 102100023616 Neural cell adhesion molecule L1-like protein Human genes 0.000 description 1
- 229940122426 Nuclease inhibitor Drugs 0.000 description 1
- 108091005461 Nucleic proteins Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 206010030348 Open-Angle Glaucoma Diseases 0.000 description 1
- 208000000599 Ornithine Carbamoyltransferase Deficiency Disease Diseases 0.000 description 1
- 206010052450 Ornithine transcarbamoylase deficiency Diseases 0.000 description 1
- 208000035903 Ornithine transcarbamylase deficiency Diseases 0.000 description 1
- 102100028200 Ornithine transcarbamylase, mitochondrial Human genes 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- KJWZYMMLVHIVSU-IYCNHOCDSA-N PGK1 Chemical compound CCCCC[C@H](O)\C=C\[C@@H]1[C@@H](CCCCCCC(O)=O)C(=O)CC1=O KJWZYMMLVHIVSU-IYCNHOCDSA-N 0.000 description 1
- 102100024019 Pancreatic secretory granule membrane major glycoprotein GP2 Human genes 0.000 description 1
- 206010033892 Paraplegia Diseases 0.000 description 1
- 241000182952 Parcubacteria group bacterium GW2011_GWC2_44_17 Species 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 206010034620 Peripheral sensory neuropathy Diseases 0.000 description 1
- 241000009328 Perro Species 0.000 description 1
- 201000011252 Phenylketonuria Diseases 0.000 description 1
- 102100028251 Phosphoglycerate kinase 1 Human genes 0.000 description 1
- 102000014750 Phosphorylase Kinase Human genes 0.000 description 1
- 108010064071 Phosphorylase Kinase Proteins 0.000 description 1
- 108010073135 Phosphorylases Proteins 0.000 description 1
- 102000009097 Phosphorylases Human genes 0.000 description 1
- 101150080509 Plcg1 gene Proteins 0.000 description 1
- RVGRUAULSDPKGF-UHFFFAOYSA-N Poloxamer Chemical compound C1CO1.CC1CO1 RVGRUAULSDPKGF-UHFFFAOYSA-N 0.000 description 1
- 108010061844 Poly(ADP-ribose) Polymerases Proteins 0.000 description 1
- 102000012338 Poly(ADP-ribose) Polymerases Human genes 0.000 description 1
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 108010039918 Polylysine Proteins 0.000 description 1
- 241000878522 Porphyromonas crevioricanis Species 0.000 description 1
- 241001135241 Porphyromonas macacae Species 0.000 description 1
- 241001302521 Prevotella albensis Species 0.000 description 1
- 241001135219 Prevotella disiens Species 0.000 description 1
- 102100040120 Prominin-1 Human genes 0.000 description 1
- 102000007327 Protamines Human genes 0.000 description 1
- 108010007568 Protamines Proteins 0.000 description 1
- 102100023602 Protein Hook homolog 1 Human genes 0.000 description 1
- 102100035620 Protein phosphatase 1 regulatory subunit 12C Human genes 0.000 description 1
- 102100034911 Pyruvate kinase PKM Human genes 0.000 description 1
- 230000026279 RNA modification Effects 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 102100022491 RNA-binding protein NOB1 Human genes 0.000 description 1
- 102000002490 Rad51 Recombinase Human genes 0.000 description 1
- 108010068097 Rad51 Recombinase Proteins 0.000 description 1
- 241000773293 Rappaport Species 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 102100020718 Receptor-type tyrosine-protein kinase FLT3 Human genes 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 108020005091 Replication Origin Proteins 0.000 description 1
- 208000007014 Retinitis pigmentosa Diseases 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 108091006296 SLC2A1 Proteins 0.000 description 1
- 108091006298 SLC2A3 Proteins 0.000 description 1
- 108091006647 SLC9A1 Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 101100215928 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) ALY1 gene Proteins 0.000 description 1
- 241000593524 Sargassum patens Species 0.000 description 1
- 201000002883 Scheie syndrome Diseases 0.000 description 1
- 102000007562 Serum Albumin Human genes 0.000 description 1
- 108010071390 Serum Albumin Proteins 0.000 description 1
- 102100030980 Sodium/hydrogen exchanger 1 Human genes 0.000 description 1
- 102100023536 Solute carrier family 2, facilitated glucose transporter member 1 Human genes 0.000 description 1
- 102100022722 Solute carrier family 2, facilitated glucose transporter member 3 Human genes 0.000 description 1
- 208000032930 Spastic paraplegia Diseases 0.000 description 1
- PFNFFQXMRSDOHW-UHFFFAOYSA-N Spermine Natural products NCCCNCCCCNCCCN PFNFFQXMRSDOHW-UHFFFAOYSA-N 0.000 description 1
- 208000009415 Spinocerebellar Ataxias Diseases 0.000 description 1
- 102100023719 Src substrate cortactin Human genes 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 229930006000 Sucrose Natural products 0.000 description 1
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 1
- 241001602708 Sulfuricurvum sp. Species 0.000 description 1
- 102100035721 Syndecan-1 Human genes 0.000 description 1
- 102100027208 T-cell antigen CD7 Human genes 0.000 description 1
- 102100025244 T-cell surface glycoprotein CD5 Human genes 0.000 description 1
- 102100027213 T-cell-specific surface glycoprotein CD28 Human genes 0.000 description 1
- 101710156963 TP53-binding protein 1 Proteins 0.000 description 1
- 102100034107 TP53-binding protein 1 Human genes 0.000 description 1
- 208000022292 Tay-Sachs disease Diseases 0.000 description 1
- 210000000447 Th1 cell Anatomy 0.000 description 1
- 210000004241 Th2 cell Anatomy 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 108010022394 Threonine synthase Proteins 0.000 description 1
- 108091046915 Threose nucleic acid Proteins 0.000 description 1
- 102000008235 Toll-Like Receptor 9 Human genes 0.000 description 1
- 108010060818 Toll-Like Receptor 9 Proteins 0.000 description 1
- 102100024324 Toll-like receptor 3 Human genes 0.000 description 1
- 102100039390 Toll-like receptor 7 Human genes 0.000 description 1
- 102100033110 Toll-like receptor 8 Human genes 0.000 description 1
- 208000035317 Total hypoxanthine-guanine phosphoribosyl transferase deficiency Diseases 0.000 description 1
- 101710150448 Transcriptional regulator Myc Proteins 0.000 description 1
- 101800000385 Transmembrane protein Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102100022153 Tumor necrosis factor receptor superfamily member 4 Human genes 0.000 description 1
- 101710165473 Tumor necrosis factor receptor superfamily member 4 Proteins 0.000 description 1
- 102100036857 Tumor necrosis factor receptor superfamily member 8 Human genes 0.000 description 1
- 102100036856 Tumor necrosis factor receptor superfamily member 9 Human genes 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 101710128901 Tyrosine-protein phosphatase non-receptor type 6 Proteins 0.000 description 1
- 208000014769 Usher Syndromes Diseases 0.000 description 1
- 108010079206 V-Set Domain-Containing T-Cell Activation Inhibitor 1 Proteins 0.000 description 1
- 102100038929 V-set domain-containing T-cell activation inhibitor 1 Human genes 0.000 description 1
- 102100038282 V-type immunoglobulin domain-containing suppressor of T-cell activation Human genes 0.000 description 1
- 108010073929 Vascular Endothelial Growth Factor A Proteins 0.000 description 1
- 102000005789 Vascular Endothelial Growth Factors Human genes 0.000 description 1
- 108010019530 Vascular Endothelial Growth Factors Proteins 0.000 description 1
- 102100026383 Vasopressin-neurophysin 2-copeptin Human genes 0.000 description 1
- 108091093126 WHP Posttrascriptional Response Element Proteins 0.000 description 1
- 208000018839 Wilson disease Diseases 0.000 description 1
- 208000006110 Wiskott-Aldrich syndrome Diseases 0.000 description 1
- 208000027024 X-linked chronic granulomatous disease Diseases 0.000 description 1
- 201000006083 Xeroderma Pigmentosum Diseases 0.000 description 1
- 241001531273 [Eubacterium] eligens Species 0.000 description 1
- 102100024148 [Pyruvate dehydrogenase (acetyl-transferring)] kinase isozyme 1, mitochondrial Human genes 0.000 description 1
- 239000003070 absorption delaying agent Substances 0.000 description 1
- 150000001242 acetic acid derivatives Chemical class 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 208000037919 acquired disease Diseases 0.000 description 1
- 239000013543 active substance Substances 0.000 description 1
- 201000010275 acute porphyria Diseases 0.000 description 1
- 230000004721 adaptive immunity Effects 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 239000000048 adrenergic agonist Substances 0.000 description 1
- 229940126157 adrenergic receptor agonist Drugs 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 206010064930 age-related macular degeneration Diseases 0.000 description 1
- 239000000556 agonist Substances 0.000 description 1
- 229910001508 alkali metal halide Inorganic materials 0.000 description 1
- 150000008045 alkali metal halides Chemical class 0.000 description 1
- 208000006682 alpha 1-Antitrypsin Deficiency Diseases 0.000 description 1
- SRHNADOZAAWYLV-XLMUYGLTSA-N alpha-L-Fucp-(1->2)-beta-D-Galp-(1->4)-[alpha-L-Fucp-(1->3)]-beta-D-GlcpNAc Chemical compound O[C@H]1[C@H](O)[C@H](O)[C@H](C)O[C@H]1O[C@H]1[C@H](O[C@H]2[C@@H]([C@@H](NC(C)=O)[C@H](O)O[C@@H]2CO)O[C@H]2[C@H]([C@H](O)[C@H](O)[C@H](C)O2)O)O[C@H](CO)[C@H](O)[C@@H]1O SRHNADOZAAWYLV-XLMUYGLTSA-N 0.000 description 1
- 125000003368 amide group Chemical group 0.000 description 1
- 125000003277 amino group Chemical group 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 102000025171 antigen binding proteins Human genes 0.000 description 1
- 108091000831 antigen binding proteins Proteins 0.000 description 1
- 239000004599 antimicrobial Substances 0.000 description 1
- 238000002617 apheresis Methods 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 201000003554 argininosuccinic aciduria Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 230000003385 bacteriostatic effect Effects 0.000 description 1
- 210000003651 basophil Anatomy 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 229960000686 benzalkonium chloride Drugs 0.000 description 1
- 235000010233 benzoic acid Nutrition 0.000 description 1
- 229960004365 benzoic acid Drugs 0.000 description 1
- 235000019445 benzyl alcohol Nutrition 0.000 description 1
- CADWTSSKOVRVJC-UHFFFAOYSA-N benzyl(dimethyl)azanium;chloride Chemical compound [Cl-].C[NH+](C)CC1=CC=CC=C1 CADWTSSKOVRVJC-UHFFFAOYSA-N 0.000 description 1
- WHGYBXFWUBPSRW-FOUAGVGXSA-N beta-cyclodextrin Chemical compound OC[C@H]([C@H]([C@@H]([C@H]1O)O)O[C@H]2O[C@@H]([C@@H](O[C@H]3O[C@H](CO)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](CO)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](CO)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](CO)[C@H]([C@@H]([C@H]3O)O)O3)[C@H](O)[C@H]2O)CO)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O)[C@@H]3O[C@@H]1CO WHGYBXFWUBPSRW-FOUAGVGXSA-N 0.000 description 1
- 235000011175 beta-cyclodextrine Nutrition 0.000 description 1
- 229960004853 betadex Drugs 0.000 description 1
- 201000005271 biliary atresia Diseases 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 206010071434 biotinidase deficiency Diseases 0.000 description 1
- 210000002449 bone cell Anatomy 0.000 description 1
- KQNZDYYTLMIZCT-KQPMLPITSA-N brefeldin A Chemical compound O[C@@H]1\C=C\C(=O)O[C@@H](C)CCC\C=C\[C@@H]2C[C@H](O)C[C@H]21 KQNZDYYTLMIZCT-KQPMLPITSA-N 0.000 description 1
- JUMGSHROWPPKFX-UHFFFAOYSA-N brefeldin-A Natural products CC1CCCC=CC2(C)CC(O)CC2(C)C(O)C=CC(=O)O1 JUMGSHROWPPKFX-UHFFFAOYSA-N 0.000 description 1
- 239000007975 buffered saline Substances 0.000 description 1
- 239000006172 buffering agent Substances 0.000 description 1
- 239000004067 bulking agent Substances 0.000 description 1
- DQXBYHZEEUGOBF-UHFFFAOYSA-N but-3-enoic acid;ethene Chemical compound C=C.OC(=O)CC=C DQXBYHZEEUGOBF-UHFFFAOYSA-N 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- VJEONQKOZGKCAK-UHFFFAOYSA-N caffeine Natural products CN1C(=O)N(C)C(=O)C2=C1C=CN2C VJEONQKOZGKCAK-UHFFFAOYSA-N 0.000 description 1
- 210000000234 capsid Anatomy 0.000 description 1
- 208000022843 carbamoyl phosphate synthetase I deficiency disease Diseases 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 125000002091 cationic group Chemical group 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 230000005754 cellular signaling Effects 0.000 description 1
- 235000013330 chicken meat Nutrition 0.000 description 1
- 229960003260 chlorhexidine Drugs 0.000 description 1
- 229940107161 cholesterol Drugs 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 208000036733 chronic X-linked granulomatous disease Diseases 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000000536 complexating effect Effects 0.000 description 1
- 239000008139 complexing agent Substances 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013270 controlled release Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229920001577 copolymer Polymers 0.000 description 1
- 230000000139 costimulatory effect Effects 0.000 description 1
- 230000002338 cryopreservative effect Effects 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 210000001151 cytotoxic T lymphocyte Anatomy 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000003405 delayed action preparation Substances 0.000 description 1
- 230000017858 demethylation Effects 0.000 description 1
- 238000010520 demethylation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000019425 dextrin Nutrition 0.000 description 1
- 239000008121 dextrose Substances 0.000 description 1
- 201000010064 diabetes insipidus Diseases 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 1
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 1
- 102000004419 dihydrofolate reductase Human genes 0.000 description 1
- NAGJZTKCGNOGPW-UHFFFAOYSA-K dioxido-sulfanylidene-sulfido-$l^{5}-phosphane Chemical compound [O-]P([O-])([S-])=S NAGJZTKCGNOGPW-UHFFFAOYSA-K 0.000 description 1
- 150000002016 disaccharides Chemical class 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004090 dissolution Methods 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 238000012377 drug delivery Methods 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 210000002308 embryonic cell Anatomy 0.000 description 1
- 239000003995 emulsifying agent Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- JOZGNYDSEBIJDH-UHFFFAOYSA-N eniluracil Chemical compound O=C1NC=C(C#C)C(=O)N1 JOZGNYDSEBIJDH-UHFFFAOYSA-N 0.000 description 1
- 238000001952 enzyme assay Methods 0.000 description 1
- 210000003979 eosinophil Anatomy 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 238000012236 epigenome editing Methods 0.000 description 1
- 201000006517 essential tremor Diseases 0.000 description 1
- 235000019441 ethanol Nutrition 0.000 description 1
- BEFDCLMNVWHSGT-UHFFFAOYSA-N ethenylcyclopentane Chemical compound C=CC1CCCC1 BEFDCLMNVWHSGT-UHFFFAOYSA-N 0.000 description 1
- 239000005038 ethylene vinyl acetate Substances 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 230000001036 exonucleolytic effect Effects 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 229960004222 factor ix Drugs 0.000 description 1
- 229960000301 factor viii Drugs 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- VLMZMRDOMOGGFA-WDBKCZKBSA-N festuclavine Chemical compound C1=CC([C@H]2C[C@H](CN(C)[C@@H]2C2)C)=C3C2=CNC3=C1 VLMZMRDOMOGGFA-WDBKCZKBSA-N 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 239000000945 filler Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- IJJVMEJXYNJXOJ-UHFFFAOYSA-N fluquinconazole Chemical compound C=1C=C(Cl)C=C(Cl)C=1N1C(=O)C2=CC(F)=CC=C2N=C1N1C=NC=N1 IJJVMEJXYNJXOJ-UHFFFAOYSA-N 0.000 description 1
- 229940014144 folate Drugs 0.000 description 1
- 229960000304 folic acid Drugs 0.000 description 1
- 229940118764 francisella tularensis Drugs 0.000 description 1
- 238000004108 freeze drying Methods 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000000799 fusogenic effect Effects 0.000 description 1
- 150000002270 gangliosides Chemical class 0.000 description 1
- 229920000159 gelatin Polymers 0.000 description 1
- 239000008273 gelatin Substances 0.000 description 1
- 235000019322 gelatine Nutrition 0.000 description 1
- 235000011852 gelatine desserts Nutrition 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 230000008826 genomic mutation Effects 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 229960002989 glutamic acid Drugs 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 208000015362 glutaric aciduria Diseases 0.000 description 1
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 1
- 208000007345 glycogen storage disease Diseases 0.000 description 1
- 210000002288 golgi apparatus Anatomy 0.000 description 1
- 210000003714 granulocyte Anatomy 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 229940093915 gynecological organic acid Drugs 0.000 description 1
- 229910052736 halogen Inorganic materials 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 210000002443 helper t lymphocyte Anatomy 0.000 description 1
- 208000009429 hemophilia B Diseases 0.000 description 1
- 208000033552 hepatic porphyria Diseases 0.000 description 1
- 208000006454 hepatitis Diseases 0.000 description 1
- 231100000283 hepatitis Toxicity 0.000 description 1
- 125000005842 heteroatom Chemical group 0.000 description 1
- 210000003630 histaminocyte Anatomy 0.000 description 1
- 229940088597 hormone Drugs 0.000 description 1
- 239000005556 hormone Substances 0.000 description 1
- 230000005099 host tropism Effects 0.000 description 1
- 102000055905 human ADORA2A Human genes 0.000 description 1
- 102000054910 human GMNN Human genes 0.000 description 1
- 102000055958 human TP53BP1 Human genes 0.000 description 1
- 210000003917 human chromosome Anatomy 0.000 description 1
- 239000000017 hydrogel Substances 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 229960002163 hydrogen peroxide Drugs 0.000 description 1
- 229920001477 hydrophilic polymer Polymers 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 201000011286 hyperargininemia Diseases 0.000 description 1
- 102000027596 immune receptors Human genes 0.000 description 1
- 108091008915 immune receptors Proteins 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 229940072221 immunoglobulins Drugs 0.000 description 1
- 238000003364 immunohistochemistry Methods 0.000 description 1
- 230000003308 immunostimulating effect Effects 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000001976 improved effect Effects 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 239000000411 inducer Substances 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 208000000509 infertility Diseases 0.000 description 1
- 230000036512 infertility Effects 0.000 description 1
- 208000021267 infertility disease Diseases 0.000 description 1
- 108700032552 influenza virus INS1 Proteins 0.000 description 1
- 238000001802 infusion Methods 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 108010074108 interleukin-21 Proteins 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 230000029225 intracellular protein transport Effects 0.000 description 1
- 238000007912 intraperitoneal administration Methods 0.000 description 1
- 208000028867 ischemia Diseases 0.000 description 1
- 239000007951 isotonicity adjuster Substances 0.000 description 1
- 208000006443 lactic acidosis Diseases 0.000 description 1
- 239000000787 lecithin Substances 0.000 description 1
- 229940067606 lecithin Drugs 0.000 description 1
- 235000010445 lecithin Nutrition 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 208000002780 macular degeneration Diseases 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 235000011147 magnesium chloride Nutrition 0.000 description 1
- 229910052943 magnesium sulfate Inorganic materials 0.000 description 1
- 235000019341 magnesium sulphate Nutrition 0.000 description 1
- 239000002122 magnetic nanoparticle Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 125000000956 methoxy group Chemical group [H]C([H])([H])O* 0.000 description 1
- 239000004292 methyl p-hydroxybenzoate Substances 0.000 description 1
- 229960002216 methylparaben Drugs 0.000 description 1
- YACKEPLHDIMKIO-UHFFFAOYSA-N methylphosphonic acid Chemical compound CP(O)(O)=O YACKEPLHDIMKIO-UHFFFAOYSA-N 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 239000000693 micelle Substances 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 239000003094 microcapsule Substances 0.000 description 1
- 230000000394 mitotic effect Effects 0.000 description 1
- 210000001616 monocyte Anatomy 0.000 description 1
- 150000002772 monosaccharides Chemical class 0.000 description 1
- 201000002273 mucopolysaccharidosis II Diseases 0.000 description 1
- 208000005340 mucopolysaccharidosis III Diseases 0.000 description 1
- 208000000690 mucopolysaccharidosis VI Diseases 0.000 description 1
- 208000010978 mucopolysaccharidosis type 4 Diseases 0.000 description 1
- 210000000663 muscle cell Anatomy 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 201000006938 muscular dystrophy Diseases 0.000 description 1
- 210000000066 myeloid cell Anatomy 0.000 description 1
- 230000009826 neoplastic cell growth Effects 0.000 description 1
- 208000004296 neuralgia Diseases 0.000 description 1
- 208000021722 neuropathic pain Diseases 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000000346 nonvolatile oil Substances 0.000 description 1
- 230000012223 nuclear import Effects 0.000 description 1
- 230000030648 nucleus localization Effects 0.000 description 1
- 230000009437 off-target effect Effects 0.000 description 1
- 229920001542 oligosaccharide Polymers 0.000 description 1
- 150000002482 oligosaccharides Chemical class 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 150000007524 organic acids Chemical class 0.000 description 1
- 235000005985 organic acids Nutrition 0.000 description 1
- 201000011278 ornithine carbamoyltransferase deficiency Diseases 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- FJKROLUGYXJWQN-UHFFFAOYSA-N papa-hydroxy-benzoic acid Natural products OC(=O)C1=CC=C(O)C=C1 FJKROLUGYXJWQN-UHFFFAOYSA-N 0.000 description 1
- 238000007911 parenteral administration Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000010647 peptide synthesis reaction Methods 0.000 description 1
- 230000000144 pharmacologic effect Effects 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N phenylalanine group Chemical group N[C@@H](CC1=CC=CC=C1)C(=O)O COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- ZJAOAACCNHFJAH-UHFFFAOYSA-N phosphonoformic acid Chemical compound OC(=O)P(O)(O)=O ZJAOAACCNHFJAH-UHFFFAOYSA-N 0.000 description 1
- 150000008298 phosphoramidates Chemical class 0.000 description 1
- 239000002504 physiological saline solution Substances 0.000 description 1
- 229960000502 poloxamer Drugs 0.000 description 1
- 229920001200 poly(ethylene-vinyl acetate) Polymers 0.000 description 1
- 229920000747 poly(lactic acid) Polymers 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 229920000728 polyester Polymers 0.000 description 1
- 239000008389 polyethoxylated castor oil Substances 0.000 description 1
- 229920000656 polylysine Polymers 0.000 description 1
- 229920005862 polyol Polymers 0.000 description 1
- 150000003077 polyols Chemical class 0.000 description 1
- 108010000222 polyserine Proteins 0.000 description 1
- 229950008882 polysorbate Drugs 0.000 description 1
- 229940068977 polysorbate 20 Drugs 0.000 description 1
- 229940068965 polysorbates Drugs 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 210000004986 primary T-cell Anatomy 0.000 description 1
- 201000006366 primary open angle glaucoma Diseases 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 235000010232 propyl p-hydroxybenzoate Nutrition 0.000 description 1
- 239000004405 propyl p-hydroxybenzoate Substances 0.000 description 1
- 229960003415 propylparaben Drugs 0.000 description 1
- 229940048914 protamine Drugs 0.000 description 1
- 235000004252 protein component Nutrition 0.000 description 1
- 230000017854 proteolysis Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000002213 purine nucleotide Substances 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 108700015048 receptor decoy activity proteins Proteins 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 108010054624 red fluorescent protein Proteins 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 210000003289 regulatory T cell Anatomy 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 229960004889 salicylic acid Drugs 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 201000005572 sensory peripheral neuropathy Diseases 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 208000007056 sickle cell anemia Diseases 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 229940126586 small molecule drug Drugs 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 229910052708 sodium Inorganic materials 0.000 description 1
- 235000015424 sodium Nutrition 0.000 description 1
- 235000017557 sodium bicarbonate Nutrition 0.000 description 1
- 229910000030 sodium bicarbonate Inorganic materials 0.000 description 1
- 229910000029 sodium carbonate Inorganic materials 0.000 description 1
- 229940079827 sodium hydrogen sulfite Drugs 0.000 description 1
- 229940001482 sodium sulfite Drugs 0.000 description 1
- 235000010265 sodium sulphite Nutrition 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 235000010199 sorbic acid Nutrition 0.000 description 1
- 239000004334 sorbic acid Substances 0.000 description 1
- 229940075582 sorbic acid Drugs 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 229940063675 spermine Drugs 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000011146 sterile filtration Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 239000005720 sucrose Substances 0.000 description 1
- 150000005846 sugar alcohols Chemical class 0.000 description 1
- 150000008163 sugars Chemical class 0.000 description 1
- 239000004094 surface-active agent Substances 0.000 description 1
- 239000000375 suspending agent Substances 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000010809 targeting technique Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- RTKIYNMVFMVABJ-UHFFFAOYSA-L thimerosal Chemical compound [Na+].CC[Hg]SC1=CC=CC=C1C([O-])=O RTKIYNMVFMVABJ-UHFFFAOYSA-L 0.000 description 1
- 229940033663 thimerosal Drugs 0.000 description 1
- 210000001541 thymus gland Anatomy 0.000 description 1
- 108010078373 tisagenlecleucel Proteins 0.000 description 1
- 230000009258 tissue cross reactivity Effects 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
- 239000012443 tonicity enhancing agent Substances 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 108091006106 transcriptional activators Proteins 0.000 description 1
- 230000037426 transcriptional repression Effects 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 102000027257 transmembrane receptors Human genes 0.000 description 1
- 108091008578 transmembrane receptors Proteins 0.000 description 1
- ODLHGICHYURWBS-LKONHMLTSA-N trappsol cyclo Chemical compound CC(O)COC[C@H]([C@H]([C@@H]([C@H]1O)O)O[C@H]2O[C@@H]([C@@H](O[C@H]3O[C@H](COCC(C)O)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](COCC(C)O)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](COCC(C)O)[C@H]([C@@H]([C@H]3O)O)O[C@H]3O[C@H](COCC(C)O)[C@H]([C@@H]([C@H]3O)O)O3)[C@H](O)[C@H]2O)COCC(O)C)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O)[C@@H]3O[C@@H]1COCC(C)O ODLHGICHYURWBS-LKONHMLTSA-N 0.000 description 1
- GPRLSGONYQIRFK-MNYXATJNSA-N triton Chemical compound [3H+] GPRLSGONYQIRFK-MNYXATJNSA-N 0.000 description 1
- 229960000281 trometamol Drugs 0.000 description 1
- 230000010415 tropism Effects 0.000 description 1
- 201000011296 tyrosinemia Diseases 0.000 description 1
- 210000000623 ulna Anatomy 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 150000003722 vitamin derivatives Chemical class 0.000 description 1
- 239000008215 water for injection Substances 0.000 description 1
- 238000001262 western blot Methods 0.000 description 1
Definitions
- CRISPR-Cas systems of bacterial and archaeal adaptive immunity have been adapted for precise targeting of genomic DNA in eukaryotic cells.
- the CRISPR-Cas systems are easy to set up, scalable, and amenable to targeting multiple positions within the eukaryotic genome, thereby providing a major resource for new applications in genome engineering.
- Two distinct classes of CRISPR-Cas systems have been identified. Class 1 CRISPR- Cas systems utilize multi-protein effector complexes, whereas class 2 CRISPR-Cas systems utilize single-protein effectors (see, Makarova et al. (2017) CELL, 168: 328).
- type II and type V systems typically target DNA and type VI systems typically target RNA (id.).
- Naturally occurring type II effector complexes consist of Cas9, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA), but the crRNA and tracrRNA can be fused as a single guide RNA in an engineered system for simplicity (see, Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227).
- type V systems such as type V-A, type V-C, and type V-D systems, do not require tracrRNA and use crRNA alone as the guide for cleavage of target DNA (see, Zetsche et al. (2015) CELL, 163: 759; Makarova et al. (2017) CELL, 168: 328).
- the CRISPR-Cas systems have been engineered for various purposes, such as genomic DNA cleavage, base editing, epigenome editing, and genomic imaging (see, e.g., Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227 and Rees et al. (2016) NAT. REV. GENET., 19: 770).
- Figure 1A shows a schematic representation showing the structure of an exemplary single guide Type V-A CRISPR system.
- Figure 1B is a schematic representation showing the structure of an exemplary dual guide Type V-A CRISPR system.
- Figures 2A-C show a series of schematic representation showing incorporation of a protecting group (e.g., a protective nucleotide sequence or a chemical modification) (Figure 2A), a donor template-recruiting sequence (Figure 2B), and an editing enhancer (Figure 2C) into a Type V-A CRISPR-Cas system.
- a protecting group e.g., a protective nucleotide sequence or a chemical modification
- Figure 2B e.g., a donor template-recruiting sequence
- an editing enhancer Figure 2C
- Figure 3 shows a schematic of a Type V-A nucleic acid guide nuclease comprising a dual guide nucleic acid.
- DETAILED DESCRIPTION Outline I. Engineered non-naturally-occurring dual guide CRISPR-cas systems A. Cas proteins B. Guide nucleic acids C. gNA modifications II. Composition and methods for targeting, editing, and/or modifying genomic DNA A. Ribonucleoprotein (RNP) delivery and “cas RNA” delivery B. CRISPR expression systems C. Donor templates D. Efficiency and specificity E. Multiplex F. Genomic safe harbors G. Guide nucleic acids III. Pharmaceutical compositions IV. Therapeutic uses A. Gene therapies V. Kits VI. Embodiments VI. Equivalents I.
- a CRISPR-Cas system generally comprises a Cas protein and one or more guide nucleic acids (gNAs).
- the Cas protein can be directed to a specific location in a double-stranded DNA target by recognizing a protospacer adjacent motif (PAM) in the non-target strand of the DNA, and the one or more guide nucleic acids can be directed to a specific location by hybridizing with a target nucleotide sequence, also referred to herein as a target sequence, in the target strand of the target polynucleotide.
- PAM protospacer adjacent motif
- a guide nucleic acid can be designed to comprise a nucleotide sequence called a spacer sequence that is at least partially complementary to and can hybridize with a target nucleotide sequence, where target nucleotide sequence is located adjacent to a PAM in an orientation operable with the Cas protein. It has been observed that not all CRISPR-Cas systems designed by these criteria are equally effective.
- the larger polynucleotide in which a target nucleotide sequence is located may be referred to as a target polynucleotide; e.g., a chromosome or other genomic DNA, or portion thereof, or any other suitable polynucleotide within which a target nucleotide sequence is located.
- the target polynucleotide in double stranded DNA comprises two strands.
- the strand of the DNA duplex to which the spacer sequence is complementary herein is called the “target strand,” while the strand to which the spacer sequence shares sequence identity herein is called the “non-target strand.”
- Class 1 CRISPR- Cas systems utilize multi-protein effector complexes
- class 2 CRISPR-Cas systems utilize single-protein effectors
- type II and type V systems typically target DNA and type VI systems typically target RNA (id.).
- Naturally occurring type II effector complexes include Cas9, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA), but the crRNA and tracrRNA can be fused as a single guide RNA in an engineered system for simplicity (see, Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227).
- Naturally occurring type V systems such as type V-A, type V-C, and type V-D systems, do not require tracrRNA and use crRNA alone as the guide for cleavage of target DNA (see, Zetsche et al. (2015) CELL, 163: 759; Makarova et al. (2017) CELL, 168: 328.
- Naturally occurring type II CRISPR-Cas systems e.g., CRISPR-Cas9 systems
- CRISPR-Cas9 systems generally comprise two guide nucleic acids, called crRNA and tracrRNA, which form a complex by nucleotide hybridization.
- Single guide nucleic acids capable of activating type II Cas nucleases have been developed, for example, by linking the crRNA and the tracrRNA (see, e.g., U.S. Patent Nos.10,266,850 and 8,906,616).
- Naturally occurring type II Cas proteins comprise a RuvC-like nuclease domain and an HNH endonuclease domain, and recognize a 3’ G-rich PAM located immediately downstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
- the CRISPR-Cas systems cleave a double-stranded DNA to generate a blunt end.
- the cleavage site is generally 3-4 nucleotides upstream from the PAM on the non-target strand.
- Naturally occurring Type V-A, Type V-C, and Type V-D CRISPR-Cas systems lack a tracrRNA and rely on a single crRNA to guide the CRISPR-Cas complex to the target polynucleotide.
- Dual guide nucleic acids capable of activating type V-A, type V-C, or type V-D Cas nucleases have been developed, for example, by splitting the single crRNA into a targeter nucleic acid and a modulator nucleic acid (see, e.g., International (PCT) Application Publication No. WO 2021/067788).
- Naturally occurring type V-A Cas proteins comprise a RuvC-like nuclease domain but lack an HNH endonuclease domain, and recognize a 5’ T-rich PAM located immediately upstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
- These CRISPR-Cas systems cleave a double-stranded DNA to generate a staggered double- stranded break rather than a blunt end.
- the cleavage site is distant from the PAM site (e.g., separated by at least 10, 11, 12, 13, 14, or 15 nucleotides downstream from the PAM on the non- target strand and/or separated by at least 15, 16, 17, 18, or 19 nucleotides upstream from the sequence complementary to PAM on the target strand).
- Elements in an exemplary single guide CRISPR Cas system e.g., a type V-A CRISPR-Cas system, are shown in Figure 1A.
- the single gNA can also be called a “crRNA” or “single gRNA” where it is present in the form of an RNA.
- an optional 5’ sequence e.g., a tail, a modulator stem sequence, a loop, a targeter stem sequence complementary to the modulator stem sequence, and a spacer sequence that is at least partially complementary to and can hybridize with a target sequence in the target strand of the target polynucleotide.
- an optional 5’ sequence e.g., a tail, a modulator stem sequence, a loop, a targeter stem sequence complementary to the modulator stem sequence, and a spacer sequence that is at least partially complementary to and can hybridize with a target sequence in the target strand of the target polynucleotide.
- the sequence including the 5’ tail and the modulator stem sequence can also be called a “modulator sequence” herein.
- a fragment of the single guide nucleic acid from the optional 5’ tail to the targeter stem sequence also called a “scaffold sequence” herein, bind the Cas protein.
- the first guide nucleic acid which can be called a “modulator nucleic acid” herein, comprises, from 5’ to 3’, an optional 5’ tail and a modulator stem sequence. Where a 5’ tail is present, the sequence including the 5’ tail and the modulator stem sequence can also called a “modulator sequence” herein.
- the second guide nucleic acid which can be called “targeter nucleic acid” herein, comprises, from 5’ to 3’, a targeter stem sequence complementary to the modulator stem sequence and a spacer sequence that is at least partially complementary to and can hybridize with the target sequence in the target strand of the target polynucleotide.
- the duplex between the modulator stem sequence and the targeter stem sequence, plus the optional 5’ tail, constitute a structure that binds the Cas protein.
- the PAM in the non-target strand of the target DNA binds the Cas protein.
- targeter nucleic acid and the modulator nucleic acid while not in the same nucleic acids, i.e., not linked end-to-end through a traditional internucleotide bond, can be covalently conjugated to each other through one or more chemical modifications introduced into these nucleic acids, thereby increasing the stability of the double- stranded complex and/or improving other characteristics of the system.
- modulator stem sequence can refer to a pair of nucleotide sequences in one or more guide nucleic acids that hybridize with each other.
- the targeter stem sequence When a targeter stem sequence and a modulator stem sequence are contained in a single guide nucleic acid, the targeter stem sequence is proximal to a spacer sequence designed to hybridize with a target nucleotide sequence, and the modulator stem sequence is proximal to the targeter stem sequence.
- the targeter stem sequence and a modulator stem sequence are in separate nucleic acids, the targeter stem sequence is in the same nucleic acid as a spacer sequence designed to hybridize with a target nucleotide sequence.
- the duplex formed between the targeter stem sequence and the modulator stem sequence corresponds to the duplex formed between the crRNA and the tracrRNA.
- the duplex formed between the targeter stem sequence and the modulator stem sequence corresponds to the stem portion of a stem-loop structure in the scaffold sequence of the crRNA. It is understood that 100% complementarity is not required between the targeter stem sequence and the modulator stem sequence.
- Figure 3 shows a Type V-A nucleic acid guided nuclease (301) complexed with a gual gNA comprising a modulator nucleic acid (306) and a targeter nucleic acid (307), wherein the modulator nucleic acid and targeter nucleic acid are hybridized through a stem.
- the targeter nucleic acid further comprises a spacer sequence (305) at least partially complementary to a target nucleotide sequence (304), i.e., a protospacer, in a target polynucleotide (302) adjacent to a suitable PAM (303).
- a target nucleotide sequence i.e., a protospacer
- the nucleic acid-guided nuclease complex can generate one or more strand breaks (308) in the target polynucleotide at or near the target nucleotide sequence.
- a guide nucleic acid either as a single guide nucleic acid alone (targeter and modulator nucleic acids are part of a single polynucleotide) or as a dual gNA comprising separate targeter nucleic acid used in combination with a cognate modulator nucleic acid, is capable of binding a CRISPR Associated (Cas) protein, e.g., a Cas nuclease.
- Cas CRISPR Associated
- the guide nucleic acid is capable of activating a Cas nuclease.
- a gNA capable of activating a particular Cas nuclease is said to be “compatible” with the Cas nuclease; a Cas nuclease capable of being activated by a particular gNA is said to be “compatible” with the gNA.
- CRISPR-Associated protein can refer to a naturally occurring Cas protein or an engineered Cas protein.
- Non-limiting examples of Cas protein engineering include but are not limited to mutations and modifications of the Cas protein that alter the activity of the Cas, alter the PAM specificity, broaden the range of recognized PAMs, and/or reduce the ability to modify one or more off-target loci as compared to a corresponding unmodified Cas.
- the altered activity of engineered Cas comprises altered ability (e.g., specificity or kinetics) to bind a naturally occurring gNA, e.g., gRNA or engineered gNA, e.g., gRNA, altered ability (e.g., specificity or kinetics) to bind a target nucleotide sequence, altered processivity of nucleic acid scanning, and/or altered effector (e.g., nuclease) activity.
- a Cas protein having nuclease activity can be referred to as a “CRISPR-Associated nuclease” or “Cas nuclease,” or simply “nuclease,” as used interchangeably herein.
- the Cas protein is a type V-A, type V-C, or type V-D Cas protein. In certain embodiments, the Cas protein is a type V-A Cas protein. In other embodiments, the Cas protein is a type II Cas protein, e.g., a Cas9 protein.
- a type V-A Cas nucleases comprises Cpf1. Cpf1 proteins are known in the art and are described, e.g., in U.S. Patent Nos.9,790,490 and 10,113,179. Cpf1 orthologs can be found in various bacterial and archaeal genomes.
- the Cpf1 protein is derived from Francisella novicida U112 (Fn), Acidaminococcus sp. BV3L6 (As), Lachnospiraceae bacterium ND2006 (Lb), Lachnospiraceae bacterium MA2020 (Lb2), Candidatus Methanoplasma termitum (CMt), Moraxella bovoculi 237 (Mb), Porphyromonas crevioricanis (Pc), Prevotella disiens (Pd), Francisella tularensis 1, Francisella tularensis subsp.
- a type V-A Cas nuclease comprises AsCpf1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises LbCpf1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises FnCpf1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises Prevotella bryantii Cpf1 (PbCpf1) or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises Proteocatella sphenisci Cpf1 (PsCpf1) or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises Anaerovibrio sp.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises Moraxella caprae Cpf1 (McCpf1) or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No.
- a type V-A Cas nuclease comprises Lachnospiraceae bacterium COE1 Cpf1 (Lb3Cpf1) or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises Eubacterium coprostanoligenes Cpf1 (EcCpf1) or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease is not Cpf1. In certain embodiments, a type V-A Cas nuclease is not AsCpf1.
- a type V-A Cas nuclease comprises MAD1, MAD2, MAD3, MAD4, MAD5, MAD6, MAD7, MAD8, MAD9, MAD10, MAD11, MAD12, MAD13, MAD14, MAD15, MAD16, MAD17, MAD18, MAD19, or MAD20, or variants thereof.
- MAD1-MAD20 are known in the art and are described in U.S. Patent No.9,982,279.
- a type V-A Cas nuclease comprises MAD7 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 37. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 37.
- MAD7 (SEQ ID NO: 37) MNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGF ISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSAKLISD ILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEI FFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVN SFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVER LRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVK N
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 38. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 38.
- MAD2 (SEQ ID NO: 38) MSSLTKFTNKYSKQLTIKNELIPVGKTLENIKENGLIDGDEQLNENYQKAKIIVDDFLRDFINKA LNNTQIGNWRELADALNKEDEDNIEKLQDKIRGIIVSKFETFDLFSSYSIKKDEKIIDDDNDVEE EELDLGKKTSSFKYIFKKNLFKLVLPSYLKTTNQDKLKIISSFDNFSTYFRGFFENRKNIFTKKP ISTSIAYRIVHDNFPKFLDNIRCFNVWQTECPQLIVKADNYLKSKNVIAKDKSLANYFTVGAYDY FLSQNGIDFYNNIIGGLPAFAGHEKIQGLNEFINQECQKDSELKSKLKNRHAFKMAVLFKQILSD REKSFVIDEFESDAQVIDAVKNFYAEQCKDNNVIFNLLNLIKNIAFLSDDELDGIFIEGKYLSSV SQKLYSDWSKLRNDIEDSANSKQ
- Csm1 proteins are known in the art and are described in U.S. Patent No.9,896,696. Csm1 orthologs can be found in various bacterial and archaeal genomes.
- a Csm1 protein is derived from Smithella sp. SCADC (Sm), Sulfuricurvum sp. (Ss), or Microgenomates (Roizmanbacteria) bacterium (Mb).
- a type V-A Cas nuclease comprises SmCsm1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 12 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 12 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises SsCsm1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 13 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 13 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas nuclease comprises MbCsm1 or a variant thereof.
- a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 14 of International (PCT) Application Publication No. WO 2021/158918.
- a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 14 of International (PCT) Application Publication No. WO 2021/158918.
- the type V-A Cas nuclease comprises an ART nuclease or a variant thereof.
- the Type V-A nuclease comprises an ART1, ART2, ART3, ART4, ART5, ART6, ART7, ART8, ART9, ART10, ART11, ART12, ART13, ART14, ART15, ART16, ART17, ART18, ART19, ART20, ART21, ART22, ART23, ART24, ART25, ART26, ART27, ART28, ART28, ART30, ART31, ART32, ART33, ART34, ART35, or ART11* (i.e., ART11_L679F, i.e., ART11 wherein leucine (L) at amino acid position 679 is replaced with phenylalanine (F
- the type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence designated for the individual ART nuclease as shown in Table 1.
- nucleic acid-guided nuclease comprising a nucleic acid-guided nuclease polypeptide having at least 85% identity to an amino acid sequence represented by SEQ ID NOs: 1-36 or a nucleic acid encoding a nucleic acid-guided nuclease polypeptide comprising at least 85% identity with the polynucleotide represented by SEQ ID NOs: 1-36.
- nucleic acid-guided nuclease comprising a polypeptide having at least 90% identity to the amino acid sequence represented by SEQ ID NOs: 1-36, wherein the polypeptide does not contain a peptide motif of YLFQIYNKDF (SEQ ID NO: 39).
- nucleic acid-guided nuclease comprising a nucleic acid encoding a polypeptide having at least 90% identity to nucleic acids represented by SEQ ID NOs: 808-845 wherein an encoded polypeptide does not contain a peptide motif of YLFQIYNKDF (SEQ ID NO: 39).
- nucleic acid-guided nuclease wherein the polypeptide comprises at least 90% identity with the amino acid sequence represented by SEQ ID NOs: 1-9. In certain embodiments, provided is a nucleic acid-guided nuclease, wherein the polypeptide comprises a polypeptide comprising at least 90% identity with the amino acid sequence represented by SEQ ID NO: 2, 11, or 36.
- a Cas nuclease comprises ABW1 (SEQ ID NO: 3), ABW2 (SEQ ID NO: 16), ABW3 (SEQ ID NO: 29), ABW4 (SEQ ID NO: 42), ABW5 (SEQ ID NO: 55), ABW6 (SEQ ID NO: 68), ABW7 (SEQ ID NO: 81), ABW8 (SEQ ID NO: 94), or ABW9 (SEQ ID NO: 107) (all SEQ ID NOs for ABW1-9 and variants thereof from International (PCT) Application Publication No.
- WO 2021/108324 or variants thereof, such as any one of variants 1-10 of ABW1 (SEQ ID NOs: 4-13, respectively), any one of variants 1-10 of ABW2 (SEQ ID NOs: 17-26, respectively), any one of variants 1-10 of ABW3 (SEQ ID NOs: 30-39, respectively), any one of variants 1-10 of ABW4 (SEQ ID NOs: 43-52, respectively), any one of variants 1-10 of ABW5 (SEQ ID NOs: 56-65, respectively), any one of variants 1-10 of ABW6 (SEQ ID NOs: 69-78, respectively), any one of variants 1-10 of ABW7 (SEQ ID NOs: 82-91, respectively), any one of variants 1-10 of ABW8 (SEQ ID NOs: 95-104, respectively), any one of variants 1-10 of ABW9 (SEQ ID NOs: 108-117, respectively).
- More type V-A Cas nucleases and their corresponding naturally occurring CRISPR- Cas systems can be identified by computational and experimental methods known in the art, e.g., as described in U.S. Patent No.9,790,490 and Shmakov et al. (2015) MOL. CELL, 60: 385.
- Exemplary computational methods include analysis of putative Cas proteins by homology modeling, structural BLAST, PSI-BLAST, or HHPred, and analysis of putative CRISPR loci by identification of CRISPR arrays.
- the Cas protein is a Cas nuclease that directs cleavage of one or both strands at the target locus, such as the target strand (i.e., the strand having the target nucleotide sequence that is at least partially complementary to and can hybridize with a single guide nucleic acid or dual guide nucleic acids) and/or the non-target strand.
- the Cas nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more nucleotides from the first or last nucleotide of the target nucleotide sequence or its complementary sequence.
- the cleavage is staggered, i.e. generating sticky ends.
- the cleavage generates a staggered cut with a 5' overhang.
- the cleavage generates a staggered cut with a 5' overhang of 1 to 5 nucleotides, e.g., of 4 or 5 nucleotides.
- a composition provided herein comprises a Cas nuclease that a compatible guide nucleic acid (gNA), e.g., a gRNA, is capable of activating.
- a composition provided herein further comprises a Cas protein that is related to the Cas nuclease that a compatible guide nucleic acid (gNA), e.g., a gRNA, is capable of activating.
- a Cas protein comprises an amino acid sequence at least 80% (e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) identical to the Cas nuclease amino acid sequence.
- a Cas protein comprises a nuclease-inactive mutant of the Cas nuclease.
- a Cas protein further comprises an effector domain.
- a Cas protein lacks substantially all DNA cleavage activity.
- Such a Cas protein can be generated, e.g., by introducing one or more mutations to an active Cas nuclease (e.g., a naturally occurring Cas nuclease).
- a mutated Cas protein is considered to lack substantially all DNA cleavage activity when the DNA cleavage activity of the protein has no more than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or less of the DNA cleavage activity of the corresponding non-mutated form, for example, nil or negligible as compared with the non- mutated form.
- a Cas protein may comprise one or more mutations (e.g., a mutation in the RuvC domain of a type V-A Cas protein) and be used as a generic DNA binding protein with or without fusion to an effector domain.
- Exemplary mutations include D908A, E993A, and D1263A with reference to the amino acid positions in AsCpf1; D832A, E925A, and D1180A with reference to the amino acid positions in LbCpf1; and D917A, E1006A, and D1255A with reference to the amino acid position numbering of the FnCpf1. More mutations can be designed and generated according to the crystal structure described in Yamano et al. (2016) CELL, 165: 949.
- a Cas protein rather than losing nuclease activity to cleave all DNA, may lose the ability to cleave only the target strand or only the non-target strand of a double-stranded DNA, thereby being functional as a nickase (see, Gao et al. (2016) CELL RES., 26: 901). Accordingly, in certain embodiments, a Cas nuclease is a Cas nickase. In certain embodiments, a Cas nuclease has the activity to cleave the non-target strand but lacks substantially the activity to cleave the target strand, e.g., by a mutation in the Nuc domain.
- a Cas nuclease has the cleavage activity to cleave the target strand but lacks substantially the activity to cleave the non-target strand.
- a Cas nuclease has the activity to cleave a double-stranded DNA and result in a double-strand break.
- Cas proteins that lack substantially all DNA cleavage activity or have the ability to cleave only one strand may also be identified from naturally occurring systems. For example, certain naturally occurring CRISPR-Cas systems may retain the ability to bind the target nucleotide sequence but lose entire or partial DNA cleavage activity in eukaryotic (e.g., mammalian or human) cells.
- Such type V-A proteins are disclosed, for example, in Kim et al. (2017) ACS SYNTH. BIOL.6(7): 1273-82 and Zhang et al. (2017) CELL DISCOV.3:17018.
- the activity of a Cas protein e.g., Cas nuclease
- altered activity of an engineered Cas protein comprises increased targeting efficiency and/or decreased off-target binding.
- altered activity comprises modified binding, e.g., increased binding to the target locus (e.g., the target strand or the non-target strand) and/or decreased binding to off-target loci.
- altered activity comprises altered charge in a region of the protein that associates with a single guide nucleic acid or dual guide nucleic acids.
- altered activity of an engineered Cas protein comprises altered charge in a region of the protein that associates with the target strand and/or the non- target strand. In certain embodiments, altered activity of an engineered Cas protein comprises altered charge in a region of the protein that associates with an off-target locus.
- the altered charge can include decreased positive charge, decreased negative charge, increased positive charge, or increased negative charge. For example, decreased negative charge and increased positive charge may generally strengthen binding to the nucleic acid(s) whereas decreased positive charge and increased negative charge may weaken binding to the nucleic acid(s).
- altered activity comprises increased or decreased steric hindrance between the protein and a single guide nucleic acid or dual guide nucleic acids.
- altered activity comprises increased or decreased steric hindrance between the protein and the target strand and/or the non-target strand. In certain embodiments, altered activity comprises increased or decreased steric hindrance between the protein and an off-target locus.
- a modification or mutation comprises one or more substitutions of Lys, His, Arg, Glu, Asp, Ser, Gly, and/or Thr. In certain embodiments, a modification or mutation comprises one or more substitutions with Gly, Ala, Ile, Glu, and/or Asp. In certain embodiments, modification or mutation comprises one or more amino acid substitutions in the groove between the WED and RuvC domain of the Cas protein (e.g., a type V-A Cas protein).
- altered activity of an engineered Cas protein comprises increased nuclease activity to cleave the target locus. In certain embodiments, altered activity of an engineered Cas protein comprises decreased nuclease activity to cleave an off-target locus. In certain embodiments, altered activity of an engineered Cas protein comprises altered helicase kinetics. In certain embodiments, an engineered Cas protein comprises a modification that alters formation of the CRISPR complex.
- a protospacer adjacent motif (PAM) or PAM-like motif directs binding of a Cas protein complex to a target locus. Many Cas proteins have PAM specificity. The precise sequence and length requirements for the PAM differ depending on the Cas protein used.
- PAM sequences are typically 2-5 base pairs in length and are adjacent to (but located on a different strand of target DNA from) the target nucleotide sequence. PAM sequences can be identified using any suitable method, such as testing cleavage, targeting, or modification of oligonucleotides having the target nucleotide sequence and different PAM sequences. [0054] Exemplary PAM sequences are provided in Tables 2 and 3.
- a Cas protein comprises MAD7 and the PAM is TTTN, wherein N is A, C, G, or T.
- a Cas protein comprises MAD7 and the PAM is CTTN, wherein N is A, C, G, or T.
- a Cas protein comprises AsCpf1 and the PAM is TTTN, wherein N is A, C, G, or T.
- a Cas protein comprises FnCpf1 and the PAM is 5' TTN, wherein N is A, C, G, or T.
- PAM sequences for certain other type V-A Cas proteins are disclosed in Zetsche et al. (2015) CELL, 163: 759 and U.S. Patent No.9,982,279. Further, engineering of the PAM Interacting (PI) domain of a Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and/or increase the versatility of an engineered, non- naturally occurring system.
- an engineered Cas protein comprises a modification that alters the Cas protein specificity in concert with modification to targeting range.
- Cas mutants can be designed to have increased target specificity as well as accommodating modifications in PAM recognition, for example by choosing mutations that alter PAM specificity (e.g., in the PI domain) and combining those mutations with groove mutations that increase (or if desired, decrease) specificity for the on-target locus versus off-target loci.
- an engineered Cas protein comprises one or more nuclear localization signal (NLS) motifs.
- an engineered Cas protein comprises at least 2 (e.g., at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motifs.
- Non-limiting examples of NLS motifs include: the NLS of SV40 large T-antigen, having the amino acid sequence of PKKKRKV (SEQ ID NO: 40); the NLS from nucleoplasmin, e.g., the nucleoplasmin bipartite NLS having the amino acid sequence of KRPAATKKAGQAKKKK (SEQ ID NO: 41); the c-myc NLS, having the amino acid sequence of PAAKRVKLD (SEQ ID NO: 42) or RQRRNELKRSP (SEQ ID NO: 43); the hRNPA1 M9 NLS, having the amino acid sequence of NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 44); the importin- IBB domain NLS, having the amino acid sequence of RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 45); the myoma T protein NLS, having the amino acid sequence of
- the one or more NLS motifs are of sufficient strength to drive accumulation of the Cas protein in a detectable amount in the nucleus of a eukaryotic cell.
- the strength of nuclear localization activity may derive from the number of NLS motif(s) in the Cas protein, the particular NLS motif(s) used, the position(s) of the NLS motif(s), or a combination of these and/or other factors.
- an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the N-terminus (e.g., within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N-terminus).
- an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the C- terminus (e.g., within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the C-terminus).
- an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the C-terminus and at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the N-terminus.
- the engineered Cas protein comprises one, two, or three NLS motifs at or near the C-terminus.
- the engineered Cas protein comprises one NLS motif at or near the N-terminus and one, two, or three NLS motifs at or near the C-terminus. In certain embodiments, the engineered Cas protein comprises a nucleoplasmin NLS at or near the C-terminus.
- Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to a nucleic acid-targeting protein, such that location within a cell may be visualized. Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting the protein, such as immunohistochemistry, Western blot, or enzyme activity assay.
- Accumulation in the nucleus may also be determined indirectly, such as by an assay that detects the effect of the nuclear import of a Cas protein complex (e.g., assay for DNA cleavage or mutation at the target locus, or assay for altered gene expression activity) as compared to a control not exposed to the Cas protein or exposed to a Cas protein lacking one or more of the NLS motifs.
- a Cas protein may comprise a chimeric Cas protein, e.g., a Cas protein having enhanced function by being a chimera. Chimeric Cas proteins may be new Cas proteins containing fragments from more than one naturally occurring Cas protein or variants thereof.
- fragments of multiple type V-A Cas homologs may be fused to form a chimeric Cas protein.
- a chimeric Cas protein comprises fragments of Cpf1 orthologs from multiple species and/or strains.
- a Cas protein comprises one or more effector domains. The one or more effector domains may be located at or near the N-terminus of the Cas protein and/or at or near the C-terminus of the Cas protein.
- an effector domain comprised in the Cas protein is a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or an SID domain), an exogenous nuclease domain (e.g., FokI), a deaminase domain (e.g., cytidine deaminase or adenine deaminase), or a reverse transcriptase domain (e.g., a high fidelity reverse transcriptase domain).
- a transcriptional activation domain e.g., VP64
- a transcriptional repression domain e.g., a KRAB domain or an SID domain
- an exogenous nuclease domain e.g., FokI
- a deaminase domain e.g., cytidine deaminase or adenine deaminase
- a Cas protein comprises one or more protein domains that enhance homology-directed repair (HDR) and/or inhibit non-homologous end joining (NHEJ).
- HDR homology-directed repair
- NHEJ non-homologous end joining
- a Cas protein comprises a dominant negative version of p53-binding protein 1 (53BP1), for example, a fragment of 53BP1 comprising a minimum focus forming region (e.g., amino acids 1231-1644 of human 53BP1).
- 53BP1 p53-binding protein 1
- a Cas protein comprises a motif that is targeted by APC-Cdh1, such as amino acids 1-110 of human Geminin, thereby resulting in degradation of the fusion protein during the HDR non-permissive G1 phase of the cell cycle.
- a Cas protein comprises an inducible or controllable domain.
- inducers or controllers include light, hormones, and small molecule drugs.
- a Cas protein comprises a light inducible or controllable domain.
- a Cas protein comprises a chemically inducible or controllable domain.
- a Cas protein comprises a tag protein or peptide for ease of tracking and/or purification.
- tag proteins and peptides include fluorescent proteins (e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato), HIS tags (e.g., 6 ⁇ His tag, or gly-6xHis; 8xHis, or gly-8xHis), hemagglutinin (HA) tag, FLAG tag, 3xFLAG tag, and Myc tag.
- GFP green fluorescent protein
- RFP red fluorescent protein
- CFP CFP
- mCherry mCherry
- tdTomato e.g., HIS tags
- HIS tags e.g., 6 ⁇ His tag, or gly-6xHis; 8xHis, or gly-8xHis
- HA hemagglutinin
- FLAG tag FLAG tag
- 3xFLAG tag 3xFLAG tag
- a Cas protein is covalently conjugated to the non-protein moiety.
- CRISPR-Associated protein Cas protein
- Cas CRISPR-Associated nuclease
- Cas nuclease CRISPR-Associated nuclease
- a guide nucleic acid can be a single gNA (sgNA, e.g., sgRNA), in which the gNA is a single polynucleotide, or a dual gNA (e.g., dual gRNA), in which the gNA comprises two separate polynucleotides (these can in some cases be covalently linked, but not via a conventional internucleotide linkage).
- a single guide nucleic acid is capable of activating a Cas nuclease alone (e.g., in the absence of a tracrRNA).
- a gNA comprises a modulator nucleic acid and a targeter nucleic acid.
- the modulator and targeter nucleic acids are part of a single polynucleotide.
- the modulator and targeter nucleic acids are separate, e.g., not joined by a conventional nucleotide linkage, such as not joined at all.
- the targeter nucleic acid comprises a spacer sequence and a targeter stem sequence.
- the modulator nucleic acid comprises a modulator stem sequence and, generally, further nucleotides, such as nucleotides comprising a 5’ tail.
- the modulator stem sequence and targeter stem sequence can each comprise any suitable number of nucleotides and are of sufficient complementarity that they can hybridize. In a single gNA there may be additional NTs between the targeter stem sequence and the modulator stem sequence; these can, in certain cases, form secondary structure, such as a loop.
- the guide nucleic acid comprises a targeter nucleic acid that, in combination with a modulator nucleic acid, is capable of binding a Cas protein. In certain embodiments, the guide nucleic acid comprises a targeter nucleic acid that, in combination with a modulator nucleic acid, is capable of activating a Cas nuclease.
- the system further comprises the Cas protein that the targeter nucleic acid and the modulator nucleic acid are capable of binding or the Cas nuclease that the targeter nucleic acid and the modulator nucleic acid are capable of activating.
- the single or dual guide nucleic acids need to be the compatible with a Cas protein (e.g., Cas nuclease) to provide an operative CRISPR system.
- the targeter stem sequence and the modulator stem sequence can be derived from a naturally occurring crRNA capable of activating a Cas nuclease in the absence of a tracrRNA.
- the targeter stem sequence and the modulator stem sequence can be derived from a naturally occurring set of crRNA and tracrRNA, respectively, that are capable of activating a Cas nuclease.
- the nucleotide sequences of the targeter stem sequence and the modulator stem sequence are identical to the corresponding stem sequences of a stem-loop structure in such naturally occurring crRNA.
- Guide nucleic acid sequences that are operative with a type II or type V Cas protein are known in the art and are disclosed, for example, in U.S. Patent Nos.9,790,490, 9,896,696, 10,113,179, and 10,266,850, and U.S. Patent Application Publication No.2014/0242664. It is understood that these sequences are merely illustrative, and other guide nucleic acid sequences may also be used with these Cas proteins. TABLE 2: Type V-A Cas Protein and Corresponding Single Guide Nucleic Acid Sequences
- a guide nucleic acid in the context of a type V-A CRISPR- Cas system, comprises a targeter stem sequence listed in Table 3.
- a guide nucleic acid is a single guide nucleic acid that comprises, from 5’ to 3’, a modulator stem sequence, a loop sequence, a targeter stem sequence, and a spacer sequence.
- the targeter stem sequence in the single guide nucleic acid is listed in Table 2 as a bold-underlined portion of scaffold sequence, and the modulator stem sequence is complementary (e.g., 100% complementary) to the targeter stem sequence.
- the single guide nucleic acid comprises, from 5’ to 3’, a modulator sequence listed in Table 2 as an underlined portion of a scaffold sequence, a loop sequence, a targeter stem sequence a bold-underlined portion of the same scaffold sequence, and a spacer sequence.
- an engineered, non-naturally occurring system comprises a single guide nucleic acid comprising a scaffold sequence listed in Table 2.
- the system further comprises a Cas protein (e.g., Cas nuclease) comprising an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 2.
- the system further comprises a Cas protein (e.g., Cas nuclease) comprising the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 2.
- the system is useful for targeting, editing, or modifying a nucleic acid comprising a target nucleotide sequence close or adjacent to (e.g., immediately downstream of) a PAM listed in the same line of Table 2 when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
- a guide nucleic acid e.g, dual gNA, comprises a targeter guide nucleic acid that comprises, from 5’ to 3’, a targeter stem sequence and a spacer sequence.
- the targeter stem sequence in the targeter nucleic acid is listed in Table 3.
- an engineered, non-naturally occurring system comprises the targeter nucleic acid and a modulator stem sequence complementary (e.g., 100% complementary) to the targeter stem sequence.
- the modulator nucleic acid comprises a modulator sequence listed in the same line of Table 3.
- the system further comprises a Cas protein (e.g., Cas nuclease) comprising an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 3.
- the system further comprises a Cas protein (e.g., Cas nuclease) comprising the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 3.
- a Cas protein e.g., Cas nuclease
- the system is useful for targeting, editing, or modifying a nucleic acid comprising a target nucleotide sequence close or adjacent to (e.g., immediately downstream of) a PAM listed in the same line of Table 3 when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
- a single guide nucleic acid, the targeter nucleic acid, and/or the modulator nucleic acid can be synthesized chemically or produced in a biological process (e.g., catalyzed by an RNA polymerase in an in vitro reaction). Such reaction or process may limit the lengths of the single guide nucleic acid, targeter nucleic acid, and/or modulator nucleic acid.
- a single guide nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 25 nucleotides in length.
- a single guide nucleic acid is at least 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length.
- the single guide nucleic acid is 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 20-25, 25-100, 25-90, 25-80, 25-70, 25-60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length.
- a targeter nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 25 nucleotides in length. In certain embodiments, a targeter nucleic acid is at least 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length.
- the targeter nucleic acid is 20- 100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 20-25, 25-100, 25-90, 25-80, 25-70, 25- 60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40- 80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70- 100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length.
- a modulator nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 20 nucleotides in length. In certain embodiments, a modulator nucleic acid is at least 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length.
- the modulator nucleic acid is 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 15-100, 15-90, 15-80, 15-70, 15-60, 15- 50, 15-40, 15-30, 15-20, 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 25-100, 25- 90, 25-80, 25-70, 25-60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length.
- the length of the duplex formed within the single guide nuclei acid or formed between the targeter nucleic acid and the modulator nucleic acid, e.g. in a dual gNA, may be a factor in providing an operative CRISPR system.
- the targeter stem sequence and the modulator stem sequence each consist of 4-10 nucleotides that base pair with each other.
- the targeter stem sequence and the modulator stem sequence each consist of 4-9, 4-8, 4-7, 4-6, 4-5, 5-10, 5-9, 5-8, 5-7, or 5-6 nucleotides that base pair with each other.
- the targeter stem sequence and the modulator stem sequence each consist of 4, 5, 6, 7, 8, 9, or 10 nucleotides. It is understood that the composition of the nucleotides in each sequence affects the stability of the duplex, and a C-G base pair confers greater stability than an A-U base pair.
- 20%-80%, 20%-70%, 20%-60%, 20%-50%, 20%-40%, 20%-30%, 30%-80%, 30%-70%, 30%-60%, 30%- 50%, 30%-40%, 40%-80%, 40%-70%, 40%-60%, 40%-50%, 50%-80%, 50%-70%, 50%-60%, 60%-80%, 60%-70%, or 70%-80% of the base pairs are C-G base pairs.
- the targeter stem sequence and the modulator stem sequence each consist of 5 nucleotides. As such, the targeter stem sequence and the modulator stem sequence form a duplex of 5 base pairs. In certain embodiments, 0-4, 0-3, 0-2, 0-1, 1-5, 1-4, 1-3, 1-2, 2-5, 2-4, 2-3, 3-5, 3-4, or 4-5 out of the 5 base pairs are C-G base pairs. In certain embodiments, 0, 1, 2, 3, 4, or 5 out of the 5 base pairs are C-G base pairs. In certain embodiments, the targeter stem sequence consists of 5’-GUAGA-3’ and the modulator stem sequence consists of 5’-UCUAC-3’.
- the targeter stem sequence consists of 5’-GUGGG-3’ and the modulator stem sequence consists of 5’-CCCAC-3’.
- the 3’ end of the targeter stem sequence is linked by no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides to the 5’ end of the spacer sequence.
- the targeter stem sequence and the spacer sequence are adjacent to each other, directly linked by an internucleotide bond.
- the targeter stem sequence and the spacer sequence are linked by one nucleotide, e.g., a uridine.
- the targeter stem sequence and the spacer sequence are linked by two or more nucleotides. In certain embodiments, the targeter stem sequence and the spacer sequence are linked by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides. [0077] In certain embodiments, the targeter nucleic acid further comprises an additional nucleotide sequence 5’ to the targeter stem sequence. In certain embodiments, the additional nucleotide sequence comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50) nucleotides.
- the additional nucleotide sequence consists of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In certain embodiments, the additional nucleotide sequence consists of 2 nucleotides. In certain embodiments, the additional nucleotide sequence is reminiscent to the loop or a fragment thereof (e.g., one, two, three, or four nucleotides at the 3’ end of the loop) in a crRNA of a corresponding single guide CRISPR-Cas system. It is understood that an additional nucleotide sequence 5’ to the targeter stem sequence can be dispensable.
- the targeter nucleic acid does not comprise any additional nucleotide 5’ to the targeter stem sequence.
- the targeter nucleic acid or the single guide nucleic acid further comprises an additional nucleotide sequence containing one or more nucleotides at the 3’ end that does not hybridize with the target nucleotide sequence.
- the additional nucleotide sequence may protect the targeter nucleic acid from degradation by 3’-5’ exonuclease.
- the additional nucleotide sequence is no more than 100 nucleotides in length.
- the additional nucleotide sequence is no more than 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length. In certain embodiments, the additional nucleotide sequence is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides in length.
- the additional nucleotide sequence is 5-100, 5-50, 5-40, 5-30, 5-25, 5-20, 5-15, 5- 10, 10-100, 10-50, 10-40, 10-30, 10-25, 10-20, 10-15, 15-100, 15-50, 15-40, 15-30, 15-25, 15- 20, 20-100, 20-50, 20-40, 20-30, 20-25, 25-100, 25-50, 25-40, 25-30, 30-100, 30-50, 30-40, 40- 100, 40-50, or 50-100 nucleotides in length.
- the additional nucleotide sequence forms a hairpin with the spacer sequence.
- Such secondary structure may increase the specificity of guide nucleic acid or the engineered, non-naturally occurring system (see, Kocak et al. (2019) Nat. Biotech.37: 657- 66).
- the free energy change during the hairpin formation is greater than or equal to -20 kcal/mol, -15 kcal/mol, -14 kcal/mol, -13 kcal/mol, -12 kcal/mol, -11 kcal/mol, or -10 kcal/mol.
- the free energy change during the hairpin formation is greater than or equal to -5 kcal/mol, -6 kcal/mol, -7 kcal/mol, -8 kcal/mol, -9 kcal/mol, -10 kcal/mol, -11 kcal/mol, -12 kcal/mol, -13 kcal/mol, -14 kcal/mol, or -15 kcal/mol.
- the free energy change during the hairpin formation is in the range of -20 to -10 kcal/mol, -20 to -11 kcal/mol, -20 to -12 kcal/mol, -20 to -13 kcal/mol, -20 to -14 kcal/mol, -20 to -15 kcal/mol, -15 to -10 kcal/mol, -15 to -11 kcal/mol, -15 to -12 kcal/mol, -15 to -13 kcal/mol, -15 to -14 kcal/mol, -14 to -10 kcal/mol, -14 to -11 kcal/mol, -14 to -12 kcal/mol, -14 to -13 kcal/mol, -13 to -10 kcal/mol, -13 to -11 kcal/mol, -13 to -12 kcal/mol, -12 to -10 kcal/mol, -13 to -11 kcal/mol, -13 to -12 kcal/mol, -12 to -10 kcal/
- the targeter nucleic acid or the single guide nucleic acid does not comprise any nucleotide 3’ to the spacer sequence.
- the modulator nucleic acid further comprises an additional nucleotide sequence 3’ to the modulator stem sequence.
- the additional nucleotide sequence comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50) nucleotides.
- the additional nucleotide sequence consists of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In certain embodiments, the additional nucleotide sequence consists of 1 nucleotide (e.g., uridine). In certain embodiments, the additional nucleotide sequence consists of 2 nucleotides. In certain embodiments, the additional nucleotide sequence is reminiscent to the loop or a fragment thereof (e.g., one, two, three, or four nucleotides at the 5’ end of the loop) in a crRNA of a corresponding single guide CRISPR-Cas system.
- an additional nucleotide sequence 3’ to the modulator stem sequence can be dispensable. Accordingly, in certain embodiments, the modulator nucleic acid does not comprise any additional nucleotide 3’ to the modulator stem sequence. [0081] It is understood that the additional nucleotide sequence 5’ to the targeter stem sequence and the additional nucleotide sequence 3’ to the modulator stem sequence, if present, may interact with each other.
- nucleotide immediately 5’ to the targeter stem sequence and the nucleotide immediately 3’ to the modulator stem sequence do not form a Watson-Crick base pair (otherwise they would constitute part of the targeter stem sequence and part of the modulator stem sequence, respectively), other nucleotides in the additional nucleotide sequence 5’ to the targeter stem sequence and the additional nucleotide sequence 3’ to the modulator stem sequence may form one, two, three, or more base pairs (e.g., Watson-Crick base pairs). Such interaction may affect the stability of a complex comprising the targeter nucleic acid and the modulator nucleic acid.
- the stability of a complex comprising a targeter nucleic acid and a modulator nucleic acid can be assessed by the Gibbs free energy change ( G) during the formation of the complex, either calculated or actually measured.
- G Gibbs free energy change
- the G during the formation of the complex correlates generally with the G during the formation of a secondary structure within the corresponding single guide nucleic acid.
- RNAfold (rna.tbi.univie.ac.at/cgi- bin/RNAWebSuite/RNAfold.cgi) as disclosed in Gruber et al. (2008) Nucleic Acids Res., 36(Web Server issue): W70–W74. Unless indicated otherwise, the G values in the present disclosure are calculated by RNAfold for the formation of a secondary structure within a corresponding single guide nucleic acid.
- the G is lower than or equal to -1 kcal/mol, e.g., lower than or equal to -2 kcal/mol, lower than or equal to -3 kcal/mol, lower than or equal to -4 kcal/mol, lower than or equal to -5 kcal/mol, lower than or equal to -6 kcal/mol, lower than or equal to -7 kcal/mol, lower than or equal to -7.5 kcal/mol, or lower than or equal to -8 kcal/mol.
- the G is greater than or equal to -10 kcal/mol, e.g., greater than or equal to -9 kcal/mol, greater than or equal to -8.5 kcal/mol, or greater than or equal to -8 kcal/mol. In certain embodiments, the G is in the range of -10 to -4 kcal/mol.
- the G is in the range of -8 to -4 kcal/mol, -7 to -4 kcal/mol, -6 to -4 kcal/mol, -5 to -4 kcal/mol, -8 to -4.5 kcal/mol, -7 to -4.5 kcal/mol, -6 to -4.5 kcal/mol, or -5 to - 4.5 kcal/mol.
- the G is about -8 kcal/mol, -7 kcal/mol, -6 kcal/mol, -5 kcal/mol, -4.9 kcal/mol, -4.8 kcal/mol, -4.7 kcal/mol, -4.6 kcal/mol, -4.5 kcal/mol, -4.4 kcal/mol, -4.3 kcal/mol, -4.2 kcal/mol, -4.1 kcal/mol, or -4 kcal/mol.
- the G may be affected by a sequence in the targeter nucleic acid that is not within the targeter stem sequence, and/or a sequence in the modulator nucleic acid that is not within the modulator stem sequence.
- one or more base pairs may reduce the G, i.e., stabilize the nucleic acid complex.
- the nucleotide immediately 5’ to the targeter stem sequence comprises a uracil or is a uridine
- the nucleotide immediately 3’ to the modulator stem sequence comprises a uracil or is a uridine, thereby forming a nonconventional U-U base pair.
- the modulator nucleic acid or the single guide nucleic acid comprises a nucleotide sequence referred to herein as a “5’ tail” positioned 5’ to the modulator stem sequence.
- the 5’ tail is a nucleotide sequence positioned 5’ to the stem-loop structure of the crRNA.
- a 5’ tail in an engineered type V-A CRISPR-Cas system, whether single guide or dual guide, can be reminiscent to the 5’ tail in a corresponding naturally occurring type V-A CRISPR-Cas system.
- the 5’ tail may participate in the formation of the CRISPR-Cas complex.
- the 5’ tail forms a pseudoknot structure with the modulator stem sequence, which is recognized by the Cas protein (see, Yamano et al. (2016) Cell, 165: 949).
- the 5’ tail is at least 3 (e.g., at least 4 or at least 5) nucleotides in length.
- the 5’ tail is 3, 4, or 5 nucleotides in length.
- the nucleotide at the 3’ end of the 5’ tail comprises a uracil or is a uridine.
- the second nucleotide in the 5’ tail, the position counted from the 3’ end comprises a uracil or is a uridine.
- the third nucleotide in the 5’ tail, the position counted from the 3’ end comprises an adenine or is an adenosine.
- This third nucleotide may form a base pair (e.g., a Watson-Crick base pair) with a nucleotide 5’ to the modulator stem sequence.
- the modulator nucleic acid comprises a uridine or a uracil-containing nucleotide 5’ to the modulator stem sequence.
- the 5’ tail comprises the nucleotide sequence of 5’- AUU-3’. In certain embodiments, the 5’ tail comprises the nucleotide sequence of 5’-AAUU-3’. In certain embodiments, the 5’ tail comprises the nucleotide sequence of 5’-UAAUU-3’. In certain embodiments, the 5’ tail is positioned immediately 5’ to the modulator stem sequence. [0086] In certain embodiments, the single guide nucleic acid, the targeter nucleic acid, and/or the modulator nucleic acid are designed to reduce the degree of secondary structure other than the hybridization between the targeter stem sequence and the modulator stem sequence.
- no more than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the single guide nucleic acid other than the targeter stem sequence and the modulator stem sequence participate in self-complementary base pairing when optimally folded.
- no more than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the targeter nucleic acid and/or the modulator nucleic acid participate in self-complementary base pairing when optimally folded.
- Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy.
- mFold as described by Zuker and Stiegler (Nucleic Acids Res.9 (1981), 133-148).
- Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62).
- the targeter nucleic acid is directed to a specific target nucleotide sequence, and a donor template can be designed to modify the target nucleotide sequence or a sequence nearby.
- the single guide nucleic acid or the modulator nucleic acid further comprises a donor template-recruiting sequence capable of hybridizing with a donor template (see Figure 2B).
- Donor templates are described in the “Donor Templates” subsection of section II infra. The donor template and donor template-recruiting sequence can be designed such that they bear sequence complementarity.
- the donor template-recruiting sequence is at least 90% (e.g., at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) complementary to at least a portion of the donor template. In certain embodiments, the donor template-recruiting sequence is 100% complementary to at least a portion of the donor template. In certain embodiments, where the donor template comprises an engineered sequence not homologous to the sequence to be repaired, the donor template-recruiting sequence is capable of hybridizing with the engineered sequence in the donor template.
- the donor template-recruiting sequence is at least 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides in length. In certain embodiments, the donor template-recruiting sequence is positioned at or near the 5’ end of the single guide nucleic acid or at or near the 5’ end of the modulator nucleic acid. In certain embodiments, the donor template-recruiting sequence is linked to the 5’ tail, if present, or to the modulator stem sequence, of the single guide nucleic acid or the modulator nucleic acid through an internucleotide bond or a nucleotide linker.
- the single guide nucleic acid or the modulator nucleic acid further comprises an editing enhancer sequence, which increases the efficiency of gene editing and/or homology-directed repair (HDR) (see Figure 2C).
- HDR homology-directed repair
- Exemplary editing enhancer sequences are described in Park et al. (2016) Nat. Commun.9: 3313.
- the editing enhancer sequence is positioned 5’ to the 5’ tail, if present, or 5’ to the single guide nucleic acid or the modulator stem sequence.
- the editing enhancer sequence is 1-50, 4-50, 9-50, 15-50, 25-50, 1-25, 4-25, 9-25, 15-25, 1-15, 4-15, 9-15, 1-9, 4-9, or 1-4 nucleotides in length.
- the editing enhancer sequence is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55 nucleotides in length.
- the editing enhancer sequence is designed to minimize homology to the target nucleotide sequence or any other sequence that the engineered, non-naturally occurring system may be contacted to, e.g., the genome sequence of a cell into which the engineered, non-naturally occurring system is delivered.
- the editing enhancer is designed to minimize the presence of hairpin structure.
- the editing enhancer can comprise one or more of the chemical modifications disclosed herein.
- the single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid can further comprise a protective nucleotide sequence that prevents or reduces nucleic acid degradation.
- the protective nucleotide sequence is at least 5 (e.g., at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50) nucleotides in length.
- the length of the protective nucleotide sequence increases the time for an exonuclease to reach the 5’ tail, modulator stem sequence, targeter stem sequence, and/or spacer sequence, thereby protecting these portions of the single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid from degradation by an exonuclease.
- the protective nucleotide sequence forms a secondary structure, such as a hairpin or a tRNA structure, to reduce the speed of degradation by an exonuclease (see, for example, Wu et al. (2016) Cell. Mol. Life Sci., 75(19): 3593-3607).
- a protective nucleotide sequence is typically located at the 5’ or 3’ end of the single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid.
- the single guide nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker.
- the modulator nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker.
- the modulator nucleic acid comprises a protective nucleotide sequence at the 5’ end (see Figure 2A).
- the targeter nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker.
- nucleotide sequences can be present in the 5’ portion of a single nucleic acid or a modulator nucleic acid, including but not limited to a donor template- recruiting sequence, an editing enhancer sequence, a protective nucleotide sequence, and a linker connecting such sequence to the 5’ tail, if present, or to the modulator stem sequence. It is understood that the functions of donor template recruitment, editing enhancement, protection against degradation, and linkage are not exclusive to each other, and one nucleotide sequence can have one or more of such functions.
- the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both a donor template-recruiting sequence and an editing enhancer sequence.
- the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both a donor template-recruiting sequence and a protective sequence.
- the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both an editing enhancer sequence and a protective sequence.
- the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is a donor template-recruiting sequence, an editing enhancer sequence, and a protective sequence.
- the nucleotide sequence 5’ to the 5’ tail, if present, or 5’ to the modulator stem sequence is 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-90, 30-80, 30- 70, 30-60, 30-50, 30-40, 40-90, 40-80, 40-70, 40-60, 40-50, 50-90, 50-80, 50-70, 50-60, 60-90, 60-80, 60-70, 70-90, 70-80, or 80-90 nucleotides in length.
- an engineered, non-naturally occurring system further comprises one or more compounds (e.g., small molecule compounds) that enhance HDR and/or inhibit NHEJ.
- compounds e.g., small molecule compounds
- Exemplary compounds having such functions are described in Maruyama et al. (2015) Nat Biotechnol.33(5): 538-42; Chu et al. (2015) Nat Biotechnol.33(5): 543-48; Yu et al. (2015) Cell Stem Cell 16(2): 142-47; Pinder et al. (2015) Nucleic Acids Res.43(19): 9379-92; and Yagiz et al. (2019) Commun. Biol.2: 198.
- an engineered, non- naturally occurring system further comprises one or more compounds selected from the group consisting of DNA ligase IV antagonists (e.g., SCR7 compound, Ad4 E1B55K protein, and Ad4 E4orf6 protein), RAD51 agonists (e.g., RS-1), DNA-dependent protein kinase (DNA-PK) antagonists (e.g., NU7441 and KU0060648), 3-adrenergic receptor agonists (e.g., L755507), inhibitors of intracellular protein transport from the ER to the Golgi apparatus (e.g., brefeldin A), and any combinations thereof.
- DNA ligase IV antagonists e.g., SCR7 compound, Ad4 E1B55K protein, and Ad4 E4orf6 protein
- RAD51 agonists e.g., RS-1
- DNA-PK DNA-dependent protein kinase
- 3-adrenergic receptor agonists e.g., L
- an engineered, non-naturally occurring system comprising a targeter nucleic acid and a modulator nucleic acid is tunable or inducible.
- the targeter nucleic acid, the modulator nucleic acid, and/or the Cas protein can be introduced to the target nucleotide sequence at different times, the system becoming active only when all components are present.
- the amounts of the targeter nucleic acid, the modulator nucleic acid, and/or the Cas protein can be titrated to achieve desired efficiency and specificity.
- Guide nucleic acids including a single guide nucleic acid, a targeter nucleic acid, and/or a modulator nucleic acid, may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- the single guide nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- the targeter nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- the modulator nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- Spacer sequences can be presented as DNA sequences by including thymidines (T) rather than uridines (U). It is understood that corresponding RNA sequences and DNA/RNA chimeric sequences are also contemplated. For example, where the spacer sequence is an RNA, its sequence can be derived from a DNA sequence disclosed herein by replacing each T with U.
- engineered, non-naturally occurring systems comprising a targeter nucleic acid comprising: a spacer sequence designed to hybridize with a target nucleotide sequence and a targeter stem sequence; and a modulator nucleic acid comprising a modulator stem sequence complementary to the targeter stem sequence, and, optionally, a 5’ sequence, e.g., a tail sequence, wherein, in a single guide nucleic acid the targeter nucleic acid and the modulator nucleic acid are part of a single polynucleotide, and in a dual guide nucleic acid, the targeter nucleic acid and the modulator nucleic acid are separate nucleic acids; modifications can include one or more chemical modifications to one or more nucleotides or internucleotide linkages at or near the 3’ end of the targeter nucleic acid (dual and single and the targeter nucleic acid
- the Cas nuclease is a type V-A Cas nuclease.
- Modulator and/or targeter nucleic sequences can include further sequences, as detailed in the Guide Nucleic Acids section, and modifications can be in these further sequences, as appropriate and apparent to one of skill in the art.
- guide nucleic acid is oriented from 5’ at the modulator nucleic acid to 3’ at the modulator stem sequence, and 5’ at the targeter stem sequence to 3’ at the targeter sequence (see, e.g., Figure 1A and 1B); in certain embodiments, as appropriate, guide nucleic acid is oriented from 3’ at the modulator nucleic acid to 5’ at the modulator stem sequence, and 3’ at the targeter stem sequence to 5’ at the targeter sequence.
- the targeter nucleic acid may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- the modulator nucleic acid may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof.
- the targeter nucleic acid is an RNA and the modulator nucleic acid is an RNA.
- a targeter nucleic acid in the form of an RNA is also called targeter RNA, and a modulator nucleic acid in the form of an RNA is also called modulator RNA.
- the nucleotide sequences disclosed herein are presented as DNA sequences by including thymidines (T) and/or RNA sequences including uridines (U). It is understood that corresponding DNA sequences, RNA sequences, and DNA/RNA chimeric sequences are also contemplated.
- RNA e.g., a gRNA.
- 5-100%, 10-100%, 20-100%, 30-100%, 40-100%, 50-100%, 60-100%, 70-100%, 80-100%, 90-100%, 95-100%, 99-100%, 99.5-100% of the gNA is gRNA.
- 20%-80%, 20%-70%, 20%-60%, 20%-50%, 20%-40%, 20%-30%, 30%-80%, 30%-70%, 30%-60%, 30%-50%, 30%-40%, 40%-80%, 40%-70%, 40%-60%, 40%-50%, 50%- 80%, 50%-70%, 50%-60%, 60%-80%, 60%-70%, or 70%-80% of gNA is RNA.
- 50% of the gNA is RNA.
- 70% of the gNA is RNA.
- 90% of the gNA is RNA.
- 100% of the gNA is RNA, e.g., a gRNA.
- the remaining portion of the gNA that is not RNA comprises a modified ribonucleotide, a deoxyribonucleotide, a modified deoxyribonucleotide, or a synthetic, e.g., unnatural nucleotide, for example, not intended to be limiting, threose nucleic acid, locked nucleic acid, peptide nucleic acid, arabinonucleic acid, hexose nucleic acid, among others.
- the targeter nucleic acid and/or the modulator nucleic acid are RNAs with one or more modifications in a ribose group, one or more modifications in a phosphate group, one or more modifications in a nucleobase, one or more terminal modifications, or a combination thereof.
- Exemplary modifications are disclosed in U.S. Patent Nos.10,900,034 and 10,767,175, U.S. Patent Application Publication No.2018/0119140, Watts et al. (2008) Drug Discov. Today 13: 842-55, and Hendel et al. (2015) NAT. BIOTECHNOL.33: 985.
- a targeter nucleic acid e.g., RNA
- the 3’ end of the targeter nucleic acid comprises the spacer sequence.
- the 3’ end of the targeter nucleic acid comprises the targeter stem sequence. Exemplary modifications are disclosed in Dang et al. (2015) Genome Biol.16: 280, Kocaz et al. (2019) Nature Biotech.37: 657-66, Liu et al.
- Modifications in a ribose group include but are not limited to modifications at the 2' position or modifications at the 4 position.
- the ribose comprises 2'-O-C1-4alkyl, such as 2'-O-methyl (2'-OMe, or M).
- the ribose comprises 2'-O-C1-3alkyl-O-C1-3alkyl, such as 2'-methoxyethoxy (2'-O—CH 2 CH 2 OCH 3 ) also known as 2'-O-(2-methoxyethyl) or 2'-MOE.
- the ribose comprises 2'-O-allyl.
- the ribose comprises 2'-O-2,4-Dinitrophenol (DNP).
- the ribose comprises 2'-halo, such as 2'-F, 2'-Br, 2'-Cl, or 2'-I.
- the ribose comprises 2'-NH 2 .
- the ribose comprises 2'-H (e.g., a deoxynucleotide). In certain embodiments, the ribose comprises 2'-arabino or 2'-F- arabino. In certain embodiments, the ribose comprises 2'-LNA or 2'-ULNA. In certain embodiments, the ribose comprises a 4'-thioribosyl. [0101] Modifications can also include a deoxy group, for example a 2'-deoxy-3'- phosphonoacetate (DP), a 2'-deoxy-3'-thiophosphonoacetate (DSP).
- DP 2'-deoxy-3'- phosphonoacetate
- DSP 2'-deoxy-3'-thiophosphonoacetate
- Internucleotide linkage modifications in a phosphate group include but are not limited to a phosphorothioate (S), a chiral phosphorothioate, a phosphorodithioate, a boranophosphonate, a C 1-4 alkyl phosphonate such as a methylphosphonate, a boranophosphonate, a phosphonocarboxylate such as a phosphonoacetate (P), a phosphonocarboxylate ester such as a phosphonoacetate ester, an amide, a thiophosphonocarboxylate such as a thiophosphonoacetate (SP), a thiophosphonocarboxylate ester such as a thiophosphonoacetate ester, and a 2' ,5 -linkage having a phosphodiester or any of the modified phosphates above.
- S phosphorothioate
- nucleobase examples include but are not limited to 2-thiouracil, 2- thiocytosine, 4-thiouracil, 6-thioguanine, 2-aminoadenine, 2-aminopurine, pseudouracil, hypoxanthine, 7-deazaguanine, 7-deaza-8-azaguanine, 7-deazaadenine, 7-deaza-8-azaadenine, 5- methylcytosine, 5-methyluracil, 5-hydroxymethylcytosine, 5-hydroxymethyluracil, 5,6- dehydrouracil, 5-propynylcytosine, 5-propynyluracil, 5-ethynylcytosine, 5-ethynyluracil, 5- allyluracil, 5-allylcytosine, 5-aminoallyluracil, 5-aminoallyl-cytosine, 5-bromouracil, 5- iodouraci
- Terminal modifications include but are not limited to polyethyleneglycol (PEG), hydrocarbon linkers (such as heteroatom (O,S,N)-substituted hydrocarbon spacers; halo- substituted hydrocarbon spacers; keto-, carboxyl-, amido-, thionyl-, carbamoyl-, thionocarbamaoyl-containing hydrocarbon spacers, propanediol), spermine linkers, dyes such as fluorescent dyes (for example, fluoresceins, rhodamines, cyanines), quenchers (for example, dabcyl, BHQ), and other labels (for example biotin, digoxigenin, acridine, streptavidin, avidin, peptid
- a terminal modification comprises a conjugation (or ligation) of the RNA to another molecule comprising an oligonucleotide (such as deoxyribonucleotides and/or ribonucleotides), a peptide, a protein, a sugar, an oligosaccharide, a steroid, a lipid, a folic acid, a vitamin and/or other molecule.
- an oligonucleotide such as deoxyribonucleotides and/or ribonucleotides
- a terminal modification incorporated into the RNA is located internally in the RNA sequence via a linker such as 2-(4-butylamidofluorescein)propane-1,3-diol bis(phosphodiester) linker, which is incorporated as a phosphodiester linkage and can be incorporated anywhere between two nucleotides in the RNA.
- a linker such as 2-(4-butylamidofluorescein)propane-1,3-diol bis(phosphodiester) linker, which is incorporated as a phosphodiester linkage and can be incorporated anywhere between two nucleotides in the RNA.
- the modification in the RNA is selected from the group consisting of incorporation of 2'-O-methyl- 3'phosphorothioate (MS), 2'-O-methyl-3'-phosphonoacetate (MP), 2'-O-methyl-3'- thiophosphonoacetate (MSP), 2'-halo-3'-phosphorothioate (e.g., 2'-fluoro-3'-phosphorothioate), 2'-halo-3'-phosphonoacetate (e.g., 2'-fluoro-3'-phosphonoacetate), and 2'-halo-3'- thiophosphonoacetate (e.g., 2'-fluoro-3'-thiophosphonoacetate).
- MS 2'-O-methyl- 3'phosphorothioate
- MP 2'-O-methyl-3'-phosphonoacetate
- MSP 2'-halo-3'-phosphorothioate
- 2'-halo-3'-phosphorothioate e
- modifications can include 2'-O-methyl (M), a phosphorothioate (S), a phosphonoacetate (P), a thiophosphonoacetate (SP), a 2'-O-methyl-3'- phosphorothioate (MS), a 2'-O-methyl-3'-phosphonoacetate (MP), a 2'-O-methyl-3'- thiophosphonoacetate (MSP), a 2'-deoxy-3'-phosphonoacetate (DP), a 2'-deoxy-3'- thiophosphonoacetate (DSP), or a combination thereof, at or near either the 3’ or 5’ end of either the targeter or modulator nucleic acid, as appropriate for single or dual gNA.
- M 2'-O-methyl
- S phosphorothioate
- P phosphonoacetate
- SP thiophosphonoacetate
- MS 2'-O-methyl-3'- phosphorothioate
- MS 2'
- modifications can include either a 5’ or a 3’ propanediol or C3 linker modification.
- the modification alters the stability of the RNA.
- the modification enhances the stability of the RNA, e.g., by increasing nuclease resistance of the RNA relative to a corresponding RNA without the modification.
- Stability- enhancing modifications include but are not limited to incorporation of 2'-O-methyl, a 2'-O-C 1- 4 alkyl, 2'-halo (e.g., 2'-F, 2'-Br, 2'-Cl, or 2'-I), 2' MOE, a 2'-O-C 1-3 alkyl-O-C 1-3 alkyl, 2'-NH 2 , 2'-H (or 2'-deoxy), 2'-arabino, 2'-F-arabino, 4 -thioribosyl sugar moiety, 3'-phosphorothioate, 3'- phosphonoacetate, 3'-thiophosphonoacetate, 3'-methylphosphonate, 3'-boranophosphate, 3'- phosphorodithioate, locked nucleic acid (“LNA”) nucleotide which comprises a methylene bridge between the 2' and 4' carbons of the ribose ring, and unlocked nucleic acid
- modifications are suitable for use as a protecting group to prevent or reduce degradation of the 5’ sequence, e.g., a tail sequence, modulator stem sequence (dual guide nucleic acids), targeter stem sequence (dual guide nucleic acids), and/or spacer sequence (see, the “Targeter and Modulator nucleic acids” subsection).
- the modification alters the specificity of the engineered, non- naturally occurring system.
- the modification enhances the specificity of the engineered, non-naturally occurring system, e.g., by enhancing on-target binding and/or cleavage, or reducing off-target binding and/or cleavage, or a combination thereof.
- Specificity- enhancing modifications include but are not limited to 2-thiouracil, 2-thiocytosine, 4-thiouracil, 6-thioguanine, 2-aminoadenine, and pseudouracil. Within 10, 5, 4, 3, 2, or 1 nucleotide of the 3’ end, for example the 3’ end nucleotide, is modified.
- the modification alters the immunostimulatory effect of the RNA relative to a corresponding RNA without the modification. For example, in certain embodiments, the modification reduces the ability of the RNA to activate TLR7, TLR8, TLR9, TLR3, RIG-I, and/or MDA5.
- the targeter nucleic acid and/or the modulator nucleic acid comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 modified nucleotides or internucleotide linkages.
- the modification can be made at one or more positions in the targeter nucleic acid and/or the modulator nucleic acid such that these nucleic acids retain functionality.
- the modified nucleic acids can still direct the Cas protein to the target nucleotide sequence and allow the Cas protein to exert its effector function.
- the particular modification(s) at a position may be selected based on the functionality of the nucleotide or internucleotide linkage at the position.
- a specificity-enhancing modification may be suitable for a nucleotide or internucleotide linkage in the spacer sequence, the targeter stem sequence, or the modulator stem sequence.
- a stability-enhancing modification may be suitable for one or more terminal nucleotides or internucleotide linkages in the targeter nucleic acid and/or the modulator nucleic acid.
- At least 1 e.g., at least 2, at least 3, at least 4, or at least 5 terminal nucleotides or internucleotide linkages at or near the 5’ end and/or at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 3’ end of the targeter nucleic acid are modified.
- At least 1 e.g., at least 2, at least 3, at least 4, or at least 5 terminal nucleotides or internucleotide linkages at or near the 5’ end and/or at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 3’ end of the modulator nucleic acid are modified.
- the targeter or modulator nucleic acid is a combination of DNA and RNA
- the nucleic acid as a whole is considered as an RNA
- the DNA nucleotide(s) are considered as modification(s) of the RNA, including a 2'-H modification of the ribose and optionally a modification of the nucleobase.
- composition and methods for targeting, editing, and/or modifying genomic DNA can be covalently conjugated to each other through one or more chemical modifications introduced into these nucleic acids, thereby increasing the stability of the double-stranded complex and/or improving other characteristics of the system.
- An engineered, non-naturally occurring system, such as disclosed herein, can be useful for targeting, editing, and/or modifying a target nucleic acid, such as a DNA (e.g., genomic DNA) in a cell or organism.
- the present invention provides a method of cleaving a target nucleic acid (e.g., DNA) comprising the sequence of a preselected target sequence or a portion thereof, the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, thereby resulting in cleavage of the target DNA.
- a target nucleic acid e.g., DNA
- the present invention provides a method of binding a target nucleic acid (e.g., DNA) comprising the sequence of a preselected target sequence or a portion thereof, the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, thereby resulting in binding of the system to the target DNA.
- This method can be useful, e.g., for detecting the presence and/or location of the a preselected target gene, for example, if a component of the system (e.g., the Cas protein) comprises a detectable marker.
- a component of the system e.g., the Cas protein
- a detectable marker e.g., a detectable marker associated with the target DNA.
- methods of modifying a target nucleic acid e.g., DNA
- a structure e.g., protein associated with the target DNA
- the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, wherein the Cas protein comprises an effector domain or is associated with an effector protein, thereby resulting in modification of the target DNA or the structure associated with the target DNA.
- a method comprises contacting the target nucleic acid with a CRISPR-Cas complex comprising a targeter nucleic acid, a modulator nucleic acid, and a Cas protein disclosed herein.
- the Cas protein is a type V-A, type V-C, or type V-D Cas protein (e.g., Cas nuclease).
- the Cas protein is a type V-A Cas protein (e.g., Cas nuclease).
- a method of editing a human genomic sequence at one of a group of preselected target gene loci comprising delivering an engineered, non-naturally occurring system disclosed herein into a human cell, thereby resulting in editing of the genomic sequence at the target gene locus in the human cell.
- a method of detecting a human genomic sequence at one of a group of preselected target gene loci comprising delivering the engineered, non- naturally occurring system disclosed herein into a human cell, wherein a component of the system (e.g., the Cas protein) comprises a detectable marker, thereby detecting the target gene locus in the human cell.
- a component of the system e.g., the Cas protein
- a method of modifying a human chromosome at one of a group of preselected target gene loci comprising delivering the engineered, non-naturally occurring system disclosed herein into a human cell, wherein the Cas protein comprises an effector domain or is associated with an effector protein, thereby resulting in modification of the chromosome at the target gene locus in the human cell.
- the CRISPR-Cas complex may be delivered to a cell by introducing a pre-formed ribonucleoprotein (RNP) complex into the cell. Alternatively, one or more components of the CRISPR-Cas complex may be expressed in the cell.
- RNP ribonucleoprotein
- Exemplary methods of delivery are known in the art and described in, for example, U.S. Patent Nos.8,697,359, 10,113,167, 10,570,418, 10,829,787, 11,118,194, and 11,125,739 and U.S. Patent Application Publication Nos. 2015/0344912, 2018/0119140, and 2018/0282763.
- contacting a DNA e.g., genomic DNA
- a CRISPR- Cas complex does not require delivery of all components of the complex into the cell.
- one or more of the components may be pre-existing in the cell.
- the cell (or a parental/ancestral cell thereof) has been engineered to express the Cas protein, and the single guide nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the single guide nucleic acid), the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid), and/or the modulator nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the modulator nucleic acid) are delivered into the cell.
- the single guide nucleic acid or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the single guide nucleic acid
- the targeter nucleic acid or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic
- the cell (or a parental/ancestral cell thereof) has been engineered to express the modulator nucleic acid, and the Cas protein (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the Cas protein) and the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid) are delivered into the cell.
- the Cas protein or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the Cas protein
- the targeter nucleic acid or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid
- the cell (or a parental/ancestral cell thereof) has been engineered to express the Cas protein and the modulator nucleic acid, and the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid) is delivered into the cell.
- the target DNA is in the genome of a target cell.
- the present invention also provides a cell comprising the non-naturally occurring system or a CRISPR expression system described herein.
- the present invention provides a cell whose genome has been modified by the CRISPR-Cas system or complex disclosed herein.
- the target cells can be mitotic or post-mitotic cells from any organism, such as a bacterial cell (e.g., E coli), an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, or the like, a fungal cell (e.g., a yeast cell, such as S. cervisiae), an animal cell, a cell from an invertebrate animal (e.g.
- a bacterial cell e.g., E coli
- an archaeal cell e.g., a cell of a single-cell eukaryotic organism
- a plant cell e.g., an algal cell, e.g., Botryococc
- fruit fly enidarian, echinoderm, nematode, etc.
- a cell from a vertebrate animal e.g., fish, amphibian, reptile, bird, mammal
- a cell from a mammal e.g., a cell from a rodent, or a cell from a human.
- target cells include but are not limited to a stem cell (e.g., an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell), a somatic cell (e.g., a fibroblast, a hematopoietic cell, a T lymphocyte (e.g., CD8+ T lymphocyte), an NK cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell), an in vitro or in vivo embryonic cell of an embryo at any stage (e.g., a 1-cell, 2-cell, 4-cell, 8-cell; stage zebrafish embryo).
- a stem cell e.g., an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell
- a somatic cell e.g., a fibroblast, a hematopoietic cell, a T lymphocyte (e.g., CD8
- Cells may be from established cell lines or may be primary cells (i.e., cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages of the culture).
- primary cultures are cultures that may have been passaged within 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times to go through the crisis stage.
- the primary cell lines are maintained for fewer than 10 passages in vitro. If the cells are primary cells, they may be harvest from an individual by any suitable method.
- leukocytes may be harvested by apheresis, leukocytapheresis, or density gradient separation, while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, or stomach can be harvested by biopsy.
- the harvested cells may be used immediately, or may be stored under frozen conditions with a cryopreservative and thawed at a later time in a manner as commonly known in the art.
- Ribonucleoprotein (RNP) delivery and “cas RNA” delivery An engineered, non-naturally occurring system disclosed herein can be delivered into a cell by suitable methods known in the art, including but not limited to ribonucleoprotein (RNP) delivery and “Cas RNA” delivery described below.
- RNP ribonucleoprotein
- Cas RNA delivery described below.
- a CRISPR-Cas system including a single guide nucleic acid and a Cas protein, or a CRISPR-Cas system including a targeter nucleic acid, a modulator nucleic acid, and a Cas protein can be combined into a RNP complex and then delivered into the cell as a pre-formed complex.
- a “ribonucleoprotein” or “RNP,” as used herein, can refer to a complex comprising a nucleoprotein and a ribonucleic acid.
- nucleoprotein as provided herein can refer to a protein capable of binding a nucleic acid (e.g., RNA, DNA). Where the nucleoprotein binds a ribonucleic acid it can be referred to as “ribonucleoprotein.”
- the interaction between the ribonucleoprotein and the ribonucleic acid may be direct, e.g., by covalent bond, or indirect, e.g., by non-covalent bond (e.g. electrostatic interactions (e.g. ionic bond, hydrogen bond, halogen bond), van der Waals interactions (e.g.
- the ribonucleoprotein includes an RNA-binding motif non-covalently bound to the ribonucleic acid.
- positively charged aromatic amino acid residues e.g., lysine residues
- the RNA-binding motif may form electrostatic interactions with the negative nucleic acid phosphate backbones of the RNA.
- the single guide nucleic acid, or the combination of the targeter nucleic acid and the modulator nucleic acid can be provided in excess molar amount (e.g., at least 2 fold, at least 3 fold, at least 4 fold, or at least 5 fold) relative to the Cas protein.
- the targeter nucleic acid and the modulator nucleic acid are annealed under suitable conditions prior to complexing with the Cas protein.
- the targeter nucleic acid, the modulator nucleic acid, and the Cas protein are directly mixed together to form an RNP.
- a variety of delivery methods can be used to introduce an RNP disclosed herein into a cell.
- Exemplary delivery methods or vehicles include but are not limited to microinjection, liposomes (see, e.g., U.S. Patent No.10829,787,) such as molecular trojan horses liposomes that delivers molecules across the blood brain barrier (see, Pardridge et al. (2010) Cold Spring Harb. Protoc., doi:10.1101/pdb.prot5407), immunoliposomes, virosomes, microvesicles (e.g., exosomes and ARMMs), polycations, lipid:nucleic acid conjugates, electroporation, cell permeable peptides (see, U.S.
- Patent No.11,118,194 nanoparticles, nanowires (see, Shalek et al. (2012) Nano Letters, 12: 6498), exosomes, and perturbation of cell membrane (e.g., by passing cells through a constriction in a microfluidic system, see, U.S. Patent No.11,125,739).
- the efficiency of RNP delivery can be enhanced by cell cycle synchronization (see, U.S. Patent No.10,570,418).
- an RNP is delivered into a cell by electroporation.
- a CRISPR-Cas system is delivered into a cell in a “approach, i.e., delivering (a) a single guide nucleic acid, or a combination of a targeter nucleic acid and a modulator nucleic acid, and (b) an RNA (e.g., messenger RNA (mRNA)) encoding a Cas protein.
- RNA e.g., messenger RNA (mRNA)
- the RNA encoding the Cas protein can be translated in the cell and form a complex with the single guide nucleic acid or combination of the targeter nucleic acid and the modulator nucleic acid intracellularly.
- RNAs Similar to the RNP approach, RNAs have limited half-lives in cells, even though stability-increasing modification(s) can be made in one or more of the RNAs. Accordingly, the “Cas RNA” approach is suitable for active modification of the genetic or epigenetic information in a cell during a limited time period, such as DNA cleavage, and has the advantage of reducing off-targeting.
- the mRNA can be produced by transcription of a DNA comprising a regulatory element operably linked to a Cas coding sequence.
- the single guide nucleic acid, or the targeter nucleic acid and the modulator nucleic acid are generally provided in excess molar amount (e.g., at least 5 fold, at least 10 fold, at least 20 fold, at least 30 fold, at least 50 fold, or at least 100 fold) relative to the mRNA.
- the targeter nucleic acid and the modulator nucleic acid are annealed under suitable conditions prior to delivery into the cells.
- the targeter nucleic acid and the modulator nucleic acid are delivered into the cells without annealing in vitro.
- a variety of delivery systems can be used to introduce an “Cas RNA” system into a cell.
- Non-limiting examples of delivery methods or vehicles include microinjection, biolistic particles, liposomes (see, e.g., U.S. Patent No.10,829,787) such as molecular trojan horses liposomes that delivers molecules across the blood brain barrier (see, Pardridge et al. (2010) Cold Spring Harb. Protoc., doi:10.1101/pdb.prot5407), immunoliposomes, virosomes, polycations, lipid:nucleic acid conjugates, electroporation, nanoparticles, nanowires (see, Shalek et al.
- the CRISPR-Cas system is delivered into a cell in the form of (a) a single guide nucleic acid or a combination of a targeter nucleic acid and a modulator nucleic acid, and (b) a DNA comprising a regulatory element operably linked to a Cas coding sequence.
- the DNA can be provided in a plasmid, viral vector, or any other form described in the “CRISPR Expression Systems” subsection.
- Such delivery method may result in constitutive expression of Cas protein in the target cell (e.g., if the DNA is maintained in the cell in an episomal vector or is integrated into the genome), and may increase the risk of off-targeting which is undesirable when the Cas protein has nuclease activity.
- this approach is useful when the Cas protein comprises a non-nuclease effector (e.g., a transcriptional activator or repressor). It is also useful for research purposes and for genome editing of plants.
- a non-nuclease effector e.g., a transcriptional activator or repressor
- nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding a guide nucleic acid disclosed herein.
- the nucleic acid comprises a regulatory element operably linked to a nucleotide sequence encoding a single guide nucleic acid; this nucleic acid alone can constitute a CRISPR expression system.
- the nucleic acid comprises a regulatory element operably linked to a nucleotide sequence encoding a targeter nucleic acid.
- the nucleic acid further comprises a nucleotide sequence encoding a modulator nucleic acid, wherein the nucleotide sequence encoding the modulator nucleic acid is operably linked to the same regulatory element as the nucleotide sequence encoding the targeter nucleic acid or a different regulatory element; this nucleic acid alone can constitute a CRISPR expression system.
- the present invention provides a CRISPR expression system comprising: (a) a nucleic acid comprising a first regulatory element operably linked to a nucleotide sequence encoding a targeter nucleic acid and (b) a nucleic acid comprising a second regulatory element operably linked to a nucleotide sequence encoding a modulator nucleic acid.
- a CRISPR expression system further comprises a nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding a Cas protein, such as a Cas protein disclosed herein.
- the Cas protein is a type V-A, type V-C, or type V-D Cas protein (e.g., Cas nuclease). In certain embodiments, the Cas protein is a type V-A Cas protein (e.g., Cas nuclease).
- the term “operably linked” can mean that the nucleotide sequence of interest is linked to the regulatory element in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
- the nucleic acids of a CRISPR expression system described above may be independently selected from various nucleic acids such as DNA (e.g., modified DNA) and RNA (e.g., modified RNA).
- the nucleic acids comprising a regulatory element operably linked to one or more nucleotide sequences encoding the guide nucleic acids are in the form of DNA.
- the nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding the Cas protein is in the form of DNA.
- the third regulatory element can be a constitutive or inducible promoter that drives the expression of the Cas protein.
- nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding the Cas protein is in the form of RNA (e.g., mRNA).
- Nucleic acids of a CRISPR expression system can be provided in one or more vectors.
- the term “vector,” as used herein, can refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
- Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in cells, such as prokaryotic cells, eukaryotic cells, mammalian cells, or target tissues.
- Non-viral vector delivery systems include DNA plasmids, RNA (e.g.
- Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
- At least one of the vectors is a DNA plasmid.
- at least one of the vectors is a viral vector (e.g., retrovirus, adenovirus, or adeno-associated virus).
- Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors and replication defective viral vectors) do not autonomously replicate in the host cell. Certain vectors, however, may be integrated into the genome of the host cell and thereby are replicated along with the host genome. A skilled person in the art will appreciate that different vectors may be suitable for different delivery methods and have different host tropism, and will be able to select one or more vectors suitable for the use.
- regulatory element can refer to a transcriptional and/or translational control sequence, such as a promoter, enhancer, transcription termination signal (e.g., polyadenylation signal), internal ribosomal entry sites (IRES), protein degradation signal, or the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a targeter nucleic acid or a modulator nucleic acid) or a coding sequence (e.g., a Cas protein) and/or regulate translation of an encoded polypeptide.
- a transcriptional and/or translational control sequence such as a promoter, enhancer, transcription termination signal (e.g., polyadenylation signal), internal ribosomal entry sites (IRES), protein degradation signal, or the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a targeter nucleic acid or a modulator nucleic acid) or a coding sequence (e.g., a Cas protein) and/or regulate translation
- Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences).
- tissue-specific regulatory sequences may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g., liver, pancreas), or particular cell types (e.g., lymphocytes).
- a vector comprises one or more pol III promoter (e.g., 1, 2, 3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof.
- pol III promoters include, but are not limited to, U6 and H1 promoters.
- pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer), the SV40 promoter, the dihydrofolate reductase promoter, the -actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1 promoter.
- RSV Rous sarcoma virus
- CMV cytomegalovirus
- SV40 promoter the dihydrofolate reductase promoter
- the -actin promoter the phosphoglycerol kinase (PGK) promoter
- PGK phosphoglycerol kinase
- EF1 promoter also encompassed by the term “regulatory element” are enhancer elements, such as WPRE; CMV enhancers; the R-U5' segment in L
- a vector can be introduced into host cells to produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., CRISPR transcripts, proteins, enzymes, mutant forms thereof, or fusion proteins thereof).
- the nucleotide sequence encoding the Cas protein is codon optimized for expression in a prokaryotic cell, e.g., E coli, eukaryotic host cell, e.g., a yeast cell (e.g., S. cerevisiae), a mammalian cell (e.g., a mouse cell, a rat cell, or a human cell), or a plant cell.
- a prokaryotic cell e.g., E coli
- eukaryotic host cell e.g., a yeast cell (e.g., S. cerevisiae)
- a mammalian cell e.g., a mouse cell, a rat cell, or a human cell
- Various species exhibit particular bias for certain codons of a particular amino acid.
- Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules.
- mRNA messenger RNA
- tRNA transfer RNA
- the predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at kazusa.or.jp/codon/ and these tables can be adapted in a number of ways (see, Nakamura et al.
- C. Donor templates Cleavage of a target nucleotide sequence in the genome of a cell by a CRISPR-Cas system or complex can activate DNA damage pathways, which may rejoin the cleaved DNA fragments by NHEJ or HDR. HDR requires a repair template, either endogenous or exogenous, to transfer the sequence information from the repair template to the target.
- an engineered, non-naturally occurring system or CRISPR expression system further comprises a donor template.
- the term “donor template” can refer to a nucleic acid designed to serve as a repair template at or near the target nucleotide sequence upon introduction into a cell or organism.
- the donor template is complementary to a polynucleotide comprising the target nucleotide sequence or a portion thereof.
- a donor template may overlap with one or more nucleotides of a target nucleotide sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides).
- the nucleotide sequence of the donor template is typically not identical to the genomic sequence that it replaces. Rather, the donor template may contain one or more substitutions, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair.
- the donor template comprises a non-homologous sequence flanked by two regions of homology (i.e., homology arms), such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region.
- the donor template comprises a non- homologous sequence 10-100 nucleotides, 50-500 nucleotides, 100-1,000 nucleotides, 200-2,000 nucleotides, or 500-5,000 nucleotides in length positioned between two homology arms.
- the homologous region(s) of a donor template has at least 50% sequence identity to a genomic sequence with which recombination is desired.
- the homology arms are designed or selected such that they are capable of recombining with the nucleotide sequences flanking the target nucleotide sequence under intracellular conditions.
- the donor template comprises a first homology arm homologous to a sequence 5’ to the target nucleotide sequence and a second homology arm homologous to a sequence 3’ to the target nucleotide sequence.
- the first homology arm is at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identical to a sequence 5’ to the target nucleotide sequence.
- the second homology arm is at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identical to a sequence 3’ to the target nucleotide sequence.
- the nearest nucleotide of the donor template is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, or more nucleotides from the target nucleotide sequence.
- the donor template further comprises an engineered sequence not homologous to the sequence to be repaired. Such engineered sequence can harbor a barcode and/or a sequence capable of hybridizing with a donor template-recruiting sequence disclosed herein.
- the donor template further comprises one or more mutations relative to the genomic sequence, wherein the one or more mutations reduce or prevent cleavage, by the same CRISPR-Cas system, of the donor template or of a modified genomic sequence with at least a portion of the donor template sequence incorporated.
- the PAM adjacent to the target nucleotide sequence and recognized by the Cas nuclease is mutated to a sequence not recognized by the same Cas nuclease.
- the target nucleotide sequence e.g., the seed region
- the one or more mutations are silent with respect to the reading frame of a protein-coding sequence encompassing the mutated sites.
- the donor template can be provided to the cell as single-stranded DNA, single- stranded RNA, double-stranded DNA, or double-stranded RNA. It is understood that a CRISPR- Cas system, such as a system disclosed herein, may possess nuclease activity to cleave the target strand, the non-target strand, or both. When HDR of the target strand is desired, a donor template having a nucleic acid sequence complementary to the target strand is also contemplated. [0146] The donor template can be introduced into a cell in linear or circular form.
- the ends of the donor template may be protected (e.g., from exonucleolytic degradation) by methods known to those of skill in the art.
- one or more dideoxynucleotide residues are added to the 3 terminus of a linear molecule and/or self- complementary oligonucleotides are ligated to one or both ends (see, for example, Chang et al. (1987) PROC. NATL. ACAD SCI USA, 84: 4959; Nehls et al. (1996) SCIENCE, 272: 886; see also the chemical modifications for increasing stability and/or specificity of RNA disclosed supra).
- Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues.
- additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination.
- a donor template can be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide.
- the donor template is a DNA.
- a donor template is in the same nucleic acid as a sequence encoding the single guide nucleic acid, a sequence encoding the targeter nucleic acid, a sequence encoding the modulator nucleic acid, and/or a sequence encoding the Cas protein, where applicable.
- a donor template is provided in a separate nucleic acid.
- a donor template polynucleotide may be of any suitable length, such as about or at least about 50, 75, 100, 150, 200, 500, 1000, 2000, 3000, 4000, or more nucleotides in length.
- a donor template can be introduced into a cell as an isolated nucleic acid.
- a donor template can be introduced into a cell as part of a vector (e.g., a plasmid) having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance, that are not intended for insertion into the DNA region of interest.
- a donor template can be delivered by viruses (e.g., adenovirus, adeno-associated virus (AAV)).
- viruses e.g., adenovirus, adeno-associated virus (AAV)
- the donor template is introduced as an AAV, e.g., a pseudotyped AAV.
- the capsid proteins of the AAV can be selected by a person skilled in the art based upon the tropism of the AAV and the target cell type.
- the donor template is introduced into a hepatocyte as AAV8 or AAV9.
- the donor template is introduced into a hematopoietic stem cell, a hematopoietic progenitor cell, or a T lymphocyte (e.g., CD8 + T lymphocyte) as AAV6 or an AAVHSC (see, U.S. Patent No.9,890,396).
- sequence of a capsid protein may be modified from a wild-type AAV capsid protein, for example, having at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) sequence identity to a wild-type AAV capsid sequence.
- the donor template can be delivered to a cell (e.g., a primary cell) by various delivery methods, such as a viral or non-viral method disclosed herein.
- a non- viral donor template is introduced into the target cell as a naked nucleic acid or in complex with a liposome or poloxamer.
- a non-viral donor template is introduced into the target cell by electroporation.
- a viral donor template is introduced into the target cell by infection.
- the engineered, non-naturally occurring system can be delivered before, after, or simultaneously with the donor template (see, International (PCT) Application Publication No. WO 2017/053729). A skilled person in the art will be able to choose proper timing based upon the form of delivery (consider, for example, the time needed for transcription and translation of RNA and protein components) and the half-life of the molecule(s) in the cell.
- the donor template e.g., as an AAV
- the donor template is introduced into the cell within 4 hours (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 90, 120, 150, 180, 210, or 240 minutes) after the introduction of the engineered, non-naturally occurring system.
- the donor template is conjugated covalently to a modulator nucleic acid. Covalent linkages suitable for this conjugation are known in the art and are described, for example, in U.S.
- the donor template is covalently linked to a modulator nucleic acid (e.g., the 5’ end of the modulator nucleic acid) through an internucleotide bond.
- the donor template is covalently linked to a modulator nucleic acid (e.g., the 5’ end of the modulator nucleic acid) through a linker.
- the donor template can comprise any nucleic acid chemistry.
- the donor template can comprise DNA and/or RNA nucleotides.
- the donor template can comprise single-stranded DNA, linear single- stranded RNA, linear double-stranded DNA, linear double-stranded RNA, circular single- stranded DNA, circular single-stranded RNA, circular double-stranded DNA, or circular double- stranded RNA.
- the donor template comprises a mutation in a PAM sequence to partially or completely abolish binding of the RNP to the DNA.
- the donor template is present at a concentration of at least 0.05, 0.01, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.25, 1.5, 1.75, 2, 3, or 4, and/or no more than 0.01, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.25, 1.5, 1.75, 2, 3, 4, or 5 ⁇ g ⁇ L -1 , for example 0.01-5 ⁇ g ⁇ L -1 .
- the donor template comprises one or more promoters.
- the donor template comprises a promoter that shares at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99.5% sequence identity with any one of SEQ ID NOs: 78-85 of Table 4. TABLE 4: Promoter sequences
- An engineered, non-naturally occurring system can be evaluated in terms of efficiency and/or specificity in nucleic acid targeting, cleavage, or modification.
- an engineered, non-naturally occurring system has high efficiency. For example, in certain embodiments, at least 1, 1.5, 2, 2.5, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, or 100% of a population of nucleic acids having the target nucleotide sequence and a cognate PAM, when contacted with the engineered, non-naturally occurring system, is targeted, cleaved, or modified.
- the genomes of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, or 100% of a population of cells, when the engineered, non-naturally occurring system is delivered into the cells, are targeted, cleaved, or modified.
- the frequency of off-target events e.g., targeting, cleavage, or modification, depending on the function of the CRISPR-Cas system
- off-target events were summarized in Lazzarotto et al. (2016) Nat Protoc.13(11): 2615-42, and include discovery of in situ Cas off-targets and verification by sequencing (DISCOVER-seq) as disclosed in Wienert et al.
- the off-target events include targeting, cleavage, or modification at a given off-target locus (e.g., the locus with the highest occurrence of off-target events detected).
- the off-target events include targeting, cleavage, or modification at all the loci with detectable off-target events, collectively.
- genomic mutations are detected in no more than 0.0001%, 0.0002%, 0.0003%, 0.0004%, 0.0005%, 0.0006%, 0.0007%, 0.0008%, 0.0009%, 0.001%, 0.002%, 0.003%, 0.004%, 0.005%, 0.006%, 0.007%, 0.008%, 0.009%, 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1%, 2%, 3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1%, 2%, 3%, 4%, or 5% of the cells at any off-target loci (in aggregate).
- the ratio of the percentage of cells having an on-target event to the percentage of cells having any off-target event is at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000. It is understood that genetic variation may be present in a population of cells, for example, by spontaneous mutations, and such mutations are not included as off-target events. E.
- the method of targeting, editing, and/or modifying a genomic DNA disclosed herein can be conducted in multiplicity.
- a library of targeter nucleic acids can be used to target multiple genomic loci; a library of donor templates can also be used to generate multiple insertions, deletions, and/or substitutions.
- the multiplex assay can be conducted in a screening method wherein each separate cell culture (e.g., in a well of a 96-well plate or a 384-well plate) is exposed to a different guide nucleic acid having a different targeter stem sequence and/or a different donor template.
- the multiplex assay can also be conducted in a selection method wherein a cell culture is exposed to a mixed population of different guide nucleic acids and/or donor templates, and the cells with desired characteristics (e.g., functionality) are enriched or selected by advantageous survival or growth, resistance to a certain agent, expression of a detectable protein (e.g., a fluorescent protein that is detectable by flow cytometry), etc.
- the plurality of guide nucleic acids and/or the plurality of donor templates are designed for saturation editing. For example, in certain embodiments, each nucleotide position in a sequence of interest is systematically modified with each of all four traditional bases, A, T, G and C.
- each sequence in each gene from a pool of genes of interest is modified, for example, according to a CRISPR design algorithm.
- each sequence from a pool of exogenous elements of interest e.g., protein coding sequences, non-protein coding genes, regulatory elements
- each sequence from a pool of exogenous elements of interest is inserted into one or more given loci of the genome.
- the multiplex methods suitable for the purpose of carrying out a screening or selection method which is typically conducted for research purposes, may be different from the methods suitable for therapeutic purposes.
- constitutive expression of certain elements e.g., a Cas nuclease and/or a guide nucleic acid
- constitutive expression of a Cas nuclease and/or a guide nucleic acid may be desirable.
- the constitutive expression provides a large window during which other elements can be introduced.
- constitutive expression of certain elements can increase the efficiency and reduce the complexity of a screening or selection process.
- Inducible expression of certain elements of the system disclosed herein may also be used for research purposes given similar advantages.
- Expression may be induced by an exogenous agent (e.g., a small molecule) or by an endogenous molecule or complex present in a particular cell type (e.g., at a particular stage of differentiation).
- exogenous agent e.g., a small molecule
- endogenous molecule or complex present in a particular cell type (e.g., at a particular stage of differentiation).
- Methods known in the art, such as those described herein, can be used for constitutively or inducibly expressing one or more elements.
- the specificity of CRISPR nucleases is at least partially dictated by the uniqueness of the spacer (in combination with spacer sequence’s proximity to a requisite PAM) and its off-target score can be calculated with algorithms, such as crispr.mit.edu (Hsu et al. (2013) Nat. Biotech.31: 827-832).
- the algorithm for gRNA prediction should be able to make alignments with repeated regions and low-complexity sequences.
- the efficiency of the screening or selection process can also be achieved by pre-assembling a plurality of RNP complexes in a multiplex manner.
- the method disclosed herein further comprises a step of identifying a guide nucleic acid, a Cas protein, a donor template, or a combination of two or more of these elements from the screening or selection process.
- a set of barcodes may be used, for example, in the donor template between two homology arms, to facilitate the identification.
- the method further comprises harvesting the population of cells; selectively amplifying a genomic DNA or RNA sample including the target nucleotide sequence(s) and/or the barcodes; and/or sequencing the genomic DNA or RNA sample and/or the barcodes that has been selectively amplified.
- the present invention provides a library comprising a plurality of guide nucleic acids, such as a plurality of guide nucleic acids disclosed herein.
- the present invention provides a library comprising a plurality of nucleic acids each comprising a regulatory element operably linked to a different guide nucleic acid such as a different guide nucleic acid disclosed herein.
- These libraries can be used in combination with one or more Cas proteins or Cas-coding nucleic acids, such as disclosed herein, and/or one or more donor templates, such as disclosed herein, for a screening or selection method.
- Genomic safe harbors are an area of research seeking to modify genes of living organisms to improve our understanding of gene function and to develop methods for genome engineering that treat genetic or acquired diseases, among many others.
- skilled artisans use one or more available tools to introduce changes into the genome at targeted locations to modify the sequence of a target polynucleotide, e.g., a target gene, in desired ways, e.g., modulate gene expression, modulate gene sequences, remove gene sequences, introduce genes, e.g., exogenous DNA, e.g., transgenes, and the like.
- Efficient transgene insertion may be accomplished through non-precise methods including but not limited to viral vectors, such as, retroviral vectors, e.g., adeno-associate virus (AAV) and the like, or precise methods including but not limited to guided nucleases, such as, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), homing endonucleases, e.g., restriction endonucleases, or nucleic acid-guided nuclease, e.g., CRISPR-cas, e.g., Cas9 and Cas12a and engineered versions thereof.
- viral vectors such as, retroviral vectors, e.g., adeno-associate virus (AAV) and the like
- guided nucleases such as, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), homing endon
- Exogenous genes e.g., transgenes
- inserted into the genome of a target human cell either randomly, e.g., through retroviral vectors, or in a targeted manner, e.g., through the action of a nucleic acid-guided nuclease, such as Cas, may interact with other genomic elements in unpredictable ways.
- transgenes due to the complex transcriptional regulation of genes in mammalian cells through networks of cis and trans regulatory elements, such as proximal and distal enhancers, and multiple transcription factors, attempts to alter the default genomic architecture by integration of exogenous DNA, e.g., transgenes, or synthetic sequences can affect the expression of the transgene itself leading to complete attenuation or complete silencing, and/or the expression of both nearby and distant endogenous genes that can, e.g., compromise the safety checkpoints that healthy cells have including dysregulation of expression of key genes, such as oncogenes and tumor suppressor genes, that can alter cellular behavior in dramatic ways, i.e., promoting clonal expansion or malignant transformation of the host.
- exogenous DNA e.g., transgenes, or synthetic sequences
- key genes such as oncogenes and tumor suppressor genes
- suitable target polynucleotide comprising a target nucleotide sequence in the human genome wherein the insertion of a transgene leads to sufficient expression of the transgene in a therapeutic cell e.g., a T cell, e.g., a CAR T cell; or precursor cell, e.g., a stem cell, such as a hematopoietic stem cell, without malignant transformation or any other disruption that would be harmful to an individual after implantation is desired.
- a therapeutic cell e.g., a T cell, e.g., a CAR T cell
- precursor cell e.g., a stem cell, such as a hematopoietic stem cell
- Expression of exogenous genes, e.g., transgenes, in desired cell types and/or developmental/differentiation stages relies on integration into suitable target polynucleotide comprising a target nucleotide sequence that results in sufficient expression, to a degree sufficient for the intended purpose, from the candidate locus.
- suitable target polynucleotide comprising a target nucleotide sequence that results in sufficient expression, to a degree sufficient for the intended purpose, from the candidate locus.
- Expression from a specific genomic site can be affected by many factors including but not limited to cell type and differentiation stage, as one or more components of the target polynucleotide get activated during differentiation while others get silenced, and changes in chromatin architecture.
- compositions and methods for genome engineering comprise compositions.
- Certain embodiments comprise composition for editing genomes.
- gNAs novel guide nucleic acids
- a target polynucleotide includes a polynucleotide in which a target nucleotide sequence is located.
- a “target nucleotide sequence” includes a sequence to which a guide sequence can bind, e.g., has complementarity to, where binding between a target nucleotide sequence and a guide sequence may allow the activity of a nucleic acid-guided nuclease complex.
- gNAs e.g., gRNAs
- gRNAs that are complementary to a target nucleotide sequence in a target polynucleotide into which insertion of exogenous DNA, e.g., a transgene, doesn’t negatively affect the cell, e.g., significantly affect the expression of one or more endogenous genes or result in a malignant transformation of the cell.
- gene expression demonstrated in the human target cell is maintained through differentiation of the human target cell and/or through proliferation in the one or more progeny cells at a level sufficient for the ultimate use of the cells.
- Certain embodiments disclosed herein concern novel nucleic acid-guided nuclease complexes, e.g., RNPs, such as Cas bound to a gNA, that are complementary to a target nucleotide sequence within a target polynucleotide and hydrolyze the phosphodiester back bone (also referred as cleave or cut) in at least one position on at least one strand of the target polynucleotide.
- Certain embodiments disclosed herein concern methods for selecting and using gNAs, e.g., gRNAs, for genome engineering.
- Certain embodiments concern methods for using gNAs that are complementary to a target nucleotide sequence within a target polynucleotide, synthesizing the gNA and nucleic-acid-guided nuclease, and/or combining the nucleic guided nuclease with the gNA to form a nucleic acid-guided nuclease complex, e.g., RNP.
- Certain embodiments disclosed herein concern methods.
- Certain embodiments disclosed herein concern methods for engineering genomes.
- nucleic acid-guided nuclease complex e.g., RNP
- a donor template e.g., an exogenous DNA, e.g., a transgene
- the nucleic-acid guided nuclease cleaves the backbone at a least one position in at least one of the strands of the target polynucleotide and the donor template is used to repair the cleaved target polynucleotide, introducing at least a portion of the donor template into the target polynucleotide.
- exogenous DNA or a “transgene” includes any gene, natural or synthetic, which is introduced into the genome of an organism or cell to which it is not endogenous.
- the transgene may or may not retain the ability to be expressed and/or produce RNA or protein in the human target cell.
- the transgene may or may not alter the resulting phenotype of the human target cell.
- Certain embodiments include human target cells, e.g., a eukaryotic cell, e.g., a mammalian cell, such as a human cell, for example a stem cell or an immune cell, generated through a method where the nucleic acid-guided nuclease complex, e.g., RNP, is introduced, e.g., transfected, into a human target cell along with a donor template, e.g., as an exogenous DNA or a transgene, such as a chimeric antigen receptor (CAR), in which the nucleic-acid guided nuclease cleaves at or near a targets sequence in a target polynucleotide and the donor template is used to repair the cleaved target polynucleotide introducing at least a portion of the donor template into the target polynucleotide.
- a eukaryotic cell e.g., a mammalian cell, such as a human cell
- a “human target cell” includes a cell into which an exogenous product, e.g., a protein, a nucleic acid, or a combination thereof, has been introduced.
- a human target cell may be used to produce a gene product from an exogenous DNA, e.g., a transgene, such as an exogenous protein, e.g., a CAR.
- a human target cell may comprise a target nucleotide sequence within target polynucleotide wherein a nucleic acid-guided nuclease hybridizes and cleaves at a site of cleavage at one or more positions on one or more strands of the target polynucleotide at or near the target nucleotide sequence.
- a “site of cleavage” includes the location or locations at which a nucleic acid-guided nuclease complex will hydrolyze the phosphodiester backbone of a single- stranded or double-stranded target polynucleotide, after binding at a target nucleotide sequence in the target polynucleotide.
- binding of the nucleic acid-guided nuclease complex to a target nucleotide sequence within the target polynucleotide can result in hydrolysis of one of the strands of the target polynucleotide at or near the target nucleotide sequence, resulting in strand cleavage.
- the nucleic acid-guided nuclease complex can cleave either strand of the target polynucleotide.
- binding of the nucleic acid-guided nuclease complex to a target nucleotide sequence within a target polynucleotide can result in hydrolysis of both strands of the target polynucleotide at or near the target nucleotide sequence, resulting in cleavage of both strands.
- the sites of cleavage can be the same for both strands, resulting in a blunt end, or the sites of cleavage for each strand can be offset resulting in single strand overhangs, e.g., sticky ends.
- mismatches at or near the site of cleavage may or may not affect the cleavage efficiency of the nucleic acid-guided nuclease complex.
- Exemplary characteristics of a target nucleotide sequence that can demonstrate predictable function without potentially harmful alterations in human target cell genomic activity include one or more of (1) >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, (2) >150 kb, for example, >200, such as >250, and in some cases >300 kb away from any miRNA/other functional small RNA, (3) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, (4) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any replication origin, (5) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any ultra-conserved element, (6) demonstrating low transcriptional activity, (7) outside of a copy number variable region, (8) located in open chromatin, and (9) unique
- compositions In certain embodiments, provided herein are compositions. In certain embodiments, provided herein are compositions for engineering a human target cell at suitable target nucleotide sequences within a target polynucleotide of the human target cell. [0173] In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least one of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least two of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least three of the exemplary characteristics.
- a suitable target polynucleotide that comprises a target nucleotide sequence has at least four of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least five of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least six of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least seven of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least eight of the exemplary characteristics.
- a suitable target polynucleotide that comprises a target nucleotide sequence has all the exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at one additional exemplary characteristic.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least two additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least three additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least four additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least five additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least six additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least seven additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises all eight additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at one additional exemplary characteristic.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least two additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least three additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least four additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least five additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least six additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least seven additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises all eight additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, and >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least one additional exemplary characteristic.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least two additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least three additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least four additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least five additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least six additional exemplary characteristics.
- a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises all seven additional exemplary characteristics.
- a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and >150, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene.
- a suitable target polynucleotide comprising a target nucleotide sequence may comprise any one of SEQ ID NOs: 2020- 2043 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2043.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2043. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2043. [0179] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2042 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2042.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2042. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2042. [0180] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2041 and 2043 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2041 and 2043.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2041 and 2043. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2041 and 2043. [0181] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2041 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2041.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2041. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2041.
- a suitable target polynucleotide comprising a target nucleotide sequence may comprise at least a portion of, for example, nucleotides 1-495, 1-490, 1-485, 1-480, 1-475, 1-470, 1-465, 1-460, 1-455, 1-450, 1- 445, 1-440, 1-435, 1-430, 1-425, 1-420, 1-415, 1-410, 1-405, or 1-400, of any one of SEQ ID NOs: 2020-2030 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to the portion of any one of SEQ ID NOs: 2020- 2030.
- a suitable target polynucleotide comprising a target nucleotide sequence may comprise at least a portion of, for example, nucleotides 5-500, 10-500, 15-500, 20-500, 25-500, 30-500, 35-500, 40-500, 45-500, 50-500, 55-500, 60-500, 65-500, 70-500, 75-500, 80-500, 85-500, 90-500, 95-500, or 100-500, of any one of SEQ ID NOs: 2031-2041 of Table 5.
- a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to the portion of any one of SEQ ID NOs: 2031-2041. TABLE 5 suitable target polynucleotides comprising a target nucleotide sequence for transgene insertion
- expression of an exogenous DNA, e.g., transgene, inserted in a target polynucleotide at or near a target nucleotide sequence may depend on cell type and differentiation stage, as one or more components of a target polynucleotide get activated during differentiation while others get silenced, which may or may not be correlated with rearrangements of the chromatin architecture reorganization during differentiation.
- a suitable target polynucleotide comprising a target nucleotide sequence demonstrates suitable expression of an inserted exogenous DNA, e.g., transgene, throughout differentiation and clonal expansion.
- the guide nucleic acid comprises: (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence.
- the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6.
- the modulator nucleic acid comprises one or more modifications as disclosed herein, preferably at least 1, 2, 3, 4, 5, 6, 7 and/or not more than 2, 3, 4, 5, 6, 7, or 8 modifications at or near the 5’ end of the modulator nucleic acid.
- the modulator nucleic acid comprises a 5’ 2’-O-methoxy modified nucleotide.
- the modulator nucleic acid comprises at least 1, 2, 3, 4, or 5 and/or not more than 2, 3, 4, 5, or 5 modified phosphodiester linkages as disclosed herein.
- the modulator nucleic acid comprises 1 phosphorothioate modified linkage at or near the 5’ end.
- the modulator nucleic acid comprises 2 phosphorothioate modified linages at or near the 5’ end.
- the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7.
- the targeter nucleic acid comprises one or more modifications as disclosed herein, preferably at least 1, 2, 3, 4, 5, 6, 7 and/or not more than 2, 3, 4, 5, 6, 7, or 8 modifications at or near the 3’ end of the targeter nucleic acid.
- the targeter nucleic acid comprises at least 1, 2, 3, 4, 5, 6, or 7 and/or not more than 2, 3, 4, 5, 6, 8, or 83’ 2’-O-methoxy modified nucleotides, preferably 1-5, more preferably 1-3. In certain embodiments, the targeter nucleic acid comprises at least 1, 2, 3, 4, or 5 and/or not more than 2, 3, 4, 5, or 5 modified phosphodiester linkages as disclosed herein. In certain embodiments, the targeter nucleic acid comprises 1 phosphorothioate modified linkage at or near the 3’ end. In certain embodiments, the targeter nucleic acid comprises 2 phosphorothioate modified linages at or near the 3’ end.
- the targeter nucleic acid comprises 3-5 phosphorothioate modification at or near the 3’ end. In certain embodiments, the targeter nucleic acid comprises at least 1, 2, 3, 4, 5, 6, or 7 and/or not more than 2, 3, 4, 5, 6, 7, or 82’ fluoro- modifications at or near the 3’ end.
- Table 6 modulator sequences
- Table 7 targeter sequences
- compositions comprising a guide nucleic acid, an engineered, non-naturally occurring system, or a eukaryotic cell, such as a guide nucleic acid, an engineered, non-naturally occurring system, or a eukaryotic cell, disclosed herein.
- the composition comprises an RNP comprising a guide nucleic acid, such as a guide nucleic acid disclosed herein, and a Cas protein (e.g., Cas nuclease).
- the composition comprises a single guide nucleic acid, such as a single guide nucleic acid disclosed herein.
- the composition comprises an RNP comprising the single guide nucleic acid, and a Cas protein (e.g., Cas nuclease).
- the composition comprises an RNP comprising the targeter nucleic acid, the modulator nucleic acid, and a Cas protein (e.g., Cas nuclease).
- the composition comprises a complex of a targeter nucleic acid and a modulator nucleic acid, such as a complex of a targeter nucleic acid and a modulator nucleic acid disclosed herein.
- the composition comprises an RNP comprising the targeter nucleic acid, the modulator nucleic acid, and a Cas protein (e.g., Cas nuclease).
- a Cas protein e.g., Cas nuclease.
- the method comprising incubating a single guide nucleic acid, such as a single guide nucleic acid disclosed herein, with a Cas protein, thereby producing a complex of the single guide nucleic acid and the Cas protein (e.g., an RNP).
- the method further comprises purifying the complex (e.g., the RNP).
- a method of producing a composition comprising incubating a targeter nucleic acid and a modulator nucleic acid, such as a targeter nucleic acid and a modulator nucleic acid disclosed herein, under suitable conditions, thereby producing a composition (e.g., pharmaceutical composition) comprising a complex of the targeter nucleic acid and the modulator nucleic acid.
- a modulator nucleic acid such as a targeter nucleic acid and a modulator nucleic acid disclosed herein
- the method further comprises incubating the targeter nucleic acid and the modulator nucleic acid with a Cas protein (e.g., the Cas nuclease that the targeter nucleic acid and the modulator nucleic acid are capable of activating or a related Cas protein), thereby producing a complex of the targeter nucleic acid, the modulator nucleic acid, and the Cas protein (e.g., an RNP).
- a Cas protein e.g., the Cas nuclease that the targeter nucleic acid and the modulator nucleic acid are capable of activating or a related Cas protein
- the method further comprises purifying the complex (e.g., the RNP).
- a guide nucleic acid, an engineered, non-naturally occurring system, a CRISPR expression system, or a cell comprising such system or modified by such system disclosed herein is combined with a pharmaceutically acceptable carrier.
- pharmaceutically acceptable can refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit-to-risk ratio.
- compositions include buffers, carriers, and excipients suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
- Pharmaceutically acceptable carriers include any of the standard pharmaceutical carriers, such as a phosphate buffered saline solution, water, emulsions (e.g., such as an oil/water or water/oil emulsions), and various types of wetting agents.
- the compositions also can include stabilizers and preservatives.
- Pharmaceutically acceptable carriers include buffers, solvents, dispersion media, coatings, isotonic and absorption delaying agents, or the like, that are compatible with pharmaceutical administration. The use of such media and agents for pharmaceutically active substances is known in the art.
- a pharmaceutical composition disclosed herein comprises a salt, e.g., NaCl, MgCl2, KCl, MgSO4, etc.; a buffering agent, e.g., a Tris buffer, N-(2- Hydroxyethyl)piperazine-N -(2-ethanesulfonic acid) (HEPES), 2-(N-Morpholino)ethanesulfonic acid (MES), MES sodium salt, 3-(N-Morpholino)propanesulfonic acid (MOPS), N- tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS), etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a nuclease inhibitor; or the like.
- a buffering agent e.g., a Tris buffer, N-(2- Hydroxyethyl)piperazine-N
- a subject composition comprises a subject DNA-targeting RNA, e.g., gRNA, and a buffer for stabilizing nucleic acids.
- a pharmaceutical composition may contain formulation materials for modifying, maintaining or preserving, for example, the pH, osmolarity, viscosity, clarity, color, isotonicity, odor, sterility, stability, rate of dissolution or release, adsorption or penetration of the composition.
- suitable formulation materials include, but are not limited to, amino acids (such as glycine, glutamine, asparagine, arginine or lysine); antimicrobials; antioxidants (such as ascorbic acid, sodium sulfite or sodium hydrogen-sulfite); buffers (such as borate, bicarbonate, Tris-HCl, citrates, phosphates or other organic acids); bulking agents (such as mannitol or glycine); chelating agents (such as ethylenediamine tetraacetic acid (EDTA)); complexing agents (such as caffeine, polyvinylpyrrolidone, beta- cyclodextrin or hydroxypropyl-beta-cyclodextrin); fillers; monosaccharides; disaccharides; and other carbohydrates (such as glucose, mannose or dextrins); proteins (such as serum albumin, gelatin or immunoglobulins); coloring, flavoring and diluting agents; emulsifying agents
- amino acids
- a pharmaceutical composition may contain nanoparticles, e.g., polymeric nanoparticles, liposomes, or micelles (See Anselmo et al. (2016) Bioeng. Transl. Med.1: 10-29).
- the pharmaceutical composition comprises an inorganic nanoparticle.
- Exemplary inorganic nanoparticles include, e.g., magnetic nanoparticles (e.g., Fe3MnO2) or silica.
- the outer surface of the nanoparticle can be conjugated with a positively charged polymer (e.g., polyethylenimine, polylysine, polyserine) which allows for attachment (e.g., conjugation or entrapment) of payload.
- a positively charged polymer e.g., polyethylenimine, polylysine, polyserine
- the pharmaceutical composition comprises an organic nanoparticle (e.g., entrapment of the payload inside the nanoparticle).
- Exemplary organic nanoparticles include, e.g., SNALP liposomes that contain cationic lipids together with neutral helper lipids which are coated with polyethylene glycol (PEG) and protamine and nucleic acid complex coated with lipid coating.
- the pharmaceutical composition comprises a liposome, for example, a liposome disclosed in International (PCT) Application Publication No. WO 2015/148863.
- the pharmaceutical composition comprises a targeting moiety to increase target cell binding or update of nanoparticles and liposomes.
- targeting moieties include cell specific antigens, monoclonal antibodies, single chain antibodies, aptamers, polymers, sugars, and cell penetrating peptides.
- the pharmaceutical composition comprises a fusogenic or endosome-destabilizing peptide or polymer.
- a pharmaceutical composition may contain a sustained- or controlled-delivery formulation.
- sustained- or controlled-delivery means such as liposome carriers, bio-erodible microparticles or porous beads and depot injections, are also known to those skilled in the art.
- Sustained-release preparations may include, e.g., porous polymeric microparticles or semipermeable polymer matrices in the form of shaped articles, e.g., films, or microcapsules.
- Sustained release matrices may include polyesters, hydrogels, polylactides, copolymers of L-glutamic acid and gamma ethyl-L-glutamate, poly (2- hydroxyethyl-inethacrylate), ethylene vinyl acetate, or poly-D( _ )-3-hydroxybutyric acid.
- Sustained release compositions may also include liposomes that can be prepared by any of several methods known in the art.
- a pharmaceutical composition of the invention can be administered by a variety of methods known in the art. The route and/or mode of administration vary depending upon the desired results. Administration can be intravenous, intramuscular, intraperitoneal, or subcutaneous, or administered proximal to the site of the target.
- the pharmaceutically acceptable carrier should be suitable for intravenous, intramuscular, subcutaneous, parenteral, spinal or epidermal administration (e.g., by injection or infusion).
- the active compound e.g., the guide nucleic acid, engineered, non-naturally occurring system, or CRISPR expression system disclosed herein
- the active compound may be coated in a material to protect the compound from the action of acids and other natural conditions that may inactivate the compound.
- Formulation components suitable for parenteral administration include a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerin, propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as EDTA; buffers such as acetates, citrates or phosphates; and agents for the adjustment of tonicity such as sodium chloride or dextrose.
- a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerin, propylene glycol or other synthetic solvents
- antibacterial agents such as benzyl alcohol or methyl parabens
- antioxidants such as ascorbic acid or sodium bisulfite
- chelating agents such as EDTA
- buffers such as acetates, citrates or phosphates
- suitable carriers include physiological saline, bacteriostatic water, Cremophor ELTM (BASF, Parsippany, NJ) or phosphate buffered saline (PBS).
- the carrier should be stable under the conditions of manufacture and storage, and should be preserved against microorganisms.
- the carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyetheylene glycol), and suitable mixtures thereof.
- Pharmaceutical formulations preferably are sterile. Sterilization can be accomplished by any suitable method, e.g., filtration through sterile filtration membranes.
- compositions of the invention can be prepared in accordance with methods well known and routinely practiced in the art. See, e.g., Remington: The Science and Practice of Pharmacy, Mack Publishing Co., 20th ed., 2000; and Sustained and Controlled Release Drug Delivery Systems, J. R. Robinson, ed., Marcel Dekker, Inc., New York, 1978. Pharmaceutical compositions are preferably manufactured under GMP conditions.
- compositions of the invention typically employ a therapeutically effective dose or efficacious dose of the guide nucleic acid, engineered, non- naturally occurring system, or CRISPR expression system disclosed herein is employed in the pharmaceutical compositions of the invention.
- the compositions disclosed herein are formulated into pharmaceutically acceptable dosage forms by conventional methods known to those of skill in the art. Dosage regimens are adjusted to provide the optimum desired response (e.g., a therapeutic response). For example, a single bolus may be administered, several divided doses may be administered over time or the dose may be proportionally reduced or increased as indicated by the exigencies of the therapeutic situation. It is especially advantageous to formulate parenteral compositions in dosage unit form for ease of administration and uniformity of dosage.
- Dosage unit form as used herein refers to physically discrete units suited as unitary dosages for the subjects to be treated; each unit contains a predetermined quantity of active compound calculated to produce the desired therapeutic effect in association with the required pharmaceutical carrier. [0202] Actual dosage levels of the active ingredients in the pharmaceutical compositions of the invention can be varied so as to obtain an amount of the active ingredient which is effective to achieve the desired therapeutic response for a particular patient, composition, and mode of administration, without being toxic to the patient.
- the selected dosage level depends upon a variety of pharmacokinetic factors including the activity of the particular compositions disclosed herein employed, or the ester, salt or amide thereof, the route of administration, the time of administration, the rate of excretion of the particular compound being employed, the duration of the treatment, other drugs, compounds and/or materials used in combination with the particular compositions employed, the age, sex, weight, condition, general health and prior medical history of the patient being treated, and like factors.
- Guide nucleic acids, engineered, non-naturally occurring systems, and the CRISPR expression systems, e.g., as disclosed herein, are useful for targeting, editing, and/or modifying the genomic DNA in a cell or organism.
- guide nucleic acids and systems can be used to treat a disease or disorder in which modification of genetic or epigenetic information is desirable.
- a method of treating a disease or disorder comprising administering to a subject in need thereof a guide nucleic acid, a non-naturally occurring system, a CRISPR expression system, or a cell disclosed herein.
- subject includes human and non-human animals.
- Non-human animals include all vertebrates, e.g., mammals and non-mammals, such as non-human primates, sheep, dog, cow, chickens, amphibians, and reptiles.
- treatment can refer to obtaining a desired pharmacologic and/or physiologic effect.
- the effect may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease or delaying the disease progression.
- Treatment covers any treatment of a disease in a mammal, e.g., in a human, and includes: (a) inhibiting the disease, i.e., arresting its development; and (b) relieving the disease, i.e., causing regression of the disease.
- a disease or disorder may be identified by genetic methods and treated prior to manifestation of any medical symptom.
- it can be important to control the concentration of the CRISPR-Cas system delivered.
- Optimal concentrations can be determined by testing different concentrations in a cellular, tissue, or non-human eukaryote animal model and using deep sequencing to analyze the extent of modification at potential off-target genomic loci. The concentration that gives the highest level of on-target modification while minimizing the level of off-target modification is generally selected for ex vivo or in vivo delivery.
- the guide nucleic acid, the engineered, non-naturally occurring system, and the CRISPR expression system disclosed herein can be used to treat any suitable disease or disorder that can be improved by the system in a cell.
- certain methods disclosed herein is particularly suitable for editing or modifying a proliferating cell, such as a stem cell (e.g., a hematopoietic stem cell), a progenitor cell (e.g., a hematopoietic progenitor cell or a lymphoid progenitor cell), or a memory cell (e.g., a memory T cell).
- a stem cell e.g., a hematopoietic stem cell
- a progenitor cell e.g., a hematopoietic progenitor cell or a lymphoid progenitor cell
- a memory cell e.g., a memory T cell
- the engineered, non-naturally occurring system of the present invention has the advantage of increasing or decreasing the efficiency of nucleic acid cleavage by, for example, adjusting the hybridization of dual guide nucleic acids. As a result, it can be used to minimize off-target events when creating genetically engineered proliferating cells.
- the guide nucleic acid, the engineered, non-naturally occurring system, and/or the CRISPR expression system disclosed herein can be used to engineer an immune cell.
- Immune cells include but are not limited to lymphocytes (e.g., B lymphocytes or B cells, T lymphocytes or T cells, and natural killer cells), myeloid cells (e.g., monocytes, macrophages, eosinophils, mast cells, basophils, and granulocytes), and the stem and progenitor cells that can differentiate into these cell types (e.g., hematopoietic stem cells, hematopoietic progenitor cells, and lymphoid progenitor cells).
- lymphocytes e.g., B lymphocytes or B cells, T lymphocytes or T cells, and natural killer cells
- myeloid cells e.g., monocytes, macrophages, eosinophils, mast cells, basophils, and granulocytes
- the cells can include autologous cells derived from a subject to be treated, or alternatively allogenic cells derived from a donor.
- the immune cell is a T cell, which can be, for example, a cultured T cell, a primary T cell, a T cell from a cultured T cell line (e.g., Jurkat, SupTi), or a T cell obtained from a mammal, for example, from a subject to be treated. If obtained from a mammal, the T cell can be obtained from numerous sources, including but not limited to blood, bone marrow, lymph node, the thymus, or other tissues or fluids. T cells can also be enriched or purified.
- the T cell can be any type of T cell and can be of any developmental stage, including but not limited to, CD4 + /CD8 + double positive T cells, CD4 + helper T cells (e.g., Th1 and Th2 cells), CD8 + T cells (e.g., cytotoxic T cells), tumor infiltrating lymphocytes (TILs), memory T cells (e.g., central memory T cells and effector memory T cells), regulatory T cells, naive T cells, or the like.
- an immune cell e.g., a T cell, is engineered to express an exogenous gene.
- an engineered CRISPR system disclosed herein may catalyze DNA cleavage at the gene locus, allowing for site-specific integration of the exogenous gene at the gene locus by HDR.
- an immune cell e.g., a T cell
- a chimeric antigen receptor i.e., the T cell comprises an exogenous nucleotide sequence encoding a CAR.
- the term “chimeric antigen receptor” or “CAR” includes any artificial receptor including an antigen-specific binding moiety and one or more signaling chains derived from an immune receptor.
- CARs can comprise a single chain fragment variable (scFv) of an antibody specific for an antigen coupled via hinge and transmembrane regions to cytoplasmic domains of T cell signaling molecules, e.g. a T cell costimulatory domain (e.g., from CD28, CD137, OX40, ICOS, or CD27) in tandem with a T cell triggering domain (e.g. from CD3 ).
- T cell costimulatory domain e.g., from CD28, CD137, OX40, ICOS, or CD27
- a T cell expressing a chimeric antigen receptor is referred to as a CAR T cell.
- Exemplary CAR T cells include CD19 targeted CTL019 cells (see, Grupp et al. (2015) BLOOD, 126: 4983), 19-28z cells (see, Park et al. (2015) J. CLIN.
- an immune cell e.g., a T cell
- binds an antigen e.g., a cancer antigen
- an endogenous T cell receptor TCR
- an immune cell e.g., a T cell
- is engineered to express an exogenous TCR e.g., an exogenous naturally occurring TCR or an exogenous engineered TCR.
- T cell receptors comprise two chains referred to as the ⁇ - and ⁇ - chains, that combine on the surface of a T cell to form a heterodimeric receptor that can recognize MHC-restricted antigens.
- Each of ⁇ - and ⁇ -chain comprises a constant region and a variable region.
- Each variable region of the ⁇ - and ⁇ -chains defines three loops, referred to as complementary determining regions (CDRs) known as CDR 1 , CDR 2 , and CDR 3 that confer the T cell receptor with antigen binding activity and binding specificity.
- CDRs complementary determining regions
- a CAR or TCR binds a cancer antigen selected from B-cell maturation antigen (BCMA), mesothelin, prostate specific membrane antigen (PSMA), prostate stem cell antigen (PSCA), carbonic anhydrase IX (CAIX), carcinoembryonic antigen (CEA), CD5, CD7, CD10, CD19, CD20, CD22, CD30, CD33, CD34, CD38, CD41, CD44, CD49f, CD56, CD70, CD74, CD123, CD133, CD138, epithelial glycoprotein2 (EGP 2), epithelial glycoprotein-40 (EGP-40), epithelial cell adhesion molecule (EpCAM), receptor-type tyrosine- protein kinase (FLT3), folate-binding protein (FBP), fetal acetylcholine receptor (AChR), folate receptor-a and (FRa and ), Ganglioside G2 (GD2), Ganglioside G2 (GD2), Gan
- TCR subunit loci e.g., the TCR constant (TRAC) locus, the TCR constant 1 (TRBC1) locus, and the TCR constant 2 (TRBC2) locus. It is understood that insertion in the TRAC locus reduces tonic CAR signaling and enhances T cell potency (see, Eyquem et al. (2017) NATURE, 543: 113).
- an immune cell e.g., a T cell
- an immune cell is engineered to have reduced expression of an endogenous TCR or TCR subunit, e.g., TRAC, TRBC1, and/or TRBC2.
- the cell may be engineered to have partially reduced or no expression of the endogenous TCR or TCR subunit.
- the immune cell e.g., a T cell
- the immune cell is engineered to have less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of the endogenous TCR or TCR subunit relative to a corresponding unmodified or parental cell.
- the immune cell e.g., a T cell
- the immune cell is engineered to have no detectable expression of the endogenous TCR or TCR subunit. Exemplary approaches to reduce expression of TCRs using CRISPR systems are described in U.S. Patent No.9,181,527, Liu et al.
- T cells also express major histocompatibility complex (MHC) or human leukocyte antigen (HLA) genes, and inactivation of these endogenous gene may reduce an immune response, thereby allowing use of allogeneic T cells as starting materials for preparation of CAR T cells.
- MHC major histocompatibility complex
- HLA human leukocyte antigen
- an immune cell e.g., a T-cell
- a T-cell is engineered to have reduced expression of one or more endogenous class I or class II MHCs or HLAs (e.g., beta 2-microglobulin (B2M), class II major histocompatibility complex transactivator (CIITA)).
- the cell may be engineered to have partially reduced or no expression of an endogenous MHC or HLA.
- the immune cell e.g., a T-cell
- the immune cell is engineered to have less than less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of endogenous MHC (e.g., B2M, CIITA) relative to a corresponding unmodified or parental cell.
- the immune cell e.g., a T cell
- a cell may be engineered to have expression of, e.g., HLA-E and/or HLA-G, in order to avoid attack by natural killer (NK) cells.
- HLA-E and/or HLA-G expression of, e.g., HLA-E and/or HLA-G, in order to avoid attack by natural killer (NK) cells.
- NK natural killer
- Exemplary approaches to reduce expression of MHCs using CRISPR systems are described in Liu et al. (2017) CELL RES, 27: 154, Ren et al. (2017) CLIN CANCER RES, 23: 2255, and Ren et al. (2017) ONCOTARGET, 8: 17002.
- Other genes that may be inactivated include but are not limited to CD3, CD52, and deoxycytidine kinase (DCK).
- inactivation of DCK may render the immune cells (e.g., T cells) resistant to purine nucleotide analogue (PNA) compounds, which are often used to compromise the host immune system in order to reduce a GVHD response during an immune cell therapy.
- the immune cell e.g., a T-cell
- the immune cell is engineered to have less than less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of endogenous CD52 or DCK relative to a corresponding unmodified or parental cell.
- an immune cell e.g., T cell
- an immune cell e.g., a T cell
- an immune cell is engineered to have reduced expression of an immune checkpoint protein.
- immune checkpoint proteins expressed by wild-type T cells include but are not limited to PDCD1 (PD-1), CTLA4, ADORA2A (A2AR), B7-H3, B7-H4, BTLA, KIR, LAG3, HAVCR2 (TIM3), TIGIT, VISTA, PTPN6 (SHP-1), and FAS.
- the cell may be modified to have partially reduced or no expression of the immune checkpoint protein.
- the immune cell e.g., a T cell
- the immune cell is engineered to have less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of the immune checkpoint protein relative to a corresponding unmodified or parental cell.
- the immune cell e.g., a T cell
- Exemplary approaches to reduce expression of immune checkpoint proteins using CRISPR systems are described in International (PCT) Publication No. WO 2017/017184, Cooper et al.
- the immune cell can be engineered to have reduced expression of an endogenous gene, e.g., an endogenous genes described above, by gene editing or modification.
- an engineered CRISPR system disclosed herein may result in DNA cleavage at a gene locus, thereby inactivating the targeted gene.
- an engineered CRISPR system disclosed herein may be fused to an effector domain (e.g., a transcriptional repressor or histone methylase) to reduce the expression of the target gene.
- the immune cell can also be engineered to express an exogenous protein (besides an antigen-binding protein described above) at the locus of a human ADORA2A, B2M, CD52, CIITA, CTLA4, DCK, FAS, HAVCR2, LAG3, PDCD1, PTPN6, TIGIT, TRAC, TRBC1, TRBC2, CARD11, CD247, IL7R, LCK, or PLCG1 gene.
- an immune cell e.g., a T cell, is modified to express a dominant-negative form of an immune checkpoint protein.
- the dominant-negative form of the checkpoint inhibitor can act as a decoy receptor to bind or otherwise sequester the natural ligand that would otherwise bind and activate the wild-type immune checkpoint protein.
- engineered immune cells for example, T cells containing dominant-negative forms of an immune suppressor are described, for example, in International (PCT) Publication No. WO 2017/040945.
- an immune cell e.g., a T cell
- a gene e.g., a transcription factor, a cytokine, or an enzyme
- a gene e.g., a transcription factor, a cytokine, or an enzyme
- the immune cell is modified to express TET2, FOXO1, IL-12, IL-15, IL-18, IL-21, IL-7, GLUT1, GLUT3, HK1, HK2, GAPDH, LDHA, PDK1, PKM2, PFKFB3, PGK1, ENO1, GYS1, and/or ALDOA.
- the modification is an insertion of a nucleotide sequence encoding the protein operably linked to a regulatory element.
- the modification is a substitution of a single nucleotide polymorphism (SNP) site in the endogenous gene.
- SNP single nucleotide polymorphism
- an immune cell e.g., a T cell
- a variant of a gene for example, a variant that has greater activity than the respective wild-type gene.
- the immune cell is modified to express a variant of CARD11, CD247, IL7R, LCK, or PLCG1.
- certain gain-of-function variants of IL7R were disclosed in Zenatti et al., (2011) NAT. GENET.43(10):932-39.
- the variant can be expressed from the native locus of the respective wild-type gene by delivering an engineered system described herein for targeting the native locus in combination with a donor template that carries the variant or a portion thereof.
- an immune cell e.g., a T cell
- a protein e.g., a cytokine or an enzyme
- the immune cell is modified to express CA9, CA12, a V-ATPase subunit, NHE1, and/or MCT-1.
- Gene therapies [0224] It is understood that the engineered, non-naturally occurring system and CRISPR expression system, e.g., as disclosed herein, can be used to treat a genetic disease or disorder, i.e., a disease or disorder associated with or otherwise mediated by an undesirable mutation in the genome of a subject.
- Exemplary genetic diseases or disorders include age-related macular degeneration, adrenoleukodystrophy (ALD), Alagille syndrome, alpha-1-antitrypsin deficiency, argininemia, argininosuccinic aciduria, ataxia (e.g., Friedreich ataxia, spinocerebellar ataxias, ataxia telangiectasia, essential tremor, spastic paraplegia), autism, biliary atresia, biotinidase deficiency, carbamoyl phosphate synthetase I deficiency, carbohydrate deficient glycoprotein syndrome (CDGS), a central nervous system (CNS)-related disorder (e.g., Alzheimer's disease, amyotrophic lateral sclerosis (ALS), canavan disease (CD), ischemia, multiple sclerosis (MS), neuropathic pain, Parkinson's disease), Bloom's syndrome, cancer, Charcot-Marie-T
- diabetes insipidus Fabry, familial hypercholesterolemia (LDL receptor defect), Fanconi's anemia, fragile X syndrome, a fatty acid oxidation disorder, galactosemia, glucose-6-phosphate dehydrogenase (G6PD), glycogen storage diseases (e.g., type I (glucose-6-phosphatase deficiency, Von Gierke II (alpha glucosidase deficiency, Pompe), III (debrancher enzyme deficiency, Cori), IV (brancher enzyme deficiency, Anderson), V (muscle glycogen phosphorylase deficiency, McArdle), VII (muscle phosphofructokinase deficiency, Tauri), VI (liver phosphorylase deficiency, Hers), IX (liver glycogen phosphorylase kinase deficiency)), hemophilia A (associated with defective factor VIII), hemophilia B (associated with defective factor IX), Huntington’s disease
- Additional exemplary genetic diseases or disorders and associated information are available on the world wide web at kumc.edu/gec/support, genome.gov/10001200, and ncbi.nlm.nih.gov/books/NBK22183/. Additional exemplary genetic diseases or disorders, associated genetic mutations, and gene therapy approaches to treat genetic diseases or disorders are described in International (PCT) Publication Nos.
- kits containing any one or more of the elements disclosed in the above systems, libraries, methods, and compositions.
- the kit comprises an engineered, non-naturally occurring system as disclosed herein and instructions for using the kit.
- the instructions may be specific to the applications and methods described herein.
- one or more of the elements of the system are provided in a solution.
- one or more of the elements of the system are provided in lyophilized form, and the kit further comprises a diluent.
- Elements may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, a tube, or immobilized on the surface of a solid base (e.g., chip or microarray).
- the kit comprises one or more of the nucleic acids and/or proteins described herein.
- the kit provides all elements of the systems of the invention.
- the targeter nucleic acid and the modulator nucleic acid are provided in separate containers.
- the targeter nucleic acid and the modulator nucleic acid are pre-complexed, and the complex is provided in a single container.
- the kit comprises a Cas protein or a nucleic acid comprising a regulatory element operably linked to a nucleic acid encoding a Cas protein provided in a separate container.
- the kit comprises a Cas protein pre-complexed with the single guide nucleic acid or a combination of the targeter nucleic acid and the modulator nucleic acid, and the complex is provided in a single container.
- the kit further comprises one or more donor templates provided in one or more separate containers.
- the kit comprises a plurality of donor templates as disclosed herein (e.g., in separate tubes or immobilized on the surface of a solid base such as a chip or a microarray), one or more guide nucleic acids disclosed herein, and optionally a Cas protein or a regulatory element operably linked to a nucleic acid encoding a Cas protein as disclosed herein.
- kits are useful for identifying a donor template that introduces optimal genetic modification in a multiplex assay.
- the CRISPR expression systems as disclosed herein are also suitable for use in a kit.
- a kit further comprises one or more reagents and/or buffers for use in a process utilizing one or more of the elements described herein.
- Reagents may be provided in any suitable container and may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use (e.g., in concentrate or lyophilized form).
- a buffer may be a reaction or storage buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof.
- the buffer is alkaline.
- the buffer has a pH from about 7 to about 10.
- the kit further comprises a pharmaceutically acceptable carrier.
- the kit further comprises one or more devices or other materials for administration to a subject. V.
- Embodiments [0232] in embodiment 1 provided herein is a composition comprising a synthetic guide nucleic acid (gNA) comprising: (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex.
- gNA synthetic guide nucleic acid
- embodiment 2 provided herein is the composition of embodiment 1, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other.
- embodiment 3 provided herein is the composition of embodiment 2, wherein the targeter stem sequence and the modulator stem sequence each comprise five nucleotides that base pair with each other.
- embodiment 4 provided herein is the composition of embodiment 3, wherein (1) the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least 2 nucleotides, and (2) the modulator nucleic acid comprises an additional nucleotide sequence 3’ to the modulator stem sequence comprising an additional at least 2 nucleotides.
- embodiment 5 provided herein is the composition of embodiment 2, wherein the targeter stem sequence and the modulator stem sequence each comprise four nucleotides that base pair with each other.
- embodiment 6 provided herein is the composition of any one of the preceding embodiments, wherein the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least two nucleotides.
- embodiment 7 provided herein is the composition of any one of the preceding embodiments, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity.
- embodiment 8 provided herein is the composition of any one of the preceding embodiments, wherein at least 40% of the base pairs in the stem are C-G base pairs.
- embodiment 9 provided herein is the composition of any one of the preceding embodiments, wherein the targeter and modulator nucleic acids comprise a single polynucleotide.
- embodiment 10 provided herein is the composition of embodiments 1-8, wherein the targeter and modulator nucleic acids are separate polynucleotides.
- embodiment 11 provided herein is the composition of embodiment 1, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both.
- embodiment 12 provided herein is the composition of embodiment 11, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end.
- embodiment 13 provided herein is the composition of any one of the preceding embodiments, further comprising a Type V nucleic acid-guided nuclease complexed with the gNA.
- composition of embodiment 13 wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease.
- modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6.
- embodiment 16 provided herein is the composition of any one of the preceding embodiments, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7.
- a method of editing a genome of a eukaryotic cell comprising (I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex; (B) one or more Type V nucleic acid-guided nucleases, or polynucleo
- embodiment 18 provided herein is the method of embodiment 17, further comprising treating the eukaryotic cell with a HDR enhancer.
- the HDR enhancer comprises a DNA-PK antagonist, preferably M3814.
- embodiment 20 provided herein is the method of any one of embodiments 17-19, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences.
- compositions comprising a synthetic guide nucleic acid (gNA) comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides, (2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex.
- gNA synthetic guide nucleic acid
- embodiment 22 provided herein is the composition of embodiment 21, wherein the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-7 and -4 kcal/mol.
- embodiment 23 provided herein is the composition of embodiment 21 or 22, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other.
- embodiment 24 provided herein is the composition of embodiment 21-23, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity.
- embodiment 25 provided herein is the composition of embodiment 21-24, wherein at least 40% of the base pairs in the stem are C-G base pairs.
- embodiment 26 provided herein is the composition of embodiment 21-25, wherein the targeter and modulator nucleic acids comprise a single polynucleotide.
- embodiment 27 provided herein is the composition of embodiment 21-25, wherein the targeter and modulator nucleic acids are separate polynucleotides.
- embodiment 28 provided herein is the composition of embodiment 21-27, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both.
- embodiment 29 provided herein is the composition of embodiment 21-28, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end.
- embodiment 30 provided herein is the composition of embodiment 21-29, further comprising a Type V nucleic acid-guided nuclease.
- embodiment 31 provided herein is the composition of embodiment 30, wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease.
- embodiment 32 provided herein is the composition of any one of embodiments 21- 31, wherein the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6.
- embodiment 33 provided herein is the composition of any one of embodiments 21- 32, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7.
- a method of editing a genome of a eukaryotic cell comprising (I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides, (2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is between -10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming
- embodiment 35 provided herein is the method of embodiment 34, further comprising treating the eukaryotic cell with a HDR enhancer.
- the HDR enhancer comprises a DNA-PK antagonist, preferably M3814.
- embodiment 37 provided herein is the method of any one of embodiments 34-36, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences.
- compositions are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are compositions of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
- an element or component is said to be included in and/or selected from a list of recited elements or components, it should be understood that the element or component can be any one of the recited elements or components, or the element or component can be selected from a group consisting of two or more of the recited elements or components.
Abstract
Provided herein are nucleic acids useful as guide nucleic acids (gNAs), e.g., guide ribonucleic acids (gRNAs), in a CRISPR system wherein the guide nucleic acids contain one or more modifications to one or more nucleotides, use of such guide nucleic acids in modifying cells, and other uses wherein CRISPR Cas proteins 5 are utilized.
Description
COMPOSITIONS AND METHODS FOR TARGETING, EDITING, OR MODIFYING GENES STATEMENT AS TO FEDERALLY FUNDED RESEARCH [0001] None REFERENCE TO RELATED APPLICATIONS [0002] This application claims the benefit of and priority to U.S. Provisional Patent Application No.63/415,539, filed October 12, 2022, the disclosure of which is hereby incorporated by reference in its entirety for all purposes. BACKGROUND [0003] Recent advances have been made in precise genome targeting technologies. For example, specific loci in genomic DNA can be targeted, edited, or otherwise modified by designer meganucleases, zinc finger nucleases, or transcription activator-like effectors (TALEs). Furthermore, the CRISPR-Cas systems of bacterial and archaeal adaptive immunity have been adapted for precise targeting of genomic DNA in eukaryotic cells. Compared to the earlier generations of genome editing tools, the CRISPR-Cas systems are easy to set up, scalable, and amenable to targeting multiple positions within the eukaryotic genome, thereby providing a major resource for new applications in genome engineering. [0004] Two distinct classes of CRISPR-Cas systems have been identified. Class 1 CRISPR- Cas systems utilize multi-protein effector complexes, whereas class 2 CRISPR-Cas systems utilize single-protein effectors (see, Makarova et al. (2017) CELL, 168: 328). Among the three types of class 2 CRISPR-Cas systems, type II and type V systems typically target DNA and type VI systems typically target RNA (id.). Naturally occurring type II effector complexes consist of Cas9, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA), but the crRNA and tracrRNA can be fused as a single guide RNA in an engineered system for simplicity (see, Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227). Certain naturally occurring type V systems, such as type V-A, type V-C, and type V-D systems, do not require tracrRNA and use crRNA alone as the guide for cleavage of target DNA (see, Zetsche et al. (2015) CELL, 163: 759; Makarova et al. (2017) CELL, 168: 328). [0005] The CRISPR-Cas systems have been engineered for various purposes, such as genomic DNA cleavage, base editing, epigenome editing, and genomic imaging (see, e.g., Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227 and Rees et al. (2018) NAT. REV. GENET., 19: 770). Although significant developments have been made, there still remains a need for new and useful CRISPR-Cas systems as powerful precise genome targeting tools.
INCORPORATION BY REFERENCE [0006] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. BRIEF DESCRIPTION OF THE DRAWINGS [0007] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which: [0008] Figure 1A shows a schematic representation showing the structure of an exemplary single guide Type V-A CRISPR system. Figure 1B is a schematic representation showing the structure of an exemplary dual guide Type V-A CRISPR system. [0009] Figures 2A-C show a series of schematic representation showing incorporation of a protecting group (e.g., a protective nucleotide sequence or a chemical modification) (Figure 2A), a donor template-recruiting sequence (Figure 2B), and an editing enhancer (Figure 2C) into a Type V-A CRISPR-Cas system. These additional elements are shown in the context of a dual guide Type V-A CRISPR system, but it is understood that they can also be present in other CRISPR systems, including a single guide Type V-A CRISPR system, a single guide Type II CRISPR system, or a dual guide Type II CRISPR system. [0010] Figure 3 shows a schematic of a Type V-A nucleic acid guide nuclease comprising a dual guide nucleic acid. DETAILED DESCRIPTION Outline I. Engineered non-naturally-occurring dual guide CRISPR-cas systems A. Cas proteins B. Guide nucleic acids C. gNA modifications II. Composition and methods for targeting, editing, and/or modifying genomic DNA A. Ribonucleoprotein (RNP) delivery and “cas RNA” delivery B. CRISPR expression systems C. Donor templates D. Efficiency and specificity E. Multiplex F. Genomic safe harbors G. Guide nucleic acids III. Pharmaceutical compositions IV. Therapeutic uses
A. Gene therapies V. Kits VI. Embodiments VI. Equivalents I. Engineered non-naturally-occurring dual guide CRISPR-cas systems [0011] A CRISPR-Cas system generally comprises a Cas protein and one or more guide nucleic acids (gNAs). The Cas protein can be directed to a specific location in a double-stranded DNA target by recognizing a protospacer adjacent motif (PAM) in the non-target strand of the DNA, and the one or more guide nucleic acids can be directed to a specific location by hybridizing with a target nucleotide sequence, also referred to herein as a target sequence, in the target strand of the target polynucleotide. Typically, both PAM recognition and target nucleotide sequence hybridization are required for stable binding of a CRISPR-Cas complex to the DNA target and, if the Cas protein has an effector function (e.g., nuclease activity), activation of the effector function. As a result, when creating a CRISPR-Cas system, a guide nucleic acid can be designed to comprise a nucleotide sequence called a spacer sequence that is at least partially complementary to and can hybridize with a target nucleotide sequence, where target nucleotide sequence is located adjacent to a PAM in an orientation operable with the Cas protein. It has been observed that not all CRISPR-Cas systems designed by these criteria are equally effective. The larger polynucleotide in which a target nucleotide sequence is located may be referred to as a target polynucleotide; e.g., a chromosome or other genomic DNA, or portion thereof, or any other suitable polynucleotide within which a target nucleotide sequence is located. The target polynucleotide in double stranded DNA comprises two strands. The strand of the DNA duplex to which the spacer sequence is complementary herein is called the “target strand,” while the strand to which the spacer sequence shares sequence identity herein is called the “non-target strand.” [0012] Two distinct classes of CRISPR-Cas systems have been identified. Class 1 CRISPR- Cas systems utilize multi-protein effector complexes, whereas class 2 CRISPR-Cas systems utilize single-protein effectors (see, Makarova et al. (2017) CELL, 168: 328). Among the types of class 2 CRISPR-Cas systems, type II and type V systems typically target DNA and type VI systems typically target RNA (id.). Naturally occurring type II effector complexes include Cas9, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA), but the crRNA and tracrRNA can be fused as a single guide RNA in an engineered system for simplicity (see, Wang et al. (2016) ANNU. REV. BIOCHEM., 85: 227). Certain naturally occurring type V systems, such as type V-A, type V-C, and type V-D systems, do not require tracrRNA and use crRNA alone as the guide for cleavage of target DNA (see, Zetsche et al. (2015) CELL, 163: 759; Makarova et al. (2017) CELL, 168: 328.
[0013] Naturally occurring type II CRISPR-Cas systems (e.g., CRISPR-Cas9 systems) generally comprise two guide nucleic acids, called crRNA and tracrRNA, which form a complex by nucleotide hybridization. Single guide nucleic acids capable of activating type II Cas nucleases have been developed, for example, by linking the crRNA and the tracrRNA (see, e.g., U.S. Patent Nos.10,266,850 and 8,906,616). Naturally occurring type II Cas proteins comprise a RuvC-like nuclease domain and an HNH endonuclease domain, and recognize a 3’ G-rich PAM located immediately downstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. The CRISPR-Cas systems cleave a double-stranded DNA to generate a blunt end. The cleavage site is generally 3-4 nucleotides upstream from the PAM on the non-target strand. [0014] Naturally occurring Type V-A, Type V-C, and Type V-D CRISPR-Cas systems lack a tracrRNA and rely on a single crRNA to guide the CRISPR-Cas complex to the target polynucleotide. Dual guide nucleic acids capable of activating type V-A, type V-C, or type V-D Cas nucleases have been developed, for example, by splitting the single crRNA into a targeter nucleic acid and a modulator nucleic acid (see, e.g., International (PCT) Application Publication No. WO 2021/067788). Naturally occurring type V-A Cas proteins comprise a RuvC-like nuclease domain but lack an HNH endonuclease domain, and recognize a 5’ T-rich PAM located immediately upstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. These CRISPR-Cas systems cleave a double-stranded DNA to generate a staggered double- stranded break rather than a blunt end. The cleavage site is distant from the PAM site (e.g., separated by at least 10, 11, 12, 13, 14, or 15 nucleotides downstream from the PAM on the non- target strand and/or separated by at least 15, 16, 17, 18, or 19 nucleotides upstream from the sequence complementary to PAM on the target strand). [0015] Elements in an exemplary single guide CRISPR Cas system, e.g., a type V-A CRISPR-Cas system, are shown in Figure 1A. The single gNA can also be called a “crRNA” or “single gRNA” where it is present in the form of an RNA. It can comprise, from 5’ to 3’, an optional 5’ sequence, e.g., a tail, a modulator stem sequence, a loop, a targeter stem sequence complementary to the modulator stem sequence, and a spacer sequence that is at least partially complementary to and can hybridize with a target sequence in the target strand of the target polynucleotide. Where a 5’ tail is present, the sequence including the 5’ tail and the modulator stem sequence can also be called a “modulator sequence” herein. A fragment of the single guide nucleic acid from the optional 5’ tail to the targeter stem sequence, also called a “scaffold sequence” herein, bind the Cas protein. In addition, the PAM in the non-target strand of the target DNA binds the Cas protein.
[0016] Elements in an exemplary dual guide type CRISPR Cas system, e.g., a dual guide type V-A CRISPR-Cas system are shown in Figure 1B. The first guide nucleic acid, which can be called a “modulator nucleic acid” herein, comprises, from 5’ to 3’, an optional 5’ tail and a modulator stem sequence. Where a 5’ tail is present, the sequence including the 5’ tail and the modulator stem sequence can also called a “modulator sequence” herein. The second guide nucleic acid, which can be called “targeter nucleic acid” herein, comprises, from 5’ to 3’, a targeter stem sequence complementary to the modulator stem sequence and a spacer sequence that is at least partially complementary to and can hybridize with the target sequence in the target strand of the target polynucleotide. The duplex between the modulator stem sequence and the targeter stem sequence, plus the optional 5’ tail, constitute a structure that binds the Cas protein. In addition, the PAM in the non-target strand of the target DNA binds the Cas protein. It is understood that, in a dual gNA, e.g., dual gRNA, the targeter nucleic acid and the modulator nucleic acid, while not in the same nucleic acids, i.e., not linked end-to-end through a traditional internucleotide bond, can be covalently conjugated to each other through one or more chemical modifications introduced into these nucleic acids, thereby increasing the stability of the double- stranded complex and/or improving other characteristics of the system. [0017] The terms “targeter stem sequence” and “modulator stem sequence,” as used herein, can refer to a pair of nucleotide sequences in one or more guide nucleic acids that hybridize with each other. When a targeter stem sequence and a modulator stem sequence are contained in a single guide nucleic acid, the targeter stem sequence is proximal to a spacer sequence designed to hybridize with a target nucleotide sequence, and the modulator stem sequence is proximal to the targeter stem sequence. When a targeter stem sequence and a modulator stem sequence are in separate nucleic acids, the targeter stem sequence is in the same nucleic acid as a spacer sequence designed to hybridize with a target nucleotide sequence. In a CRISPR-Cas system that naturally includes separate crRNA and tracrRNA (e.g., a type II system), the duplex formed between the targeter stem sequence and the modulator stem sequence corresponds to the duplex formed between the crRNA and the tracrRNA. In a CRISPR-Cas system that naturally includes a single crRNA but no tracrRNA (e.g., a type V-A system), the duplex formed between the targeter stem sequence and the modulator stem sequence corresponds to the stem portion of a stem-loop structure in the scaffold sequence of the crRNA. It is understood that 100% complementarity is not required between the targeter stem sequence and the modulator stem sequence. In a type V-A CRISPR-Cas system, however, the targeter stem sequence is typically 100% complementary to the modulator stem sequence. [0018] An illustrative example of a nucleic acid-guided nuclease complex is shown in Figure 3. Specifically, Figure 3 shows a Type V-A nucleic acid guided nuclease (301) complexed with
a gual gNA comprising a modulator nucleic acid (306) and a targeter nucleic acid (307), wherein the modulator nucleic acid and targeter nucleic acid are hybridized through a stem. The targeter nucleic acid further comprises a spacer sequence (305) at least partially complementary to a target nucleotide sequence (304), i.e., a protospacer, in a target polynucleotide (302) adjacent to a suitable PAM (303). Upon binding to the target nucleotide sequence, the nucleic acid-guided nuclease complex can generate one or more strand breaks (308) in the target polynucleotide at or near the target nucleotide sequence. A. Cas proteins [0019] A guide nucleic acid, either as a single guide nucleic acid alone (targeter and modulator nucleic acids are part of a single polynucleotide) or as a dual gNA comprising separate targeter nucleic acid used in combination with a cognate modulator nucleic acid, is capable of binding a CRISPR Associated (Cas) protein, e.g., a Cas nuclease. In certain embodiments, the guide nucleic acid, either as a single guide nucleic acid alone (targeter and modulator nucleic acids are part of a single polynucleotide) or as a dual gNA comprising separate targeter nucleic acid used in combination with a cognate modulator nucleic acid, is capable of activating a Cas nuclease. A gNA capable of activating a particular Cas nuclease is said to be “compatible” with the Cas nuclease; a Cas nuclease capable of being activated by a particular gNA is said to be “compatible” with the gNA. [0020] The terms “CRISPR-Associated protein,” “Cas protein,” and “Cas,” as used interchangeably herein, can refer to a naturally occurring Cas protein or an engineered Cas protein. Non-limiting examples of Cas protein engineering include but are not limited to mutations and modifications of the Cas protein that alter the activity of the Cas, alter the PAM specificity, broaden the range of recognized PAMs, and/or reduce the ability to modify one or more off-target loci as compared to a corresponding unmodified Cas. In certain embodiments, the altered activity of engineered Cas comprises altered ability (e.g., specificity or kinetics) to bind a naturally occurring gNA, e.g., gRNA or engineered gNA, e.g., gRNA, altered ability (e.g., specificity or kinetics) to bind a target nucleotide sequence, altered processivity of nucleic acid scanning, and/or altered effector (e.g., nuclease) activity. A Cas protein having nuclease activity can be referred to as a “CRISPR-Associated nuclease” or “Cas nuclease,” or simply “nuclease,” as used interchangeably herein. [0021] In certain embodiments, the Cas protein is a type V-A, type V-C, or type V-D Cas protein. In certain embodiments, the Cas protein is a type V-A Cas protein. In other embodiments, the Cas protein is a type II Cas protein, e.g., a Cas9 protein.
[0022] In certain embodiments, a type V-A Cas nucleases comprises Cpf1. Cpf1 proteins are known in the art and are described, e.g., in U.S. Patent Nos.9,790,490 and 10,113,179. Cpf1 orthologs can be found in various bacterial and archaeal genomes. For example, in certain embodiments, the Cpf1 protein is derived from Francisella novicida U112 (Fn), Acidaminococcus sp. BV3L6 (As), Lachnospiraceae bacterium ND2006 (Lb), Lachnospiraceae bacterium MA2020 (Lb2), Candidatus Methanoplasma termitum (CMt), Moraxella bovoculi 237 (Mb), Porphyromonas crevioricanis (Pc), Prevotella disiens (Pd), Francisella tularensis 1, Francisella tularensis subsp. novicida, Prevotella albensis, Lachnospiraceae bacterium MC2017 1, Butyrivibrio proteoclasticus, Peregrinibacteria bacterium GW2011_GWA2_33_10, Parcubacteria bacterium GW2011_GWC2_44_17, Smithella sp. SCADC, Eubacterium eligens, Leptospira inadai, Porphyromonas macacae, Prevotella bryantii, Proteocatella sphenisci, Anaerovibrio sp. RM50, Moraxella caprae, Lachnospiraceae bacterium COE1, or Eubacterium coprostanoligenes. [0023] In certain embodiments, a type V-A Cas nuclease comprises AsCpf1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918. [0024] In certain embodiments, a type V-A Cas nuclease comprises LbCpf1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021/158918. [0025] In certain embodiments, a type V-A Cas nuclease comprises FnCpf1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021/158918.
[0026] In certain embodiments, a type V-A Cas nuclease comprises Prevotella bryantii Cpf1 (PbCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918. [0027] In certain embodiments, a type V-A Cas nuclease comprises Proteocatella sphenisci Cpf1 (PsCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021/158918. [0028] In certain embodiments, a type V-A Cas nuclease comprises Anaerovibrio sp. RM50 Cpf1 (As2Cpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021/158918. [0029] In certain embodiments, a type V-A Cas nuclease comprises Moraxella caprae Cpf1 (McCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918. [0030] In certain embodiments, a type V-A Cas nuclease comprises Lachnospiraceae bacterium COE1 Cpf1 (Lb3Cpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid
sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021/158918. [0031] In certain embodiments, a type V-A Cas nuclease comprises Eubacterium coprostanoligenes Cpf1 (EcCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021/158918. [0032] In certain embodiments, a type V-A Cas nuclease is not Cpf1. In certain embodiments, a type V-A Cas nuclease is not AsCpf1. [0033] In certain embodiments, a type V-A Cas nuclease comprises MAD1, MAD2, MAD3, MAD4, MAD5, MAD6, MAD7, MAD8, MAD9, MAD10, MAD11, MAD12, MAD13, MAD14, MAD15, MAD16, MAD17, MAD18, MAD19, or MAD20, or variants thereof. MAD1-MAD20 are known in the art and are described in U.S. Patent No.9,982,279. [0034] In certain embodiments, a type V-A Cas nuclease comprises MAD7 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 37. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 37. [0035] MAD7 (SEQ ID NO: 37) MNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGF ISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSAKLISD ILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEI FFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVN SFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVER LRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVK NDLQKSITEINELVSNYKLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELK NVLDVIMNAFHWCSVFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKL NFGIPTLADGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLL PGPNKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEW KNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTG NDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGN
IQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPI TINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQ IKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIAMEDLSYGFKKGRFKV ERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQLTYIPDKLKNVGHQCGCIFYVPAAYTSK IDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYT YGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTV QMRNSLSELEDRDYDRLISPVLNENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENW KEDGKFSRDKLKISNKDWFDFIQNKRYL [0036] In certain embodiments, a type V-A Cas nuclease comprises MAD2 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 38. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 38. [0037] MAD2 (SEQ ID NO: 38) MSSLTKFTNKYSKQLTIKNELIPVGKTLENIKENGLIDGDEQLNENYQKAKIIVDDFLRDFINKA LNNTQIGNWRELADALNKEDEDNIEKLQDKIRGIIVSKFETFDLFSSYSIKKDEKIIDDDNDVEE EELDLGKKTSSFKYIFKKNLFKLVLPSYLKTTNQDKLKIISSFDNFSTYFRGFFENRKNIFTKKP ISTSIAYRIVHDNFPKFLDNIRCFNVWQTECPQLIVKADNYLKSKNVIAKDKSLANYFTVGAYDY FLSQNGIDFYNNIIGGLPAFAGHEKIQGLNEFINQECQKDSELKSKLKNRHAFKMAVLFKQILSD REKSFVIDEFESDAQVIDAVKNFYAEQCKDNNVIFNLLNLIKNIAFLSDDELDGIFIEGKYLSSV SQKLYSDWSKLRNDIEDSANSKQGNKELAKKIKTNKGDVEKAISKYEFSLSELNSIVHDNTKFSD LLSCTLHKVASEKLVKVNEGDWPKHLKNNEEKQKIKEPLDALLEIYNTLLIFNCKSFNKNGNFYV DYDRCINELSSVVYLYNKTRNYCTKKPYNTDKFKLNFNSPQLGEGFSKSKENDCLTLLFKKDDNY YVGIIRKGAKINFDDTQAIADNTDNCIFKMNYFLLKDAKKFIPKCSIQLKEVKAHFKKSEDDYIL SDKEKFASPLVIKKSTFLLATAHVKGKKGNIKKFQKEYSKENPTEYRNSLNEWIAFCKEFLKTYK AATIFDITTLKKAEEYADIVEFYKDVDNLCYKLEFCPIKTSFIENLIDNGDLYLFRINNKDFSSK STGTKNLHTLYLQAIFDERNLNNPTIMLNGGAELFYRKESIEQKNRITHKAGSILVNKVCKDGTS LDDKIRNEIYQYENKFIDTLSDEAKKVLPNVIKKEATHDITKDKRFTSDKFFFHCPLTINYKEGD TKQFNNEVLSFLRGNPDINIIGIDRGERNLIYVTVINQKGEILDSVSFNTVTNKSSKIEQTVDYE EKLAVREKERIEAKRSWDSISKIATLKEGYLSAIVHEICLLMIKHNAIVVLENLNAGFKRIRGGL SEKSVYQKFEKMLINKLNYFVSKKESDWNKPSGLLNGLQLSDQFESFEKLGIQSGFIFYVPAAYT SKIDPTTGFANVLNLSKVRNVDAIKSFFSNFNEISYSKKEALFKFSFDLDSLSKKGFSSFVKFSK SKWNVYTFGERIIKPKNKQGYREDKRINLTFEMKKLLNEYKVSFDLENNLIPNLTSANLKDTFWK
ELFFIFKTTLQLRNSVTNGKEDVLISPVKNAKGEFFVSGTHNKTLPQDCDANGAYHIALKGLMIL ERNNLVREEKDTKKIMAISNVDWFEYVQKRRGVL [0038] In certain embodiments, a type V-A Cas nucleases comprises Csm1. Csm1 proteins are known in the art and are described in U.S. Patent No.9,896,696. Csm1 orthologs can be found in various bacterial and archaeal genomes. For example, in certain embodiments, a Csm1 protein is derived from Smithella sp. SCADC (Sm), Sulfuricurvum sp. (Ss), or Microgenomates (Roizmanbacteria) bacterium (Mb). [0039] In certain embodiments, a type V-A Cas nuclease comprises SmCsm1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 12 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 12 of International (PCT) Application Publication No. WO 2021/158918. [0040] In certain embodiments, a type V-A Cas nuclease comprises SsCsm1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 13 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 13 of International (PCT) Application Publication No. WO 2021/158918. [0041] In certain embodiments, a type V-A Cas nuclease comprises MbCsm1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 14 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 14 of International (PCT) Application Publication No. WO 2021/158918. [0042] In certain embodiments, the type V-A Cas nuclease comprises an ART nuclease or a variant thereof. In general, such nucleases sequences have < 60% AA sequence similarity to Cas12a, < 60% AA sequence similarity to a positive control nuclease, and > 80% query cover. In certain embodiments, the Type V-A nuclease comprises an ART1, ART2, ART3, ART4, ART5, ART6, ART7, ART8, ART9, ART10, ART11, ART12, ART13, ART14, ART15, ART16, ART17, ART18, ART19, ART20, ART21, ART22, ART23, ART24, ART25, ART26, ART27, ART28, ART28, ART30, ART31, ART32, ART33, ART34, ART35, or ART11* (i.e.,
ART11_L679F, i.e., ART11 wherein leucine (L) at amino acid position 679 is replaced with phenylalanine (F)) nuclease, as shown in Table 1. In certain embodiments, the type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence designated for the individual ART nuclease as shown in Table 1. In certain embodiments, provided is a nucleic acid-guided nuclease comprising a nucleic acid-guided nuclease polypeptide having at least 85% identity to an amino acid sequence represented by SEQ ID NOs: 1-36 or a nucleic acid encoding a nucleic acid-guided nuclease polypeptide comprising at least 85% identity with the polynucleotide represented by SEQ ID NOs: 1-36. In certain embodiments, provided is a nucleic acid-guided nuclease comprising a polypeptide having at least 90% identity to the amino acid sequence represented by SEQ ID NOs: 1-36, wherein the polypeptide does not contain a peptide motif of YLFQIYNKDF (SEQ ID NO: 39). In certain embodiments, provided is a nucleic acid-guided nuclease comprising a nucleic acid encoding a polypeptide having at least 90% identity to nucleic acids represented by SEQ ID NOs: 808-845 wherein an encoded polypeptide does not contain a peptide motif of YLFQIYNKDF (SEQ ID NO: 39). In certain embodiments, provided is a nucleic acid-guided nuclease wherein the polypeptide comprises at least 90% identity with the amino acid sequence represented by SEQ ID NOs: 1-9. In certain embodiments, provided is a nucleic acid-guided nuclease, wherein the polypeptide comprises a polypeptide comprising at least 90% identity with the amino acid sequence represented by SEQ ID NO: 2, 11, or 36. TABLE 1: ART nucleases
[0043] In certain embodiments, a Cas nuclease comprises ABW1 (SEQ ID NO: 3), ABW2 (SEQ ID NO: 16), ABW3 (SEQ ID NO: 29), ABW4 (SEQ ID NO: 42), ABW5 (SEQ ID NO: 55), ABW6 (SEQ ID NO: 68), ABW7 (SEQ ID NO: 81), ABW8 (SEQ ID NO: 94), or ABW9 (SEQ ID NO: 107) (all SEQ ID NOs for ABW1-9 and variants thereof from International (PCT) Application Publication No. WO 2021/108324), or variants thereof, such as any one of variants 1-10 of ABW1 (SEQ ID NOs: 4-13, respectively), any one of variants 1-10 of ABW2 (SEQ ID NOs: 17-26, respectively), any one of variants 1-10 of ABW3 (SEQ ID NOs: 30-39, respectively), any one of variants 1-10 of ABW4 (SEQ ID NOs: 43-52, respectively), any one of variants 1-10 of ABW5 (SEQ ID NOs: 56-65, respectively), any one of variants 1-10 of ABW6 (SEQ ID NOs: 69-78, respectively), any one of variants 1-10 of ABW7 (SEQ ID NOs: 82-91, respectively), any one of variants 1-10 of ABW8 (SEQ ID NOs: 95-104, respectively), any one of variants 1-10 of ABW9 (SEQ ID NOs: 108-117, respectively). ABW1-ABW9, and variants thereof are known in the art and are described in International (PCT) Application Publication No. WO 2021/108324. [0044] More type V-A Cas nucleases and their corresponding naturally occurring CRISPR- Cas systems can be identified by computational and experimental methods known in the art, e.g., as described in U.S. Patent No.9,790,490 and Shmakov et al. (2015) MOL. CELL, 60: 385. Exemplary computational methods include analysis of putative Cas proteins by homology modeling, structural BLAST, PSI-BLAST, or HHPred, and analysis of putative CRISPR loci by identification of CRISPR arrays. Exemplary experimental methods include in vitro cleavage assays and in-cell nuclease assays (e.g., the Surveyor assay) as described in Zetsche et al. (2015) CELL, 163: 759. [0045] In certain embodiments, the Cas protein is a Cas nuclease that directs cleavage of one or both strands at the target locus, such as the target strand (i.e., the strand having the target nucleotide sequence that is at least partially complementary to and can hybridize with a single guide nucleic acid or dual guide nucleic acids) and/or the non-target strand. In certain embodiments, the Cas nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more nucleotides from the first or last nucleotide of the target nucleotide sequence or its complementary sequence. In certain embodiments, the
cleavage is staggered, i.e. generating sticky ends. In certain embodiments, the cleavage generates a staggered cut with a 5' overhang. In certain embodiments, the cleavage generates a staggered cut with a 5' overhang of 1 to 5 nucleotides, e.g., of 4 or 5 nucleotides. In certain embodiments, the cleavage site is distant from the PAM, e.g., the cleavage occurs after the 18th nucleotide on the non-target strand and after the 23rd nucleotide on the target strand. [0046] In certain embodiments, a composition provided herein comprises a Cas nuclease that a compatible guide nucleic acid (gNA), e.g., a gRNA, is capable of activating. In certain embodiments, a composition provided herein further comprises a Cas protein that is related to the Cas nuclease that a compatible guide nucleic acid (gNA), e.g., a gRNA, is capable of activating. For example, in certain embodiments, a Cas protein comprises an amino acid sequence at least 80% (e.g., at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) identical to the Cas nuclease amino acid sequence. In certain embodiments, a Cas protein comprises a nuclease-inactive mutant of the Cas nuclease. In certain embodiments, a Cas protein further comprises an effector domain. [0047] In certain embodiments, a Cas protein lacks substantially all DNA cleavage activity. Such a Cas protein can be generated, e.g., by introducing one or more mutations to an active Cas nuclease (e.g., a naturally occurring Cas nuclease). A mutated Cas protein is considered to lack substantially all DNA cleavage activity when the DNA cleavage activity of the protein has no more than about 25%, 10%, 5%, 1%, 0.1%, 0.01%, or less of the DNA cleavage activity of the corresponding non-mutated form, for example, nil or negligible as compared with the non- mutated form. Thus, a Cas protein may comprise one or more mutations (e.g., a mutation in the RuvC domain of a type V-A Cas protein) and be used as a generic DNA binding protein with or without fusion to an effector domain. Exemplary mutations include D908A, E993A, and D1263A with reference to the amino acid positions in AsCpf1; D832A, E925A, and D1180A with reference to the amino acid positions in LbCpf1; and D917A, E1006A, and D1255A with reference to the amino acid position numbering of the FnCpf1. More mutations can be designed and generated according to the crystal structure described in Yamano et al. (2016) CELL, 165: 949. [0048] It is understood that a Cas protein, rather than losing nuclease activity to cleave all DNA, may lose the ability to cleave only the target strand or only the non-target strand of a double-stranded DNA, thereby being functional as a nickase (see, Gao et al. (2016) CELL RES., 26: 901). Accordingly, in certain embodiments, a Cas nuclease is a Cas nickase. In certain embodiments, a Cas nuclease has the activity to cleave the non-target strand but lacks substantially the activity to cleave the target strand, e.g., by a mutation in the Nuc domain. In
certain embodiments, a Cas nuclease has the cleavage activity to cleave the target strand but lacks substantially the activity to cleave the non-target strand. [0049] In certain embodiments, a Cas nuclease has the activity to cleave a double-stranded DNA and result in a double-strand break. [0050] Cas proteins that lack substantially all DNA cleavage activity or have the ability to cleave only one strand may also be identified from naturally occurring systems. For example, certain naturally occurring CRISPR-Cas systems may retain the ability to bind the target nucleotide sequence but lose entire or partial DNA cleavage activity in eukaryotic (e.g., mammalian or human) cells. Such type V-A proteins are disclosed, for example, in Kim et al. (2017) ACS SYNTH. BIOL.6(7): 1273-82 and Zhang et al. (2017) CELL DISCOV.3:17018. [0051] The activity of a Cas protein (e.g., Cas nuclease) can be altered, e.g., by creating an engineered Cas protein. In certain embodiments, altered activity of an engineered Cas protein comprises increased targeting efficiency and/or decreased off-target binding. While not wishing to be bound by theory, it is hypothesized that off-target binding can be recognized by the Cas protein, for example, by the presence of one or more mismatches between the spacer sequence and the target nucleotide sequence, which may affect the stability and/or conformation of the CRISPR-Cas complex. In certain embodiments, altered activity comprises modified binding, e.g., increased binding to the target locus (e.g., the target strand or the non-target strand) and/or decreased binding to off-target loci. In certain embodiments, altered activity comprises altered charge in a region of the protein that associates with a single guide nucleic acid or dual guide nucleic acids. In certain embodiments, altered activity of an engineered Cas protein comprises altered charge in a region of the protein that associates with the target strand and/or the non- target strand. In certain embodiments, altered activity of an engineered Cas protein comprises altered charge in a region of the protein that associates with an off-target locus. The altered charge can include decreased positive charge, decreased negative charge, increased positive charge, or increased negative charge. For example, decreased negative charge and increased positive charge may generally strengthen binding to the nucleic acid(s) whereas decreased positive charge and increased negative charge may weaken binding to the nucleic acid(s). In certain embodiments, altered activity comprises increased or decreased steric hindrance between the protein and a single guide nucleic acid or dual guide nucleic acids. In certain embodiments, altered activity comprises increased or decreased steric hindrance between the protein and the target strand and/or the non-target strand. In certain embodiments, altered activity comprises increased or decreased steric hindrance between the protein and an off-target locus. In certain embodiments, a modification or mutation comprises one or more substitutions of Lys, His, Arg, Glu, Asp, Ser, Gly, and/or Thr. In certain embodiments, a modification or mutation comprises
one or more substitutions with Gly, Ala, Ile, Glu, and/or Asp. In certain embodiments, modification or mutation comprises one or more amino acid substitutions in the groove between the WED and RuvC domain of the Cas protein (e.g., a type V-A Cas protein). [0052] In certain embodiments, altered activity of an engineered Cas protein comprises increased nuclease activity to cleave the target locus. In certain embodiments, altered activity of an engineered Cas protein comprises decreased nuclease activity to cleave an off-target locus. In certain embodiments, altered activity of an engineered Cas protein comprises altered helicase kinetics. In certain embodiments, an engineered Cas protein comprises a modification that alters formation of the CRISPR complex. [0053] In certain embodiments, a protospacer adjacent motif (PAM) or PAM-like motif directs binding of a Cas protein complex to a target locus. Many Cas proteins have PAM specificity. The precise sequence and length requirements for the PAM differ depending on the Cas protein used. PAM sequences are typically 2-5 base pairs in length and are adjacent to (but located on a different strand of target DNA from) the target nucleotide sequence. PAM sequences can be identified using any suitable method, such as testing cleavage, targeting, or modification of oligonucleotides having the target nucleotide sequence and different PAM sequences. [0054] Exemplary PAM sequences are provided in Tables 2 and 3. In certain embodiments, a Cas protein comprises MAD7 and the PAM is TTTN, wherein N is A, C, G, or T. In certain embodiments, a Cas protein comprises MAD7 and the PAM is CTTN, wherein N is A, C, G, or T. In certain embodiments, a Cas protein comprises AsCpf1 and the PAM is TTTN, wherein N is A, C, G, or T. In certain embodiments, a Cas protein comprises FnCpf1 and the PAM is 5' TTN, wherein N is A, C, G, or T. PAM sequences for certain other type V-A Cas proteins are disclosed in Zetsche et al. (2015) CELL, 163: 759 and U.S. Patent No.9,982,279. Further, engineering of the PAM Interacting (PI) domain of a Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and/or increase the versatility of an engineered, non- naturally occurring system. Exemplary approaches to alter the PAM specificity of Cpf1 are described in Gao et al. (2017) NAT. BIOTECHNOL., 35: 789. [0055] In certain embodiments, an engineered Cas protein comprises a modification that alters the Cas protein specificity in concert with modification to targeting range. Cas mutants can be designed to have increased target specificity as well as accommodating modifications in PAM recognition, for example by choosing mutations that alter PAM specificity (e.g., in the PI domain) and combining those mutations with groove mutations that increase (or if desired, decrease) specificity for the on-target locus versus off-target loci. The Cas modifications described herein can be used to counter loss of specificity resulting from alteration of PAM
recognition, enhance gain of specificity resulting from alteration of PAM recognition, counter gain of specificity resulting from alteration of PAM recognition, or enhance loss of specificity resulting from alteration of PAM recognition. [0056] In certain embodiments, an engineered Cas protein comprises one or more nuclear localization signal (NLS) motifs. In certain embodiments, an engineered Cas protein comprises at least 2 (e.g., at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motifs. Non-limiting examples of NLS motifs include: the NLS of SV40 large T-antigen, having the amino acid sequence of PKKKRKV (SEQ ID NO: 40); the NLS from nucleoplasmin, e.g., the nucleoplasmin bipartite NLS having the amino acid sequence of KRPAATKKAGQAKKKK (SEQ ID NO: 41); the c-myc NLS, having the amino acid sequence of PAAKRVKLD (SEQ ID NO: 42) or RQRRNELKRSP (SEQ ID NO: 43); the hRNPA1 M9 NLS, having the amino acid sequence of NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 44); the importin- IBB domain NLS, having the amino acid sequence of RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 45); the myoma T protein NLS, having the amino acid sequence of VSRKRPRP (SEQ ID NO: 46) or PPKKARED (SEQ ID NO: 47); the human p53 NLS, having the amino acid sequence of PQPKKKPL (SEQ ID NO: 48); the mouse c-abl IV NLS, having the amino acid sequence of SALIKKKKKMAP (SEQ ID NO: 49); the influenza virus NS1 NLS, having the amino acid sequence of DRLRR (SEQ ID NO: 50) or PKQKKRK (SEQ ID NO: 51); the hepatitis virus antigen NLS, having the amino acid sequence of RKLKKKIKKL (SEQ ID NO: 52); the mouse Mx1 protein NLS, having the amino acid sequence of REKKKFLKRR (SEQ ID NO: 53); the human poly(ADP-ribose) polymerase NLS, having the amino acid sequence of KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 54); the human glucocorticoid receptor NLS, having the amino acid sequence of RKCLQAGMNLEARKTKK (SEQ ID NO: 55), and synthetic NLS motifs such as PAAKKKKLD (SEQ ID NO: 56). [0057] In general, the one or more NLS motifs are of sufficient strength to drive accumulation of the Cas protein in a detectable amount in the nucleus of a eukaryotic cell. The strength of nuclear localization activity may derive from the number of NLS motif(s) in the Cas protein, the particular NLS motif(s) used, the position(s) of the NLS motif(s), or a combination of these and/or other factors. In certain embodiments, an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the N-terminus (e.g., within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N-terminus). In certain embodiments, an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4,
at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the C- terminus (e.g., within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the C-terminus). In certain embodiments, an engineered Cas protein comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the C-terminus and at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10) NLS motif(s) at or near the N-terminus. In certain embodiments, the engineered Cas protein comprises one, two, or three NLS motifs at or near the C-terminus. In certain embodiments, the engineered Cas protein comprises one NLS motif at or near the N-terminus and one, two, or three NLS motifs at or near the C-terminus. In certain embodiments, the engineered Cas protein comprises a nucleoplasmin NLS at or near the C-terminus. [0058] Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to a nucleic acid-targeting protein, such that location within a cell may be visualized. Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting the protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay that detects the effect of the nuclear import of a Cas protein complex (e.g., assay for DNA cleavage or mutation at the target locus, or assay for altered gene expression activity) as compared to a control not exposed to the Cas protein or exposed to a Cas protein lacking one or more of the NLS motifs. [0059] A Cas protein may comprise a chimeric Cas protein, e.g., a Cas protein having enhanced function by being a chimera. Chimeric Cas proteins may be new Cas proteins containing fragments from more than one naturally occurring Cas protein or variants thereof. For example, fragments of multiple type V-A Cas homologs (e.g., orthologs) may be fused to form a chimeric Cas protein. In certain embodiments, a chimeric Cas protein comprises fragments of Cpf1 orthologs from multiple species and/or strains. [0060] In certain embodiments, a Cas protein comprises one or more effector domains. The one or more effector domains may be located at or near the N-terminus of the Cas protein and/or at or near the C-terminus of the Cas protein. In certain embodiments, an effector domain comprised in the Cas protein is a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or an SID domain), an exogenous nuclease domain (e.g., FokI), a deaminase domain (e.g., cytidine deaminase or adenine deaminase), or a reverse transcriptase domain (e.g., a high fidelity reverse transcriptase domain). Other activities of effector domains include but are not limited to methylase activity, demethylase activity, transcription release factor activity, translational initiation activity, translational activation
activity, translational repression activity, histone modification (e.g., acetylation or demethylation) activity, single-stranded RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, and nucleic acid binding activity. [0061] In certain embodiments, a Cas protein comprises one or more protein domains that enhance homology-directed repair (HDR) and/or inhibit non-homologous end joining (NHEJ). Exemplary protein domains having such functions are described in Jayavaradhan et al. (2019) NAT. COMMUN.10(1): 2866 and Janssen et al. (2019) MOL. THER. NUCLEIC ACIDS 16: 141-54. In certain embodiments, a Cas protein comprises a dominant negative version of p53-binding protein 1 (53BP1), for example, a fragment of 53BP1 comprising a minimum focus forming region (e.g., amino acids 1231-1644 of human 53BP1). In certain embodiments, a Cas protein comprises a motif that is targeted by APC-Cdh1, such as amino acids 1-110 of human Geminin, thereby resulting in degradation of the fusion protein during the HDR non-permissive G1 phase of the cell cycle. [0062] In certain embodiments, a Cas protein comprises an inducible or controllable domain. Non-limiting examples of inducers or controllers include light, hormones, and small molecule drugs. In certain embodiments, a Cas protein comprises a light inducible or controllable domain. In certain embodiments, a Cas protein comprises a chemically inducible or controllable domain. [0063] In certain embodiments, a Cas protein comprises a tag protein or peptide for ease of tracking and/or purification. Non-limiting examples of tag proteins and peptides include fluorescent proteins (e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato), HIS tags (e.g., 6×His tag, or gly-6xHis; 8xHis, or gly-8xHis), hemagglutinin (HA) tag, FLAG tag, 3xFLAG tag, and Myc tag. [0064] In certain embodiments, a Cas protein is conjugated to a non-protein moiety, such as a fluorophore useful for genomic imaging. In certain embodiments, a Cas protein is covalently conjugated to the non-protein moiety. The terms “CRISPR-Associated protein,” “Cas protein,” “Cas,” “CRISPR-Associated nuclease,” and “Cas nuclease” are used herein to include such conjugates despite the presence of one or more non-protein moieties. B. Guide nucleic acids [0065] A guide nucleic acid can be a single gNA (sgNA, e.g., sgRNA), in which the gNA is a single polynucleotide, or a dual gNA (e.g., dual gRNA), in which the gNA comprises two separate polynucleotides (these can in some cases be covalently linked, but not via a
conventional internucleotide linkage). In certain embodiments, a single guide nucleic acid is capable of activating a Cas nuclease alone (e.g., in the absence of a tracrRNA). [0066] In general, a gNA comprises a modulator nucleic acid and a targeter nucleic acid. In a sgNA the modulator and targeter nucleic acids are part of a single polynucleotide. In a dual gNA the modulator and targeter nucleic acids are separate, e.g., not joined by a conventional nucleotide linkage, such as not joined at all. The targeter nucleic acid comprises a spacer sequence and a targeter stem sequence. The modulator nucleic acid comprises a modulator stem sequence and, generally, further nucleotides, such as nucleotides comprising a 5’ tail. The modulator stem sequence and targeter stem sequence can each comprise any suitable number of nucleotides and are of sufficient complementarity that they can hybridize. In a single gNA there may be additional NTs between the targeter stem sequence and the modulator stem sequence; these can, in certain cases, form secondary structure, such as a loop. [0067] In certain embodiments, the guide nucleic acid comprises a targeter nucleic acid that, in combination with a modulator nucleic acid, is capable of binding a Cas protein. In certain embodiments, the guide nucleic acid comprises a targeter nucleic acid that, in combination with a modulator nucleic acid, is capable of activating a Cas nuclease. In certain embodiments, the system further comprises the Cas protein that the targeter nucleic acid and the modulator nucleic acid are capable of binding or the Cas nuclease that the targeter nucleic acid and the modulator nucleic acid are capable of activating. [0068] It is contemplated that the single or dual guide nucleic acids need to be the compatible with a Cas protein (e.g., Cas nuclease) to provide an operative CRISPR system. For example, the targeter stem sequence and the modulator stem sequence can be derived from a naturally occurring crRNA capable of activating a Cas nuclease in the absence of a tracrRNA. Alternatively, the targeter stem sequence and the modulator stem sequence can be derived from a naturally occurring set of crRNA and tracrRNA, respectively, that are capable of activating a Cas nuclease. In certain embodiments, the nucleotide sequences of the targeter stem sequence and the modulator stem sequence are identical to the corresponding stem sequences of a stem-loop structure in such naturally occurring crRNA. [0069] Guide nucleic acid sequences that are operative with a type II or type V Cas protein are known in the art and are disclosed, for example, in U.S. Patent Nos.9,790,490, 9,896,696, 10,113,179, and 10,266,850, and U.S. Patent Application Publication No.2014/0242664. It is understood that these sequences are merely illustrative, and other guide nucleic acid sequences may also be used with these Cas proteins.
TABLE 2: Type V-A Cas Protein and Corresponding Single Guide Nucleic Acid Sequences
TABLE 3: Type V-A Cas Protein and Corresponding Dual Guide Nucleic Acid Sequences
comprised in the modulator nucleic acid 5’ and/or 3’ to a “modulator sequence” listed herein. 2 In the consensus PAM sequences, N represents A, C, G, or T. Where the PAM sequence is preceded by “5’,” it means that the PAM is located immediately upstream of the target nucleotide sequence when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. [0070] In certain embodiments, a guide nucleic acid, in the context of a type V-A CRISPR- Cas system, comprises a targeter stem sequence listed in Table 3. The same targeter stem sequences, as a portion of scaffold sequences, are bold-underlined in Table 2. [0071] In certain embodiments, a guide nucleic acid is a single guide nucleic acid that comprises, from 5’ to 3’, a modulator stem sequence, a loop sequence, a targeter stem sequence, and a spacer sequence. In certain embodiments, the targeter stem sequence in the single guide nucleic acid is listed in Table 2 as a bold-underlined portion of scaffold sequence, and the modulator stem sequence is complementary (e.g., 100% complementary) to the targeter stem sequence. In certain embodiments, the single guide nucleic acid comprises, from 5’ to 3’, a modulator sequence listed in Table 2 as an underlined portion of a scaffold sequence, a loop sequence, a targeter stem sequence a bold-underlined portion of the same scaffold sequence, and a spacer sequence. In certain embodiments, an engineered, non-naturally occurring system comprises a single guide nucleic acid comprising a scaffold sequence listed in Table 2. In certain embodiments, the system further comprises a Cas protein (e.g., Cas nuclease) comprising an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 2. In certain embodiments, the system further comprises a Cas protein (e.g., Cas nuclease) comprising the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 2. In certain embodiments, the system is useful for targeting, editing, or modifying a nucleic acid comprising a target nucleotide sequence close or adjacent to (e.g., immediately downstream of) a PAM listed in the same line of Table 2 when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. [0072] In certain embodiments, a guide nucleic acid, e.g, dual gNA, comprises a targeter guide nucleic acid that comprises, from 5’ to 3’, a targeter stem sequence and a spacer sequence. In certain embodiments, the targeter stem sequence in the targeter nucleic acid is listed in Table 3. In certain embodiments, an engineered, non-naturally occurring system comprises the targeter nucleic acid and a modulator stem sequence complementary (e.g., 100% complementary) to the targeter stem sequence. In certain embodiments, the modulator nucleic acid comprises a
modulator sequence listed in the same line of Table 3. In certain embodiments, the system further comprises a Cas protein (e.g., Cas nuclease) comprising an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 3. In certain embodiments, the system further comprises a Cas protein (e.g., Cas nuclease) comprising the amino acid sequence set forth in the SEQ ID NO listed in the same line of Table 3. In certain embodiments, the system is useful for targeting, editing, or modifying a nucleic acid comprising a target nucleotide sequence close or adjacent to (e.g., immediately downstream of) a PAM listed in the same line of Table 3 when using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. [0073] A single guide nucleic acid, the targeter nucleic acid, and/or the modulator nucleic acid can be synthesized chemically or produced in a biological process (e.g., catalyzed by an RNA polymerase in an in vitro reaction). Such reaction or process may limit the lengths of the single guide nucleic acid, targeter nucleic acid, and/or modulator nucleic acid. In certain embodiments, a single guide nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 25 nucleotides in length. In certain embodiments, a single guide nucleic acid is at least 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length. In certain embodiments, the single guide nucleic acid is 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 20-25, 25-100, 25-90, 25-80, 25-70, 25-60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length. In certain embodiments, a targeter nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 25 nucleotides in length. In certain embodiments, a targeter nucleic acid is at least 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length. In certain embodiments, the targeter nucleic acid is 20- 100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 20-25, 25-100, 25-90, 25-80, 25-70, 25- 60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40- 80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70- 100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length. In certain embodiments, a modulator nucleic acid is no more than 100, 90, 80, 70, 60, 50, 40, 30, or 20 nucleotides in length. In certain embodiments, a modulator nucleic acid is at least 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 nucleotides in length. In certain embodiments, the modulator nucleic acid is 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 15-100, 15-90, 15-80, 15-70, 15-60, 15- 50, 15-40, 15-30, 15-20, 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 25-100, 25- 90, 25-80, 25-70, 25-60, 25-50, 25-40, 25-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40,
40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides in length. [0074] It is contemplated that the length of the duplex formed within the single guide nuclei acid or formed between the targeter nucleic acid and the modulator nucleic acid, e.g. in a dual gNA, may be a factor in providing an operative CRISPR system. In certain embodiments, the targeter stem sequence and the modulator stem sequence each consist of 4-10 nucleotides that base pair with each other. In certain embodiments, the targeter stem sequence and the modulator stem sequence each consist of 4-9, 4-8, 4-7, 4-6, 4-5, 5-10, 5-9, 5-8, 5-7, or 5-6 nucleotides that base pair with each other. In certain embodiments, the targeter stem sequence and the modulator stem sequence each consist of 4, 5, 6, 7, 8, 9, or 10 nucleotides. It is understood that the composition of the nucleotides in each sequence affects the stability of the duplex, and a C-G base pair confers greater stability than an A-U base pair. In certain embodiments, 20%-80%, 20%-70%, 20%-60%, 20%-50%, 20%-40%, 20%-30%, 30%-80%, 30%-70%, 30%-60%, 30%- 50%, 30%-40%, 40%-80%, 40%-70%, 40%-60%, 40%-50%, 50%-80%, 50%-70%, 50%-60%, 60%-80%, 60%-70%, or 70%-80% of the base pairs are C-G base pairs. [0075] In certain embodiments, the targeter stem sequence and the modulator stem sequence each consist of 5 nucleotides. As such, the targeter stem sequence and the modulator stem sequence form a duplex of 5 base pairs. In certain embodiments, 0-4, 0-3, 0-2, 0-1, 1-5, 1-4, 1-3, 1-2, 2-5, 2-4, 2-3, 3-5, 3-4, or 4-5 out of the 5 base pairs are C-G base pairs. In certain embodiments, 0, 1, 2, 3, 4, or 5 out of the 5 base pairs are C-G base pairs. In certain embodiments, the targeter stem sequence consists of 5’-GUAGA-3’ and the modulator stem sequence consists of 5’-UCUAC-3’. In certain embodiments, the targeter stem sequence consists of 5’-GUGGG-3’ and the modulator stem sequence consists of 5’-CCCAC-3’. [0076] In certain embodiments, in a type V-A system, the 3’ end of the targeter stem sequence is linked by no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides to the 5’ end of the spacer sequence. In certain embodiments, the targeter stem sequence and the spacer sequence are adjacent to each other, directly linked by an internucleotide bond. In certain embodiments, the targeter stem sequence and the spacer sequence are linked by one nucleotide, e.g., a uridine. In certain embodiments, the targeter stem sequence and the spacer sequence are linked by two or more nucleotides. In certain embodiments, the targeter stem sequence and the spacer sequence are linked by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides. [0077] In certain embodiments, the targeter nucleic acid further comprises an additional nucleotide sequence 5’ to the targeter stem sequence. In certain embodiments, the additional nucleotide sequence comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least
35, at least 40, at least 45, or at least 50) nucleotides. In certain embodiments, the additional nucleotide sequence consists of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In certain embodiments, the additional nucleotide sequence consists of 2 nucleotides. In certain embodiments, the additional nucleotide sequence is reminiscent to the loop or a fragment thereof (e.g., one, two, three, or four nucleotides at the 3’ end of the loop) in a crRNA of a corresponding single guide CRISPR-Cas system. It is understood that an additional nucleotide sequence 5’ to the targeter stem sequence can be dispensable. Accordingly, in certain embodiments, the targeter nucleic acid does not comprise any additional nucleotide 5’ to the targeter stem sequence. [0078] In certain embodiments, the targeter nucleic acid or the single guide nucleic acid further comprises an additional nucleotide sequence containing one or more nucleotides at the 3’ end that does not hybridize with the target nucleotide sequence. The additional nucleotide sequence may protect the targeter nucleic acid from degradation by 3’-5’ exonuclease. In certain embodiments, the additional nucleotide sequence is no more than 100 nucleotides in length. In certain embodiments, the additional nucleotide sequence is no more than 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length. In certain embodiments, the additional nucleotide sequence is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides in length. In certain embodiments, the additional nucleotide sequence is 5-100, 5-50, 5-40, 5-30, 5-25, 5-20, 5-15, 5- 10, 10-100, 10-50, 10-40, 10-30, 10-25, 10-20, 10-15, 15-100, 15-50, 15-40, 15-30, 15-25, 15- 20, 20-100, 20-50, 20-40, 20-30, 20-25, 25-100, 25-50, 25-40, 25-30, 30-100, 30-50, 30-40, 40- 100, 40-50, or 50-100 nucleotides in length. [0079] In certain embodiments, the additional nucleotide sequence forms a hairpin with the spacer sequence. Such secondary structure may increase the specificity of guide nucleic acid or the engineered, non-naturally occurring system (see, Kocak et al. (2019) Nat. Biotech.37: 657- 66). In certain embodiments, the free energy change during the hairpin formation is greater than or equal to -20 kcal/mol, -15 kcal/mol, -14 kcal/mol, -13 kcal/mol, -12 kcal/mol, -11 kcal/mol, or -10 kcal/mol. In certain embodiments, the free energy change during the hairpin formation is greater than or equal to -5 kcal/mol, -6 kcal/mol, -7 kcal/mol, -8 kcal/mol, -9 kcal/mol, -10 kcal/mol, -11 kcal/mol, -12 kcal/mol, -13 kcal/mol, -14 kcal/mol, or -15 kcal/mol. In certain embodiments, the free energy change during the hairpin formation is in the range of -20 to -10 kcal/mol, -20 to -11 kcal/mol, -20 to -12 kcal/mol, -20 to -13 kcal/mol, -20 to -14 kcal/mol, -20 to -15 kcal/mol, -15 to -10 kcal/mol, -15 to -11 kcal/mol, -15 to -12 kcal/mol, -15 to -13 kcal/mol, -15 to -14 kcal/mol, -14 to -10 kcal/mol, -14 to -11 kcal/mol, -14 to -12 kcal/mol, -14 to -13 kcal/mol, -13 to -10 kcal/mol, -13 to -11 kcal/mol, -13 to -12 kcal/mol, -12 to -10
kcal/mol, -12 to -11 kcal/mol, or -11 to -10 kcal/mol. In other embodiments, the targeter nucleic acid or the single guide nucleic acid does not comprise any nucleotide 3’ to the spacer sequence. [0080] In certain embodiments, the modulator nucleic acid further comprises an additional nucleotide sequence 3’ to the modulator stem sequence. In certain embodiments, the additional nucleotide sequence comprises at least 1 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50) nucleotides. In certain embodiments, the additional nucleotide sequence consists of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In certain embodiments, the additional nucleotide sequence consists of 1 nucleotide (e.g., uridine). In certain embodiments, the additional nucleotide sequence consists of 2 nucleotides. In certain embodiments, the additional nucleotide sequence is reminiscent to the loop or a fragment thereof (e.g., one, two, three, or four nucleotides at the 5’ end of the loop) in a crRNA of a corresponding single guide CRISPR-Cas system. It is understood that an additional nucleotide sequence 3’ to the modulator stem sequence can be dispensable. Accordingly, in certain embodiments, the modulator nucleic acid does not comprise any additional nucleotide 3’ to the modulator stem sequence. [0081] It is understood that the additional nucleotide sequence 5’ to the targeter stem sequence and the additional nucleotide sequence 3’ to the modulator stem sequence, if present, may interact with each other. For example, although the nucleotide immediately 5’ to the targeter stem sequence and the nucleotide immediately 3’ to the modulator stem sequence do not form a Watson-Crick base pair (otherwise they would constitute part of the targeter stem sequence and part of the modulator stem sequence, respectively), other nucleotides in the additional nucleotide sequence 5’ to the targeter stem sequence and the additional nucleotide sequence 3’ to the modulator stem sequence may form one, two, three, or more base pairs (e.g., Watson-Crick base pairs). Such interaction may affect the stability of a complex comprising the targeter nucleic acid and the modulator nucleic acid. [0082] The stability of a complex comprising a targeter nucleic acid and a modulator nucleic acid can be assessed by the Gibbs free energy change ( G) during the formation of the complex, either calculated or actually measured. Where all the predicted base pairing in the complex occurs between a base in the targeter nucleic acid and a base in the modulator nucleic acid, i.e., there is no intra-strand secondary structure, the G during the formation of the complex correlates generally with the G during the formation of a secondary structure within the corresponding single guide nucleic acid. Methods of calculating or measuring the G are known in the art. An exemplary method is RNAfold (rna.tbi.univie.ac.at/cgi- bin/RNAWebSuite/RNAfold.cgi) as disclosed in Gruber et al. (2008) Nucleic Acids Res.,
36(Web Server issue): W70–W74. Unless indicated otherwise, the G values in the present disclosure are calculated by RNAfold for the formation of a secondary structure within a corresponding single guide nucleic acid. In certain embodiments, the G is lower than or equal to -1 kcal/mol, e.g., lower than or equal to -2 kcal/mol, lower than or equal to -3 kcal/mol, lower than or equal to -4 kcal/mol, lower than or equal to -5 kcal/mol, lower than or equal to -6 kcal/mol, lower than or equal to -7 kcal/mol, lower than or equal to -7.5 kcal/mol, or lower than or equal to -8 kcal/mol. In certain embodiments, the G is greater than or equal to -10 kcal/mol, e.g., greater than or equal to -9 kcal/mol, greater than or equal to -8.5 kcal/mol, or greater than or equal to -8 kcal/mol. In certain embodiments, the G is in the range of -10 to -4 kcal/mol. In certain embodiments, the G is in the range of -8 to -4 kcal/mol, -7 to -4 kcal/mol, -6 to -4 kcal/mol, -5 to -4 kcal/mol, -8 to -4.5 kcal/mol, -7 to -4.5 kcal/mol, -6 to -4.5 kcal/mol, or -5 to - 4.5 kcal/mol. In certain embodiments, the G is about -8 kcal/mol, -7 kcal/mol, -6 kcal/mol, -5 kcal/mol, -4.9 kcal/mol, -4.8 kcal/mol, -4.7 kcal/mol, -4.6 kcal/mol, -4.5 kcal/mol, -4.4 kcal/mol, -4.3 kcal/mol, -4.2 kcal/mol, -4.1 kcal/mol, or -4 kcal/mol. [0083] It is understood that the G may be affected by a sequence in the targeter nucleic acid that is not within the targeter stem sequence, and/or a sequence in the modulator nucleic acid that is not within the modulator stem sequence. For example, one or more base pairs (e.g., Watson- Crick base pair) between an additional sequence 5’ to the targeter stem sequence and an additional sequence 3’ to the modulator stem sequence may reduce the G, i.e., stabilize the nucleic acid complex. In certain embodiments, the nucleotide immediately 5’ to the targeter stem sequence comprises a uracil or is a uridine, and the nucleotide immediately 3’ to the modulator stem sequence comprises a uracil or is a uridine, thereby forming a nonconventional U-U base pair. [0084] In certain embodiments, the modulator nucleic acid or the single guide nucleic acid comprises a nucleotide sequence referred to herein as a “5’ tail” positioned 5’ to the modulator stem sequence. In a naturally occurring type V-A CRISPR-Cas system, the 5’ tail is a nucleotide sequence positioned 5’ to the stem-loop structure of the crRNA. A 5’ tail in an engineered type V-A CRISPR-Cas system, whether single guide or dual guide, can be reminiscent to the 5’ tail in a corresponding naturally occurring type V-A CRISPR-Cas system. [0085] Without being bound by theory, it is contemplated that the 5’ tail may participate in the formation of the CRISPR-Cas complex. For example, in certain embodiments, the 5’ tail forms a pseudoknot structure with the modulator stem sequence, which is recognized by the Cas protein (see, Yamano et al. (2016) Cell, 165: 949). In certain embodiments, the 5’ tail is at least 3 (e.g., at least 4 or at least 5) nucleotides in length. In certain embodiments, the 5’ tail is 3, 4, or 5 nucleotides in length. In certain embodiments, the nucleotide at the 3’ end of the 5’ tail
comprises a uracil or is a uridine. In certain embodiments, the second nucleotide in the 5’ tail, the position counted from the 3’ end, comprises a uracil or is a uridine. In certain embodiments, the third nucleotide in the 5’ tail, the position counted from the 3’ end, comprises an adenine or is an adenosine. This third nucleotide may form a base pair (e.g., a Watson-Crick base pair) with a nucleotide 5’ to the modulator stem sequence. Accordingly, in certain embodiments, the modulator nucleic acid comprises a uridine or a uracil-containing nucleotide 5’ to the modulator stem sequence. In certain embodiments, the 5’ tail comprises the nucleotide sequence of 5’- AUU-3’. In certain embodiments, the 5’ tail comprises the nucleotide sequence of 5’-AAUU-3’. In certain embodiments, the 5’ tail comprises the nucleotide sequence of 5’-UAAUU-3’. In certain embodiments, the 5’ tail is positioned immediately 5’ to the modulator stem sequence. [0086] In certain embodiments, the single guide nucleic acid, the targeter nucleic acid, and/or the modulator nucleic acid are designed to reduce the degree of secondary structure other than the hybridization between the targeter stem sequence and the modulator stem sequence. In certain embodiments, no more than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the single guide nucleic acid other than the targeter stem sequence and the modulator stem sequence participate in self-complementary base pairing when optimally folded. In certain embodiments, no more than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the targeter nucleic acid and/or the modulator nucleic acid participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62). [0087] The targeter nucleic acid is directed to a specific target nucleotide sequence, and a donor template can be designed to modify the target nucleotide sequence or a sequence nearby. It is understood, therefore, that association of the single guide nucleic acid, the targeter nucleic acid, or the modulator nucleic acid with a donor template can increase editing efficiency and reduce off-targeting. Accordingly, in certain embodiments, the single guide nucleic acid or the modulator nucleic acid further comprises a donor template-recruiting sequence capable of hybridizing with a donor template (see Figure 2B). Donor templates are described in the “Donor Templates” subsection of section II infra. The donor template and donor template-recruiting sequence can be designed such that they bear sequence complementarity. In certain
embodiments, the donor template-recruiting sequence is at least 90% (e.g., at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) complementary to at least a portion of the donor template. In certain embodiments, the donor template-recruiting sequence is 100% complementary to at least a portion of the donor template. In certain embodiments, where the donor template comprises an engineered sequence not homologous to the sequence to be repaired, the donor template-recruiting sequence is capable of hybridizing with the engineered sequence in the donor template. In certain embodiments, the donor template-recruiting sequence is at least 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides in length. In certain embodiments, the donor template-recruiting sequence is positioned at or near the 5’ end of the single guide nucleic acid or at or near the 5’ end of the modulator nucleic acid. In certain embodiments, the donor template-recruiting sequence is linked to the 5’ tail, if present, or to the modulator stem sequence, of the single guide nucleic acid or the modulator nucleic acid through an internucleotide bond or a nucleotide linker. [0088] In certain embodiments, the single guide nucleic acid or the modulator nucleic acid further comprises an editing enhancer sequence, which increases the efficiency of gene editing and/or homology-directed repair (HDR) (see Figure 2C). Exemplary editing enhancer sequences are described in Park et al. (2018) Nat. Commun.9: 3313. In certain embodiments, the editing enhancer sequence is positioned 5’ to the 5’ tail, if present, or 5’ to the single guide nucleic acid or the modulator stem sequence. In certain embodiments, the editing enhancer sequence is 1-50, 4-50, 9-50, 15-50, 25-50, 1-25, 4-25, 9-25, 15-25, 1-15, 4-15, 9-15, 1-9, 4-9, or 1-4 nucleotides in length. In certain embodiments, the editing enhancer sequence is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55 nucleotides in length. The editing enhancer sequence is designed to minimize homology to the target nucleotide sequence or any other sequence that the engineered, non-naturally occurring system may be contacted to, e.g., the genome sequence of a cell into which the engineered, non-naturally occurring system is delivered. In certain embodiments, the editing enhancer is designed to minimize the presence of hairpin structure. The editing enhancer can comprise one or more of the chemical modifications disclosed herein. [0089] The single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid can further comprise a protective nucleotide sequence that prevents or reduces nucleic acid degradation. In certain embodiments, the protective nucleotide sequence is at least 5 (e.g., at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50) nucleotides in length. The length of the protective nucleotide sequence increases the time for an exonuclease to reach the 5’ tail, modulator stem sequence, targeter stem sequence, and/or spacer sequence, thereby protecting these portions of the single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid from degradation by an exonuclease. In certain
embodiments, the protective nucleotide sequence forms a secondary structure, such as a hairpin or a tRNA structure, to reduce the speed of degradation by an exonuclease (see, for example, Wu et al. (2018) Cell. Mol. Life Sci., 75(19): 3593-3607). Secondary structures can be predicted by methods known in the art, such as the online webserver RNAfold developed at University of Vienna using the centroid structure prediction algorithm (see, Gruber et al. (2008) Nucleic Acids Res., 36: W70). Certain chemical modifications, which may be present in the protective nucleotide sequence, can also prevent or reduce nucleic acid degradation, as disclosed in the “RNA Modifications” subsection infra. [0090] A protective nucleotide sequence is typically located at the 5’ or 3’ end of the single guide nucleic acid, the modulator nucleic acid, and/or the targeter nucleic acid. In certain embodiments, the single guide nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker. In certain embodiments, the modulator nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker. In particular embodiments, the modulator nucleic acid comprises a protective nucleotide sequence at the 5’ end (see Figure 2A). In certain embodiments, the targeter nucleic acid comprises a protective nucleotide sequence at the 5’ end, at the 3’ end, or at both ends, optionally through a nucleotide linker. [0091] As described above, various nucleotide sequences can be present in the 5’ portion of a single nucleic acid or a modulator nucleic acid, including but not limited to a donor template- recruiting sequence, an editing enhancer sequence, a protective nucleotide sequence, and a linker connecting such sequence to the 5’ tail, if present, or to the modulator stem sequence. It is understood that the functions of donor template recruitment, editing enhancement, protection against degradation, and linkage are not exclusive to each other, and one nucleotide sequence can have one or more of such functions. For example, in certain embodiments, the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both a donor template-recruiting sequence and an editing enhancer sequence. In certain embodiments, the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both a donor template-recruiting sequence and a protective sequence. In certain embodiments, the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is both an editing enhancer sequence and a protective sequence. In certain embodiments, the single guide nucleic acid or the modulator nucleic acid comprises a nucleotide sequence that is a donor template-recruiting sequence, an editing enhancer sequence, and a protective sequence. In certain embodiments, the nucleotide sequence 5’ to the 5’ tail, if present, or 5’ to the modulator stem sequence is 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 10-90, 10-80, 10-70, 10-60,
10-50, 10-40, 10-30, 10-20, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-90, 30-80, 30- 70, 30-60, 30-50, 30-40, 40-90, 40-80, 40-70, 40-60, 40-50, 50-90, 50-80, 50-70, 50-60, 60-90, 60-80, 60-70, 70-90, 70-80, or 80-90 nucleotides in length. [0092] In certain embodiments, an engineered, non-naturally occurring system further comprises one or more compounds (e.g., small molecule compounds) that enhance HDR and/or inhibit NHEJ. Exemplary compounds having such functions are described in Maruyama et al. (2015) Nat Biotechnol.33(5): 538-42; Chu et al. (2015) Nat Biotechnol.33(5): 543-48; Yu et al. (2015) Cell Stem Cell 16(2): 142-47; Pinder et al. (2015) Nucleic Acids Res.43(19): 9379-92; and Yagiz et al. (2019) Commun. Biol.2: 198. In certain embodiments, an engineered, non- naturally occurring system further comprises one or more compounds selected from the group consisting of DNA ligase IV antagonists (e.g., SCR7 compound, Ad4 E1B55K protein, and Ad4 E4orf6 protein), RAD51 agonists (e.g., RS-1), DNA-dependent protein kinase (DNA-PK) antagonists (e.g., NU7441 and KU0060648), 3-adrenergic receptor agonists (e.g., L755507), inhibitors of intracellular protein transport from the ER to the Golgi apparatus (e.g., brefeldin A), and any combinations thereof. [0093] In certain embodiments, an engineered, non-naturally occurring system comprising a targeter nucleic acid and a modulator nucleic acid is tunable or inducible. For example, in certain embodiments, the targeter nucleic acid, the modulator nucleic acid, and/or the Cas protein can be introduced to the target nucleotide sequence at different times, the system becoming active only when all components are present. In certain embodiments, the amounts of the targeter nucleic acid, the modulator nucleic acid, and/or the Cas protein can be titrated to achieve desired efficiency and specificity. In certain embodiments, excess amount of a nucleic acid comprising the targeter stem sequence or the modulator stem sequence can be added to the system, thereby dissociating the complex of the targeter nucleic and modulator nucleic acid and turning off the system. C. gNA modifications [0094] Guide nucleic acids, including a single guide nucleic acid, a targeter nucleic acid, and/or a modulator nucleic acid, may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. In certain embodiments, the single guide nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. In certain embodiments, the targeter nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. In certain embodiments, the modulator nucleic acid comprises a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. Spacer sequences can be presented as DNA sequences by including
thymidines (T) rather than uridines (U). It is understood that corresponding RNA sequences and DNA/RNA chimeric sequences are also contemplated. For example, where the spacer sequence is an RNA, its sequence can be derived from a DNA sequence disclosed herein by replacing each T with U. As a result, for the purpose of describing a nucleotide sequence, T and U are used interchangeably herein. [0095] In certain embodiments engineered, non-naturally occurring systems comprising a targeter nucleic acid comprising: a spacer sequence designed to hybridize with a target nucleotide sequence and a targeter stem sequence; and a modulator nucleic acid comprising a modulator stem sequence complementary to the targeter stem sequence, and, optionally, a 5’ sequence, e.g., a tail sequence, wherein, in a single guide nucleic acid the targeter nucleic acid and the modulator nucleic acid are part of a single polynucleotide, and in a dual guide nucleic acid, the targeter nucleic acid and the modulator nucleic acid are separate nucleic acids; modifications can include one or more chemical modifications to one or more nucleotides or internucleotide linkages at or near the 3’ end of the targeter nucleic acid (dual and single gNA), at or near the 5’ end of the targeter nucleic acid (dual gNA), at or near the 3’ end of the modulator nucleic acid (dual gNA), at or near the 5’ end of the modulator nucleic acid (single and dual gNA), or combinations thereof as appropriate for single or dual gNA. In certain embodiments, the Cas nuclease is a type V-A Cas nuclease. Modulator and/or targeter nucleic sequences can include further sequences, as detailed in the Guide Nucleic Acids section, and modifications can be in these further sequences, as appropriate and apparent to one of skill in the art. In embodiments described in this section, below, in certain embodiments, guide nucleic acid is oriented from 5’ at the modulator nucleic acid to 3’ at the modulator stem sequence, and 5’ at the targeter stem sequence to 3’ at the targeter sequence (see, e.g., Figure 1A and 1B); in certain embodiments, as appropriate, guide nucleic acid is oriented from 3’ at the modulator nucleic acid to 5’ at the modulator stem sequence, and 3’ at the targeter stem sequence to 5’ at the targeter sequence. [0096] The targeter nucleic acid may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. The modulator nucleic acid may comprise a DNA (e.g., modified DNA), an RNA (e.g., modified RNA), or a combination thereof. In certain embodiments, the targeter nucleic acid is an RNA and the modulator nucleic acid is an RNA. A targeter nucleic acid in the form of an RNA is also called targeter RNA, and a modulator nucleic acid in the form of an RNA is also called modulator RNA. The nucleotide sequences disclosed herein are presented as DNA sequences by including thymidines (T) and/or RNA sequences including uridines (U). It is understood that corresponding DNA sequences, RNA sequences, and DNA/RNA chimeric sequences are also contemplated. For example, where a spacer sequence is
presented as a DNA sequence, a nucleic acid comprising this spacer sequence as an RNA can be derived from the DNA sequence disclosed herein by replacing each T with U. As a result, for the purpose of describing a nucleotide sequence, T and U are used interchangeably herein. [0097] In certain embodiments some or all of the gNA is RNA, e.g., a gRNA. In certain embodiments, 5-100%, 10-100%, 20-100%, 30-100%, 40-100%, 50-100%, 60-100%, 70-100%, 80-100%, 90-100%, 95-100%, 99-100%, 99.5-100% of the gNA is gRNA. In certain embodiments, 20%-80%, 20%-70%, 20%-60%, 20%-50%, 20%-40%, 20%-30%, 30%-80%, 30%-70%, 30%-60%, 30%-50%, 30%-40%, 40%-80%, 40%-70%, 40%-60%, 40%-50%, 50%- 80%, 50%-70%, 50%-60%, 60%-80%, 60%-70%, or 70%-80% of gNA is RNA. In certain embodiments, 50% of the gNA is RNA. In certain embodiments, 70% of the gNA is RNA. In certain embodiments, 90% of the gNA is RNA. In certain embodiments, 100% of the gNA is RNA, e.g., a gRNA. In further embodiments, the remaining portion of the gNA that is not RNA comprises a modified ribonucleotide, a deoxyribonucleotide, a modified deoxyribonucleotide, or a synthetic, e.g., unnatural nucleotide, for example, not intended to be limiting, threose nucleic acid, locked nucleic acid, peptide nucleic acid, arabinonucleic acid, hexose nucleic acid, among others. [0098] In certain embodiments, the targeter nucleic acid and/or the modulator nucleic acid are RNAs with one or more modifications in a ribose group, one or more modifications in a phosphate group, one or more modifications in a nucleobase, one or more terminal modifications, or a combination thereof. Exemplary modifications are disclosed in U.S. Patent Nos.10,900,034 and 10,767,175, U.S. Patent Application Publication No.2018/0119140, Watts et al. (2008) Drug Discov. Today 13: 842-55, and Hendel et al. (2015) NAT. BIOTECHNOL.33: 985. [0099] In certain embodiments, a targeter nucleic acid, e.g., RNA, comprises at least one nucleotide at or near the 3’ end comprising a modification to a ribose, phosphate group, nucleobase, or terminal modification. In certain embodiments, the 3’ end of the targeter nucleic acid comprises the spacer sequence. In certain embodiments, the 3’ end of the targeter nucleic acid comprises the targeter stem sequence. Exemplary modifications are disclosed in Dang et al. (2015) Genome Biol.16: 280, Kocaz et al. (2019) Nature Biotech.37: 657-66, Liu et al. (2019) Nucleic Acids Res.47(8): 4169-4180, Schubert et al. (2018) J. Cytokine Biol.3(1): 121, Teng et al. (2019) Genome Biol.20(1): 15, Watts et al. (2008) Drug Discov. Today 13(19-20): 842-55, and Wu et al. (2018) Cell Mol. Life. Sci.75(19): 3593-607. [0100] Modifications in a ribose group include but are not limited to modifications at the 2' position or modifications at the 4 position. For example, in certain embodiments, the ribose comprises 2'-O-C1-4alkyl, such as 2'-O-methyl (2'-OMe, or M). In certain embodiments, the ribose comprises 2'-O-C1-3alkyl-O-C1-3alkyl, such as 2'-methoxyethoxy (2'-O—CH2CH2OCH3)
also known as 2'-O-(2-methoxyethyl) or 2'-MOE. In certain embodiments, the ribose comprises 2'-O-allyl. In certain embodiments, the ribose comprises 2'-O-2,4-Dinitrophenol (DNP). In certain embodiments, the ribose comprises 2'-halo, such as 2'-F, 2'-Br, 2'-Cl, or 2'-I. In certain embodiments, the ribose comprises 2'-NH2. In certain embodiments, the ribose comprises 2'-H (e.g., a deoxynucleotide). In certain embodiments, the ribose comprises 2'-arabino or 2'-F- arabino. In certain embodiments, the ribose comprises 2'-LNA or 2'-ULNA. In certain embodiments, the ribose comprises a 4'-thioribosyl. [0101] Modifications can also include a deoxy group, for example a 2'-deoxy-3'- phosphonoacetate (DP), a 2'-deoxy-3'-thiophosphonoacetate (DSP). [0102] Internucleotide linkage modifications in a phosphate group include but are not limited to a phosphorothioate (S), a chiral phosphorothioate, a phosphorodithioate, a boranophosphonate, a C1-4alkyl phosphonate such as a methylphosphonate, a boranophosphonate, a phosphonocarboxylate such as a phosphonoacetate (P), a phosphonocarboxylate ester such as a phosphonoacetate ester, an amide, a thiophosphonocarboxylate such as a thiophosphonoacetate (SP), a thiophosphonocarboxylate ester such as a thiophosphonoacetate ester, and a 2' ,5 -linkage having a phosphodiester or any of the modified phosphates above. Various salts, mixed salts and free acid forms are also included. [0103] Modifications in a nucleobase include but are not limited to 2-thiouracil, 2- thiocytosine, 4-thiouracil, 6-thioguanine, 2-aminoadenine, 2-aminopurine, pseudouracil, hypoxanthine, 7-deazaguanine, 7-deaza-8-azaguanine, 7-deazaadenine, 7-deaza-8-azaadenine, 5- methylcytosine, 5-methyluracil, 5-hydroxymethylcytosine, 5-hydroxymethyluracil, 5,6- dehydrouracil, 5-propynylcytosine, 5-propynyluracil, 5-ethynylcytosine, 5-ethynyluracil, 5- allyluracil, 5-allylcytosine, 5-aminoallyluracil, 5-aminoallyl-cytosine, 5-bromouracil, 5- iodouracil, diaminopurine, difluorotoluene, dihydrouracil, an abasic nucleotide, Z base, P base, Unstructured Nucleic Acid, isoguanine, isocytosine (see, Piccirilli et al. (1990) NATURE, 343: 33), 5-methyl-2-pyrimidine (see, Rappaport (1993) BIOCHEMISTRY, 32: 3047), x(A,G,C,T), and y(A,G,C,T). [0104] Terminal modifications include but are not limited to polyethyleneglycol (PEG), hydrocarbon linkers (such as heteroatom (O,S,N)-substituted hydrocarbon spacers; halo- substituted hydrocarbon spacers; keto-, carboxyl-, amido-, thionyl-, carbamoyl-, thionocarbamaoyl-containing hydrocarbon spacers, propanediol), spermine linkers, dyes such as fluorescent dyes (for example, fluoresceins, rhodamines, cyanines), quenchers (for example, dabcyl, BHQ), and other labels (for example biotin, digoxigenin, acridine, streptavidin, avidin, peptides and/or proteins). In certain embodiments, a terminal modification comprises a conjugation (or ligation) of the RNA to another molecule comprising an oligonucleotide (such as
deoxyribonucleotides and/or ribonucleotides), a peptide, a protein, a sugar, an oligosaccharide, a steroid, a lipid, a folic acid, a vitamin and/or other molecule. In certain embodiments, a terminal modification incorporated into the RNA is located internally in the RNA sequence via a linker such as 2-(4-butylamidofluorescein)propane-1,3-diol bis(phosphodiester) linker, which is incorporated as a phosphodiester linkage and can be incorporated anywhere between two nucleotides in the RNA. [0105] The modifications disclosed above can be combined in the targeter nucleic acid and/or the modulator nucleic acid that are in the form of RNA. In certain embodiments, the modification in the RNA is selected from the group consisting of incorporation of 2'-O-methyl- 3'phosphorothioate (MS), 2'-O-methyl-3'-phosphonoacetate (MP), 2'-O-methyl-3'- thiophosphonoacetate (MSP), 2'-halo-3'-phosphorothioate (e.g., 2'-fluoro-3'-phosphorothioate), 2'-halo-3'-phosphonoacetate (e.g., 2'-fluoro-3'-phosphonoacetate), and 2'-halo-3'- thiophosphonoacetate (e.g., 2'-fluoro-3'-thiophosphonoacetate). [0106] In certain embodiments, modifications can include 2'-O-methyl (M), a phosphorothioate (S), a phosphonoacetate (P), a thiophosphonoacetate (SP), a 2'-O-methyl-3'- phosphorothioate (MS), a 2'-O-methyl-3'-phosphonoacetate (MP), a 2'-O-methyl-3'- thiophosphonoacetate (MSP), a 2'-deoxy-3'-phosphonoacetate (DP), a 2'-deoxy-3'- thiophosphonoacetate (DSP), or a combination thereof, at or near either the 3’ or 5’ end of either the targeter or modulator nucleic acid, as appropriate for single or dual gNA. In certain embodiments, modifications can include either a 5’ or a 3’ propanediol or C3 linker modification. [0107] In certain embodiments, the modification alters the stability of the RNA. In certain embodiments, the modification enhances the stability of the RNA, e.g., by increasing nuclease resistance of the RNA relative to a corresponding RNA without the modification. Stability- enhancing modifications include but are not limited to incorporation of 2'-O-methyl, a 2'-O-C1- 4alkyl, 2'-halo (e.g., 2'-F, 2'-Br, 2'-Cl, or 2'-I), 2' MOE, a 2'-O-C1-3alkyl-O-C1-3alkyl, 2'-NH2, 2'-H (or 2'-deoxy), 2'-arabino, 2'-F-arabino, 4 -thioribosyl sugar moiety, 3'-phosphorothioate, 3'- phosphonoacetate, 3'-thiophosphonoacetate, 3'-methylphosphonate, 3'-boranophosphate, 3'- phosphorodithioate, locked nucleic acid (“LNA”) nucleotide which comprises a methylene bridge between the 2' and 4' carbons of the ribose ring, and unlocked nucleic acid (“ULNA”) nucleotide. Such modifications are suitable for use as a protecting group to prevent or reduce degradation of the 5’ sequence, e.g., a tail sequence, modulator stem sequence (dual guide nucleic acids), targeter stem sequence (dual guide nucleic acids), and/or spacer sequence (see, the “Targeter and Modulator nucleic acids” subsection).
[0108] In certain embodiments, the modification alters the specificity of the engineered, non- naturally occurring system. In certain embodiments, the modification enhances the specificity of the engineered, non-naturally occurring system, e.g., by enhancing on-target binding and/or cleavage, or reducing off-target binding and/or cleavage, or a combination thereof. Specificity- enhancing modifications include but are not limited to 2-thiouracil, 2-thiocytosine, 4-thiouracil, 6-thioguanine, 2-aminoadenine, and pseudouracil. Within 10, 5, 4, 3, 2, or 1 nucleotide of the 3’ end, for example the 3’ end nucleotide, is modified. [0109] In certain embodiments, the modification alters the immunostimulatory effect of the RNA relative to a corresponding RNA without the modification. For example, in certain embodiments, the modification reduces the ability of the RNA to activate TLR7, TLR8, TLR9, TLR3, RIG-I, and/or MDA5. [0110] In certain embodiments, the targeter nucleic acid and/or the modulator nucleic acid comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 modified nucleotides or internucleotide linkages. The modification can be made at one or more positions in the targeter nucleic acid and/or the modulator nucleic acid such that these nucleic acids retain functionality. For example, the modified nucleic acids can still direct the Cas protein to the target nucleotide sequence and allow the Cas protein to exert its effector function. It is understood that the particular modification(s) at a position may be selected based on the functionality of the nucleotide or internucleotide linkage at the position. For example, a specificity-enhancing modification may be suitable for a nucleotide or internucleotide linkage in the spacer sequence, the targeter stem sequence, or the modulator stem sequence. A stability-enhancing modification may be suitable for one or more terminal nucleotides or internucleotide linkages in the targeter nucleic acid and/or the modulator nucleic acid. In certain embodiments, at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 5’ end and/or at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 3’ end of the targeter nucleic acid are modified. In certain embodiments, 5 or fewer (e.g., 1 or fewer, 2 or fewer, 3 or fewer, or 4 or fewer) terminal nucleotides or internucleotide linkages at or near the 5’ end and/or 5 or fewer (e.g., 1 or fewer, 2 or fewer, 3 or fewer, or 4 or fewer) terminal nucleotides or internucleotide linkages at or near the 3’ end of the targeter nucleic acid are modified. In certain embodiments, at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 5’ end and/or at least 1 (e.g., at least 2, at least 3, at least 4, or at least 5) terminal nucleotides or internucleotide linkages at or near the 3’ end of the modulator nucleic acid are modified. In certain embodiments, 5 or fewer (e.g., 1 or fewer, 2 or fewer, 3 or fewer, or 4 or fewer) terminal
nucleotides or internucleotide linkages at or near the 5’ end and/or 5 or fewer (e.g., 1 or fewer, 2 or fewer, 3 or fewer, or 4 or fewer) terminal nucleotides or internucleotide linkages at or near the 3’ end of the modulator nucleic acid are modified. Selection of positions for modifications is described in U.S. Patent Nos.10,900,034 and 10,767,175. As used in this paragraph, where the targeter or modulator nucleic acid is a combination of DNA and RNA, the nucleic acid as a whole is considered as an RNA, and the DNA nucleotide(s) are considered as modification(s) of the RNA, including a 2'-H modification of the ribose and optionally a modification of the nucleobase. [0111] It is understood that, in dual guide nucleic acid systems the targeter nucleic acid and the modulator nucleic acid, while not in the same nucleic acids, i.
., not linked end-to-end through a traditional internucleotide bond, can be covalently conjugated to each other through one or more chemical modifications introduced into these nucleic acids, thereby increasing the stability of the double-stranded complex and/or improving other characteristics of the system. II. Composition and methods for targeting, editing, and/or modifying genomic DNA [0112] An engineered, non-naturally occurring system, such as disclosed herein, can be useful for targeting, editing, and/or modifying a target nucleic acid, such as a DNA (e.g., genomic DNA) in a cell or organism. [0113] The present invention provides a method of cleaving a target nucleic acid (e.g., DNA) comprising the sequence of a preselected target sequence or a portion thereof, the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, thereby resulting in cleavage of the target DNA. [0114] In addition, the present invention provides a method of binding a target nucleic acid (e.g., DNA) comprising the sequence of a preselected target sequence or a portion thereof, the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, thereby resulting in binding of the system to the target DNA. This method can be useful, e.g., for detecting the presence and/or location of the a preselected target gene, for example, if a component of the system (e.g., the Cas protein) comprises a detectable marker. [0115] In addition, provided are methods of modifying a target nucleic acid (e.g., DNA) comprising the sequence of a preselected target sequence or a portion thereof, or a structure (e.g., protein) associated with the target DNA (e.g., a histone protein in a chromosome), the method comprising contacting the target DNA with an engineered, non-naturally occurring system disclosed herein, wherein the Cas protein comprises an effector domain or is associated with an effector protein, thereby resulting in modification of the target DNA or the structure associated
with the target DNA. The modification corresponds to the function of the effector domain or effector protein. Exemplary functions described in the “Cas Proteins” subsection in Section I supra are applicable hereto. [0116] An engineered, non-naturally occurring system can be contacted with the target nucleic acid as a complex. Accordingly, in certain embodiments, a method comprises contacting the target nucleic acid with a CRISPR-Cas complex comprising a targeter nucleic acid, a modulator nucleic acid, and a Cas protein disclosed herein. In certain embodiments, the Cas protein is a type V-A, type V-C, or type V-D Cas protein (e.g., Cas nuclease). In certain embodiments, the Cas protein is a type V-A Cas protein (e.g., Cas nuclease). [0117] In certain embodiments, provided is a method of editing a human genomic sequence at one of a group of preselected target gene loci, the method comprising delivering an engineered, non-naturally occurring system disclosed herein into a human cell, thereby resulting in editing of the genomic sequence at the target gene locus in the human cell. In certain embodiments, provided herein is a method of detecting a human genomic sequence at one of a group of preselected target gene loci, the method comprising delivering the engineered, non- naturally occurring system disclosed herein into a human cell, wherein a component of the system (e.g., the Cas protein) comprises a detectable marker, thereby detecting the target gene locus in the human cell. In certain embodiments, provided herein is a method of modifying a human chromosome at one of a group of preselected target gene loci, the method comprising delivering the engineered, non-naturally occurring system disclosed herein into a human cell, wherein the Cas protein comprises an effector domain or is associated with an effector protein, thereby resulting in modification of the chromosome at the target gene locus in the human cell. [0118] The CRISPR-Cas complex may be delivered to a cell by introducing a pre-formed ribonucleoprotein (RNP) complex into the cell. Alternatively, one or more components of the CRISPR-Cas complex may be expressed in the cell. Exemplary methods of delivery are known in the art and described in, for example, U.S. Patent Nos.8,697,359, 10,113,167, 10,570,418, 10,829,787, 11,118,194, and 11,125,739 and U.S. Patent Application Publication Nos. 2015/0344912, 2018/0119140, and 2018/0282763. [0119] It is understood that contacting a DNA (e.g., genomic DNA) in a cell with a CRISPR- Cas complex does not require delivery of all components of the complex into the cell. For example, one or more of the components may be pre-existing in the cell. In certain embodiments, the cell (or a parental/ancestral cell thereof) has been engineered to express the Cas protein, and the single guide nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the single guide nucleic acid), the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding
the targeter nucleic acid), and/or the modulator nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the modulator nucleic acid) are delivered into the cell. In certain embodiments, the cell (or a parental/ancestral cell thereof) has been engineered to express the modulator nucleic acid, and the Cas protein (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the Cas protein) and the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid) are delivered into the cell. In certain embodiments, the cell (or a parental/ancestral cell thereof) has been engineered to express the Cas protein and the modulator nucleic acid, and the targeter nucleic acid (or a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding the targeter nucleic acid) is delivered into the cell. [0120] In certain embodiments, the target DNA is in the genome of a target cell. Accordingly, the present invention also provides a cell comprising the non-naturally occurring system or a CRISPR expression system described herein. In addition, the present invention provides a cell whose genome has been modified by the CRISPR-Cas system or complex disclosed herein. [0121] The target cells can be mitotic or post-mitotic cells from any organism, such as a bacterial cell (e.g., E coli), an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, or the like, a fungal cell (e.g., a yeast cell, such as S. cervisiae), an animal cell, a cell from an invertebrate animal (e.g. fruit fly, enidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal, a cell from a rodent, or a cell from a human. The types of target cells include but are not limited to a stem cell (e.g., an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell), a somatic cell (e.g., a fibroblast, a hematopoietic cell, a T lymphocyte (e.g., CD8+ T lymphocyte), an NK cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell), an in vitro or in vivo embryonic cell of an embryo at any stage (e.g., a 1-cell, 2-cell, 4-cell, 8-cell; stage zebrafish embryo). Cells may be from established cell lines or may be primary cells (i.e., cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages of the culture). For example, primary cultures are cultures that may have been passaged within 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times to go through the crisis stage. Typically, the primary cell lines are maintained for fewer than 10 passages in vitro. If the cells are primary cells, they may be harvest from an individual by any suitable method. For example, leukocytes may be harvested by apheresis, leukocytapheresis, or density gradient
separation, while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, or stomach can be harvested by biopsy. The harvested cells may be used immediately, or may be stored under frozen conditions with a cryopreservative and thawed at a later time in a manner as commonly known in the art. A. Ribonucleoprotein (RNP) delivery and “cas RNA” delivery [0122] An engineered, non-naturally occurring system disclosed herein can be delivered into a cell by suitable methods known in the art, including but not limited to ribonucleoprotein (RNP) delivery and “Cas RNA” delivery described below. [0123] In certain embodiments, a CRISPR-Cas system including a single guide nucleic acid and a Cas protein, or a CRISPR-Cas system including a targeter nucleic acid, a modulator nucleic acid, and a Cas protein, can be combined into a RNP complex and then delivered into the cell as a pre-formed complex. This method is suitable for active modification of the genetic or epigenetic information in a cell during a limited time period. For example, where the Cas protein has nuclease activity to modify the genomic DNA of the cell, the nuclease activity only needs to be retained for a period of time to allow DNA cleavage, and prolonged nuclease activity may increase off-targeting. Similarly, certain epigenetic modifications can be maintained in a cell once established and can be inherited by daughter cells. [0124] A “ribonucleoprotein” or “RNP,” as used herein, can refer to a complex comprising a nucleoprotein and a ribonucleic acid. A “nucleoprotein” as provided herein can refer to a protein capable of binding a nucleic acid (e.g., RNA, DNA). Where the nucleoprotein binds a ribonucleic acid it can be referred to as “ribonucleoprotein.” The interaction between the ribonucleoprotein and the ribonucleic acid may be direct, e.g., by covalent bond, or indirect, e.g., by non-covalent bond (e.g. electrostatic interactions (e.g. ionic bond, hydrogen bond, halogen bond), van der Waals interactions (e.g. dipole-dipole, dipole-induced dipole, London dispersion), ring stacking (pi effects), hydrophobic interactions, or the like). In certain embodiments, the ribonucleoprotein includes an RNA-binding motif non-covalently bound to the ribonucleic acid. For example, positively charged aromatic amino acid residues (e.g., lysine residues) in the RNA- binding motif may form electrostatic interactions with the negative nucleic acid phosphate backbones of the RNA. [0125] To ensure efficient loading of the Cas protein, the single guide nucleic acid, or the combination of the targeter nucleic acid and the modulator nucleic acid, can be provided in excess molar amount (e.g., at least 2 fold, at least 3 fold, at least 4 fold, or at least 5 fold) relative to the Cas protein. In certain embodiments, the targeter nucleic acid and the modulator nucleic acid are annealed under suitable conditions prior to complexing with the Cas protein. In other
embodiments, the targeter nucleic acid, the modulator nucleic acid, and the Cas protein are directly mixed together to form an RNP. [0126] A variety of delivery methods can be used to introduce an RNP disclosed herein into a cell. Exemplary delivery methods or vehicles include but are not limited to microinjection, liposomes (see, e.g., U.S. Patent No.10829,787,) such as molecular trojan horses liposomes that delivers molecules across the blood brain barrier (see, Pardridge et al. (2010) Cold Spring Harb. Protoc., doi:10.1101/pdb.prot5407), immunoliposomes, virosomes, microvesicles (e.g., exosomes and ARMMs), polycations, lipid:nucleic acid conjugates, electroporation, cell permeable peptides (see, U.S. Patent No.11,118,194), nanoparticles, nanowires (see, Shalek et al. (2012) Nano Letters, 12: 6498), exosomes, and perturbation of cell membrane (e.g., by passing cells through a constriction in a microfluidic system, see, U.S. Patent No.11,125,739). Where the target cell is a proliferating cell, the efficiency of RNP delivery can be enhanced by cell cycle synchronization (see, U.S. Patent No.10,570,418). In certain embodiments, an RNP is delivered into a cell by electroporation. [0127] In certain embodiments, a CRISPR-Cas system is delivered into a cell in a “approach, i.e., delivering (a) a single guide nucleic acid, or a combination of a targeter nucleic acid and a modulator nucleic acid, and (b) an RNA (e.g., messenger RNA (mRNA)) encoding a Cas protein. The RNA encoding the Cas protein can be translated in the cell and form a complex with the single guide nucleic acid or combination of the targeter nucleic acid and the modulator nucleic acid intracellularly. Similar to the RNP approach, RNAs have limited half-lives in cells, even though stability-increasing modification(s) can be made in one or more of the RNAs. Accordingly, the “Cas RNA” approach is suitable for active modification of the genetic or epigenetic information in a cell during a limited time period, such as DNA cleavage, and has the advantage of reducing off-targeting. [0128] The mRNA can be produced by transcription of a DNA comprising a regulatory element operably linked to a Cas coding sequence. Given that multiple copies of Cas protein can be generated from one mRNA, the single guide nucleic acid, or the targeter nucleic acid and the modulator nucleic acid are generally provided in excess molar amount (e.g., at least 5 fold, at least 10 fold, at least 20 fold, at least 30 fold, at least 50 fold, or at least 100 fold) relative to the mRNA. In certain embodiments, the targeter nucleic acid and the modulator nucleic acid are annealed under suitable conditions prior to delivery into the cells. In other embodiments, the targeter nucleic acid and the modulator nucleic acid are delivered into the cells without annealing in vitro. [0129] A variety of delivery systems can be used to introduce an “Cas RNA” system into a cell. Non-limiting examples of delivery methods or vehicles include microinjection, biolistic
particles, liposomes (see, e.g., U.S. Patent No.10,829,787) such as molecular trojan horses liposomes that delivers molecules across the blood brain barrier (see, Pardridge et al. (2010) Cold Spring Harb. Protoc., doi:10.1101/pdb.prot5407), immunoliposomes, virosomes, polycations, lipid:nucleic acid conjugates, electroporation, nanoparticles, nanowires (see, Shalek et al. (2012) Nano Letters, 12: 6498), exosomes, and perturbation of cell membrane (e.g., by passing cells through a constriction in a microfluidic system, see, U.S. Patent No.11,125,739). Specific examples of the “nucleic acid only” approach by electroporation are described in International (PCT) Publication No. WO 2016/164356. [0130] In certain embodiments, the CRISPR-Cas system is delivered into a cell in the form of (a) a single guide nucleic acid or a combination of a targeter nucleic acid and a modulator nucleic acid, and (b) a DNA comprising a regulatory element operably linked to a Cas coding sequence. The DNA can be provided in a plasmid, viral vector, or any other form described in the “CRISPR Expression Systems” subsection. Such delivery method may result in constitutive expression of Cas protein in the target cell (e.g., if the DNA is maintained in the cell in an episomal vector or is integrated into the genome), and may increase the risk of off-targeting which is undesirable when the Cas protein has nuclease activity. Notwithstanding, this approach is useful when the Cas protein comprises a non-nuclease effector (e.g., a transcriptional activator or repressor). It is also useful for research purposes and for genome editing of plants. B. CRISPR expression systems [0131] Also provided herein is a nucleic acid comprising a regulatory element operably linked to a nucleotide sequence encoding a guide nucleic acid disclosed herein. In certain embodiments, the nucleic acid comprises a regulatory element operably linked to a nucleotide sequence encoding a single guide nucleic acid; this nucleic acid alone can constitute a CRISPR expression system. In certain embodiments, the nucleic acid comprises a regulatory element operably linked to a nucleotide sequence encoding a targeter nucleic acid. In certain embodiments, the nucleic acid further comprises a nucleotide sequence encoding a modulator nucleic acid, wherein the nucleotide sequence encoding the modulator nucleic acid is operably linked to the same regulatory element as the nucleotide sequence encoding the targeter nucleic acid or a different regulatory element; this nucleic acid alone can constitute a CRISPR expression system. [0132] In addition, the present invention provides a CRISPR expression system comprising: (a) a nucleic acid comprising a first regulatory element operably linked to a nucleotide sequence encoding a targeter nucleic acid and (b) a nucleic acid comprising a second regulatory element operably linked to a nucleotide sequence encoding a modulator nucleic acid.
[0133] In certain embodiments, a CRISPR expression system further comprises a nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding a Cas protein, such as a Cas protein disclosed herein. In certain embodiments, the Cas protein is a type V-A, type V-C, or type V-D Cas protein (e.g., Cas nuclease). In certain embodiments, the Cas protein is a type V-A Cas protein (e.g., Cas nuclease). [0134] As used in this context, the term “operably linked” can mean that the nucleotide sequence of interest is linked to the regulatory element in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). [0135] The nucleic acids of a CRISPR expression system described above may be independently selected from various nucleic acids such as DNA (e.g., modified DNA) and RNA (e.g., modified RNA). In certain embodiments, the nucleic acids comprising a regulatory element operably linked to one or more nucleotide sequences encoding the guide nucleic acids are in the form of DNA. In certain embodiments, the nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding the Cas protein is in the form of DNA. The third regulatory element can be a constitutive or inducible promoter that drives the expression of the Cas protein. In other embodiments, the nucleic acid comprising a third regulatory element operably linked to a nucleotide sequence encoding the Cas protein is in the form of RNA (e.g., mRNA). [0136] Nucleic acids of a CRISPR expression system can be provided in one or more vectors. The term “vector,” as used herein, can refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in cells, such as prokaryotic cells, eukaryotic cells, mammalian cells, or target tissues. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. Gene therapy procedures are known in the art and disclosed in Van Brunt (1988) BIOTECHNOLOGY, 6: 1149; Anderson (1992) SCIENCE, 256: 808; Nabel & Feigner (1993) TIBTECH, 11: 211; Mitani & Caskey (1993) TIBTECH, 11: 162; Dillon (1993) TIBTECH, 11: 167; Miller (1992) NATURE, 357: 455; Vigne,(1995) RESTORATIVE NEUROLOGY AND NEUROSCIENCE, 8: 35; Kremer & Perricaudet (1995) BRITISH MEDICAL BULLETIN, 51: 31; Haddada et al. (1995) CURRENT TOPICS IN MICROBIOLOGY AND IMMUNOLOGY, 199: 297; Yu et al. (1994) GENE THERAPY, 1: 13; and Doerfler and Bohm (Eds.) (2012) The Molecular Repertoire of Adenoviruses II: Molecular Biology of Virus-Cell Interactions. In certain
embodiments, at least one of the vectors is a DNA plasmid. In certain embodiments, at least one of the vectors is a viral vector (e.g., retrovirus, adenovirus, or adeno-associated virus). [0137] Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors and replication defective viral vectors) do not autonomously replicate in the host cell. Certain vectors, however, may be integrated into the genome of the host cell and thereby are replicated along with the host genome. A skilled person in the art will appreciate that different vectors may be suitable for different delivery methods and have different host tropism, and will be able to select one or more vectors suitable for the use. [0138] The term “regulatory element,” as used herein, can refer to a transcriptional and/or translational control sequence, such as a promoter, enhancer, transcription termination signal (e.g., polyadenylation signal), internal ribosomal entry sites (IRES), protein degradation signal, or the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a targeter nucleic acid or a modulator nucleic acid) or a coding sequence (e.g., a Cas protein) and/or regulate translation of an encoded polypeptide. Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY, 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). A tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g., liver, pancreas), or particular cell types (e.g., lymphocytes). Regulatory elements may also direct expression in a temporal-dependent manner, such as in a cell-cycle dependent or developmental stage-dependent manner, which may or may not also be tissue or cell-type specific. In certain embodiments, a vector comprises one or more pol III promoter (e.g., 1, 2, 3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer), the SV40 promoter, the dihydrofolate reductase promoter, the -actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1 promoter. Also encompassed by the term “regulatory element” are enhancer elements, such as WPRE; CMV enhancers; the R-U5' segment in LTR of HTLV-I (see, Takebe et al. (1988) MOL. CELL. BIOL., 8: 466); SV40
enhancer; and the intron sequence between exons 2 and 3 of rabbit -globin (see, O’Hare et al. (1981) PROC. NATL. ACAD. SCI. USA., 78: 1527). It will be appreciated by those skilled in the art that the design of the expression vector can depend on factors such as the choice of the host cell to be transformed, the level of expression desired, etc. A vector can be introduced into host cells to produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., CRISPR transcripts, proteins, enzymes, mutant forms thereof, or fusion proteins thereof). [0139] In certain embodiments, the nucleotide sequence encoding the Cas protein is codon optimized for expression in a prokaryotic cell, e.g., E coli, eukaryotic host cell, e.g., a yeast cell (e.g., S. cerevisiae), a mammalian cell (e.g., a mouse cell, a rat cell, or a human cell), or a plant cell. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at kazusa.or.jp/codon/ and these tables can be adapted in a number of ways (see, Nakamura et al. (2000) NUCL. ACIDS RES., 28: 292). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In certain embodiments, the codon optimization facilitates or improves expression of the Cas protein in the host cell. C. Donor templates [0140] Cleavage of a target nucleotide sequence in the genome of a cell by a CRISPR-Cas system or complex can activate DNA damage pathways, which may rejoin the cleaved DNA fragments by NHEJ or HDR. HDR requires a repair template, either endogenous or exogenous, to transfer the sequence information from the repair template to the target. [0141] In certain embodiments, an engineered, non-naturally occurring system or CRISPR expression system further comprises a donor template. As used herein, the term “donor template” can refer to a nucleic acid designed to serve as a repair template at or near the target nucleotide sequence upon introduction into a cell or organism. In certain embodiments, the donor template is complementary to a polynucleotide comprising the target nucleotide sequence or a portion thereof. When optimally aligned, a donor template may overlap with one or more nucleotides of
a target nucleotide sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides). The nucleotide sequence of the donor template is typically not identical to the genomic sequence that it replaces. Rather, the donor template may contain one or more substitutions, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair. In certain embodiments, the donor template comprises a non-homologous sequence flanked by two regions of homology (i.e., homology arms), such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. In certain embodiments, the donor template comprises a non- homologous sequence 10-100 nucleotides, 50-500 nucleotides, 100-1,000 nucleotides, 200-2,000 nucleotides, or 500-5,000 nucleotides in length positioned between two homology arms. [0142] Generally, the homologous region(s) of a donor template has at least 50% sequence identity to a genomic sequence with which recombination is desired. The homology arms are designed or selected such that they are capable of recombining with the nucleotide sequences flanking the target nucleotide sequence under intracellular conditions. In certain embodiments, where HDR of the non-target strand is desired, the donor template comprises a first homology arm homologous to a sequence 5’ to the target nucleotide sequence and a second homology arm homologous to a sequence 3’ to the target nucleotide sequence. In certain embodiments, the first homology arm is at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identical to a sequence 5’ to the target nucleotide sequence. In certain embodiments, the second homology arm is at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identical to a sequence 3’ to the target nucleotide sequence. In certain embodiments, when the donor template sequence and a polynucleotide comprising a target nucleotide sequence are optimally aligned, the nearest nucleotide of the donor template is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, or more nucleotides from the target nucleotide sequence. [0143] In certain embodiments, the donor template further comprises an engineered sequence not homologous to the sequence to be repaired. Such engineered sequence can harbor a barcode and/or a sequence capable of hybridizing with a donor template-recruiting sequence disclosed herein. [0144] In certain embodiments, the donor template further comprises one or more mutations relative to the genomic sequence, wherein the one or more mutations reduce or prevent cleavage,
by the same CRISPR-Cas system, of the donor template or of a modified genomic sequence with at least a portion of the donor template sequence incorporated. In certain embodiments, in the donor template, the PAM adjacent to the target nucleotide sequence and recognized by the Cas nuclease is mutated to a sequence not recognized by the same Cas nuclease. In certain embodiments, in the donor template, the target nucleotide sequence (e.g., the seed region) is mutated. In certain embodiments, the one or more mutations are silent with respect to the reading frame of a protein-coding sequence encompassing the mutated sites. [0145] The donor template can be provided to the cell as single-stranded DNA, single- stranded RNA, double-stranded DNA, or double-stranded RNA. It is understood that a CRISPR- Cas system, such as a system disclosed herein, may possess nuclease activity to cleave the target strand, the non-target strand, or both. When HDR of the target strand is desired, a donor template having a nucleic acid sequence complementary to the target strand is also contemplated. [0146] The donor template can be introduced into a cell in linear or circular form. If introduced in linear form, the ends of the donor template may be protected (e.g., from exonucleolytic degradation) by methods known to those of skill in the art. For example, one or more dideoxynucleotide residues are added to the 3 terminus of a linear molecule and/or self- complementary oligonucleotides are ligated to one or both ends (see, for example, Chang et al. (1987) PROC. NATL. ACAD SCI USA, 84: 4959; Nehls et al. (1996) SCIENCE, 272: 886; see also the chemical modifications for increasing stability and/or specificity of RNA disclosed supra). Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the termini of a linear donor template, additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination. [0147] A donor template can be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide. In certain embodiments, the donor template is a DNA. In certain embodiments, a donor template is in the same nucleic acid as a sequence encoding the single guide nucleic acid, a sequence encoding the targeter nucleic acid, a sequence encoding the modulator nucleic acid, and/or a sequence encoding the Cas protein, where applicable. In certain embodiments, a donor template is provided in a separate nucleic acid. A donor template polynucleotide may be of any suitable length, such as about or at least about 50, 75, 100, 150, 200, 500, 1000, 2000, 3000, 4000, or more nucleotides in length.
[0148] A donor template can be introduced into a cell as an isolated nucleic acid. Alternatively, a donor template can be introduced into a cell as part of a vector (e.g., a plasmid) having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance, that are not intended for insertion into the DNA region of interest. Alternatively, a donor template can be delivered by viruses (e.g., adenovirus, adeno-associated virus (AAV)). In certain embodiments, the donor template is introduced as an AAV, e.g., a pseudotyped AAV. The capsid proteins of the AAV can be selected by a person skilled in the art based upon the tropism of the AAV and the target cell type. For example, in certain embodiments, the donor template is introduced into a hepatocyte as AAV8 or AAV9. In certain embodiments, the donor template is introduced into a hematopoietic stem cell, a hematopoietic progenitor cell, or a T lymphocyte (e.g., CD8+ T lymphocyte) as AAV6 or an AAVHSC (see, U.S. Patent No.9,890,396). It is understood that the sequence of a capsid protein (VP1, VP2, or VP3) may be modified from a wild-type AAV capsid protein, for example, having at least 50% (e.g., at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) sequence identity to a wild-type AAV capsid sequence. [0149] The donor template can be delivered to a cell (e.g., a primary cell) by various delivery methods, such as a viral or non-viral method disclosed herein. In certain embodiments, a non- viral donor template is introduced into the target cell as a naked nucleic acid or in complex with a liposome or poloxamer. In certain embodiments, a non-viral donor template is introduced into the target cell by electroporation. In other embodiments, a viral donor template is introduced into the target cell by infection. The engineered, non-naturally occurring system can be delivered before, after, or simultaneously with the donor template (see, International (PCT) Application Publication No. WO 2017/053729). A skilled person in the art will be able to choose proper timing based upon the form of delivery (consider, for example, the time needed for transcription and translation of RNA and protein components) and the half-life of the molecule(s) in the cell. In particular embodiments, where the CRISPR-Cas system including the Cas protein is delivered by electroporation (e.g., as an RNP), the donor template (e.g., as an AAV) is introduced into the cell within 4 hours (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 90, 120, 150, 180, 210, or 240 minutes) after the introduction of the engineered, non-naturally occurring system. [0150] In certain embodiments, the donor template is conjugated covalently to a modulator nucleic acid. Covalent linkages suitable for this conjugation are known in the art and are described, for example, in U.S. Patent No.9,982,278 and Savic et al. (2018) ELIFE 7:e33761. In certain embodiments, the donor template is covalently linked to a modulator nucleic acid (e.g.,
the 5’ end of the modulator nucleic acid) through an internucleotide bond. In certain embodiments, the donor template is covalently linked to a modulator nucleic acid (e.g., the 5’ end of the modulator nucleic acid) through a linker. [0151] In certain embodiments, the donor template can comprise any nucleic acid chemistry. In certain embodiments, the donor template can comprise DNA and/or RNA nucleotides. In certain embodiments, the donor template can comprise single-stranded DNA, linear single- stranded RNA, linear double-stranded DNA, linear double-stranded RNA, circular single- stranded DNA, circular single-stranded RNA, circular double-stranded DNA, or circular double- stranded RNA. In certain embodiments, the donor template comprises a mutation in a PAM sequence to partially or completely abolish binding of the RNP to the DNA. In certain embodiments, the donor template is present at a concentration of at least 0.05, 0.01, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.25, 1.5, 1.75, 2, 3, or 4, and/or no more than 0.01, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.25, 1.5, 1.75, 2, 3, 4, or 5 μg μL-1, for example 0.01-5 μg μL-1. In certain embodiments, the donor template comprises one or more promoters. In certain embodiments, the donor template comprises a promoter that shares at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99.5% sequence identity with any one of SEQ ID NOs: 78-85 of Table 4. TABLE 4: Promoter sequences
D. Efficiency and specificity [0152] An engineered, non-naturally occurring system can be evaluated in terms of efficiency and/or specificity in nucleic acid targeting, cleavage, or modification. [0153] In certain embodiments, an engineered, non-naturally occurring system has high efficiency. For example, in certain embodiments, at least 1, 1.5, 2, 2.5, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99, or 100% of a population of nucleic acids having the target nucleotide sequence and a cognate PAM, when contacted with the engineered, non-naturally occurring system, is targeted, cleaved, or modified. In certain embodiments, the genomes of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,
85, 90, 95, 96, 97, 98, 99, or 100% of a population of cells, when the engineered, non-naturally occurring system is delivered into the cells, are targeted, cleaved, or modified. [0154] It has been observed that for a given spacer sequence, the occurrence of on-target events and the occurrence of off-target events are generally correlated. For certain therapeutic purposes, lower on-target efficiency can be tolerated and low off-target frequency is more desirable. For example, when editing or modifying a proliferating cell that will be delivered to a subject and proliferate in vivo, tolerance to off-target events is low. Prior to delivery, it is possible to assess the on-target and off-target events, thereby selecting one or more colonies that have the desired edit or modification and lack any undesired edit or modification. Notwithstanding, the on-target efficiency may need to meet a certain standard to be suitable for therapeutic use. High editing efficiency in a standard CRISPR-Cas system allows tuning of the system, for example, by reducing the binding of the guide nucleic acids to the Cas protein, without losing therapeutic applicability. [0155] In certain embodiments, when a population of nucleic acids having the target nucleotide sequence and a cognate PAM is contacted with the engineered, non-naturally occurring system disclosed herein, the frequency of off-target events (e.g., targeting, cleavage, or modification, depending on the function of the CRISPR-Cas system) is reduced. Methods of assessing off-target events were summarized in Lazzarotto et al. (2018) Nat Protoc.13(11): 2615-42, and include discovery of in situ Cas off-targets and verification by sequencing (DISCOVER-seq) as disclosed in Wienert et al. (2019) Science 364(6437): 286-89; genome- wide unbiased identification of double-stranded breaks (DSBs) enabled by sequencing (GUIDE- seq) as disclosed in Kleinstiver et al. (2016) Nat. Biotech.34: 869-74; circularization for in vitro reporting of cleavage effects by sequencing (CIRCLE-seq) as described in Kocak et al. (2019) Nat. Biotech.37: 657-66. In certain embodiments, the off-target events include targeting, cleavage, or modification at a given off-target locus (e.g., the locus with the highest occurrence of off-target events detected). In certain embodiments, the off-target events include targeting, cleavage, or modification at all the loci with detectable off-target events, collectively. [0156] In certain embodiments, genomic mutations are detected in no more than 0.0001%, 0.0002%, 0.0003%, 0.0004%, 0.0005%, 0.0006%, 0.0007%, 0.0008%, 0.0009%, 0.001%, 0.002%, 0.003%, 0.004%, 0.005%, 0.006%, 0.007%, 0.008%, 0.009%, 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1%, 2%, 3%, 4%, or 5% of the cells at any off-target loci (in aggregate). In certain embodiments, the ratio of the percentage of cells having an on-target event to the percentage of cells having any off-target event (e.g., the ratio of the percentage of cells having an on-target editing event to the percentage of cells having a mutation at any off-target loci) is at least 10, 20,
30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000. It is understood that genetic variation may be present in a population of cells, for example, by spontaneous mutations, and such mutations are not included as off-target events. E. Multiplexing [0157] The method of targeting, editing, and/or modifying a genomic DNA disclosed herein can be conducted in multiplicity. For example, a library of targeter nucleic acids can be used to target multiple genomic loci; a library of donor templates can also be used to generate multiple insertions, deletions, and/or substitutions. The multiplex assay can be conducted in a screening method wherein each separate cell culture (e.g., in a well of a 96-well plate or a 384-well plate) is exposed to a different guide nucleic acid having a different targeter stem sequence and/or a different donor template. The multiplex assay can also be conducted in a selection method wherein a cell culture is exposed to a mixed population of different guide nucleic acids and/or donor templates, and the cells with desired characteristics (e.g., functionality) are enriched or selected by advantageous survival or growth, resistance to a certain agent, expression of a detectable protein (e.g., a fluorescent protein that is detectable by flow cytometry), etc. [0158] In certain embodiments, the plurality of guide nucleic acids and/or the plurality of donor templates are designed for saturation editing. For example, in certain embodiments, each nucleotide position in a sequence of interest is systematically modified with each of all four traditional bases, A, T, G and C. In other embodiments, at least one sequence in each gene from a pool of genes of interest is modified, for example, according to a CRISPR design algorithm. In certain embodiments, each sequence from a pool of exogenous elements of interest (e.g., protein coding sequences, non-protein coding genes, regulatory elements) is inserted into one or more given loci of the genome. [0159] It is understood that the multiplex methods suitable for the purpose of carrying out a screening or selection method, which is typically conducted for research purposes, may be different from the methods suitable for therapeutic purposes. For example, constitutive expression of certain elements (e.g., a Cas nuclease and/or a guide nucleic acid) may be undesirable for therapeutic purposes due to the potential of increased off-targeting. Conversely, for research purposes, constitutive expression of a Cas nuclease and/or a guide nucleic acid may be desirable. For example, the constitutive expression provides a large window during which other elements can be introduced. When a stable cell line is established for the constitutive expression, the number of exogenous elements that need to be co-delivered into a single cell is also reduced. Therefore, constitutive expression of certain elements can increase the efficiency
and reduce the complexity of a screening or selection process. Inducible expression of certain elements of the system disclosed herein may also be used for research purposes given similar advantages. Expression may be induced by an exogenous agent (e.g., a small molecule) or by an endogenous molecule or complex present in a particular cell type (e.g., at a particular stage of differentiation). Methods known in the art, such as those described herein, can be used for constitutively or inducibly expressing one or more elements. For example, the specificity of CRISPR nucleases is at least partially dictated by the uniqueness of the spacer (in combination with spacer sequence’s proximity to a requisite PAM) and its off-target score can be calculated with algorithms, such as crispr.mit.edu (Hsu et al. (2013) Nat. Biotech.31: 827-832). The highest possible score is 100, which shows probability for high specificity and few off targets. Because our SHS library targets intergenic regions, the algorithm for gRNA prediction should be able to make alignments with repeated regions and low-complexity sequences. [0160] It is further understood that despite the need to introduce multiple elements—the single guide nucleic acid and the Cas protein; or the targeter nucleic acid, the modulator nucleic acid, and the Cas protein—these elements can be delivered into the cell as a single complex of pre-formed RNP. Therefore, the efficiency of the screening or selection process can also be achieved by pre-assembling a plurality of RNP complexes in a multiplex manner. [0161] In certain embodiments, the method disclosed herein further comprises a step of identifying a guide nucleic acid, a Cas protein, a donor template, or a combination of two or more of these elements from the screening or selection process. A set of barcodes may be used, for example, in the donor template between two homology arms, to facilitate the identification. In specific embodiments, the method further comprises harvesting the population of cells; selectively amplifying a genomic DNA or RNA sample including the target nucleotide sequence(s) and/or the barcodes; and/or sequencing the genomic DNA or RNA sample and/or the barcodes that has been selectively amplified. [0162] In addition, the present invention provides a library comprising a plurality of guide nucleic acids, such as a plurality of guide nucleic acids disclosed herein. In another aspect, the present invention provides a library comprising a plurality of nucleic acids each comprising a regulatory element operably linked to a different guide nucleic acid such as a different guide nucleic acid disclosed herein. These libraries can be used in combination with one or more Cas proteins or Cas-coding nucleic acids, such as disclosed herein, and/or one or more donor templates, such as disclosed herein, for a screening or selection method.
F. Genomic safe harbors [0163] Genome engineering is an area of research seeking to modify genes of living organisms to improve our understanding of gene function and to develop methods for genome engineering that treat genetic or acquired diseases, among many others. To modify the genome of target cells, skilled artisans use one or more available tools to introduce changes into the genome at targeted locations to modify the sequence of a target polynucleotide, e.g., a target gene, in desired ways, e.g., modulate gene expression, modulate gene sequences, remove gene sequences, introduce genes, e.g., exogenous DNA, e.g., transgenes, and the like. Efficient transgene insertion may be accomplished through non-precise methods including but not limited to viral vectors, such as, retroviral vectors, e.g., adeno-associate virus (AAV) and the like, or precise methods including but not limited to guided nucleases, such as, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), homing endonucleases, e.g., restriction endonucleases, or nucleic acid-guided nuclease, e.g., CRISPR-cas, e.g., Cas9 and Cas12a and engineered versions thereof. [0164] Exogenous genes, e.g., transgenes, inserted into the genome of a target human cell either randomly, e.g., through retroviral vectors, or in a targeted manner, e.g., through the action of a nucleic acid-guided nuclease, such as Cas, may interact with other genomic elements in unpredictable ways. Due to the complex transcriptional regulation of genes in mammalian cells through networks of cis and trans regulatory elements, such as proximal and distal enhancers, and multiple transcription factors, attempts to alter the default genomic architecture by integration of exogenous DNA, e.g., transgenes, or synthetic sequences can affect the expression of the transgene itself leading to complete attenuation or complete silencing, and/or the expression of both nearby and distant endogenous genes that can, e.g., compromise the safety checkpoints that healthy cells have including dysregulation of expression of key genes, such as oncogenes and tumor suppressor genes, that can alter cellular behavior in dramatic ways, i.e., promoting clonal expansion or malignant transformation of the host. [0165] Gene integration next to regulatory elements of proto-oncogenes has been shown to cause oncogenic transformation, which is particularly important when engineering cells for therapeutic applications. Therefore, the identification of suitable target polynucleotide comprising a target nucleotide sequence in the human genome wherein the insertion of a transgene leads to suitable expression of the transgene without disruption of neighboring genes is desired. In particular, for gene and cell therapy applications, suitable target polynucleotide comprising a target nucleotide sequence in the human genome wherein the insertion of a transgene leads to sufficient expression of the transgene in a therapeutic cell e.g., a T cell, e.g., a CAR T cell; or precursor cell, e.g., a stem cell, such as a hematopoietic stem cell, without
malignant transformation or any other disruption that would be harmful to an individual after implantation is desired. [0166] Expression of exogenous genes, e.g., transgenes, in desired cell types and/or developmental/differentiation stages relies on integration into suitable target polynucleotide comprising a target nucleotide sequence that results in sufficient expression, to a degree sufficient for the intended purpose, from the candidate locus. Expression from a specific genomic site can be affected by many factors including but not limited to cell type and differentiation stage, as one or more components of the target polynucleotide get activated during differentiation while others get silenced, and changes in chromatin architecture. Therefore, the identification of suitable target polynucleotides comprising a target nucleotide sequence in the human genome wherein insertion of exogenous DNA, e.g., a transgene, leads to sufficient expression in the target human cell, and, in the case of stem cells, the expression is maintained at a sufficient level through (1) differentiation and (2) through clonal expansion is desired. The current disclosure provides significant advances in the ability engineer human genomes by providing compositions and methods for targeting and delivering exogenous genes, e.g., transgenes, to the suitable target polynucleotide comprising a target nucleotide sequence. [0167] Provided herein are compositions and methods for genome engineering. Certain embodiments comprise compositions. Certain embodiments comprise composition for editing genomes. embodiments disclosed herein concern novel guide nucleic acids (gNAs), e.g., gRNAs, that are complementary to a target nucleotide sequence in a target polynucleotide. As used herein, a “target polynucleotide,” includes a polynucleotide in which a target nucleotide sequence is located. As used herein, a “target nucleotide sequence” includes a sequence to which a guide sequence can bind, e.g., has complementarity to, where binding between a target nucleotide sequence and a guide sequence may allow the activity of a nucleic acid-guided nuclease complex. Further embodiments disclosed herein concern novel gNAs, e.g., gRNAs, that are complementary to a target nucleotide sequence in a target polynucleotide into which insertion of exogenous DNA, e.g., a transgene, doesn’t negatively affect the cell, e.g., significantly affect the expression of one or more endogenous genes or result in a malignant transformation of the cell. In further embodiments disclosed herein, gene expression demonstrated in the human target cell is maintained through differentiation of the human target cell and/or through proliferation in the one or more progeny cells at a level sufficient for the ultimate use of the cells. Certain embodiments disclosed herein concern novel nucleic acid-guided nuclease complexes, e.g., RNPs, such as Cas bound to a gNA, that are complementary to a target nucleotide sequence within a target polynucleotide and hydrolyze the phosphodiester back bone (also referred as cleave or cut) in at least one position on at least one strand of the target polynucleotide. Certain
embodiments disclosed herein concern methods for selecting and using gNAs, e.g., gRNAs, for genome engineering. Certain embodiments concern methods for using gNAs that are complementary to a target nucleotide sequence within a target polynucleotide, synthesizing the gNA and nucleic-acid-guided nuclease, and/or combining the nucleic guided nuclease with the gNA to form a nucleic acid-guided nuclease complex, e.g., RNP. Certain embodiments disclosed herein concern methods. Certain embodiments disclosed herein concern methods for engineering genomes. Certain embodiments disclosed herein concern methods where a nucleic acid-guided nuclease complex, e.g., RNP, is introduced, e.g., transfected, into a human target cell along with a donor template, e.g., an exogenous DNA, e.g., a transgene, in which the nucleic-acid guided nuclease cleaves the backbone at a least one position in at least one of the strands of the target polynucleotide and the donor template is used to repair the cleaved target polynucleotide, introducing at least a portion of the donor template into the target polynucleotide. As used herein, “exogenous DNA” or a “transgene” includes any gene, natural or synthetic, which is introduced into the genome of an organism or cell to which it is not endogenous. The transgene may or may not retain the ability to be expressed and/or produce RNA or protein in the human target cell. The transgene may or may not alter the resulting phenotype of the human target cell. Certain embodiments include human target cells, e.g., a eukaryotic cell, e.g., a mammalian cell, such as a human cell, for example a stem cell or an immune cell, generated through a method where the nucleic acid-guided nuclease complex, e.g., RNP, is introduced, e.g., transfected, into a human target cell along with a donor template, e.g., as an exogenous DNA or a transgene, such as a chimeric antigen receptor (CAR), in which the nucleic-acid guided nuclease cleaves at or near a targets sequence in a target polynucleotide and the donor template is used to repair the cleaved target polynucleotide introducing at least a portion of the donor template into the target polynucleotide. Certain embodiments disclosed herein include promoter sequences adjacent to an exogenous gene, e.g., a transgene; in certain cases, constructs including the promoter, when introduced into a target polynucleotide of a human target cell, e.g., an immune cell or a stem cell, maintain sufficient gene expression in the edited human target cell for the intended purpose of the cell or its progeny. In certain embodiments, the human target cell is viable after introduction of the exogenous DNA. [0168] As used herein, a “human target cell” includes a cell into which an exogenous product, e.g., a protein, a nucleic acid, or a combination thereof, has been introduced. In certain cases, a human target cell may be used to produce a gene product from an exogenous DNA, e.g., a transgene, such as an exogenous protein, e.g., a CAR. In certain cases, a human target cell may comprise a target nucleotide sequence within target polynucleotide wherein a nucleic acid-guided
nuclease hybridizes and cleaves at a site of cleavage at one or more positions on one or more strands of the target polynucleotide at or near the target nucleotide sequence. [0169] As used herein, a “site of cleavage” includes the location or locations at which a nucleic acid-guided nuclease complex will hydrolyze the phosphodiester backbone of a single- stranded or double-stranded target polynucleotide, after binding at a target nucleotide sequence in the target polynucleotide. In certain cases in which the target polynucleotide of a nucleic acid- guided nuclease complex is double stranded, binding of the nucleic acid-guided nuclease complex to a target nucleotide sequence within the target polynucleotide can result in hydrolysis of one of the strands of the target polynucleotide at or near the target nucleotide sequence, resulting in strand cleavage. In such a case, the nucleic acid-guided nuclease complex can cleave either strand of the target polynucleotide. In certain cases, binding of the nucleic acid-guided nuclease complex to a target nucleotide sequence within a target polynucleotide can result in hydrolysis of both strands of the target polynucleotide at or near the target nucleotide sequence, resulting in cleavage of both strands. The sites of cleavage can be the same for both strands, resulting in a blunt end, or the sites of cleavage for each strand can be offset resulting in single strand overhangs, e.g., sticky ends. In certain cases, mismatches at or near the site of cleavage may or may not affect the cleavage efficiency of the nucleic acid-guided nuclease complex. [0170] In certain cases, uncontrolled gene integration next to regulatory elements of proto- oncogenes has been shown to cause oncogenic transformation, which is particularly important when engineering cells for therapeutic applications. Therefore, it is desired to identify suitable target polynucleotides comprising target nucleotide sequences that result in safe, stable integration of exogenous DNA with sufficient expression in a human target cell and its resultant progeny. [0171] Exemplary characteristics of a target nucleotide sequence that can demonstrate predictable function without potentially harmful alterations in human target cell genomic activity include one or more of (1) >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, (2) >150 kb, for example, >200, such as >250, and in some cases >300 kb away from any miRNA/other functional small RNA, (3) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, (4) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any replication origin, (5) >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any ultra-conserved element, (6) demonstrating low transcriptional activity, (7) outside of a copy number variable region, (8) located in open chromatin, and (9) unique, i.e., 1 copy per genome.
[0172] In certain embodiments, provided herein are compositions. In certain embodiments, provided herein are compositions for engineering a human target cell at suitable target nucleotide sequences within a target polynucleotide of the human target cell. [0173] In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least one of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least two of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least three of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least four of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least five of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least six of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least seven of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has at least eight of the exemplary characteristics. In certain embodiments, a suitable target polynucleotide that comprises a target nucleotide sequence has all the exemplary characteristics. [0174] In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at one additional exemplary characteristic. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least two additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least three additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least four additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least five additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises at least six additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in
some cases >50 kb away from any 5’ gene end and further comprises at least seven additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and further comprises all eight additional exemplary characteristics. [0175] In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at one additional exemplary characteristic. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least two additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least three additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least four additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least five additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least six additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises at least seven additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene and further comprises all eight additional exemplary characteristics. [0176] In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, and >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least one additional exemplary characteristic. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away
from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least two additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least three additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least four additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least five additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises at least six additional exemplary characteristics. In certain embodiments, a suitable target polynucleotide is >150 kb, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene, >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end, and further comprises all seven additional exemplary characteristics. [0177] In a preferred embodiment, a suitable target polynucleotide is >10 kb, for example, >20, such as >30, and in some cases >50 kb away from any 5’ gene end and >150, for example, >200, such as >250, and in some cases >300 kb away from a known cancer-related gene. [0178] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2043 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2043. In a preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2043. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2043.
[0179] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2042 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2042. In a preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2042. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2042. [0180] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2041 and 2043 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2041 and 2043. In a preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2041 and 2043. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2041 and 2043. [0181] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise any one of SEQ ID NOs: 2020- 2041 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to any one of SEQ ID NOs: 2020-2041. In a preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 98% identical to any one of SEQ ID NOs: 2020-2041. In a more preferred embodiment, a suitable target polynucleotide comprising a target nucleotide sequence is at least 99% identical to any one of SEQ ID NOs: 2020-2041. [0182] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise at least a portion of, for
example, nucleotides 1-495, 1-490, 1-485, 1-480, 1-475, 1-470, 1-465, 1-460, 1-455, 1-450, 1- 445, 1-440, 1-435, 1-430, 1-425, 1-420, 1-415, 1-410, 1-405, or 1-400, of any one of SEQ ID NOs: 2020-2030 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to the portion of any one of SEQ ID NOs: 2020- 2030. [0183] In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence, e.g., for transgene insertion, may comprise at least a portion of, for example, nucleotides 5-500, 10-500, 15-500, 20-500, 25-500, 30-500, 35-500, 40-500, 45-500, 50-500, 55-500, 60-500, 65-500, 70-500, 75-500, 80-500, 85-500, 90-500, 95-500, or 100-500, of any one of SEQ ID NOs: 2031-2041 of Table 5. In certain embodiments, a suitable target polynucleotide comprising a target nucleotide sequence is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or completely identical to the portion of any one of SEQ ID NOs: 2031-2041. TABLE 5 suitable target polynucleotides comprising a target nucleotide sequence for transgene insertion
[0184] In certain cases, expression of an exogenous DNA, e.g., transgene, inserted in a target polynucleotide at or near a target nucleotide sequence may depend on cell type and differentiation stage, as one or more components of a target polynucleotide get activated during differentiation while others get silenced, which may or may not be correlated with rearrangements of the chromatin architecture reorganization during differentiation. To overcome this, in certain embodiments, additional to the exemplary characteristics described above, a suitable target polynucleotide comprising a target nucleotide sequence demonstrates suitable
expression of an inserted exogenous DNA, e.g., transgene, throughout differentiation and clonal expansion. G. Guide nucleic acids [0185] In certain embodiments, provided herein are compositions, methods, and kits comprising a guide nucleic acid. In certain embodiments, the guide nucleic acid comprises: (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence. In certain embodiments, the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6. In certain embodiments, the modulator nucleic acid comprises one or more modifications as disclosed herein, preferably at least 1, 2, 3, 4, 5, 6, 7 and/or not more than 2, 3, 4, 5, 6, 7, or 8 modifications at or near the 5’ end of the modulator nucleic acid. In preferred embodiments, the modulator nucleic acid comprises a 5’ 2’-O-methoxy modified nucleotide. In certain embodiments, the modulator nucleic acid comprises at least 1, 2, 3, 4, or 5 and/or not more than 2, 3, 4, 5, or 5 modified phosphodiester linkages as disclosed herein. In certain embodiments, the modulator nucleic acid comprises 1 phosphorothioate modified linkage at or near the 5’ end. In certain embodiments, the modulator nucleic acid comprises 2 phosphorothioate modified linages at or near the 5’ end. [0186] In certain embodiments, the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7. In certain embodiments, the targeter nucleic acid comprises one or more modifications as disclosed herein, preferably at least 1, 2, 3, 4, 5, 6, 7 and/or not more than 2, 3, 4, 5, 6, 7, or 8 modifications at or near the 3’ end of the targeter nucleic acid. In preferred embodiments, the targeter nucleic acid comprises at least 1, 2, 3, 4, 5, 6, or 7 and/or not more than 2, 3, 4, 5, 6, 8, or 83’ 2’-O-methoxy modified nucleotides, preferably 1-5, more preferably 1-3. In certain embodiments, the targeter nucleic acid comprises at least 1, 2, 3, 4, or 5 and/or not more than 2, 3, 4, 5, or 5 modified phosphodiester linkages as disclosed herein. In certain embodiments, the targeter nucleic acid comprises 1 phosphorothioate modified linkage at or near the 3’ end. In certain embodiments, the targeter nucleic acid comprises 2 phosphorothioate modified linages at or near the 3’ end. In certain embodiments, the targeter nucleic acid comprises 3-5 phosphorothioate modification at or near the 3’ end. In certain embodiments, the targeter nucleic acid comprises at least 1, 2, 3, 4, 5, 6, or 7 and/or not more than 2, 3, 4, 5, 6, 7, or 82’ fluoro- modifications at or near the 3’ end.
Table 6: modulator sequences
Table 7: targeter sequences
III. Pharmaceutical compositions [0187] Provided herein is a composition (e.g., pharmaceutical composition) comprising a guide nucleic acid, an engineered, non-naturally occurring system, or a eukaryotic cell, such as a guide nucleic acid, an engineered, non-naturally occurring system, or a eukaryotic cell, disclosed herein. In certain embodiments, the composition comprises an RNP comprising a guide nucleic acid, such as a guide nucleic acid disclosed herein, and a Cas protein (e.g., Cas nuclease). In certain embodiments, the composition comprises a single guide nucleic acid, such as a single guide nucleic acid disclosed herein. In certain embodiments, the composition comprises an RNP comprising the single guide nucleic acid, and a Cas protein (e.g., Cas nuclease). In certain embodiments, the composition comprises an RNP comprising the targeter nucleic acid, the modulator nucleic acid, and a Cas protein (e.g., Cas nuclease). In certain embodiments, the composition comprises a complex of a targeter nucleic acid and a modulator nucleic acid, such as a complex of a targeter nucleic acid and a modulator nucleic acid disclosed herein. In certain
embodiments, the composition comprises an RNP comprising the targeter nucleic acid, the modulator nucleic acid, and a Cas protein (e.g., Cas nuclease). [0188] In certain embodiments provided herein is a method of producing a composition, the method comprising incubating a single guide nucleic acid, such as a single guide nucleic acid disclosed herein, with a Cas protein, thereby producing a complex of the single guide nucleic acid and the Cas protein (e.g., an RNP). In certain embodiments, the method further comprises purifying the complex (e.g., the RNP). [0189] In certain embodiments, provided is a method of producing a composition, the method comprising incubating a targeter nucleic acid and a modulator nucleic acid, such as a targeter nucleic acid and a modulator nucleic acid disclosed herein, under suitable conditions, thereby producing a composition (e.g., pharmaceutical composition) comprising a complex of the targeter nucleic acid and the modulator nucleic acid. In certain embodiments, the method further comprises incubating the targeter nucleic acid and the modulator nucleic acid with a Cas protein (e.g., the Cas nuclease that the targeter nucleic acid and the modulator nucleic acid are capable of activating or a related Cas protein), thereby producing a complex of the targeter nucleic acid, the modulator nucleic acid, and the Cas protein (e.g., an RNP). In certain embodiments, the method further comprises purifying the complex (e.g., the RNP). [0190] For therapeutic use, a guide nucleic acid, an engineered, non-naturally occurring system, a CRISPR expression system, or a cell comprising such system or modified by such system disclosed herein is combined with a pharmaceutically acceptable carrier. The term “pharmaceutically acceptable” as used herein can refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit-to-risk ratio. [0191] The term “pharmaceutically acceptable carrier” as used herein includes buffers, carriers, and excipients suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable carriers include any of the standard pharmaceutical carriers, such as a phosphate buffered saline solution, water, emulsions (e.g., such as an oil/water or water/oil emulsions), and various types of wetting agents. The compositions also can include stabilizers and preservatives. For examples of carriers, stabilizers and adjuvants, see, e.g., Martin, Remington’s Pharmaceutical Sciences, 15th Ed., Mack Publ. Co., Easton, PA (1975). Pharmaceutically acceptable carriers include buffers, solvents, dispersion media, coatings, isotonic and absorption delaying agents, or the like, that are
compatible with pharmaceutical administration. The use of such media and agents for pharmaceutically active substances is known in the art. [0192] In certain embodiments, a pharmaceutical composition disclosed herein comprises a salt, e.g., NaCl, MgCl2, KCl, MgSO4, etc.; a buffering agent, e.g., a Tris buffer, N-(2- Hydroxyethyl)piperazine-N -(2-ethanesulfonic acid) (HEPES), 2-(N-Morpholino)ethanesulfonic acid (MES), MES sodium salt, 3-(N-Morpholino)propanesulfonic acid (MOPS), N- tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS), etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a nuclease inhibitor; or the like. For example, in certain embodiments, a subject composition comprises a subject DNA-targeting RNA, e.g., gRNA, and a buffer for stabilizing nucleic acids. [0193] In certain embodiments, a pharmaceutical composition may contain formulation materials for modifying, maintaining or preserving, for example, the pH, osmolarity, viscosity, clarity, color, isotonicity, odor, sterility, stability, rate of dissolution or release, adsorption or penetration of the composition. In such embodiments, suitable formulation materials include, but are not limited to, amino acids (such as glycine, glutamine, asparagine, arginine or lysine); antimicrobials; antioxidants (such as ascorbic acid, sodium sulfite or sodium hydrogen-sulfite); buffers (such as borate, bicarbonate, Tris-HCl, citrates, phosphates or other organic acids); bulking agents (such as mannitol or glycine); chelating agents (such as ethylenediamine tetraacetic acid (EDTA)); complexing agents (such as caffeine, polyvinylpyrrolidone, beta- cyclodextrin or hydroxypropyl-beta-cyclodextrin); fillers; monosaccharides; disaccharides; and other carbohydrates (such as glucose, mannose or dextrins); proteins (such as serum albumin, gelatin or immunoglobulins); coloring, flavoring and diluting agents; emulsifying agents; hydrophilic polymers (such as polyvinylpyrrolidone); low molecular weight polypeptides; salt- forming counterions (such as sodium); preservatives (such as benzalkonium chloride, benzoic acid, salicylic acid, thimerosal, phenethyl alcohol, methylparaben, propylparaben, chlorhexidine, sorbic acid or hydrogen peroxide); solvents (such as glycerin, propylene glycol or polyethylene glycol); sugar alcohols (such as mannitol or sorbitol); suspending agents; surfactants or wetting agents (such as pluronics, PEG, sorbitan esters, polysorbates such as polysorbate 20, polysorbate, triton, tromethamine, lecithin, cholesterol, tyloxapal); stability enhancing agents (such as sucrose or sorbitol); tonicity enhancing agents (such as alkali metal halides, preferably sodium or potassium chloride, mannitol sorbitol); delivery vehicles; diluents; excipients and/or pharmaceutical adjuvants (see, Remington’s Pharmaceutical Sciences, 18th ed. (Mack Publishing Company, 1990).
[0194] In certain embodiments, a pharmaceutical composition may contain nanoparticles, e.g., polymeric nanoparticles, liposomes, or micelles (See Anselmo et al. (2016) Bioeng. Transl. Med.1: 10-29). In certain embodiment, the pharmaceutical composition comprises an inorganic nanoparticle. Exemplary inorganic nanoparticles include, e.g., magnetic nanoparticles (e.g., Fe3MnO2) or silica. The outer surface of the nanoparticle can be conjugated with a positively charged polymer (e.g., polyethylenimine, polylysine, polyserine) which allows for attachment (e.g., conjugation or entrapment) of payload. In certain embodiment, the pharmaceutical composition comprises an organic nanoparticle (e.g., entrapment of the payload inside the nanoparticle). Exemplary organic nanoparticles include, e.g., SNALP liposomes that contain cationic lipids together with neutral helper lipids which are coated with polyethylene glycol (PEG) and protamine and nucleic acid complex coated with lipid coating. In certain embodiment, the pharmaceutical composition comprises a liposome, for example, a liposome disclosed in International (PCT) Application Publication No. WO 2015/148863. [0195] In certain embodiments, the pharmaceutical composition comprises a targeting moiety to increase target cell binding or update of nanoparticles and liposomes. Exemplary targeting moieties include cell specific antigens, monoclonal antibodies, single chain antibodies, aptamers, polymers, sugars, and cell penetrating peptides. In certain embodiments, the pharmaceutical composition comprises a fusogenic or endosome-destabilizing peptide or polymer. [0196] In certain embodiments, a pharmaceutical composition may contain a sustained- or controlled-delivery formulation. Techniques for formulating sustained- or controlled-delivery means, such as liposome carriers, bio-erodible microparticles or porous beads and depot injections, are also known to those skilled in the art. Sustained-release preparations may include, e.g., porous polymeric microparticles or semipermeable polymer matrices in the form of shaped articles, e.g., films, or microcapsules. Sustained release matrices may include polyesters, hydrogels, polylactides, copolymers of L-glutamic acid and gamma ethyl-L-glutamate, poly (2- hydroxyethyl-inethacrylate), ethylene vinyl acetate, or poly-D(_)-3-hydroxybutyric acid. Sustained release compositions may also include liposomes that can be prepared by any of several methods known in the art. [0197] A pharmaceutical composition of the invention can be administered by a variety of methods known in the art. The route and/or mode of administration vary depending upon the desired results. Administration can be intravenous, intramuscular, intraperitoneal, or subcutaneous, or administered proximal to the site of the target. The pharmaceutically acceptable carrier should be suitable for intravenous, intramuscular, subcutaneous, parenteral, spinal or epidermal administration (e.g., by injection or infusion). Depending on the route of administration, the active compound (e.g., the guide nucleic acid, engineered, non-naturally
occurring system, or CRISPR expression system disclosed herein) may be coated in a material to protect the compound from the action of acids and other natural conditions that may inactivate the compound. [0198] Formulation components suitable for parenteral administration include a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerin, propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as EDTA; buffers such as acetates, citrates or phosphates; and agents for the adjustment of tonicity such as sodium chloride or dextrose. [0199] For intravenous administration, suitable carriers include physiological saline, bacteriostatic water, Cremophor ELTM (BASF, Parsippany, NJ) or phosphate buffered saline (PBS). The carrier should be stable under the conditions of manufacture and storage, and should be preserved against microorganisms. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyetheylene glycol), and suitable mixtures thereof. [0200] Pharmaceutical formulations preferably are sterile. Sterilization can be accomplished by any suitable method, e.g., filtration through sterile filtration membranes. Where the composition is lyophilized, filter sterilization can be conducted prior to or following lyophilization and reconstitution. In certain embodiments, the pharmaceutical composition is lyophilized, and then reconstituted in buffered saline, at the time of administration. [0201] Pharmaceutical compositions of the invention can be prepared in accordance with methods well known and routinely practiced in the art. See, e.g., Remington: The Science and Practice of Pharmacy, Mack Publishing Co., 20th ed., 2000; and Sustained and Controlled Release Drug Delivery Systems, J. R. Robinson, ed., Marcel Dekker, Inc., New York, 1978. Pharmaceutical compositions are preferably manufactured under GMP conditions. Typically, a therapeutically effective dose or efficacious dose of the guide nucleic acid, engineered, non- naturally occurring system, or CRISPR expression system disclosed herein is employed in the pharmaceutical compositions of the invention. The compositions disclosed herein are formulated into pharmaceutically acceptable dosage forms by conventional methods known to those of skill in the art. Dosage regimens are adjusted to provide the optimum desired response (e.g., a therapeutic response). For example, a single bolus may be administered, several divided doses may be administered over time or the dose may be proportionally reduced or increased as indicated by the exigencies of the therapeutic situation. It is especially advantageous to formulate parenteral compositions in dosage unit form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers to physically discrete units suited as unitary dosages for
the subjects to be treated; each unit contains a predetermined quantity of active compound calculated to produce the desired therapeutic effect in association with the required pharmaceutical carrier. [0202] Actual dosage levels of the active ingredients in the pharmaceutical compositions of the invention can be varied so as to obtain an amount of the active ingredient which is effective to achieve the desired therapeutic response for a particular patient, composition, and mode of administration, without being toxic to the patient. The selected dosage level depends upon a variety of pharmacokinetic factors including the activity of the particular compositions disclosed herein employed, or the ester, salt or amide thereof, the route of administration, the time of administration, the rate of excretion of the particular compound being employed, the duration of the treatment, other drugs, compounds and/or materials used in combination with the particular compositions employed, the age, sex, weight, condition, general health and prior medical history of the patient being treated, and like factors. IV. Therapeutic uses [0203] Guide nucleic acids, engineered, non-naturally occurring systems, and the CRISPR expression systems, e.g., as disclosed herein, are useful for targeting, editing, and/or modifying the genomic DNA in a cell or organism. These guide nucleic acids and systems, as well as a cell comprising one of the systems or a cell whose genome has been modified by one of the systems, can be used to treat a disease or disorder in which modification of genetic or epigenetic information is desirable. Accordingly, provided herein is a method of treating a disease or disorder, the method comprising administering to a subject in need thereof a guide nucleic acid, a non-naturally occurring system, a CRISPR expression system, or a cell disclosed herein. [0204] The term “subject” includes human and non-human animals. Non-human animals include all vertebrates, e.g., mammals and non-mammals, such as non-human primates, sheep, dog, cow, chickens, amphibians, and reptiles. Except when noted, the terms “patient” or “subject” are used herein interchangeably. [0205] The terms “treatment”, “treating”, “treat”, “treated”, or the like, as used herein, can refer to obtaining a desired pharmacologic and/or physiologic effect. The effect may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease or delaying the disease progression. “Treatment”, as used herein, covers any treatment of a disease in a mammal, e.g., in a human, and includes: (a) inhibiting the disease, i.e., arresting its development; and (b) relieving the disease, i.e., causing regression of the disease. It is understood that a disease or disorder may be identified by genetic methods and treated prior to manifestation of any medical symptom.
[0206] For minimization of toxicity and off-target effect, it can be important to control the concentration of the CRISPR-Cas system delivered. Optimal concentrations can be determined by testing different concentrations in a cellular, tissue, or non-human eukaryote animal model and using deep sequencing to analyze the extent of modification at potential off-target genomic loci. The concentration that gives the highest level of on-target modification while minimizing the level of off-target modification is generally selected for ex vivo or in vivo delivery. [0207] It is understood that the guide nucleic acid, the engineered, non-naturally occurring system, and the CRISPR expression system disclosed herein can be used to treat any suitable disease or disorder that can be improved by the system in a cell. [0208] For therapeutic purposes, certain methods disclosed herein is particularly suitable for editing or modifying a proliferating cell, such as a stem cell (e.g., a hematopoietic stem cell), a progenitor cell (e.g., a hematopoietic progenitor cell or a lymphoid progenitor cell), or a memory cell (e.g., a memory T cell). Given that such cell is delivered to a subject and will proliferate in vivo, tolerance to off-target events is low. Prior to delivery, however, it is possible to assess the on-target and off-target events, thereby selecting one or more colonies that have the desired edit or modification and lack any undesired edit or modification. Therefore, lower editing or modifying efficiency can be tolerated for such cell. The engineered, non-naturally occurring system of the present invention has the advantage of increasing or decreasing the efficiency of nucleic acid cleavage by, for example, adjusting the hybridization of dual guide nucleic acids. As a result, it can be used to minimize off-target events when creating genetically engineered proliferating cells. [0209] In certain embodiments, the guide nucleic acid, the engineered, non-naturally occurring system, and/or the CRISPR expression system disclosed herein can be used to engineer an immune cell. Immune cells include but are not limited to lymphocytes (e.g., B lymphocytes or B cells, T lymphocytes or T cells, and natural killer cells), myeloid cells (e.g., monocytes, macrophages, eosinophils, mast cells, basophils, and granulocytes), and the stem and progenitor cells that can differentiate into these cell types (e.g., hematopoietic stem cells, hematopoietic progenitor cells, and lymphoid progenitor cells). The cells can include autologous cells derived from a subject to be treated, or alternatively allogenic cells derived from a donor. [0210] In certain embodiments, the immune cell is a T cell, which can be, for example, a cultured T cell, a primary T cell, a T cell from a cultured T cell line (e.g., Jurkat, SupTi), or a T cell obtained from a mammal, for example, from a subject to be treated. If obtained from a mammal, the T cell can be obtained from numerous sources, including but not limited to blood, bone marrow, lymph node, the thymus, or other tissues or fluids. T cells can also be enriched or purified. The T cell can be any type of T cell and can be of any developmental stage, including
but not limited to, CD4+/CD8+ double positive T cells, CD4+ helper T cells (e.g., Th1 and Th2 cells), CD8+ T cells (e.g., cytotoxic T cells), tumor infiltrating lymphocytes (TILs), memory T cells (e.g., central memory T cells and effector memory T cells), regulatory T cells, naive T cells, or the like. [0211] In certain embodiments, an immune cell, e.g., a T cell, is engineered to express an exogenous gene. For example, in certain embodiments, an engineered CRISPR system disclosed herein may catalyze DNA cleavage at the gene locus, allowing for site-specific integration of the exogenous gene at the gene locus by HDR. [0212] In certain embodiments, an immune cell, e.g., a T cell, is engineered to express a chimeric antigen receptor (CAR), i.e., the T cell comprises an exogenous nucleotide sequence encoding a CAR. As used herein, the term “chimeric antigen receptor” or “CAR” includes any artificial receptor including an antigen-specific binding moiety and one or more signaling chains derived from an immune receptor. CARs can comprise a single chain fragment variable (scFv) of an antibody specific for an antigen coupled via hinge and transmembrane regions to cytoplasmic domains of T cell signaling molecules, e.g. a T cell costimulatory domain (e.g., from CD28, CD137, OX40, ICOS, or CD27) in tandem with a T cell triggering domain (e.g. from CD3 ). A T cell expressing a chimeric antigen receptor is referred to as a CAR T cell. Exemplary CAR T cells include CD19 targeted CTL019 cells (see, Grupp et al. (2015) BLOOD, 126: 4983), 19-28z cells (see, Park et al. (2015) J. CLIN. ONCOL., 33: 7010), and KTE-C19 cells (see, Locke et al. (2015) BLOOD, 126: 3991). Additional exemplary CAR T cells are described in U.S. Patent Nos. 7,446,190, 8,399,645, 8,906,682, 9,181,527, 9,272,002, 9,266,960, 10,253,086, 10640569, and 10,808,035, and International (PCT) Publication Nos. WO 2013/142034, WO 2015/120180, WO 2015/188141, WO 2016/120220, and WO 2017/040945. Exemplary approaches to express CARs using CRISPR systems are described in Hale et al. (2017) MOL THER METHODS CLIN DEV., 4: 192, MacLeod et al. (2017) MOL THER, 25: 949, and Eyquem et al. (2017) NATURE, 543: 113. [0213] In certain embodiments, an immune cell, e.g., a T cell, binds an antigen, e.g., a cancer antigen, through an endogenous T cell receptor (TCR). In certain embodiments, an immune cell, e.g., a T cell, is engineered to express an exogenous TCR, e.g., an exogenous naturally occurring TCR or an exogenous engineered TCR. T cell receptors comprise two chains referred to as the α- and β- chains, that combine on the surface of a T cell to form a heterodimeric receptor that can recognize MHC-restricted antigens. Each of α- and β-chain comprises a constant region and a variable region. Each variable region of the α- and β-chains defines three loops, referred to as complementary determining regions (CDRs) known as CDR1, CDR2, and CDR3 that confer the T cell receptor with antigen binding activity and binding specificity.
[0214] In certain embodiments, a CAR or TCR binds a cancer antigen selected from B-cell maturation antigen (BCMA), mesothelin, prostate specific membrane antigen (PSMA), prostate stem cell antigen (PSCA), carbonic anhydrase IX (CAIX), carcinoembryonic antigen (CEA), CD5, CD7, CD10, CD19, CD20, CD22, CD30, CD33, CD34, CD38, CD41, CD44, CD49f, CD56, CD70, CD74, CD123, CD133, CD138, epithelial glycoprotein2 (EGP 2), epithelial glycoprotein-40 (EGP-40), epithelial cell adhesion molecule (EpCAM), receptor-type tyrosine- protein kinase (FLT3), folate-binding protein (FBP), fetal acetylcholine receptor (AChR), folate receptor-a and (FRa and ), Ganglioside G2 (GD2), Ganglioside G3 (GD3), epidermal growth factor receptor 2 (HER-2/ERB2), epidermal growth factor receptor vIII (EGFRvIII), ERB3, ERB4, human telomerase reverse transcriptase (hTERT), Interleukin-13 receptor subunit alpha-2 (IL-13Ra2), K-light chain, kinase insert domain receptor (KDR), Lewis A (CA19.9), Lewis Y (LeY), LI cell adhesion molecule (LICAM), melanoma-associated antigen 1 (melanoma antigen family Al, MAGE-A1), Mucin 16 (MUC-16), Mucin 1 (MUC-1; e.g., a truncated MUC-1), KG2D ligands, cancer-testis antigen NY-ESO-1, oncofetal antigen (h5T4), tumor-associated glycoprotein 72 (TAG-72), vascular endothelial growth factor R2 (VEGF-R2), Wilms tumor protein (WT-1), type 1 tyrosine-protein kinase transmembrane receptor (ROR1), B7-H3 (CD276), B7-H6 (Nkp30), Chondroitin sulfate proteoglycan-4 (CSPG4), DNAX Accessory Molecule (DNAM-1), Ephrin type A Receptor 2 (EpHA2), Fibroblast Associated Protein (FAP), Gpl00/HLA-A2, Glypican 3 (GPC3), HA-IH, HERK-V, IL-1 IRa, Latent Membrane Protein 1 (LMP1), Neural cell-adhesion molecule (N-CAM/CD56), and Trail Receptor (TRAIL-R). [0215] Genetic loci suitable for insertion of a CAR- or exogenous TCR-encoding sequence include but are not limited to safe harbor loci (e.g., the AAVS1 locus) TCR subunit loci (e.g., the TCR constant (TRAC) locus, the TCR constant 1 (TRBC1) locus, and the TCR constant 2 (TRBC2) locus). It is understood that insertion in the TRAC locus reduces tonic CAR signaling and enhances T cell potency (see, Eyquem et al. (2017) NATURE, 543: 113). Furthermore, inactivation of the endogenous TRAC, TRBC1, or TRBC2 gene may reduce a graft-versus-host disease (GVHD) response, thereby allowing use of allogeneic T cells as starting materials for preparation of CAR T cells. Accordingly, in certain embodiments, an immune cell, e.g., a T cell, is engineered to have reduced expression of an endogenous TCR or TCR subunit, e.g., TRAC, TRBC1, and/or TRBC2. The cell may be engineered to have partially reduced or no expression of the endogenous TCR or TCR subunit. For example, in certain embodiments, the immune cell, e.g., a T cell, is engineered to have less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of the endogenous TCR or TCR subunit relative to a corresponding unmodified or parental cell. In certain embodiments, the immune cell, e.g., a T cell, is engineered to have no
detectable expression of the endogenous TCR or TCR subunit. Exemplary approaches to reduce expression of TCRs using CRISPR systems are described in U.S. Patent No.9,181,527, Liu et al. (2017) CELL RES, 27: 154, Ren et al. (2017) CLIN CANCER RES, 23: 2255, Cooper et al. (2018) LEUKEMIA, 32: 1970, and Ren et al. (2017) ONCOTARGET, 8: 17002. [0216] It is understood that certain immune cells, such as T cells, also express major histocompatibility complex (MHC) or human leukocyte antigen (HLA) genes, and inactivation of these endogenous gene may reduce an immune response, thereby allowing use of allogeneic T cells as starting materials for preparation of CAR T cells. Accordingly, in certain embodiments, an immune cell, e.g., a T-cell, is engineered to have reduced expression of one or more endogenous class I or class II MHCs or HLAs (e.g., beta 2-microglobulin (B2M), class II major histocompatibility complex transactivator (CIITA)). The cell may be engineered to have partially reduced or no expression of an endogenous MHC or HLA. For example, in certain embodiments, the immune cell, e.g., a T-cell, is engineered to have less than less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of endogenous MHC (e.g., B2M, CIITA) relative to a corresponding unmodified or parental cell. In certain embodiments, the immune cell, e.g., a T cell, is engineered to have no detectable expression of an endogenous MHC (e.g., B2M, CIITA). In certain cases, a cell may be engineered to have expression of, e.g., HLA-E and/or HLA-G, in order to avoid attack by natural killer (NK) cells. Exemplary approaches to reduce expression of MHCs using CRISPR systems are described in Liu et al. (2017) CELL RES, 27: 154, Ren et al. (2017) CLIN CANCER RES, 23: 2255, and Ren et al. (2017) ONCOTARGET, 8: 17002. [0217] Other genes that may be inactivated include but are not limited to CD3, CD52, and deoxycytidine kinase (DCK). For example, inactivation of DCK may render the immune cells (e.g., T cells) resistant to purine nucleotide analogue (PNA) compounds, which are often used to compromise the host immune system in order to reduce a GVHD response during an immune cell therapy. In certain embodiments, the immune cell, e.g., a T-cell, is engineered to have less than less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of endogenous CD52 or DCK relative to a corresponding unmodified or parental cell. [0218] It is understood that the activity of an immune cell (e.g., T cell) may be enhanced by inactivating or reducing the expression of an immune suppressor such as an immune checkpoint protein. Accordingly, in certain embodiments, an immune cell, e.g., a T cell, is engineered to have reduced expression of an immune checkpoint protein. Exemplary immune checkpoint proteins expressed by wild-type T cells include but are not limited to PDCD1 (PD-1), CTLA4, ADORA2A (A2AR), B7-H3, B7-H4, BTLA, KIR, LAG3, HAVCR2 (TIM3), TIGIT, VISTA,
PTPN6 (SHP-1), and FAS. The cell may be modified to have partially reduced or no expression of the immune checkpoint protein. For example, in certain embodiments, the immune cell, e.g., a T cell, is engineered to have less than 80% (e.g., less than 70%, less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, or less than 5%) of the expression of the immune checkpoint protein relative to a corresponding unmodified or parental cell. In certain embodiments, the immune cell, e.g., a T cell, is engineered to have no detectable expression of the immune checkpoint protein. Exemplary approaches to reduce expression of immune checkpoint proteins using CRISPR systems are described in International (PCT) Publication No. WO 2017/017184, Cooper et al. (2018) LEUKEMIA, 32: 1970, Su et al. (2016) ONCOIMMUNOLOGY, 6: e1249558, and Zhang et al. (2017) FRONT MED, 11: 554. [0219] The immune cell can be engineered to have reduced expression of an endogenous gene, e.g., an endogenous genes described above, by gene editing or modification. For example, in certain embodiments, an engineered CRISPR system disclosed herein may result in DNA cleavage at a gene locus, thereby inactivating the targeted gene. In other embodiments, an engineered CRISPR system disclosed herein may be fused to an effector domain (e.g., a transcriptional repressor or histone methylase) to reduce the expression of the target gene. [0220] The immune cell can also be engineered to express an exogenous protein (besides an antigen-binding protein described above) at the locus of a human ADORA2A, B2M, CD52, CIITA, CTLA4, DCK, FAS, HAVCR2, LAG3, PDCD1, PTPN6, TIGIT, TRAC, TRBC1, TRBC2, CARD11, CD247, IL7R, LCK, or PLCG1 gene. [0221] In certain embodiments, an immune cell, e.g., a T cell, is modified to express a dominant-negative form of an immune checkpoint protein. In certain embodiments, the dominant-negative form of the checkpoint inhibitor can act as a decoy receptor to bind or otherwise sequester the natural ligand that would otherwise bind and activate the wild-type immune checkpoint protein. Examples of engineered immune cells, for example, T cells containing dominant-negative forms of an immune suppressor are described, for example, in International (PCT) Publication No. WO 2017/040945. [0222] In certain embodiments, an immune cell, e.g., a T cell, is modified to express a gene (e.g., a transcription factor, a cytokine, or an enzyme) that regulates the survival, proliferation, activity, or differentiation (e.g., into a memory cell) of the immune cell. In certain embodiments, the immune cell is modified to express TET2, FOXO1, IL-12, IL-15, IL-18, IL-21, IL-7, GLUT1, GLUT3, HK1, HK2, GAPDH, LDHA, PDK1, PKM2, PFKFB3, PGK1, ENO1, GYS1, and/or ALDOA. In certain embodiments, the modification is an insertion of a nucleotide sequence encoding the protein operably linked to a regulatory element. In certain embodiments, the modification is a substitution of a single nucleotide polymorphism (SNP) site in the
endogenous gene. In certain embodiments, an immune cell, e.g., a T cell, is modified to express a variant of a gene, for example, a variant that has greater activity than the respective wild-type gene. In certain embodiments, the immune cell is modified to express a variant of CARD11, CD247, IL7R, LCK, or PLCG1. For example, certain gain-of-function variants of IL7R were disclosed in Zenatti et al., (2011) NAT. GENET.43(10):932-39. The variant can be expressed from the native locus of the respective wild-type gene by delivering an engineered system described herein for targeting the native locus in combination with a donor template that carries the variant or a portion thereof. [0223] In certain embodiments, an immune cell, e.g., a T cell, is modified to express a protein (e.g., a cytokine or an enzyme) that regulates the microenvironment that the immune cell is designed to migrate to (e.g., a tumor microenvironment). In certain embodiments, the immune cell is modified to express CA9, CA12, a V-ATPase subunit, NHE1, and/or MCT-1. A. Gene therapies [0224] It is understood that the engineered, non-naturally occurring system and CRISPR expression system, e.g., as disclosed herein, can be used to treat a genetic disease or disorder, i.e., a disease or disorder associated with or otherwise mediated by an undesirable mutation in the genome of a subject. [0225] Exemplary genetic diseases or disorders include age-related macular degeneration, adrenoleukodystrophy (ALD), Alagille syndrome, alpha-1-antitrypsin deficiency, argininemia, argininosuccinic aciduria, ataxia (e.g., Friedreich ataxia, spinocerebellar ataxias, ataxia telangiectasia, essential tremor, spastic paraplegia), autism, biliary atresia, biotinidase deficiency, carbamoyl phosphate synthetase I deficiency, carbohydrate deficient glycoprotein syndrome (CDGS), a central nervous system (CNS)-related disorder (e.g., Alzheimer's disease, amyotrophic lateral sclerosis (ALS), canavan disease (CD), ischemia, multiple sclerosis (MS), neuropathic pain, Parkinson's disease), Bloom's syndrome, cancer, Charcot-Marie-Tooth disease (e.g., peroneal muscular atrophy, hereditary motor sensory neuropathy), congenital hepatic porphyria, citrullinemia, Crigler-Najjar syndrome, cystic fibrosis (CF), Dentatorubro- Pallidoluysian Atrophy (DRPLA). diabetes insipidus, Fabry, familial hypercholesterolemia (LDL receptor defect), Fanconi's anemia, fragile X syndrome, a fatty acid oxidation disorder, galactosemia, glucose-6-phosphate dehydrogenase (G6PD), glycogen storage diseases (e.g., type I (glucose-6-phosphatase deficiency, Von Gierke II (alpha glucosidase deficiency, Pompe), III (debrancher enzyme deficiency, Cori), IV (brancher enzyme deficiency, Anderson), V (muscle glycogen phosphorylase deficiency, McArdle), VII (muscle phosphofructokinase deficiency, Tauri), VI (liver phosphorylase deficiency, Hers), IX (liver glycogen phosphorylase kinase
deficiency)), hemophilia A (associated with defective factor VIII), hemophilia B (associated with defective factor IX), Huntington’s disease, glutaric aciduria, hypophosphatemia, Krabbe, lactic acidosis, Lafora disease, Leber's Congenital Amaurosis, Lesch Nyhan syndrome, a lysosomal storage disease, metachromatic leukodystrophy disease (MLD), mucopolysaccharidosis (MPS) (e.g., Hunter syndrome, Hurler syndrome, Maroteaux-Lamy syndrome, Sanfilippo syndrome, Scheie syndrome, Morquio syndrome, other, MPSI, MPSII, MPSIII, MSIV, MPS 7), a muscular/skeletal disorder (e.g., muscular dystrophy, Duchenne muscular dystrophy), myotonic Dystrophy (DM), neoplasia, N-acetylglutamate synthase deficiency, ornithine transcarbamylase deficiency, phenylketonuria, primary open angle glaucoma, retinitis pigmentosa, schizophrenia, Severe Combined Immune Deficiency (SCID), Spinobulbar Muscular Atrophy (SBMA), sickle cell anemia, Usher syndrome, Tay-Sachs disease, thalassemia (e.g, -Thalassemia), trinucleotide repeat disorders, tyrosinemia, Wilson's disease, Wiskott-Aldrich syndrome, X-linked chronic granulomatous disease (CGD), X-linked severe combined immune deficiency, and xeroderma pigmentosum. [0226] Additional exemplary genetic diseases or disorders and associated information are available on the world wide web at kumc.edu/gec/support, genome.gov/10001200, and ncbi.nlm.nih.gov/books/NBK22183/. Additional exemplary genetic diseases or disorders, associated genetic mutations, and gene therapy approaches to treat genetic diseases or disorders are described in International (PCT) Publication Nos. WO 2013/126794, WO 2013/163628, WO 2015/048577, WO 2015/070083, WO 2015/089354, WO 2015/134812, WO 2015/138510, WO 2015/148670, WO 2015/148860, WO 2015/148863, WO 2015/153780, WO 2015/153789, and WO 2015/153791, U.S. Patent Nos.8,383,604, 8,859,597, 8,956,828, 9,255,130, and 9,273,296, and U.S. Patent Application Publication Nos.2009/0222937, 2009/0271881, 2010/0229252, 2010/0311124, 2011/0016540, 2011/0023139, 2011/0023144, 2011/0023145, 2011/0023146, 2011/0023153, 2011/0091441, 2012/0159653, and 2013/0145487. VI. Kits [0227] It is understood that the guide nucleic acid, the engineered, non-naturally occurring system, the CRISPR expression system, and/or a library disclosed herein can be packaged in a kit suitable for use by a medical provider. Accordingly, in another aspect, the invention provides kits containing any one or more of the elements disclosed in the above systems, libraries, methods, and compositions. In certain embodiments, the kit comprises an engineered, non-naturally occurring system as disclosed herein and instructions for using the kit. The instructions may be specific to the applications and methods described herein. In certain embodiments, one or more of the elements of the system are provided in a solution. In certain embodiments, one or more of the elements of the system are provided in lyophilized form, and the kit further comprises a
diluent. Elements may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, a tube, or immobilized on the surface of a solid base (e.g., chip or microarray). In certain embodiments, the kit comprises one or more of the nucleic acids and/or proteins described herein. In certain embodiments, the kit provides all elements of the systems of the invention. [0228] In certain embodiments of a kit comprising the engineered, non-naturally occurring dual guide system, the targeter nucleic acid and the modulator nucleic acid are provided in separate containers. In other embodiments, the targeter nucleic acid and the modulator nucleic acid are pre-complexed, and the complex is provided in a single container. [0229] In certain embodiments, the kit comprises a Cas protein or a nucleic acid comprising a regulatory element operably linked to a nucleic acid encoding a Cas protein provided in a separate container. In other embodiments, the kit comprises a Cas protein pre-complexed with the single guide nucleic acid or a combination of the targeter nucleic acid and the modulator nucleic acid, and the complex is provided in a single container. [0230] In certain embodiments, the kit further comprises one or more donor templates provided in one or more separate containers. In certain embodiments, the kit comprises a plurality of donor templates as disclosed herein (e.g., in separate tubes or immobilized on the surface of a solid base such as a chip or a microarray), one or more guide nucleic acids disclosed herein, and optionally a Cas protein or a regulatory element operably linked to a nucleic acid encoding a Cas protein as disclosed herein. Such kits are useful for identifying a donor template that introduces optimal genetic modification in a multiplex assay. The CRISPR expression systems as disclosed herein are also suitable for use in a kit. [0231] In certain embodiments, a kit further comprises one or more reagents and/or buffers for use in a process utilizing one or more of the elements described herein. Reagents may be provided in any suitable container and may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use (e.g., in concentrate or lyophilized form). A buffer may be a reaction or storage buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof. In some embodiments, the buffer is alkaline. In certain embodiments, the buffer has a pH from about 7 to about 10. In certain embodiments, the kit further comprises a pharmaceutically acceptable carrier. In certain embodiments, the kit further comprises one or more devices or other materials for administration to a subject.
V. Embodiments [0232] In embodiment 1 provided herein is a composition comprising a synthetic guide nucleic acid (gNA) comprising: (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex. [0233] In embodiment 2 provided herein is the composition of embodiment 1, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other. [0234] In embodiment 3 provided herein is the composition of embodiment 2, wherein the targeter stem sequence and the modulator stem sequence each comprise five nucleotides that base pair with each other. [0235] In embodiment 4 provided herein is the composition of embodiment 3, wherein (1) the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least 2 nucleotides, and (2) the modulator nucleic acid comprises an additional nucleotide sequence 3’ to the modulator stem sequence comprising an additional at least 2 nucleotides. [0236] In embodiment 5 provided herein is the composition of embodiment 2, wherein the targeter stem sequence and the modulator stem sequence each comprise four nucleotides that base pair with each other. [0237] In embodiment 6 provided herein is the composition of any one of the preceding embodiments, wherein the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least two nucleotides. [0238] In embodiment 7 provided herein is the composition of any one of the preceding embodiments, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity. [0239] In embodiment 8 provided herein is the composition of any one of the preceding embodiments, wherein at least 40% of the base pairs in the stem are C-G base pairs. [0240] In embodiment 9 provided herein is the composition of any one of the preceding embodiments, wherein the targeter and modulator nucleic acids comprise a single polynucleotide. [0241] In embodiment 10 provided herein is the composition of embodiments 1-8, wherein the targeter and modulator nucleic acids are separate polynucleotides.
[0242] In embodiment 11 provided herein is the composition of embodiment 1, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both. [0243] In embodiment 12 provided herein is the composition of embodiment 11, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end. [0244] In embodiment 13 provided herein is the composition of any one of the preceding embodiments, further comprising a Type V nucleic acid-guided nuclease complexed with the gNA. [0245] In embodiment 14 provided herein is the composition of embodiment 13, wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease. [0246] In embodiment 15 provided herein is the composition of any one of the preceding embodiments, wherein the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6. [0247] In embodiment 16 provided herein is the composition of any one of the preceding embodiments, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7. [0248] In embodiment 17 provided herein is a method of editing a genome of a eukaryotic cell comprising (I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex; (B) one or more Type V nucleic acid-guided nucleases, or polynucleotides encoding the one or more nucleases; and, optionally, (C) one or more donor templates, wherein the gNA and the Type V nucleic acid-guided nuclease form a nucleic acid-guided nuclease complex; and (II) contacting the genome with the nucleic acid-guided nuclease complex to form one or more strand breaks in the genome, whereby at least a portion of the donor template is inserted into the genome at or near the one or more strand breaks. [0249] In embodiment 18 provided herein is the method of embodiment 17, further comprising treating the eukaryotic cell with a HDR enhancer.
[0250] In embodiment 19 provided herein is the method of embodiment 18, wherein the HDR enhancer comprises a DNA-PK antagonist, preferably M3814. [0251] In embodiment 20 provided herein is the method of any one of embodiments 17-19, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences. [0252] In embodiment 21 provided herein is a composition comprising a synthetic guide nucleic acid (gNA) comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides, (2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex. [0253] In embodiment 22 provided herein is the composition of embodiment 21, wherein the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-7 and -4 kcal/mol. [0254] In embodiment 23 provided herein is the composition of embodiment 21 or 22, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other. [0255] In embodiment 24 provided herein is the composition of embodiment 21-23, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity. [0256] In embodiment 25 provided herein is the composition of embodiment 21-24, wherein at least 40% of the base pairs in the stem are C-G base pairs. [0257] In embodiment 26 provided herein is the composition of embodiment 21-25, wherein the targeter and modulator nucleic acids comprise a single polynucleotide. [0258] In embodiment 27 provided herein is the composition of embodiment 21-25, wherein the targeter and modulator nucleic acids are separate polynucleotides. [0259] In embodiment 28 provided herein is the composition of embodiment 21-27, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both.
[0260] In embodiment 29 provided herein is the composition of embodiment 21-28, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end. [0261] In embodiment 30 provided herein is the composition of embodiment 21-29, further comprising a Type V nucleic acid-guided nuclease. [0262] In embodiment 31 provided herein is the composition of embodiment 30, wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease. [0263] In embodiment 32 provided herein is the composition of any one of embodiments 21- 31, wherein the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6. [0264] In embodiment 33 provided herein is the composition of any one of embodiments 21- 32, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7. [0265] In embodiment 34 provided herein is a method of editing a genome of a eukaryotic cell comprising (I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides, (2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is between -10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex; (B) one or more Type V nucleic acid-guided nucleases, or polynucleotides encoding the one or more nucleases; and, optionally, (C) one or more donor templates, wherein the gNA and the Type V nucleic acid- guided nuclease form a nucleic acid-guided nuclease complex; and (II) contacting the genome with the nucleic acid-guided nuclease complex to form one or more strand breaks in the genome, whereby at least a portion of the donor template is inserted into the genome at or near the one or more strand breaks. [0266] In embodiment 35 provided herein is the method of embodiment 34, further comprising treating the eukaryotic cell with a HDR enhancer. [0267] In embodiment 36 provided herein is the method of embodiment 35, wherein the HDR enhancer comprises a DNA-PK antagonist, preferably M3814.
[0268] In embodiment 37 provided herein is the method of any one of embodiments 34-36, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences. VI. Equivalents [0269] Throughout the description, where compositions are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are compositions of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps. [0270] In the application, where an element or component is said to be included in and/or selected from a list of recited elements or components, it should be understood that the element or component can be any one of the recited elements or components, or the element or component can be selected from a group consisting of two or more of the recited elements or components. [0271] Further, it should be understood that elements and/or features of a composition or a method described herein can be combined in a variety of ways without departing from the spirit and scope of the present invention, whether explicit or implicit herein. For example, where reference is made to a particular compound, that compound can be used in various embodiments of compositions of the present invention and/or in methods of the present invention, unless otherwise understood from the context. In other words, within this application, embodiments have been described and depicted in a way that enables a clear and concise application to be written and drawn, but it is intended and will be appreciated that embodiments may be variously combined or separated without parting from the present teachings and invention(s). For example, it will be appreciated that all features described and depicted herein can be applicable to all aspects of the invention(s) described and depicted herein. [0272] The terms “a” and “an” and “the” and similar references in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. For example, the term “a cell” includes a plurality of cells, including mixtures thereof. Where the plural form is used for compounds, salts, or the like, this is taken to mean also a single compound, salt, or the like.
[0273] It should be understood that the expression “at least one of” includes individually each of the recited objects after the expression and the various combinations of two or more of the recited objects unless otherwise understood from the context and use. The expression “and/or” in connection with three or more recited objects should be understood to have the same meaning unless otherwise understood from the context. [0274] The use of the term “include,” “includes,” “including,” “have,” “has,” “having,” “contain,” “contains,” or “containing,” including grammatical equivalents thereof, should be understood generally as open-ended and non-limiting, for example, not excluding additional unrecited elements or steps, unless otherwise specifically stated or understood from the context. [0275] Where the use of the term “about” is before a quantitative value, the present invention also includes the specific quantitative value itself, unless specifically stated otherwise. As used herein, the term “about” refers to a ±10% variation from the nominal value unless otherwise indicated or inferred. [0276] It should be understood that the order of steps or order for performing certain actions is immaterial so long as the present invention remain operable. Moreover, two or more steps or actions may be conducted simultaneously. [0277] The use of any and all examples, or exemplary language herein, for example, “such as” or “including,” is intended merely to illustrate better the present invention and does not pose a limitation on the scope of the invention unless claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the present invention. [0278] The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.
Claims
CLAIMS WHAT IS CLAIMED IS: 1. A composition comprising a synthetic guide nucleic acid (gNA) comprising: (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex.
2. The composition of claim 1, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other.
3. The composition of claim 2, wherein the targeter stem sequence and the modulator stem sequence each comprise five nucleotides that base pair with each other.
4. The composition of claim 3, wherein (1) the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least 2 nucleotides, and (2) the modulator nucleic acid comprises an additional nucleotide sequence 3’ to the modulator stem sequence comprising an additional at least 2 nucleotides.
5. The composition of claim 2, wherein the targeter stem sequence and the modulator stem sequence each comprise four nucleotides that base pair with each other.
6. The composition of any one of the preceding claims, wherein the targeter nucleic acid comprises an additional nucleotide sequence 5’ to the targeter stem sequence comprising an additional at least two nucleotides.
7. The composition of any one of the preceding claims, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity.
8. The composition of any one of the preceding claims, wherein at least 40% of the base pairs in the stem are C-G base pairs.
9. The composition of any one of the preceding claims, wherein the targeter and modulator nucleic acids comprise a single polynucleotide.
10. The composition of claims 1-8, wherein the targeter and modulator nucleic acids are separate polynucleotides.
11. The composition of claim 1, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both.
12. The composition of claim 11, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end.
13. The composition of any one of the preceding claims, further comprising a Type V nucleic acid-guided nuclease complexed with the gNA.
14. The composition of claim 13, wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease.
15. The composition of any one of the preceding claims, wherein the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6.
16. The composition of any one of the preceding claims, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7.
17. A method of editing a genome of a eukaryotic cell comprising
(I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein the targeter stem sequence and the modulator stem sequence each comprise 4-10 nucleotides that base pair with each other, and the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex; (B) one or more Type V nucleic acid-guided nucleases, or polynucleotides encoding the one or more nucleases; and, optionally, (C) one or more donor templates, wherein the gNA and the Type V nucleic acid-guided nuclease form a nucleic acid-guided nuclease complex; and (II) contacting the genome with the nucleic acid-guided nuclease complex to form one or more strand breaks in the genome, whereby at least a portion of the donor template is inserted into the genome at or near the one or more strand breaks.
18. The method of claim 17, further comprising treating the eukaryotic cell with a HDR enhancer.
19. The method of claim 18, wherein the HDR enhancer comprises a DNA-PK antagonist, preferably M3814.
20. The method of any one of claims 17-19, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences.
21. A composition comprising a synthetic guide nucleic acid (gNA) comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides, (2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming a nucleic acid-guided nuclease complex.
22. The composition of claim 21, wherein the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is bewteen-7 and -4 kcal/mol.
23. The composition of claim 21 or 22, wherein the targeter stem sequence and the modulator stem sequence each comprise 4-6 nucleotides that base pair with each other.
24. The composition of claim 21-23, wherein the targeter stem sequence and the modulator stem sequence share at least 80% sequence complementarity.
25. The composition of claim 21-24, wherein at least 40% of the base pairs in the stem are C- G base pairs.
26. The composition of claim 21-25, wherein the targeter and modulator nucleic acids comprise a single polynucleotide.
27. The composition of claim 21-25, wherein the targeter and modulator nucleic acids are separate polynucleotides.
28. The composition of claim 21-27, wherein the targeter nucleic acid or the modulator nucleic acid, or both, comprise one or more modified nucleotides at or near its 3’ end, if present, at or near its 5’ end, if present, or both.
29. The composition of claim 21-28, wherein the modulator nucleic acid comprises at least one modified nucleotide and at least two modified internucleotide linkages within the first five nucleotides from the 5’ end.
30. The composition of claim 21-29, further comprising a Type V nucleic acid-guided nuclease.
31. The composition of claim 30, wherein the Type V nucleic acid-guided nuclease is at least 80% identical to an ABW, ART, or MAD nuclease.
32. The composition of any one of claims 21-31, wherein the modulator nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 6.
33. The composition of any one of claims 21-32, wherein the targeter nucleic acid comprises a sequence at least 50, 60, 70, 80, 90, 95, 99, 99.5, or 100% identical to any one of the sequences listed in Table 7.
34. A method of editing a genome of a eukaryotic cell comprising (I) delivering to the eukaryotic cell (A) one or more synthetic guide nucleic acids (gNA), or polynucleotides encoding the one or more gNAs, comprising (i) a targeter nucleic acid comprising: (a) a spacer sequence configured to hybridize with a target nucleotide sequence, and (b) a targeter stem sequence; and (ii) a modulator nucleic acid comprising: (a) a modulator stem sequence complementary to the target stem sequence, and (b) a 5’ sequence; wherein (1) the targeter nucleic acid and modulator nucleic acids are separate polynucleotides,
(2) the predicted minimum free energy of the targeter stem sequence and the modulator stem sequence as determined by the RNAcofold WebServer is between -10 and -4 kcal/mol, and (3) the gNA is capable of binding to and forming a nucleic acid- guided nuclease complex; (B) one or more Type V nucleic acid-guided nucleases, or polynucleotides encoding the one or more nucleases; and, optionally, (C) one or more donor templates, wherein the gNA and the Type V nucleic acid-guided nuclease form a nucleic acid-guided nuclease complex; and (II) contacting the genome with the nucleic acid-guided nuclease complex to form one or more strand breaks in the genome, whereby at least a portion of the donor template is inserted into the genome at or near the one or more strand breaks.
35. The method of claim 34, further comprising treating the eukaryotic cell with a HDR enhancer.
36. The method of claim 35, wherein the HDR enhancer comprises a DNA-PK antagonist, preferably M3814.
37. The method of any one of claims 34-36, wherein the method comprises delivering at least two gNAs, or polynucleotides encoding the gNAs, wherein each gNA comprises a different spacer sequence such that when complexed with a nucleic acid-guided nuclease, the nucleic acid-guided nuclease complexes form strand breaks in the genome at or near each of the target nucleotide sequences.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263415539P | 2022-10-12 | 2022-10-12 | |
US63/415,539 | 2022-10-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024081383A2 true WO2024081383A2 (en) | 2024-04-18 |
Family
ID=90670096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/035060 WO2024081383A2 (en) | 2022-10-12 | 2023-10-12 | Compositions and methods for targeting, editing, or modifying genes |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024081383A2 (en) |
-
2023
- 2023-10-12 WO PCT/US2023/035060 patent/WO2024081383A2/en unknown
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220136014A1 (en) | Crispr systems with engineered dual guide nucleic acids | |
CA3036926C (en) | Modified stem cell memory t cells, methods of making and methods of using same | |
KR102587132B1 (en) | Crispr-cpf1-related methods, compositions and components for cancer immunotherapy | |
US20230083383A1 (en) | Compositions and methods for targeting, editing or modifying human genes | |
JP7379447B2 (en) | Peptides and nanoparticles for intracellular delivery of genome editing molecules | |
JP2018519801A (en) | Optimized CRISPR / CAS9 system and method for gene editing in stem cells | |
CN114026227A (en) | Modified immune cells with adenosine deaminase base editor for modifying nucleobases in target sequences | |
JP2022549916A (en) | Compositions and methods for the treatment of liquid cancer | |
WO2023023515A1 (en) | Persistent allogeneic modified immune cells and methods of use thereof | |
JP2022514567A (en) | Nuclease-mediated repeat elongation | |
WO2022067089A1 (en) | Fratricide resistant modified immune cells and methods of using the same | |
KR20240043783A (en) | Method for producing genetically modified cells | |
CA3209863A1 (en) | Compositions and methods for targeting, editing, or modifying genes | |
WO2024081383A2 (en) | Compositions and methods for targeting, editing, or modifying genes | |
WO2023183434A2 (en) | Compositions and methods for generating cells with reduced immunogenicty | |
WO2023225035A2 (en) | Compositions and methods for engineering cells | |
WO2024025908A2 (en) | Compositions and methods for genome editing | |
CN116507629A (en) | RNA scaffold | |
WO2023137233A2 (en) | Compositions and methods for editing genomes | |
WO2023167882A1 (en) | Composition and methods for transgene insertion | |
WO2022266538A2 (en) | Compositions and methods for targeting, editing or modifying human genes | |
WO2022256448A2 (en) | Compositions and methods for targeting, editing, or modifying genes | |
US20230340437A1 (en) | Modified nucleases | |
US20240102007A1 (en) | Gene editing systems comprising a crispr nuclease and uses thereof | |
Gill et al. | DTU DTU Library |