US20200277588A1 - Engineered Cascade Components and Cascade Complexes - Google Patents
Engineered Cascade Components and Cascade Complexes Download PDFInfo
- Publication number
- US20200277588A1 US20200277588A1 US16/824,603 US202016824603A US2020277588A1 US 20200277588 A1 US20200277588 A1 US 20200277588A1 US 202016824603 A US202016824603 A US 202016824603A US 2020277588 A1 US2020277588 A1 US 2020277588A1
- Authority
- US
- United States
- Prior art keywords
- protein
- cell
- crispr
- nucleic acid
- cascade
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 646
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 513
- 102000040430 polynucleotide Human genes 0.000 claims abstract description 247
- 108091033319 polynucleotide Proteins 0.000 claims abstract description 247
- 239000002157 polynucleotide Substances 0.000 claims abstract description 247
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 228
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 187
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 187
- 239000012636 effector Substances 0.000 claims abstract description 180
- 108020004414 DNA Proteins 0.000 claims description 237
- 210000004027 cell Anatomy 0.000 claims description 220
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 152
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 140
- 229920001184 polypeptide Polymers 0.000 claims description 134
- 108020001507 fusion proteins Proteins 0.000 claims description 97
- 102000037865 fusion proteins Human genes 0.000 claims description 95
- 102000053602 DNA Human genes 0.000 claims description 94
- 125000006850 spacer group Chemical group 0.000 claims description 88
- 230000027455 binding Effects 0.000 claims description 73
- 150000001413 amino acids Chemical class 0.000 claims description 69
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 59
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 34
- 210000005260 human cell Anatomy 0.000 claims description 20
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 16
- 210000000130 stem cell Anatomy 0.000 claims description 11
- 239000000833 heterodimer Substances 0.000 claims description 5
- 210000004263 induced pluripotent stem cell Anatomy 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 80
- 102000011931 Nucleoproteins Human genes 0.000 abstract description 8
- 108010061100 Nucleoproteins Proteins 0.000 abstract description 8
- 235000018102 proteins Nutrition 0.000 description 474
- 239000013612 plasmid Substances 0.000 description 152
- 125000005647 linker group Chemical group 0.000 description 120
- 230000014509 gene expression Effects 0.000 description 118
- 239000013598 vector Substances 0.000 description 97
- 108091079001 CRISPR RNA Proteins 0.000 description 88
- 241000588724 Escherichia coli Species 0.000 description 77
- 241000282414 Homo sapiens Species 0.000 description 67
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 66
- 239000000306 component Substances 0.000 description 66
- 125000003729 nucleotide group Chemical group 0.000 description 60
- 239000000203 mixture Substances 0.000 description 59
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 58
- 230000000694 effects Effects 0.000 description 56
- 239000002773 nucleotide Substances 0.000 description 52
- 239000000758 substrate Substances 0.000 description 50
- 239000013604 expression vector Substances 0.000 description 49
- 108091028043 Nucleic acid sequence Proteins 0.000 description 43
- 108091034117 Oligonucleotide Proteins 0.000 description 42
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 41
- 230000004927 fusion Effects 0.000 description 38
- 101710163270 Nuclease Proteins 0.000 description 37
- 239000002585 base Substances 0.000 description 37
- 230000001580 bacterial effect Effects 0.000 description 35
- 230000000295 complement effect Effects 0.000 description 34
- 239000000872 buffer Substances 0.000 description 33
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 31
- 239000013613 expression plasmid Substances 0.000 description 31
- 229910052739 hydrogen Inorganic materials 0.000 description 31
- 239000001257 hydrogen Substances 0.000 description 31
- 241000196324 Embryophyta Species 0.000 description 30
- 238000003776 cleavage reaction Methods 0.000 description 30
- 239000011780 sodium chloride Substances 0.000 description 29
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 description 28
- 238000013461 design Methods 0.000 description 28
- 230000007017 scission Effects 0.000 description 26
- 238000013518 transcription Methods 0.000 description 26
- 238000004519 manufacturing process Methods 0.000 description 25
- 230000035897 transcription Effects 0.000 description 25
- 230000001105 regulatory effect Effects 0.000 description 24
- 238000006243 chemical reaction Methods 0.000 description 23
- 230000007018 DNA scission Effects 0.000 description 22
- 238000000746 purification Methods 0.000 description 22
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 21
- 238000005842 biochemical reaction Methods 0.000 description 21
- 108010042407 Endonucleases Proteins 0.000 description 19
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 description 19
- 241000894007 species Species 0.000 description 19
- 102100031780 Endonuclease Human genes 0.000 description 18
- 239000007983 Tris buffer Substances 0.000 description 18
- 238000002835 absorbance Methods 0.000 description 18
- 238000003491 array Methods 0.000 description 18
- 238000010367 cloning Methods 0.000 description 18
- 238000010362 genome editing Methods 0.000 description 18
- 239000013615 primer Substances 0.000 description 18
- 230000010076 replication Effects 0.000 description 18
- 230000008685 targeting Effects 0.000 description 18
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 18
- 241000701022 Cytomegalovirus Species 0.000 description 17
- 102000004389 Ribonucleoproteins Human genes 0.000 description 17
- 108010081734 Ribonucleoproteins Proteins 0.000 description 17
- 238000003556 assay Methods 0.000 description 17
- 230000000875 corresponding effect Effects 0.000 description 17
- 238000012986 modification Methods 0.000 description 17
- 238000001542 size-exclusion chromatography Methods 0.000 description 17
- 108020004705 Codon Proteins 0.000 description 16
- 101100172748 Mus musculus Ethe1 gene Proteins 0.000 description 16
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 16
- 238000012217 deletion Methods 0.000 description 16
- 230000037430 deletion Effects 0.000 description 16
- 210000004962 mammalian cell Anatomy 0.000 description 16
- 230000004048 modification Effects 0.000 description 16
- 108020005004 Guide RNA Proteins 0.000 description 15
- 238000012545 processing Methods 0.000 description 15
- 239000011347 resin Substances 0.000 description 15
- 229920005989 resin Polymers 0.000 description 15
- 230000006870 function Effects 0.000 description 14
- 230000003993 interaction Effects 0.000 description 14
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 13
- 239000000499 gel Substances 0.000 description 13
- 239000000523 sample Substances 0.000 description 13
- 210000001519 tissue Anatomy 0.000 description 13
- 102000004190 Enzymes Human genes 0.000 description 12
- 108090000790 Enzymes Proteins 0.000 description 12
- 238000009396 hybridization Methods 0.000 description 12
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 12
- 230000008488 polyadenylation Effects 0.000 description 12
- 239000000047 product Substances 0.000 description 12
- 235000004252 protein component Nutrition 0.000 description 12
- 108020001580 protein domains Proteins 0.000 description 12
- 230000009261 transgenic effect Effects 0.000 description 12
- 230000009977 dual effect Effects 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 11
- 230000037361 pathway Effects 0.000 description 11
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 11
- 230000008439 repair process Effects 0.000 description 11
- 239000000243 solution Substances 0.000 description 11
- 108010031325 Cytidine deaminase Proteins 0.000 description 10
- 108091005804 Peptidases Proteins 0.000 description 10
- 239000004365 Protease Substances 0.000 description 10
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 10
- 108020004682 Single-Stranded DNA Proteins 0.000 description 10
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 10
- 108010006025 bovine growth hormone Proteins 0.000 description 10
- 238000005520 cutting process Methods 0.000 description 10
- 239000012634 fragment Substances 0.000 description 10
- 238000000527 sonication Methods 0.000 description 10
- 238000011144 upstream manufacturing Methods 0.000 description 10
- 108091033409 CRISPR Proteins 0.000 description 9
- 108091026890 Coding region Proteins 0.000 description 9
- 102100026846 Cytidine deaminase Human genes 0.000 description 9
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 9
- 241000701959 Escherichia virus Lambda Species 0.000 description 9
- 108060004795 Methyltransferase Proteins 0.000 description 9
- 210000004899 c-terminal region Anatomy 0.000 description 9
- 239000003623 enhancer Substances 0.000 description 9
- 229920002521 macromolecule Polymers 0.000 description 9
- 239000000463 material Substances 0.000 description 9
- 238000011160 research Methods 0.000 description 9
- 238000001890 transfection Methods 0.000 description 9
- 239000011534 wash buffer Substances 0.000 description 9
- 108091093088 Amplicon Proteins 0.000 description 8
- 241000894006 Bacteria Species 0.000 description 8
- 238000010354 CRISPR gene editing Methods 0.000 description 8
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 8
- -1 cas8 Proteins 0.000 description 8
- 239000003153 chemical reaction reagent Substances 0.000 description 8
- 238000012350 deep sequencing Methods 0.000 description 8
- 238000002337 electrophoretic mobility shift assay Methods 0.000 description 8
- 238000000126 in silico method Methods 0.000 description 8
- 239000012139 lysis buffer Substances 0.000 description 8
- 238000002156 mixing Methods 0.000 description 8
- 230000035772 mutation Effects 0.000 description 8
- 239000008188 pellet Substances 0.000 description 8
- 229920000642 polymer Polymers 0.000 description 8
- 150000003839 salts Chemical class 0.000 description 8
- 230000002103 transcriptional effect Effects 0.000 description 8
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 7
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 7
- 241000699670 Mus sp. Species 0.000 description 7
- 108091023040 Transcription factor Proteins 0.000 description 7
- 102000040945 Transcription factor Human genes 0.000 description 7
- 239000012190 activator Substances 0.000 description 7
- 238000000246 agarose gel electrophoresis Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 7
- 101150049463 cas5 gene Proteins 0.000 description 7
- 230000015556 catabolic process Effects 0.000 description 7
- 238000005277 cation exchange chromatography Methods 0.000 description 7
- 238000005119 centrifugation Methods 0.000 description 7
- 229960005091 chloramphenicol Drugs 0.000 description 7
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 7
- 238000006731 degradation reaction Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000001597 immobilized metal affinity chromatography Methods 0.000 description 7
- 238000000338 in vitro Methods 0.000 description 7
- 238000011534 incubation Methods 0.000 description 7
- 230000001404 mediated effect Effects 0.000 description 7
- 239000012528 membrane Substances 0.000 description 7
- 230000037353 metabolic pathway Effects 0.000 description 7
- 239000013600 plasmid vector Substances 0.000 description 7
- 230000007115 recruitment Effects 0.000 description 7
- 230000002441 reversible effect Effects 0.000 description 7
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 7
- 239000006228 supernatant Substances 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 230000037426 transcriptional repression Effects 0.000 description 7
- 230000033616 DNA repair Effects 0.000 description 6
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- 101710175625 Maltose/maltodextrin-binding periplasmic protein Proteins 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 6
- 101100495513 Mus musculus Cflar gene Proteins 0.000 description 6
- 108091081021 Sense strand Proteins 0.000 description 6
- 241000723792 Tobacco etch virus Species 0.000 description 6
- 230000004913 activation Effects 0.000 description 6
- 238000000137 annealing Methods 0.000 description 6
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000002068 genetic effect Effects 0.000 description 6
- 229910052759 nickel Inorganic materials 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000012743 protein tagging Effects 0.000 description 6
- 230000009870 specific binding Effects 0.000 description 6
- 229960000268 spectinomycin Drugs 0.000 description 6
- UNFWWIHTNXNPBV-WXKVUWSESA-N spectinomycin Chemical compound O([C@@H]1[C@@H](NC)[C@@H](O)[C@H]([C@@H]([C@H]1O1)O)NC)[C@]2(O)[C@H]1O[C@H](C)CC2=O UNFWWIHTNXNPBV-WXKVUWSESA-N 0.000 description 6
- 239000012536 storage buffer Substances 0.000 description 6
- 238000000108 ultra-filtration Methods 0.000 description 6
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 5
- 230000004568 DNA-binding Effects 0.000 description 5
- 108700026244 Open Reading Frames Proteins 0.000 description 5
- 238000012408 PCR amplification Methods 0.000 description 5
- 108091093037 Peptide nucleic acid Proteins 0.000 description 5
- 108010076818 TEV protease Proteins 0.000 description 5
- 108091046915 Threose nucleic acid Proteins 0.000 description 5
- 108700019146 Transgenes Proteins 0.000 description 5
- 238000007792 addition Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 239000011324 bead Substances 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 101150044165 cas7 gene Proteins 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000004132 cross linking Methods 0.000 description 5
- 239000000710 homodimer Substances 0.000 description 5
- 239000005457 ice water Substances 0.000 description 5
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 5
- 229930027917 kanamycin Natural products 0.000 description 5
- 229960000318 kanamycin Drugs 0.000 description 5
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 5
- 229930182823 kanamycin A Natural products 0.000 description 5
- 108020004999 messenger RNA Proteins 0.000 description 5
- 239000002243 precursor Substances 0.000 description 5
- 108091008146 restriction endonucleases Proteins 0.000 description 5
- 238000000926 separation method Methods 0.000 description 5
- 210000001082 somatic cell Anatomy 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000013519 translation Methods 0.000 description 5
- 229940035893 uracil Drugs 0.000 description 5
- 230000003612 virological effect Effects 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 4
- 101710149870 C-C chemokine receptor type 5 Proteins 0.000 description 4
- 102100035875 C-C chemokine receptor type 5 Human genes 0.000 description 4
- 239000007995 HEPES buffer Substances 0.000 description 4
- 241000238631 Hexapoda Species 0.000 description 4
- 241000124008 Mammalia Species 0.000 description 4
- 241000699666 Mus <mouse, genus> Species 0.000 description 4
- 229930193140 Neomycin Natural products 0.000 description 4
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 4
- 102000006382 Ribonucleases Human genes 0.000 description 4
- 108010083644 Ribonucleases Proteins 0.000 description 4
- 108091028664 Ribonucleotide Proteins 0.000 description 4
- 241001648840 Thosea asigna virus Species 0.000 description 4
- 229960000643 adenine Drugs 0.000 description 4
- 101150073130 ampR gene Proteins 0.000 description 4
- 229960000723 ampicillin Drugs 0.000 description 4
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 description 4
- 230000010307 cell transformation Effects 0.000 description 4
- 239000012468 concentrated sample Substances 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- 238000006471 dimerization reaction Methods 0.000 description 4
- 230000002255 enzymatic effect Effects 0.000 description 4
- 125000000524 functional group Chemical group 0.000 description 4
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 4
- 238000001727 in vivo Methods 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- 230000001939 inductive effect Effects 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 239000006166 lysate Substances 0.000 description 4
- 229910001629 magnesium chloride Inorganic materials 0.000 description 4
- 239000002609 medium Substances 0.000 description 4
- 229960004927 neomycin Drugs 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000002018 overexpression Effects 0.000 description 4
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 4
- 239000000546 pharmaceutical excipient Substances 0.000 description 4
- 239000002336 ribonucleotide Substances 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 241001515965 unidentified phage Species 0.000 description 4
- 108020005065 3' Flanking Region Proteins 0.000 description 3
- 108020005029 5' Flanking Region Proteins 0.000 description 3
- 229930024421 Adenine Natural products 0.000 description 3
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 3
- 108010011170 Ala-Trp-Arg-His-Pro-Gln-Phe-Gly-Gly Proteins 0.000 description 3
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 description 3
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 3
- 101100007792 Escherichia coli (strain K12) casB gene Proteins 0.000 description 3
- 101100382541 Escherichia coli (strain K12) casD gene Proteins 0.000 description 3
- 241001646716 Escherichia coli K-12 Species 0.000 description 3
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical class NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 3
- HVLSXIKZNLPZJJ-TXZCQADKSA-N HA peptide Chemical compound C([C@@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](C)C(O)=O)NC(=O)[C@H]1N(CCC1)C(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=C(O)C=C1 HVLSXIKZNLPZJJ-TXZCQADKSA-N 0.000 description 3
- 108010093488 His-His-His-His-His-His Proteins 0.000 description 3
- 102000003960 Ligases Human genes 0.000 description 3
- 108090000364 Ligases Proteins 0.000 description 3
- 101100387131 Myxococcus xanthus (strain DK1622) devS gene Proteins 0.000 description 3
- 102000009572 RNA Polymerase II Human genes 0.000 description 3
- 108010009460 RNA Polymerase II Proteins 0.000 description 3
- 241000714474 Rous sarcoma virus Species 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 3
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 3
- 241000194020 Streptococcus thermophilus Species 0.000 description 3
- 108091023045 Untranslated Region Proteins 0.000 description 3
- 230000003321 amplification Effects 0.000 description 3
- 230000008033 biological extinction Effects 0.000 description 3
- 238000006664 bond formation reaction Methods 0.000 description 3
- 101150106467 cas6 gene Proteins 0.000 description 3
- 238000004113 cell culture Methods 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000007795 chemical reaction product Substances 0.000 description 3
- 230000021615 conjugation Effects 0.000 description 3
- 210000004748 cultured cell Anatomy 0.000 description 3
- 239000005547 deoxyribonucleotide Substances 0.000 description 3
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010494 dissociation reaction Methods 0.000 description 3
- 230000005593 dissociations Effects 0.000 description 3
- 238000010828 elution Methods 0.000 description 3
- 239000012149 elution buffer Substances 0.000 description 3
- 239000012467 final product Substances 0.000 description 3
- 239000005090 green fluorescent protein Substances 0.000 description 3
- 238000010438 heat treatment Methods 0.000 description 3
- 238000004255 ion exchange chromatography Methods 0.000 description 3
- 238000005304 joining Methods 0.000 description 3
- 239000003446 ligand Substances 0.000 description 3
- 102000006392 myotrophin Human genes 0.000 description 3
- 108010058605 myotrophin Proteins 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 239000008194 pharmaceutical composition Substances 0.000 description 3
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 210000001236 prokaryotic cell Anatomy 0.000 description 3
- 239000012264 purified product Substances 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 125000002652 ribonucleotide group Chemical group 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000010583 slow cooling Methods 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 210000005253 yeast cell Anatomy 0.000 description 3
- WPGCGXIZQYAXHI-JIZZDEOASA-N 2-aminoacetic acid;(2s)-2-amino-3-hydroxypropanoic acid Chemical compound NCC(O)=O.NCC(O)=O.OC[C@H](N)C(O)=O WPGCGXIZQYAXHI-JIZZDEOASA-N 0.000 description 2
- 108091006112 ATPases Proteins 0.000 description 2
- 102000057290 Adenosine Triphosphatases Human genes 0.000 description 2
- 241000272517 Anseriformes Species 0.000 description 2
- 241000203069 Archaea Species 0.000 description 2
- 241000271566 Aves Species 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 235000004977 Brassica sinapistrum Nutrition 0.000 description 2
- 241000195940 Bryophyta Species 0.000 description 2
- 102100024217 CAMPATH-1 antigen Human genes 0.000 description 2
- 108010065524 CD52 Antigen Proteins 0.000 description 2
- 238000010440 CRISPR–Cas3 gene editing Methods 0.000 description 2
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 2
- 241000283707 Capra Species 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- 241000218631 Coniferophyta Species 0.000 description 2
- 238000011537 Coomassie blue staining Methods 0.000 description 2
- 108020001019 DNA Primers Proteins 0.000 description 2
- 101710177611 DNA polymerase II large subunit Proteins 0.000 description 2
- 101710184669 DNA polymerase II small subunit Proteins 0.000 description 2
- 239000003155 DNA primer Substances 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 102100024364 Disintegrin and metalloproteinase domain-containing protein 8 Human genes 0.000 description 2
- 241000672609 Escherichia coli BL21 Species 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- BCCRXDTUTZHDEU-VKHMYHEASA-N Gly-Ser Chemical compound NCC(=O)N[C@@H](CO)C(O)=O BCCRXDTUTZHDEU-VKHMYHEASA-N 0.000 description 2
- 102100038261 Glycerol-3-phosphate phosphatase Human genes 0.000 description 2
- 101710171812 Glycerol-3-phosphate phosphatase Proteins 0.000 description 2
- 102000000587 Glycerolphosphate Dehydrogenase Human genes 0.000 description 2
- 108010041921 Glycerolphosphate Dehydrogenase Proteins 0.000 description 2
- 235000010469 Glycine max Nutrition 0.000 description 2
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 2
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 2
- ZRALSGWEFCBTJO-UHFFFAOYSA-N Guanidine Chemical compound NC(N)=N ZRALSGWEFCBTJO-UHFFFAOYSA-N 0.000 description 2
- 108050008339 Heat Shock Transcription Factor Proteins 0.000 description 2
- 102000000039 Heat Shock Transcription Factor Human genes 0.000 description 2
- 101710154606 Hemagglutinin Proteins 0.000 description 2
- 108010068250 Herpes Simplex Virus Protein Vmw65 Proteins 0.000 description 2
- 241000430519 Human rhinovirus sp. Species 0.000 description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical group [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 2
- 102000004286 Hydroxymethylglutaryl CoA Reductases Human genes 0.000 description 2
- 108090000895 Hydroxymethylglutaryl CoA Reductases Proteins 0.000 description 2
- 102000002284 Hydroxymethylglutaryl-CoA Synthase Human genes 0.000 description 2
- 108010000775 Hydroxymethylglutaryl-CoA synthase Proteins 0.000 description 2
- 206010020649 Hyperkeratosis Diseases 0.000 description 2
- 108020004684 Internal Ribosome Entry Sites Proteins 0.000 description 2
- 241000699660 Mus musculus Species 0.000 description 2
- 101100219625 Mus musculus Casd1 gene Proteins 0.000 description 2
- 101100387128 Myxococcus xanthus (strain DK1622) devR gene Proteins 0.000 description 2
- 208000009869 Neu-Laxova syndrome Diseases 0.000 description 2
- 101710093908 Outer capsid protein VP4 Proteins 0.000 description 2
- 101710135467 Outer capsid protein sigma-1 Proteins 0.000 description 2
- 241001494479 Pecora Species 0.000 description 2
- 108091000080 Phosphotransferase Proteins 0.000 description 2
- 101710176177 Protein A56 Proteins 0.000 description 2
- 241000589774 Pseudomonas sp. Species 0.000 description 2
- 108091008103 RNA aptamers Proteins 0.000 description 2
- 241000700159 Rattus Species 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 108700008625 Reporter Genes Proteins 0.000 description 2
- 239000012506 Sephacryl® Substances 0.000 description 2
- 108091027568 Single-stranded nucleotide Proteins 0.000 description 2
- 244000062793 Sorghum vulgare Species 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- 108091028113 Trans-activating crRNA Proteins 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 2
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 2
- 101710185494 Zinc finger protein Proteins 0.000 description 2
- 102100023597 Zinc finger protein 816 Human genes 0.000 description 2
- 101150067314 aadA gene Proteins 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000004721 adaptive immunity Effects 0.000 description 2
- 238000001042 affinity chromatography Methods 0.000 description 2
- 125000000539 amino acid group Chemical group 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 238000010256 biochemical assay Methods 0.000 description 2
- 230000003115 biocidal effect Effects 0.000 description 2
- 238000010804 cDNA synthesis Methods 0.000 description 2
- 229910052791 calcium Inorganic materials 0.000 description 2
- 239000011575 calcium Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 101150055766 cat gene Proteins 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 230000008045 co-localization Effects 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000007865 diluting Methods 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 2
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 2
- 238000011143 downstream manufacturing Methods 0.000 description 2
- 238000001962 electrophoresis Methods 0.000 description 2
- 238000004520 electroporation Methods 0.000 description 2
- 210000002257 embryonic structure Anatomy 0.000 description 2
- 108010030074 endodeoxyribonuclease MluI Proteins 0.000 description 2
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 2
- 238000006911 enzymatic reaction Methods 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 108091006047 fluorescent proteins Proteins 0.000 description 2
- 102000034287 fluorescent proteins Human genes 0.000 description 2
- 238000012239 gene modification Methods 0.000 description 2
- 230000005017 genetic modification Effects 0.000 description 2
- 235000013617 genetically modified food Nutrition 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000000185 hemagglutinin Substances 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 101150109249 lacI gene Proteins 0.000 description 2
- 238000001638 lipofection Methods 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000013011 mating Effects 0.000 description 2
- 125000001360 methionine group Chemical group N[C@@H](CCSC)C(=O)* 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 235000019198 oils Nutrition 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000002953 phosphate buffered saline Substances 0.000 description 2
- 108010079892 phosphoglycerol kinase Proteins 0.000 description 2
- 150000008300 phosphoramidites Chemical class 0.000 description 2
- 102000020233 phosphotransferase Human genes 0.000 description 2
- 108010005636 polypeptide C Proteins 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 125000006239 protecting group Chemical group 0.000 description 2
- 210000001938 protoplast Anatomy 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 108010054624 red fluorescent protein Proteins 0.000 description 2
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 2
- 210000003705 ribosome Anatomy 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 235000000346 sugar Nutrition 0.000 description 2
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 238000011830 transgenic mouse model Methods 0.000 description 2
- 230000032258 transport Effects 0.000 description 2
- 108020003272 trehalose-phosphatase Proteins 0.000 description 2
- 239000013603 viral vector Substances 0.000 description 2
- AUTOLBMXDDTRRT-JGVFFNPUSA-N (4R,5S)-dethiobiotin Chemical compound C[C@@H]1NC(=O)N[C@@H]1CCCCCC(O)=O AUTOLBMXDDTRRT-JGVFFNPUSA-N 0.000 description 1
- KJTLQQUUPVSXIM-ZCFIWIBFSA-M (R)-mevalonate Chemical compound OCC[C@](O)(C)CC([O-])=O KJTLQQUUPVSXIM-ZCFIWIBFSA-M 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- CZVCGJBESNRLEQ-UHFFFAOYSA-N 7h-purine;pyrimidine Chemical compound C1=CN=CN=C1.C1=NC=C2NC=NC2=N1 CZVCGJBESNRLEQ-UHFFFAOYSA-N 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 108010052875 Adenine deaminase Proteins 0.000 description 1
- 235000001674 Agaricus brunnescens Nutrition 0.000 description 1
- 229920000936 Agarose Polymers 0.000 description 1
- 235000016626 Agrimonia eupatoria Nutrition 0.000 description 1
- 241000589158 Agrobacterium Species 0.000 description 1
- 241000589156 Agrobacterium rhizogenes Species 0.000 description 1
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 1
- 101710082579 Aminoglycoside N(6')-acetyltransferase type 1 Proteins 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 101100108891 Arabidopsis thaliana PRMT11 gene Proteins 0.000 description 1
- 241000205042 Archaeoglobus fulgidus Species 0.000 description 1
- 241000512259 Ascophyllum nodosum Species 0.000 description 1
- 235000000832 Ayote Nutrition 0.000 description 1
- 241000006382 Bacillus halodurans Species 0.000 description 1
- 108010077805 Bacterial Proteins Proteins 0.000 description 1
- 241000219310 Beta vulgaris subsp. vulgaris Species 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 102100026189 Beta-galactosidase Human genes 0.000 description 1
- 241001474374 Blennius Species 0.000 description 1
- 241001536303 Botryococcus braunii Species 0.000 description 1
- 241000219198 Brassica Species 0.000 description 1
- 235000011331 Brassica Nutrition 0.000 description 1
- 235000014698 Brassica juncea var multisecta Nutrition 0.000 description 1
- 240000002791 Brassica napus Species 0.000 description 1
- 235000006008 Brassica napus var napus Nutrition 0.000 description 1
- 235000006618 Brassica rapa subsp oleifera Nutrition 0.000 description 1
- 244000188595 Brassica sinapistrum Species 0.000 description 1
- QCMYYKRYFNMIEC-UHFFFAOYSA-N COP(O)=O Chemical class COP(O)=O QCMYYKRYFNMIEC-UHFFFAOYSA-N 0.000 description 1
- 101150018129 CSF2 gene Proteins 0.000 description 1
- 101100063818 Caenorhabditis elegans lig-1 gene Proteins 0.000 description 1
- 101000909256 Caldicellulosiruptor bescii (strain ATCC BAA-1888 / DSM 6725 / Z-1320) DNA polymerase I Proteins 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 101710167800 Capsid assembly scaffolding protein Proteins 0.000 description 1
- 108700004991 Cas12a Proteins 0.000 description 1
- 241000700198 Cavia Species 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241000195597 Chlamydomonas reinhardtii Species 0.000 description 1
- 244000249214 Chlorella pyrenoidosa Species 0.000 description 1
- 235000007091 Chlorella pyrenoidosa Nutrition 0.000 description 1
- 241000251556 Chordata Species 0.000 description 1
- 241000186570 Clostridium kluyveri Species 0.000 description 1
- 241000243321 Cnidaria Species 0.000 description 1
- 102000008169 Co-Repressor Proteins Human genes 0.000 description 1
- 108010060434 Co-Repressor Proteins Proteins 0.000 description 1
- 229920000742 Cotton Polymers 0.000 description 1
- 240000004244 Cucurbita moschata Species 0.000 description 1
- 235000009854 Cucurbita moschata Nutrition 0.000 description 1
- 235000009804 Cucurbita pepo subsp pepo Nutrition 0.000 description 1
- 241000159506 Cyanothece Species 0.000 description 1
- 241000272778 Cygnus atratus Species 0.000 description 1
- 102000005381 Cytidine Deaminase Human genes 0.000 description 1
- KJTLQQUUPVSXIM-UHFFFAOYSA-N DL-mevalonic acid Natural products OCCC(O)(C)CC(O)=O KJTLQQUUPVSXIM-UHFFFAOYSA-N 0.000 description 1
- 238000012287 DNA Binding Assay Methods 0.000 description 1
- 108010076804 DNA Restriction Enzymes Proteins 0.000 description 1
- 108091008102 DNA aptamers Proteins 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 108050008316 DNA endonuclease RBBP8 Proteins 0.000 description 1
- 102100039524 DNA endonuclease RBBP8 Human genes 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 102100029995 DNA ligase 1 Human genes 0.000 description 1
- 101710148291 DNA ligase 1 Proteins 0.000 description 1
- 102100033688 DNA ligase 3 Human genes 0.000 description 1
- 101710148290 DNA ligase 3 Proteins 0.000 description 1
- 108010093204 DNA polymerase theta Proteins 0.000 description 1
- 102100029766 DNA polymerase theta Human genes 0.000 description 1
- 230000008265 DNA repair mechanism Effects 0.000 description 1
- 108010046331 Deoxyribodipyrimidine photo-lyase Proteins 0.000 description 1
- 241000702421 Dependoparvovirus Species 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- 241000258955 Echinodermata Species 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 101100219622 Escherichia coli (strain K12) casC gene Proteins 0.000 description 1
- 101100438439 Escherichia coli (strain K12) ygbT gene Proteins 0.000 description 1
- 101100005249 Escherichia coli (strain K12) ygcB gene Proteins 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000272496 Galliformes Species 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 241001494297 Geobacter sulfurreducens Species 0.000 description 1
- 241001571609 Geothermobacter Species 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108010070675 Glutathione transferase Proteins 0.000 description 1
- 239000004471 Glycine Chemical class 0.000 description 1
- 244000068988 Glycine max Species 0.000 description 1
- 108091093094 Glycol nucleic acid Proteins 0.000 description 1
- 241000219146 Gossypium Species 0.000 description 1
- 108050002220 Green fluorescent protein, GFP Proteins 0.000 description 1
- 244000020551 Helianthus annuus Species 0.000 description 1
- 235000003222 Helianthus annuus Nutrition 0.000 description 1
- 102100029100 Hematopoietic prostaglandin D synthase Human genes 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000946926 Homo sapiens C-C chemokine receptor type 5 Proteins 0.000 description 1
- 101000968287 Homo sapiens Denticleless protein homolog Proteins 0.000 description 1
- 101000609277 Homo sapiens Inactive serine protease PAMR1 Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 101001113483 Homo sapiens Poly [ADP-ribose] polymerase 1 Proteins 0.000 description 1
- 101000611936 Homo sapiens Programmed cell death protein 1 Proteins 0.000 description 1
- 101000662902 Homo sapiens T cell receptor beta constant 2 Proteins 0.000 description 1
- 101000788669 Homo sapiens Zinc finger MYM-type protein 2 Proteins 0.000 description 1
- 241000701024 Human betaherpesvirus 5 Species 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 102100035692 Importin subunit alpha-1 Human genes 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 108010061833 Integrases Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 235000002678 Ipomoea batatas Nutrition 0.000 description 1
- 244000017020 Ipomoea batatas Species 0.000 description 1
- 241000713666 Lentivirus Species 0.000 description 1
- 241000270322 Lepidosauria Species 0.000 description 1
- 108060001084 Luciferase Proteins 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 241000195947 Lycopodium Species 0.000 description 1
- 108091007767 MALAT1 Proteins 0.000 description 1
- 241000282560 Macaca mulatta Species 0.000 description 1
- 241000218922 Magnoliophyta Species 0.000 description 1
- 240000003183 Manihot esculenta Species 0.000 description 1
- 235000016735 Manihot esculenta subsp esculenta Nutrition 0.000 description 1
- 241000196323 Marchantiophyta Species 0.000 description 1
- 240000004658 Medicago sativa Species 0.000 description 1
- 235000017587 Medicago sativa ssp. sativa Nutrition 0.000 description 1
- 241001042243 Methanocella arvoryzae MRE50 Species 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 101100224228 Mus musculus Lig1 gene Proteins 0.000 description 1
- 235000003805 Musa ABB Group Nutrition 0.000 description 1
- CHJJGSNFBQVOTG-UHFFFAOYSA-N N-methyl-guanidine Natural products CNC(N)=N CHJJGSNFBQVOTG-UHFFFAOYSA-N 0.000 description 1
- 241001250129 Nannochloropsis gaditana Species 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 244000061176 Nicotiana tabacum Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 108010027777 Nucleotide Deaminases Proteins 0.000 description 1
- 102000018809 Nucleotide Deaminases Human genes 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000010718 Oxidation Activity Effects 0.000 description 1
- 108700038250 PAM2-CSK4 Proteins 0.000 description 1
- 239000002033 PVDF binder Substances 0.000 description 1
- 241000282579 Pan Species 0.000 description 1
- 101100484946 Petunia hybrida VPY gene Proteins 0.000 description 1
- 241000286209 Phasianidae Species 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 241000013557 Plantaginaceae Species 0.000 description 1
- 235000015266 Plantago major Nutrition 0.000 description 1
- 102100023712 Poly [ADP-ribose] polymerase 1 Human genes 0.000 description 1
- 108010061844 Poly(ADP-ribose) Polymerases Proteins 0.000 description 1
- 102000012338 Poly(ADP-ribose) Polymerases Human genes 0.000 description 1
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 108010021757 Polynucleotide 5'-Hydroxyl-Kinase Proteins 0.000 description 1
- 102000008422 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 description 1
- 241000985694 Polypodiopsida Species 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 101710130420 Probable capsid assembly scaffolding protein Proteins 0.000 description 1
- 102100040678 Programmed cell death protein 1 Human genes 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 102000002067 Protein Subunits Human genes 0.000 description 1
- 241000125945 Protoparvovirus Species 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 108091093078 Pyrimidine dimer Proteins 0.000 description 1
- 101000902592 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) DNA polymerase Proteins 0.000 description 1
- 102000014450 RNA Polymerase III Human genes 0.000 description 1
- 108010078067 RNA Polymerase III Proteins 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- 102000018120 Recombinases Human genes 0.000 description 1
- 108010091086 Recombinases Proteins 0.000 description 1
- 241000712909 Reticuloendotheliosis virus Species 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 240000000111 Saccharum officinarum Species 0.000 description 1
- 235000007201 Saccharum officinarum Nutrition 0.000 description 1
- 241001138501 Salmonella enterica Species 0.000 description 1
- 241000593524 Sargassum patens Species 0.000 description 1
- 101710204410 Scaffold protein Proteins 0.000 description 1
- 101100206155 Schizosaccharomyces pombe (strain 972 / ATCC 24843) tbp1 gene Proteins 0.000 description 1
- 229920002684 Sepharose Polymers 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 241001333726 Shewanella putrefaciens CN-32 Species 0.000 description 1
- 241000700584 Simplexvirus Species 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 101001126150 Solanum lycopersicum Probable aquaporin PIP-type pTOM75 Proteins 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 235000011684 Sorghum saccharatum Nutrition 0.000 description 1
- 244000300264 Spinacia oleracea Species 0.000 description 1
- 235000009337 Spinacia oleracea Nutrition 0.000 description 1
- 229920002472 Starch Polymers 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 241000187392 Streptomyces griseus Species 0.000 description 1
- 235000021536 Sugar beet Nutrition 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 102100037298 T cell receptor beta constant 2 Human genes 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 241000255588 Tephritidae Species 0.000 description 1
- 101100329497 Thermoproteus tenax (strain ATCC 35583 / DSM 2078 / JCM 9277 / NBRC 100435 / Kra 1) cas2 gene Proteins 0.000 description 1
- 101100273269 Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) cse3 gene Proteins 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- 108010022394 Threonine synthase Proteins 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 108700029229 Transcriptional Regulatory Elements Proteins 0.000 description 1
- 102000008579 Transposases Human genes 0.000 description 1
- 108010020764 Transposases Proteins 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 244000098338 Triticum aestivum Species 0.000 description 1
- 102000006275 Ubiquitin-Protein Ligases Human genes 0.000 description 1
- 108010083111 Ubiquitin-Protein Ligases Proteins 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 241000607477 Yersinia pseudotuberculosis Species 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 102100025085 Zinc finger MYM-type protein 2 Human genes 0.000 description 1
- 241000193445 [Clostridium] stercorarium Species 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 102000005421 acetyltransferase Human genes 0.000 description 1
- 108020002494 acetyltransferase Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 230000006154 adenylylation Effects 0.000 description 1
- 210000004504 adult stem cell Anatomy 0.000 description 1
- 239000000443 aerosol Substances 0.000 description 1
- 230000010386 affect regulation Effects 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 210000003663 amniotic stem cell Anatomy 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 239000002543 antimycotic Substances 0.000 description 1
- 239000007864 aqueous solution Substances 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 125000004429 atom Chemical group 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 230000007940 bacterial gene expression Effects 0.000 description 1
- 230000010310 bacterial transformation Effects 0.000 description 1
- 210000003323 beak Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 108010005774 beta-Galactosidase Proteins 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000006287 biotinylation Effects 0.000 description 1
- 238000007413 biotinylation Methods 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 239000006172 buffering agent Substances 0.000 description 1
- 239000008364 bulk solution Substances 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- 238000002619 cancer immunotherapy Methods 0.000 description 1
- AEIMYPLMACPBOD-UHFFFAOYSA-N carbonocyanidic acid;nickel Chemical compound [Ni].OC(=O)C#N AEIMYPLMACPBOD-UHFFFAOYSA-N 0.000 description 1
- 210000004413 cardiac myocyte Anatomy 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 101150000705 cas1 gene Proteins 0.000 description 1
- 101150055191 cas3 gene Proteins 0.000 description 1
- 101150111685 cas4 gene Proteins 0.000 description 1
- 230000032823 cell division Effects 0.000 description 1
- 239000006285 cell suspension Substances 0.000 description 1
- 108091092356 cellular DNA Proteins 0.000 description 1
- 235000013339 cereals Nutrition 0.000 description 1
- 230000005465 channeling Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 235000013330 chicken meat Nutrition 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 101150085344 csa5 gene Proteins 0.000 description 1
- 108010082025 cyan fluorescent protein Proteins 0.000 description 1
- 230000009615 deamination Effects 0.000 description 1
- 238000006481 deamination reaction Methods 0.000 description 1
- 230000005860 defense response to virus Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000006114 demyristoylation Effects 0.000 description 1
- 210000004443 dendritic cell Anatomy 0.000 description 1
- 230000027832 depurination Effects 0.000 description 1
- 238000000502 dialysis Methods 0.000 description 1
- 102000004419 dihydrofolate reductase Human genes 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- SWSQBOPZIKWTGO-UHFFFAOYSA-N dimethylaminoamidine Natural products CN(C)C(N)=N SWSQBOPZIKWTGO-UHFFFAOYSA-N 0.000 description 1
- 235000004879 dioscorea Nutrition 0.000 description 1
- 239000002270 dispersing agent Substances 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000002500 effect on skin Effects 0.000 description 1
- 210000001671 embryonic stem cell Anatomy 0.000 description 1
- 108010026638 endodeoxyribonuclease FokI Proteins 0.000 description 1
- 230000009088 enzymatic function Effects 0.000 description 1
- 230000001036 exonucleolytic effect Effects 0.000 description 1
- 239000012091 fetal bovine serum Substances 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 210000000604 fetal stem cell Anatomy 0.000 description 1
- 210000003754 fetus Anatomy 0.000 description 1
- 210000002950 fibroblast Anatomy 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 238000003209 gene knockout Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000013595 glycosylation Effects 0.000 description 1
- 238000006206 glycosylation reaction Methods 0.000 description 1
- 210000003714 granulocyte Anatomy 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 210000003958 hematopoietic stem cell Anatomy 0.000 description 1
- 239000008241 heterogeneous mixture Substances 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 102000048160 human CCR5 Human genes 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 238000001114 immunoprecipitation Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 210000004964 innate lymphoid cell Anatomy 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 239000000543 intermediate Substances 0.000 description 1
- 238000007918 intramuscular administration Methods 0.000 description 1
- 238000001990 intravenous administration Methods 0.000 description 1
- 229940065638 intron a Drugs 0.000 description 1
- 108010011989 karyopherin alpha 2 Proteins 0.000 description 1
- 210000002510 keratinocyte Anatomy 0.000 description 1
- 210000003292 kidney cell Anatomy 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000029226 lipidation Effects 0.000 description 1
- 239000002502 liposome Substances 0.000 description 1
- 238000009630 liquid culture Methods 0.000 description 1
- 210000005229 liver cell Anatomy 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 210000005265 lung cell Anatomy 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 235000009973 maize Nutrition 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 240000004308 marijuana Species 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000010534 mechanism of action Effects 0.000 description 1
- 210000003593 megakaryocyte Anatomy 0.000 description 1
- 230000000442 meristematic effect Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 229910021645 metal ion Inorganic materials 0.000 description 1
- 235000019713 millet Nutrition 0.000 description 1
- 210000001616 monocyte Anatomy 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 1
- 210000004985 myeloid-derived suppressor cell Anatomy 0.000 description 1
- 230000007498 myristoylation Effects 0.000 description 1
- 239000002105 nanoparticle Substances 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 210000001178 neural stem cell Anatomy 0.000 description 1
- 230000006780 non-homologous end joining Effects 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 101150091418 pam1 gene Proteins 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000006320 pegylation Effects 0.000 description 1
- 239000000816 peptidomimetic Chemical class 0.000 description 1
- 150000008298 phosphoramidates Chemical class 0.000 description 1
- SXADIBFZNXBEGI-UHFFFAOYSA-N phosphoramidous acid Chemical group NP(O)O SXADIBFZNXBEGI-UHFFFAOYSA-N 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 230000037039 plant physiology Effects 0.000 description 1
- 210000001778 pluripotent stem cell Anatomy 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 229920002981 polyvinylidene fluoride Polymers 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 235000012015 potatoes Nutrition 0.000 description 1
- 230000003389 potentiating effect Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 230000002335 preservative effect Effects 0.000 description 1
- 230000004952 protein activity Effects 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 238000010379 pull-down assay Methods 0.000 description 1
- 230000002685 pulmonary effect Effects 0.000 description 1
- 235000015136 pumpkin Nutrition 0.000 description 1
- 239000013635 pyrimidine dimer Substances 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000037425 regulation of transcription Effects 0.000 description 1
- 230000009712 regulation of translation Effects 0.000 description 1
- 230000008263 repair mechanism Effects 0.000 description 1
- 230000001177 retroviral effect Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000007928 solubilization Effects 0.000 description 1
- 238000005063 solubilization Methods 0.000 description 1
- 210000004989 spleen cell Anatomy 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 235000019698 starch Nutrition 0.000 description 1
- 239000008107 starch Substances 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000007920 subcutaneous administration Methods 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 125000004434 sulfur atom Chemical group 0.000 description 1
- 239000000375 suspending agent Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000004114 suspension culture Methods 0.000 description 1
- 239000002562 thickening agent Substances 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 108091006106 transcriptional activators Proteins 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
- 239000003981 vehicle Substances 0.000 description 1
- 238000001262 western blot Methods 0.000 description 1
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K7/00—Peptides having 5 to 20 amino acids in a fully defined sequence; Derivatives thereof
- C07K7/04—Linear peptides containing only normal peptide links
- C07K7/06—Linear peptides containing only normal peptide links having 5 to 11 amino acids
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K7/00—Peptides having 5 to 20 amino acids in a fully defined sequence; Derivatives thereof
- C07K7/04—Linear peptides containing only normal peptide links
- C07K7/08—Linear peptides containing only normal peptide links having 12 to 20 amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/907—Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/09—Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/20—Fusion polypeptide containing a tag with affinity for a non-protein ligand
- C07K2319/21—Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a His-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/20—Fusion polypeptide containing a tag with affinity for a non-protein ligand
- C07K2319/22—Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a Strep-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/50—Fusion polypeptide containing protease site
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2800/00—Nucleic acids vectors
- C12N2800/80—Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
Definitions
- the present application contains a Sequence Listing that has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety.
- the ASCII copy, created on 11 Jan. 2019 is named CBI032-11_ST25.txt and is 2.2 MB in size.
- the present disclosure relates generally to engineered Class 1 Type I CRISPR-Cas (Cascade) systems that comprise multi-protein effector complexes, nucleoprotein complexes comprising Type I CRISPR-Cas subunit proteins and nucleic acid guides, polynucleotides encoding Type I CRISPR-Cas subunit proteins, and guide polynucleotides.
- the disclosure also relates to compositions and methods for making and using the engineered Type I CRISPR-Cas systems of the present invention.
- CRISPR-Cas systems Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) constitute CRISPR-Cas systems.
- the CRISPR-Cas systems provide adaptive immunity against foreign polynucleotides in bacteria and archaea (see, e.g., Barrangou, R., et al., Science 315:1709-1712 (2007); Makarova, K. S., et al., Nature Reviews Microbiology 9:467-477 (2011); Garneau, J. E., et al., Nature 468:67-71 (2010); Sapranauskas, R., et al., Nucleic Acids Research 39:9275-9282 (2011); Koonin, E.
- Class 1 comprising a multiprotein effector complex (Type I (CRISPR-associated complex for antiviral defense (“Cascade”) effector complex), Type III (Cmr/Csm effector complex), and Type IV); and Class 2 comprising a single effector protein (Type II (Cas9), Type V (Cas12a, previously referred to as Cpf1), and Type VI (Cas13a, previously referred to as C2c2)).
- Type I CRISPR-associated complex for antiviral defense (“Cascade”) effector complex
- Type III Cmr/Csm effector complex
- Type IV Type VI
- Type I is the most common and diverse
- Type III is more common in archaea than bacteria
- Type IV is least common.
- Type I systems comprise the signature Cas3 protein.
- the Cas3 protein has helicase and DNase domains responsible for DNA target sequence cleavage.
- seven subtypes of the Type I system have been identified (i.e., Type I-A, I-B, I-C, I-D, I-E, I-F (and variants for I-F (e.g., I-Fv1, I-Fv2), and I-U) that have a variable number of cas genes.
- Type I cas genes include, but are not limited to, the following: cas7, cas5, cas8, cse2, csa5, cas3, cast, cas4, cas1, and cash.
- Examples of organisms having Type I systems are as follows: I-A, Archaeoglobus fulgidus ; I-B, Clostridium kluyveri ; I-C, Bacillus halodurans ; I-U, Geobacter sulfurreducens ; I-D, Cyanothece sp. 8802; I-E, Escherichia coli K12; I-F, Yersinia pseudo-tuberculosis; I-F variant, Shewanella putrefaciens CN-32 (Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)).
- Type I systems typically encode proteins that combine with a CRISPR RNA (crRNA or “guide RNA”) to form a Cascade complex. These complexes comprise multiple proteins and a CRISPR RNA (crRNA), which are transcribed from this CRISPR locus.
- crRNA CRISPR RNA
- primary processing of a pre-crRNA is catalyzed by Cash. This typically results in a crRNA with a 5′ handle of 8 nucleotides, a spacer region, and a 3′ handle; both 5′ and 3′ handles are derived from the repeat sequence.
- the 3′ handle forms a stem-loop structure; in other systems, secondary processing of the 3′ end of crRNA is catalyzed by ribonuclease(s) (van der Oost, J., et al., Nature Reviews Microbiology 12:479-492 (2014)).
- the Cascade effector complexes of the Type I CRISPR-Cas systems comprise a backbone having paralogous Repeat-Associated Mysterious Proteins (RAMPs; e.g., Cas7 and Cas5 proteins) containing the RNA Recognition Motif (RRM) fold and additional “large” and “small” subunit proteins (see, e.g., Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78, FIG. 2 (2017)).
- RAMPs Repeat-Associated Mysterious Proteins
- RRM RNA Recognition Motif
- These Cascade effector complexes typically have a Cas5 subunit protein and several Cas7 subunit proteins.
- Such Cascade effector complexes also comprise the guide RNA.
- the Cascade effector complexes comprise the various subunit proteins arranged in an asymmetric fashion along the length of the guide RNA.
- the Cas5 subunit protein and the large subunit protein (Cas8 protein) are positioned at one end of the complex, enveloping the 5′ end of the guide RNA.
- Several copies of the small subunit protein interact with the guide RNA backbone, which is bound to multiple copies of the Cas7 subunit protein.
- the Cas6 subunit protein another RAMP protein, is associated with the Cascade effector complex primarily through association with the 3′ handle (repeat region) of the crRNA.
- the Cas6 subunit protein usually functions as the repeat-specific RNase involved in pre-crRNA processing; however, in Type I-C systems, Cas5 functions as the repeat-specific RNase and there is no Cas6.
- the adaptive immunity mechanism of action in the Type I CRISPR-Cas systems involves essentially three phases: adaptation, expression, and interference.
- adaptation phase a foreign DNA or RNA infects the host and proteins encoded by various cas genes bind regions of the infecting DNA or RNA. Such regions are called protospacers.
- a protospacer adjacent motif (PAM) is a short nucleotide sequence (e.g., 2 to 6 base pair DNA sequence) that is adjacent to the protospacer.
- PAM sequences are typically recognized by a Cas1 subunit protein/Cas2 subunit protein complex, wherein the active PAM-sensing site is associated with the Cas1 subunit proteins (Jackson, S. A., et al., Science 356:356(6333) (2017)).
- the CRISPR array comprising multiple spacer-repeat elements is transcribed as a single transcript.
- Individual spacer repeat elements are processed by an endonuclease (e.g., Type I, a Cas6 protein; and Type I-C, a Cas5 protein) into individual crRNAs.
- Cas subunit proteins are expressed and associate with the crRNA to form a Cascade effector complex.
- the Cascade effector complex scans foreign polynucleotides infecting the host to identify DNA complementary to the spacer.
- interference occurs when the effector complex identifies a sequence complementary to the spacer that is adjacent a PAM; and the Cas3 protein is recruited to the DNA-bound Cascade effector complex to cleave and progressively digest the foreign polynucleotide.
- Makarova, K. S., et al., (Cell 168:946 (2017)) provide a summary of genes, homologs, Cascade complexes, and mechanisms of action for Type I CRISPR-Cas systems.
- CRISPR-Cas systems have been used for genome editing, there remains a need to improve editing efficiency and editing fidelity of these systems.
- the present invention generally relates to compositions comprising engineered Type I CRISPR-Cas effector complexes, modified guide polynucleotides, and combinations thereof.
- a first engineered Type I CRISPR-Cas effector complex comprising,
- a first Cse2 subunit protein a first Cas5 subunit protein, a first Cas6 subunit protein, and a first Cas7 subunit protein
- a first fusion protein comprising a first Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a first guide polynucleotide comprising a first spacer capable of binding a first nucleic acid target sequence
- a second engineered Type I CRISPR-Cas effector complex comprising,
- a second fusion protein comprising a second Cas8 subunit protein and a second FokI, wherein the N-terminus of the second Cas8 subunit protein or the C-terminus of the second Cas8 protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the second linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a second guide polynucleotide comprising a second spacer capable of binding a second nucleic acid target sequence, wherein a protospacer adjacent motif (PAM) of the second nucleic acid target sequence and a PAM of the first nucleic acid target sequence have an interspacer distance between 20 base pairs (bp) to 42 bp.
- PAM protospacer adjacent motif
- the length of the first linker polypeptide and/or the second linker polypeptide is a length of between about 15 amino acids and about 30 amino acids, or between about 17 amino acids and about 20 amino acids. In one embodiment, the length of the first linker polypeptide and the second linker polypeptide are the same.
- Interspacer distances between the second nucleic acid target sequence and the first nucleic acid target sequence include, but are not limited to, between about 22 bp to about 40 bp, between about 26 bp to about 36 bp, between about 29 bp to about 35 bp, or between about 30 bp to about 34 bp.
- the first FokI and the second FokI can be monomeric subunits that are capable of associating to form a homodimer, or distinct subunits that are capable of associating to form a heterodimer.
- the N-terminus of the first Cas8 subunit protein is covalently connected by the first linker polypeptide to the C-terminus of the first FokI
- the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the N-terminus of the first FokI
- the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI
- the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the N-terminus of the second FokI
- the first Cas8 subunit protein and the second Cas8 subunit protein can each comprise a Cas8 subunit protein having a different sequence or both the first and the second Cas8 subunit protein can comprise identical amino acid sequences.
- first Cse2 subunit protein and the second Cse2 subunit protein can each comprise different or identical Cse2 subunit protein amino acid sequences
- first Cas5 subunit protein and the second Cas5 subunit protein can each comprise different or identical Cas5 subunit protein amino acid sequences
- first Cas6 subunit protein and the second Cas6 subunit protein can each comprise different or identical Cas6 subunit protein amino acid sequences
- first Cas7 subunit protein and the second Cas7 subunit protein can each comprise different or identical Cas7 subunit protein amino acid sequences, and combinations thereof.
- the guide polynucleotides comprise RNA.
- FIG. 1A present a generalized illustration of a Type I CRISPR-Cas effector complex.
- FIG. 1B presents a generalized illustration of a Type I CRISPR-Cas crRNA.
- FIG. 2A , FIG. 2B , and FIG. 2C present illustrative examples of two engineered Type I CRISPR-Cas effector complexes with fusion domains bound to neighboring spacer sequences.
- FIG. 3 presents information related to SEQ ID NO:1 to SEQ ID NO:351.
- FIG. 4A and FIG. 4B present examples of circularly permuted proteins.
- FIG. 5A , FIG. 5B , FIG. 6A , FIG. 6B , FIG. 7A , FIG. 7B , FIG. 7C , FIG. 8A , FIG. 8B , FIG. 9 , FIG. 10 , FIG. 11A , and FIG. 11B illustrate a variety of examples of engineered Type I CRISPR-Cas effector complexes of the present invention.
- FIG. 12A and FIG. 12B illustrate examples of substrate channels.
- FIG. 13A , FIG. 13B , and FIG. 13C present a generalized illustration of site-directed recruitment of a functional protein domain fused to a Cascade subunit protein by a dCas9:NATNA complex.
- FIG. 14A , FIG. 14B , FIG. 15A , FIG. 15B , and FIG. 15C illustrate examples of engineered Type I CRISPR-Cas effector complexes of the present invention.
- FIG. 16A , FIG. 16B , FIG. 16C , FIG. 17A , FIG. 17B , FIG. 17C , FIG. 18A , FIG. 18B , FIG. 18C , FIG. 19A , FIG. 19B , FIG. 19C , FIG. 19D , FIG. 20A , FIG. 20B , FIG. 21A , and FIG. 21B present examples of engineered Type I CRISPR-Cas effector complexes of the present invention and methods of use thereof.
- FIG. 22A , FIG. 22B , FIG. 22C , FIG. 22D , FIG. 23A , FIG. 23B , FIG. 23C , and FIG. 23D illustrate embodiments of the present invention that use a Cas3 protein comprising active endonuclease activity.
- FIG. 24A , FIG. 24B , FIG. 24C , FIG. 24D , FIG. 24E , FIG. 25 , FIG. 26 , FIG. 27 , and FIG. 28 present schematic diagrams of a variety of Cascade component expression systems.
- FIG. 29 , FIG. 30 , FIG. 31 , FIG. 32A , FIG. 32B , FIG. 33 , FIG. 34A , FIG. 34B , and FIG. 35 present data related to genome editing of the engineered Cascade systems of the present invention.
- CRISPR Clustered regularly interspaced short palindromic repeats
- Cas proteins CRISPR-associated proteins
- Cas protein As used herein, “Cas protein,” “CRISPR-Cas protein,” and “CRISPR-Cas subunit protein,” and “Cas subunit protein,” unless otherwise identified, all refer to Class 1 Type I CRISPR-Cas proteins.
- Cas subunit proteins are capable of interacting with one or more cognate polynucleotides (most typically, a crRNA) to form a Type I effector complex (most typically, a ribonucleoprotein complex).
- a crRNA most typically, a ribonucleoprotein complex
- Type I CRISPR-Cas effector complex “Cascade complex,” “Type I CRISPR-Cas nucleoprotein complex,” and “Type I complexes” are used interchangeably herein.
- the terms “Cascade RNP complex” and “Type I ribonucleoprotein (RNP) complex” refer to a Cascade complex specifically comprising a crRNA (versus a more generic guide polynucleotide, as described below).
- An example of a wild-type Type I CRISPR-Cas effector complex is illustrated in FIG. 1A .
- FIG. 1A is adapted from Makarova, K.
- FIG. 1A illustrates six Cas7 proteins, a Cas5 protein, a Cas8 protein, two Cse2 proteins, a Cas6 protein, and a crRNA associated as a Cascade complex.
- the complex is capable of binding a nucleic acid target sequence.
- the Cascade complex is capable of cleavage of a nucleic acid target sequence.
- the total number of some Cas subunit proteins can vary in Cascade complexes.
- Cas3 and Cas3 protein are used interchangeably herein to refer to Type I CRISPR-Cas3 proteins, modifications, and variants thereof.
- the Type I CRISPR-Cas effector complexes bind foreign DNA complementary to the crRNA guide and recruit Cas3, a trans-acting nuclease-helicase required for target degradation.
- Cas3 proteins have motifs characteristic of helicases from superfamily 2 and contain a DEAD/DEAH box region and a conserved C-terminal domain.
- Cas3 proteins and variants thereof are known in the art (see, e.g., Westra, E. R., et al., Mol Cell.
- dCas3* is a mutated Cas3 protein that does not have any nuclease activity and/or helicase activity.
- nuclease refers to an enzyme capable of cleaving the phosphodiester bonds, such as those connecting two nucleotides, as found in double-stranded (ds) nucleic acids (e.g., dsDNA, genomic DNA (gDNA), dsRNA), single-stranded (ss) nucleic acids (e.g., ssDNA, RNA) or hybrid dsRNA/DNA.
- ds double-stranded
- gDNA genomic DNA
- ssRNA single-stranded nucleic acids
- ssDNA single-stranded nucleic acids
- hybrid dsRNA/DNA hybrid dsRNA/DNA.
- An “endonuclease” typically can effect ss- (nicks) or ds-breaks in its target molecules.
- a DNA endonuclease is a FokI enzyme.
- “FokI endonuclease” and “FokI” are used interchangeably herein and refer to a FokI enzyme, FokI homologs, enzymatically active domain(s) of FokI enzymes, and variants of FokI enzymes. FokI dimerization is typically required for DNA cleavage.
- Dimers of FokI can comprise two monomeric subunits that associate to form a homodimer or two distinct monomeric subunits that associate to form a heterodimer (see, e.g., Bitinaite, J., et al., Proceedings of the National Academy of Sciences 95(18):10570-10575 (1998); Ramalingam, S., et al., Journal of Molecular Biology, 405(3):630-641 (2011)).
- One example of a FokI variant is the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)). Additional DNA and RNA nucleases are known in the art.
- CRISPR RNA refers to one or more RNAs with which Cas subunit proteins are capable of interacting to form a Type I effector complex that guides the complex to preferentially bind a nucleic acid target sequence in a polynucleotide (relative to a polynucleotide that does not comprise the nucleic acid target sequence).
- Guide polynucleotide refers to the polynucleotide component of Type I effector complexes and can comprise ribonucleotide bases (e.g., RNA), deoxyribonucleotide bases (e.g., DNA), combinations of ribonucleotide bases and deoxyribonucleotide bases, nucleotides, nucleotide analogs, modified nucleotides, and the like, as well as synthetic, naturally occurring, and non-naturally occurring modified backbone residues or linkages, for example, as described herein.
- ribonucleotide bases e.g., RNA
- deoxyribonucleotide bases e.g., DNA
- combinations of ribonucleotide bases and deoxyribonucleotide bases e.g., nucleotide bases, nucleotide analogs, modified nucleotides, and the like, as well as synthetic, naturally occurring, and non-naturally occurring
- FIG. 1B An example of a Type I CRISPR-Cas crRNA associated with a nucleic acid target sequence through the crRNA spacer is illustrated in FIG. 1B .
- FIG. 1B is adapted from Hochstrasser, M. L., et al., Molecular Cell 63(5):840-851 (2016).
- FIG. 1B the PAM associated with the nucleic acid target sequence and the 5′ and 3′ strands of a double-stranded nucleic acid are illustrated ( FIG. 1B , vertical lines represent hydrogen bonds).
- a guide polynucleotide typically comprises a 5′ handle region ( FIG. 1B, 5 ′ Handle Region), a spacer region ( FIG.
- FIG. 1B Spacer
- FIG. 1B illustrates the Cascade complex spacer bound to the nucleic acid target sequences ( FIG. 1B , vertical lines represent hydrogen bonds).
- FIG. 1B also illustrates the protospacer region ( FIG. 1B , protospacer).
- the spacer can comprise a region of the crRNA between about 6 to about 56 nucleotides, wherein the spacer is complementary to a nucleic acid target sequence in a polynucleotide.
- the spacer length can be modified to fine-tune Cascade activity in Type I-E CRISPR-Cas systems.
- Cascade complexes can incorporate an extra Cas7 subunit with every 6 nucleotides added to the crRNA spacer and an extra Cse2 subunit with every 12 nucleotides added to the spacer (Luo, M. L., et al., Nucleic Acids Research. 44(15):7385-7394 (2016)).
- the spacer typically comprises a region of between about 32 to about 36 nucleotides.
- spacer spacer sequence
- nucleic acid target binding sequence are used interchangeably herein.
- a “stem element” or “stem structure” refers to two strands of nucleic acids that are known or predicted to form a double-stranded region (the “stem element”).
- a “stem-loop element” or “stem-loop structure” refers to a stem structure wherein 3′-end sequences of one strand are covalently bonded to 5′-end sequences of the second strand by a nucleotide sequence of typically single-stranded nucleotides (“a stem-loop element nucleotide sequence”).
- the loop element comprises a loop element nucleotide sequence of between about 3 and about 20 nucleotides in length, preferably between about 4 and about 10 nucleotides in length.
- a loop element nucleotide sequence is a single-stranded nucleotide sequence of unpaired nucleic acid bases that do not interact through hydrogen bond formation to create a stem element within the loop element nucleotide sequence.
- the term “hairpin element” is also used herein to refer to stem-loop structures. Such structures are well known in the art.
- the base pairing may be exact; however, as is known in the art, a stem element does not require exact base pairing.
- the stem element may include one or more base mismatches or non-paired bases.
- FIG. 1B An example of a stem-loop structure in a guide polynucleotide is illustrated in FIG. 1B .
- linker element nucleotide sequence refers to either a single-stranded nucleic acid sequence or a double-stranded nucleic acid sequence of one or more nucleotides covalently attached to a first nucleic acid sequence (e.g., 5′-linker nucleotide sequence-first nucleic acid sequence-3′).
- a linker nucleotide sequence connects two separate nucleic acid sequences to form a single polynucleotide (e.g., 5′-first nucleic acid sequence-linker nucleotide sequence-second nucleic acid sequence-3′).
- linker nucleotide sequences include, but are not limited to, 5′-first nucleic acid sequence-linker nucleotide sequence-3′ and 5′-linker nucleotide sequence-first first nucleic acid sequence-linker nucleotide sequence-3′.
- the linker element nucleotide sequence can be a single-stranded nucleotide sequence of unpaired nucleic acid bases that do not interact with each other through hydrogen bond formation to create a secondary structure (e.g., a stem-loop structure) within the linker element nucleotide sequence.
- two linker element nucleotide sequences can interact with each other through hydrogen bonding between the two linker element nucleotide sequences.
- a linker polynucleotide encodes a “linker polypeptide.”
- Such a linker polynucleotide typically connects the 3′ end of a first polynucleotide encoding a first polypeptide to the 5′ end of a second polynucleotide encoding a second polypeptide to form a single polynucleotide that encodes a fusion protein comprising N—the first polypeptide—the linker polypeptide—the second polypeptide—C.
- linker polypeptides e.g., N-a first polypeptide-a first linker polypeptide-a second polypeptide-a second linker polypeptide-a third polypeptide-C.
- Linker polypeptide, “linker polypeptide sequence,” “amino acid linker sequence,” and “linker sequence” are used interchangeably herein.
- a “connecting nucleotide sequence” refers to a single-stranded nucleic acid sequence linker sequence that covalently connects a first nucleic acid sequence and a second nucleic acid sequence.
- interspacer As used herein, the terms “interspacer,” “interspacer region,” and “interspacer distance” are used interchangeably and refer to the distance between a PAM of a first nucleic acid target sequence (e.g., a first DNA target sequence) and a PAM of a second nucleic acid target sequence (e.g., a second DNA target sequence) typically in a PAM-in orientation, wherein a first Type I CRISPR-Cas effector complex comprises a first spacer capable of binding the first nucleic acid target sequence, and a second Type I CRISPR-Cas effector complex comprises a second spacer capable of binding the second nucleic acid target sequence.
- FIG. 1 present illustrative examples of two Type I CRISPR-Cas effector complexes (“Cascade1” comprising “crRNA1” and “Cascade2” comprising “crRNA2”) comprising fusion proteins (“FP1” and “FP2”; e.g., FokI) connected with each Cascade complex through linker polynucleotides (“Linker1” and “Linker2”), wherein the CRISPR-Cas effector complexes are bound to neighboring nucleic acid target sequences on double-stranded DNA (“dsDNA”).
- PAM sequences associated with each nucleic acid target sequence are indicated (“PAM1,” open box, and “PAM2,” open box)).
- FIG. 2A illustrates an interspacer (shown as a double-arrowheaded line) between two target sites in a PAM-in (PAM-in/PAM-in) configuration.
- FIG. 2B illustrates an interspacer (shown as a double-arrowheaded line) between two target sites in a PAM-in/PAM-out configuration.
- FIG. 2C illustrates an interspacer between two target sites in the PAM-out (PAM-out/PAM-out) configuration.
- FIG. 2A , FIG. 2B , and FIG. 2C also illustrate the separation of the two strands of the dsDNA.
- a Cascade complex recognizes a dsDNA target sequence adjacent a PAM. PAM sequences are recognized by Cse1. Base pairing between the crRNA and complementary target DNA strand results in an R-loop with the displaced non-complementary target DNA strand (Beloglazova, N., et al., Nucleic Acids Research 43(1):530-543 (2015)
- cognate typically refers to a group of Cas subunit proteins (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8) and one or more guide polynucleotides (e.g., a Type I CRISPR-Cas RNA) that are capable of forming a nucleoprotein complex capable of site-directed binding to a nucleic acid target sequence complementary to a spacer present in one of the one or more guide polynucleotides.
- Cas subunit proteins e.g., Cse2, Cas5, Cas6, Cas7, and Cas8
- guide polynucleotides e.g., a Type I CRISPR-Cas RNA
- wild-type “naturally occurring,” and “unmodified” are used herein to mean the typical (or most common) form, appearance, phenotype, or strain existing in nature; for example, the typical form of cells, organisms, polynucleotides, proteins, macromolecular complexes, genes, RNAs, DNAs, or genomes as they occur in, and can be isolated from, a source in nature.
- the wild-type form, appearance, phenotype, or strain serve as the original parent before an intentional modification.
- mutant, variant, engineered, recombinant, and modified forms are not wild-type forms.
- engineered As used herein, the terms “engineered,” “genetically engineered,” “recombinant,” “modified,” “non-naturally occurring,” “non-natural,” and “non-native” are interchangeable and indicate intentional human manipulation.
- Covalent bond “Covalent bond,” “covalently attached,” “covalently bound,” “covalently linked,” “covalently connected,” and “molecular bond” are used interchangeably herein and refer to a chemical bond that involves the sharing of electron pairs between atoms.
- covalent bonds include, but are not limited to, phosphodiester bonds, phosphorothioate bonds, disulfide bonds and peptide bonds (—CO—NH—).
- Non-covalent bond “Non-covalent bond,” “non-covalently attached,” “non-covalently bound,” “non-covalently linked,” “non-covalent interaction,” and “non-covalently connected” are used interchangeably herein and refer to any relatively weak chemical bond that does not involve sharing of a pair of electrons. Multiple non-covalent bonds often stabilize the conformation of macromolecules and mediate specific interactions between molecules. Examples of non-covalent bonds include, but are not limited to, hydrogen bonding, ionic interactions (e.g., Na + Cl ⁇ ), van der Waals interactions, and hydrophobic bonds.
- hydrogen bonding As used herein, “hydrogen bonding,” “hydrogen-base pairing,” and “hydrogen bonded” are used interchangeably and refer to canonical hydrogen bonding and non-canonical hydrogen bonding including, but not limited to, “Watson-Crick-hydrogen-bonded base pairs” (W—C-hydrogen-bonded base pairs or W—C hydrogen bonding); “Hoogsteen-hydrogen-bonded base pairs” (Hoogsteen hydrogen bonding); and “wobble-hydrogen-bonded base pairs” (wobble hydrogen bonding).
- W—C hydrogen bonding refers to purine-pyrimidine base pairing, e.g., adenine:thymine, guanine:cytosine, and uracil:adenine.
- Hoogsteen hydrogen bonding refers to a variation of base pairing in nucleic acids wherein two nucleobases, one on each strand, are held together by hydrogen bonds in the major groove. This non-W—C hydrogen bonding can allow a third strand to wind around a duplex and form triple-stranded helices.
- Wobble hydrogen bonding refers to a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. There are four major wobble base pairs: guanine:uracil, inosine (hypoxanthine):uracil, inosine-adenine, and inosine-cytosine. Rules for canonical hydrogen bonding and non-canonical hydrogen bonding are known to those of ordinary skill in the art (see, e.g., The RNA World, Third Edition (Cold Spring Harbor Monograph Series), R. F.
- Connect “Connect,” “connected,” and “connecting” are used interchangeably herein and refer to a covalent bond or a non-covalent bond between two macromolecules (e.g., polynucleotides, proteins, and the like).
- macromolecules e.g., polynucleotides, proteins, and the like.
- nucleic acid sequence As used herein, the terms “nucleic acid sequence,” “nucleotide sequence,” and “oligonucleotide” are interchangeable and refer to a polymeric form of nucleotides.
- polynucleotide refers to a polymeric form of nucleotides that has one 5′ end and one 3′ end, and can comprise one or more nucleic acid sequences.
- a “circular polynucleotide” refers to a polynucleotide having a covalent bond between its 5′ end and 3′ end, thus forming the circular polynucleotide.
- the nucleotides may be deoxyribonucleotides (DNA), ribonucleotides (RNA), analogs thereof, or combinations thereof, and may be of any length.
- Polynucleotides may perform any function and may have various secondary and tertiary structures.
- the terms encompass known analogs of natural nucleotides and nucleotides that are modified in the base, sugar, and/or phosphate moieties. Analogs of a particular nucleotide have the same base-pairing specificity (e.g., an analog of A base pairs with T).
- a polynucleotide may comprise one modified nucleotide or multiple modified nucleotides.
- modified nucleotides include, but are not limited to, fluorinated nucleotides, methylated nucleotides, and nucleotide analogs.
- Nucleotide structure may be modified before or after a polymer is assembled. Following polymerization, polynucleotides may be additionally modified via, for example, conjugation with a labeling component or target binding component.
- a nucleotide sequence may incorporate non-nucleotide components. Also encompassed are nucleic acids comprising modified backbone residues or linkages, that are synthetic, naturally occurring, and/or non-naturally occurring, and have similar binding properties as a reference polynucleotide (e.g., DNA or RNA).
- Examples of such analogs include, but are not limited to, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), Locked Nucleic Acid (LNATM) (Exiqon, Inc., Woburn, Mass.) nucleosides, glycol nucleic acid, bridged nucleic acids, and morpholino structures.
- PNAs peptide-nucleic acids
- LNATM Locked Nucleic Acid
- PNAs Peptide-nucleic acids
- PNAs are synthetic homologs of nucleic acids wherein the polynucleotide phosphate-sugar backbone is replaced by a flexible pseudo-peptide polymer, and nucleobases are linked to the polymer.
- PNAs have the capacity to hybridize with high affinity and specificity to complementary sequences of RNA and DNA.
- the phosphorothioate (PS) bond substitutes a sulfur atom for a non-bridging oxygen in the polynucleotide phosphate backbone. This modification makes the internucleotide linkage resistant to nuclease degradation.
- phosphorothioate bonds are introduced between the last 3 to 5 nucleotides at the 5′-end or 3′-end sequences of a polynucleotide sequence to inhibit exonuclease degradation. Placement of phosphorothioate bonds throughout an entire oligonucleotide helps reduce degradation by endonucleases, as well.
- Threose nucleic acid is an artificial genetic polymer.
- the backbone structure of TNA comprises repeating threose sugars linked by phosphodiester bonds.
- TNA polymers are resistant to nuclease degradation.
- TNA can self-assemble by base-pair hydrogen bonding into duplex structures.
- Linkage inversions can be introduced into polynucleotides through use of “reversed phosphoramidites” (see, e.g., www.ucalgary.ca/dnalab/synthesis/-modifications/linkages).
- a 3′-3′ linkage at a terminus of a polynucleotide stabilizes the polynucleotide to exonuclease degradation by creating an oligonucleotide having two 5′-OH termini but lacking a 3′-OH terminus.
- such polynucleotides have phosphoramidite groups on the 5′-OH position and a dimethoxytrityl (DMT) protecting group on the 3′-OH position. Normally, the DMT protecting group is on the 5′-OH and the phosphoramidite is on the 3′-OH.
- DMT dimethoxytrityl
- Polynucleotide sequences are displayed herein in the conventional 5′ to 3′ orientation unless otherwise indicated.
- sequence identity generally refers to the percent identity of nucleotide bases or amino acids comparing a first polynucleotide or polypeptide to a second polynucleotide or polypeptide using algorithms having various weighting parameters.
- Sequence identity between two polynucleotides or two polypeptides can be determined using sequence alignment by various methods and computer programs (e.g., BLAST, CS-BLAST, PSI-BLAST, FASTA, HMMER, L-ALIGN, and the like) available through the worldwide web at sites including, but not limited to, GENBANK (www.ncbi.nlm.nih.gov/genbank/) and EMBL-EBI (www.ebi.ac.uk). Sequence identity between two polynucleotides or two polypeptide sequences is generally calculated using the standard default parameters of the various methods or computer programs.
- a high degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 90% identity and 100% identity, for example, about 90% identity or higher, preferably about 95% identity or higher, more preferably about 98% identity or higher.
- a moderate degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 80% identity to about 85% identity, for example, about 80% identity or higher, preferably about 85% identity.
- a low degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 50% identity and 75% identity, for example, about 50% identity, preferably about 60% identity, more preferably about 75% identity.
- a Cas protein (e.g., Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8) comprising amino acid substitutions can have a low degree of sequence identity, a moderate degree of sequence identity, or a high degree of sequence identity over its length to a reference Cas protein (e.g., wild-type Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8, respectively).
- a reference Cas protein e.g., wild-type Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8, respectively.
- a guide polynucleotide can have a low degree of sequence identity, a moderate degree of sequence identity, or a high degree of sequence identity over its length compared with a reference wild-type guide polynucleotide that complexes with the reference Cas proteins (e.g., a guide polynucleotide that forms a complex with a Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8).
- a reference wild-type guide polynucleotide that complexes with the reference Cas proteins e.g., a guide polynucleotide that forms a complex with a Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8.
- hybridization is the process of combining two complementary single-stranded DNA or RNA molecules so as to form a single double-stranded molecule (DNA/DNA, DNA/RNA, RNA/RNA) through hydrogen base pairing.
- Hybridization stringency is typically determined by the hybridization temperature and the salt concentration of the hybridization buffer; e.g., high temperature and low salt provide high stringency hybridization conditions. Examples of salt concentration ranges and temperature ranges for different hybridization conditions are as follows: high stringency, approximately 0.01M to approximately 0.05M salt, hybridization temperature 5° C. to 10° C.
- T m of duplex nucleic acid sequences is calculated by standard methods well known in the art (see, e.g., Maniatis, T., et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press: New York (1982); Casey, J., et al., Nucleic Acids Research 4:1539-1552 (1977); Bodkin, D.
- High stringency conditions for hybridization typically refer to conditions under which a polynucleotide complementary to a target sequence predominantly hybridizes with the target sequence and substantially does not hybridize to non-target sequences.
- hybridization conditions are of moderate stringency, preferably high stringency.
- complementarity refers to the ability of a nucleic acid sequence to form hydrogen bond(s) with another nucleic acid sequence (e.g., through canonical Watson-Crick base pairing). A percent complementarity indicates the percentage of residues in a nucleic acid sequence that can form hydrogen bonds with a second nucleic acid sequence. If two nucleic acid sequences have 100% complementarity, the two sequences are perfectly complementary, i.e., all of the contiguous residues of a first polynucleotide hydrogen bond with the same number of contiguous residues in a second polynucleotide.
- binding refers to a non-covalent interaction between macromolecules (e.g., between a protein and a polynucleotide, between a polynucleotide and a polynucleotide, between a protein and a protein, and the like). Such non-covalent interaction is also referred to as “associating” or “interacting” (e.g., if a first macromolecule interacts with a second macromolecule, the first macromolecule binds to second macromolecule in a non-covalent manner).
- Sequence-specific binding typically refers to one or more guide polynucleotides capable of forming a complex with Type I CRISPR-Cas subunit proteins (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8) to cause the protein to bind a nucleic acid sequence (e.g., a DNA sequence) comprising a nucleic acid target sequence (e.g., a DNA target sequence) preferentially relative to a second nucleic acid sequence (e.g., a second DNA sequence) without the nucleic acid target binding sequence (e.g., the DNA target binding sequence).
- a nucleic acid sequence e.g., a DNA sequence
- a nucleic acid target sequence e.g., a DNA target sequence
- second nucleic acid sequence e.g., a second DNA sequence
- Binding interactions can be characterized by a dissociation constant (Kd). “Binding affinity” refers to the strength of the binding interaction. An increased binding affinity is correlated with a lower Kd.
- effector complexes are said to “target” a polynucleotide if such a complex binds or cleaves a polynucleotide in the nucleic acid target sequence within the polynucleotide.
- a “double-strand break” refers to both strands of a double-stranded segment of DNA being severed. In some instances, if such a break occurs, one strand can be said to have a “sticky end” wherein nucleotides are exposed and not hydrogen bonded to nucleotides on the other strand. In other instances, a “blunt end” can occur wherein both strands remain fully base paired with each other.
- Donor polynucleotide can be a double-stranded polynucleotide (e.g., DNA), a single-stranded polynucleotide (e.g., DNA or RNA), or a combination thereof.
- Donor polynucleotides can comprise homology arms flanking the insertion sequence (e.g., DSBs in the DNA). The homology arms on each side can vary in length (e.g., 1-50 bases, 50-100 bases, 100-200 bases, 200-300 bases, 300-500 bases, 500-1000 bases). Homology arms can be symmetric or asymmetric in length.
- HDR refers to DNA repair that takes place in cells, for example, during repair of a DSB in genomic DNA.
- HDR requires nucleotide sequence homology and uses a donor or template polynucleotide to repair the sequence wherein the DSB (e.g., within a DNA target sequence) occurred.
- the donor polynucleotide generally has the requisite sequence homology with the sequence flanking the DSB so that the donor polynucleotide can serve as a suitable template for repair.
- HDR results in the transfer of genetic information from, for example, the donor polynucleotide to the DNA target sequence.
- HDR may result in alteration of the DNA target sequence (e.g., insertion, deletion, or mutation) if the donor polynucleotide sequence differs from the DNA target sequence and part or all of the donor polynucleotide is incorporated into the DNA target sequence.
- an entire donor polynucleotide, a portion of the donor polynucleotide, or a copy of the donor polynucleotide is integrated at the site of the DNA target sequence.
- a donor polynucleotide can be used for repair of the break in the DNA target sequence, wherein the repair results in the transfer of genetic information from the donor polynucleotide at the site or in close proximity of the break in the DNA. Accordingly, new genetic information may be inserted or copied at a DNA target sequence.
- a “genomic region” is a segment of a chromosome in the genome of a host cell that is present on either side of the nucleic acid target sequence site or, alternatively, also includes a portion of the nucleic acid target sequence site.
- the homology arms of the donor polynucleotide have sufficient homology to undergo homologous recombination with the corresponding genomic regions.
- the homology arms of the donor polynucleotide share significant sequence homology to the genomic region immediately flanking the nucleic acid target sequence site; it is recognized that the homology arms can be designed to have sufficient homology to genomic regions farther from the nucleic acid target sequence site.
- non-homologous end joining refers to the repair of a DSB in DNA by direct ligation of one terminus of the break to the other terminus of the break without a requirement for a donor polynucleotide.
- NHEJ is a DNA repair pathway available to cells to repair DNA without the use of a repair template. NHEJ in the absence of a donor polynucleotide often results in nucleotides being randomly inserted or deleted at the site of the DSB.
- MMEJ Microhomology-mediated end joining
- DNA repair encompasses any process whereby cellular machinery repairs damage to a DNA molecule contained in the cell.
- the damage repaired can include ss-breaks or DSBs. At least three mechanisms exist to repair DSBs: HDR, NHEJ, and MMEJ.
- DNA repair is also used herein to refer to DNA repair resulting from human manipulation, wherein a target locus is modified, e.g., by inserting, deleting, or substituting nucleotides, all of which represent forms of genome editing.
- recombination refers to a process of exchange of genetic information between two polynucleotides.
- regulatory sequences As used herein, the terms “regulatory sequences,” “regulatory elements,” and “control elements” are interchangeable and refer to polynucleotide sequences that are upstream (5′ non-coding sequences), within, or downstream (3′ non-translated sequences) of a polynucleotide target to be expressed. Regulatory sequences influence, for example, the timing of transcription, amount or level of transcription, RNA processing or stability, and/or translation of the related structural nucleotide sequence.
- Regulatory sequences may include activator binding sequences, enhancers, introns, polyadenylation recognition sequences, promoters, transcription start sites, repressor binding sequences, stem-loop structures, translational initiation sequences, internal ribosome entry sites (IRES), translation leader sequences, transcription termination sequences (e.g., polyadenylation signals and poly-U sequences), translation termination sequences, primer binding sites, and the like.
- a vector comprises one or more pol III promoters, one or more pol II promoters, one or more pol I promoters, or combinations thereof.
- pol III promoters include, but are not limited to, U6 and H1 promoters.
- pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer; see, e.g., Boshart, M., et al., Cell 41:521-530 (1985)), the SV40 promoter, the dihydrofolate reductase promoter, the (3-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1 ⁇ promoter.
- RSV Rous sarcoma virus
- CMV cytomegalovirus
- PGK phosphoglycerol kinase
- an expression vector may depend on such factors as the choice of the host cell to be transformed, the level of expression desired, and the like.
- a vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acid sequences as described herein.
- Gene refers to a polynucleotide sequence comprising exon(s) and related regulatory sequences.
- a gene may further comprise intron(s) and/or untranslated region(s) (UTR(s)).
- operably linked refers to polynucleotide sequences or amino acid sequences placed into a functional relationship with one another.
- regulatory sequences e.g., a promoter or enhancer
- operably linked regulatory elements are typically contiguous with the coding sequence.
- enhancers can function if separated from a promoter by up to several kilobases or more. Accordingly, some regulatory elements may be operably linked to a polynucleotide sequence but not contiguous with the polynucleotide sequence.
- translational regulatory elements contribute to the modulation of protein expression from a polynucleotide.
- expression refers to transcription of a polynucleotide from a DNA template, resulting in, for example, a messenger RNA (mRNA) or other RNA transcript (e.g., non-coding, such as structural or scaffolding RNAs).
- mRNA messenger RNA
- RNA transcript e.g., non-coding, such as structural or scaffolding RNAs
- the term further refers to the process through which transcribed mRNA is translated into peptides, polypeptides, or proteins.
- Transcripts and encoded polypeptides may be referred to collectively as “gene product(s).”
- Expression may include splicing the mRNA in a eukaryotic cell, if the polynucleotide is derived from genomic DNA.
- a Type I CRISPR nucleoprotein complex may modulate the activity of a promoter sequence by binding to a nucleic acid target sequence at or near the promoter or a transcriptional start site or regulator site. Depending on the action occurring after binding, the Type I CRISPR nucleoprotein complex can induce, enhance, suppress, or inhibit transcription of a gene operatively linked to the promoter sequence.
- modulation of gene expression includes both gene activation and gene repression.
- Modulation can be assayed by determining any characteristic directly or indirectly affected by the expression of the target gene. Such characteristics include, for example, changes in RNA or protein levels, protein activity, product levels, expression of the gene, or activity level of reporter genes. Accordingly, the terms “modulating expression,” “inhibiting expression,” and “activating expression” of a gene can refer to the ability of a Type I CRISPR nucleoprotein complex to change, activate, or inhibit transcription of a gene.
- Vector and “plasmid,” as used herein, refer to a polynucleotide vehicle to introduce genetic material into a cell.
- Vectors can be linear or circular.
- Vectors can contain a replication sequence capable of effecting replication of the vector in a suitable host cell (e.g., an origin of replication). Upon transformation of a suitable host, the vector can replicate and function independently of the host genome or integrate into the host genome.
- Vector design depends, among other things, on the intended use and host cell for the vector, and the design of a vector of the invention for a particular use and host cell is within the level of skill in the art.
- the four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes.
- vectors comprise an origin of replication, a multicloning site, and/or a selectable marker.
- An expression vector typically comprises an expression cassette.
- expression cassette refers to a polynucleotide construct generated using recombinant methods or by synthetic means and comprising regulatory sequences operably linked to a selected polynucleotide to facilitate expression of the selected polynucleotide in a host cell.
- the regulatory sequences can facilitate transcription of the selected polynucleotide in a host cell, or transcription and translation of the selected polynucleotide in a host cell.
- An expression cassette can, for example, be integrated in the genome of a host cell or be present in a vector to form an expression vector.
- a “targeting vector” is a recombinant DNA construct typically comprising tailored DNA arms, homologous to genomic DNA, that flank elements of a target gene or nucleic acid target sequence (e.g., a DSB).
- a targeting vector comprises a donor polynucleotide. Elements of the target gene can be modified in a number of ways, including deletions and/or insertions. A defective target gene can be replaced by a functional target gene, or in the alternative a functional gene can be knocked out.
- the donor polynucleotide of a targeting vector comprises a selection cassette comprising a selectable marker that is introduced into the target gene. Targeting regions adjacent or within a target gene can be used to affect regulation of gene expression.
- the term “between” is inclusive of end values in a given range (e.g., between 1 and 50 nucleotides in length includes 1 nucleotide and 50 nucleotides; between 5 amino acids and 50 amino acids in length includes 5 amino acids and 50 amino acids).
- amino acid refers to natural and synthetic (unnatural) amino acids, including amino acid analogs, modified amino acids, peptidomimetics, glycine, and D or L optical isomers.
- polypeptide As used herein, the terms “peptide,” “polypeptide,” “protein,” and “subunit protein” are interchangeable and refer to polymers of amino acids.
- a polypeptide may be of any length. It may be branched or linear, it may be interrupted by non-amino acids, and it may comprise modified amino acids.
- the terms also refer to an amino acid polymer that has been modified through, for example, acetylation, disulfide bond formation, glycosylation, lipidation, phosphorylation, pegylation, biotinylation, cross-linking, and/or conjugation (e.g., with a labeling component or ligand).
- Polypeptide sequences are displayed herein in the conventional N-terminal to C-terminal orientation, unless otherwise indicated.
- Polypeptides and polynucleotides can be made using routine techniques in the field of molecular biology (see, e.g., standard texts discussed above). Furthermore, essentially any polypeptide or polynucleotide is available from commercial sources.
- fusion protein and “chimeric protein,” as used herein, refer to a single protein created by joining two or more proteins, protein domains, or protein fragments or circular permuted polypeptides that do not naturally occur together in a single protein.
- a linker polynucleotide can be used to connect a first protein, protein domains, or protein fragments, or circular permuted polypeptides to a second protein, protein domains, or protein fragments or circular permuted polypeptides.
- a fusion protein can comprise a Type I CRISPR-Cas protein (e.g., Cas8, Cas3) and a functional domain from another protein (e.g., FokI; see, e.g., U.S. Pat. No. 9,885,026, issued 6 Feb. 2018).
- the modification to include such domains in fusion proteins may confer additional activity on engineered Type I CRISPR-Cas proteins.
- Such activities can include nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, and/or myristoylation activity or demyristoylation activity that modifies a polypeptide associated with nucleic acid target sequence (e.
- a fusion protein can comprise epitope tags (e.g., histidine tags, HA tags, FLAG® (Sigma Aldrich, St. Louis, Mo.) tags, Myc tags, nuclear localization signal (NLS) tags, SunTag), reporter protein sequences (e.g., glutathione-S-transferase, beta-galactosidase, luciferase, green fluorescent protein, cyan fluorescent protein, yellow fluorescent protein), and/or nucleic acid sequence binding domains (e.g., a DNA binding domain or an RNA binding domain).
- epitope tags e.g., histidine tags, HA tags, FLAG® (Sigma Aldrich, St. Louis, Mo.) tags, Myc tags, nuclear localization signal (NLS) tags, SunTag
- reporter protein sequences e.g., glutathione-S-transferase, beta-galactosidase, luciferase, green fluorescent protein, cyan fluorescent protein, yellow fluorescent protein
- a fusion protein can also comprise activator domains (e.g., heat shock transcription factors, NFKB activators) or repressor domains (e.g., a KRAB domain).
- activator domains e.g., heat shock transcription factors, NFKB activators
- repressor domains e.g., a KRAB domain
- the KRAB domain is a potent transcriptional repression module and is located in the amino-terminal sequence of most C2H2 zinc finger proteins (see, e.g., Margolin, J., et al., Proceedings of the National Academy of Sciences of the United States of America 91:4509-4513 (1994); Witzgall, R., et al., Proceedings of the National Academy of Sciences of the United States of America 91:4514-4518 (1994)).
- the KRAB domain typically binds to co-repressor proteins and/or transcription factors via protein-protein interactions, causing transcriptional repression of genes to which KRAB zinc finger proteins (KRAB-ZFPs) bind (see, e.g., Friedman J. R., et al., Genes & Development 10:2067-2678 (1996)).
- KRAB-ZFPs KRAB zinc finger proteins
- linker nucleic acid sequences are used to join the two or more proteins, protein domains, or protein fragments.
- CASCADEa Cascade activation
- Cascade activation is a CRISPR method or system wherein the method or system activates the expression of a gene within the locus of the target nucleic acid sequence.
- an effector domain e.g., VP16 or VP64.
- the guide polynucleotide can be fused 5′ or 3′ to a nucleotide effector domain such as an MS2 binding RNA that also recruits transcription factors. Fusions comprising one or more Cascade subunit proteins and the guide polynucleotide can be combined.
- CASCADE Cascade inhibition
- Cascade inhibition is a CRISPR method or system wherein the CRISPR method or system downregulates the expression of a gene within the locus of the target nucleic acid sequence.
- an effector domain e.g., KRAB
- the guide polynucleotide can be fused 5′ or 3′ to a nucleotide effector domain that also recruits transcription factors. Fusions comprising one or more Cascade subunit proteins and the guide polynucleotide can be combined.
- a “moiety,” as used herein, refers to a portion of a molecule.
- a moiety can be a functional group or describe a portion of a molecule with multiple functional groups (e.g., that share common structural aspects).
- the terms “moiety” and “functional group” are typically used interchangeably; however, a “functional group” can more specifically refer to a portion of a molecule that comprises some common chemical behavior.
- an affinity tag typically refers to one or more moieties that increases the binding affinity of one macromolecule for another, for example, to facilitate formation of an engineered Type I CRISPR-Cas nucleoprotein complex.
- an affinity tag can be used to increase the binding affinity of one Cas subunit protein for another Cas subunit protein (e.g., a first Cas7 protein for a second Cas7 protein).
- an affinity tag can be used to increase the binding affinity of one or more Cas subunit proteins for a cognate guide polynucleotide.
- Some embodiments of the present invention introduce one or more affinity tags to the N-terminal of a Cas subunit protein sequence, to the C-terminal of a Cas subunit protein sequence, to a position located between the N-terminal and C-terminal of a Cas subunit protein sequence, or to combinations thereof.
- one or more guide polynucleotide comprises an affinity tag that increases binding affinity of the guide polynucleotide with one or more Cas subunit proteins.
- affinity tags are disclosed in U.S. Published Patent Application No. 2014-0315985, published 23 Oct. 2014. Ligands and ligand-binding moieties are paired affinity tags.
- a “cross-link” is a bond that links one polymer chain (e.g., a polynucleotide or polypeptide) to another. Such bonds can be covalent bonds or ionic bonds.
- one polynucleotide can be bound to another polynucleotide by cross linking the polynucleotides.
- a polynucleotide can be cross linked to a polypeptide.
- a polypeptide can be cross linked to a polypeptide.
- cross-linking moiety typically refers to a moiety suitable to provide cross linking between two macromolecules.
- a cross-linking moiety is another example of an affinity tag.
- a “host cell” generally refers to a biological cell.
- a cell is the basic structural, functional, and/or biological unit of an organism.
- a cell can originate from any organism having one or more cells.
- Examples of host cells include, but are not limited to, a prokaryotic cell, eukaryotic cell, a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a cell of a eukaryotic organism, a protozoal cell, a cell from a plant (e.g., cells from plant crops (such as soy, tomatoes, sugar beets, pumpkin, hay, cannabis , tobacco, plantains, yams, sweet potatoes, cassava, potatoes, wheat, sorghum, soybean, rice, corn, maize, oil-producing Brassica (e.g., oil-producing rapeseed and canola), cotton, sugar cane, sunflower, millet, and alfalfa), fruits, vegetables, grains, seeds,
- seaweeds e.g., kelp
- a fungal cell e.g., a yeast cell or a cell from a mushroom
- an animal cell e.g., a cell from an invertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode, and the like)
- a cell from a vertebrate animal e.g., fish, amphibian, reptile, bird, or mammal
- a cell from a mammal e.g., a pig, a cow, a goat, a sheep, a rodent, a rat, a mouse, a non-human primate, a human, and the like.
- a cell can be a stem cell or a progenitor cell.
- a host cell is a non-human cell.
- a host cell is a human cell outside of a human body, wherein in particular embodiments the human cell is not introduced into a human body.
- stem cell refers to a cell that has the capacity for self-renewal, i.e., the ability to go through numerous cycles of cell division while maintaining the undifferentiated state.
- Stem cells can be totipotent, pluripotent, multipotent, oligopotent, or unipotent.
- Stem cells can be embryonic, fetal, amniotic, adult, or induced pluripotent stem cells.
- induced pluripotent stem cell refers to a type of pluripotent stem cell that is artificially derived from a non-pluripotent cell, typically a somatic cell.
- the somatic cell is a human somatic cell.
- somatic cells include, but are not limited to, dermal fibroblasts, bone marrow-derived mesenchymal cells, cardiac muscle cells, keratinocytes, liver cells, stomach cells, neural stem cells, lung cells, kidney cells, spleen cells, and pancreatic cells.
- somatic cells include cells of the immune system, including but not limited to, B cells, dendritic cells, granulocytes, innate lymphoid cells, megakaryocytes, monocytes/macrophages, myeloid-derived suppressor cells, natural killer (NK) cells, T cells, thymocytes, and hematopoietic stem cells.
- Plant refers to whole plants, plant organs, plant tissues, germplasm, seeds, plant cells, and progeny of the same.
- Plant cells include, without limitation, cells from seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores.
- Plant parts include differentiated and undifferentiated tissues including, but not limited to, roots, stems, shoots, leaves, pollens, seeds, tumor tissue, and various forms of cells and culture (e.g., single cells, protoplasts, embryos, and callus tissue).
- the plant tissue may be in plant or in a plant organ, tissue, or cell culture.
- Plant organ refers to plant tissue or a group of tissues that constitute a morphologically and functionally distinct part of a plant.
- Subject refers to any member of the phylum Chordata, including, without limitation, humans and other primates, including non-human primates such as rhesus macaques, chimpanzees, and other monkey and ape species; farm animals, such as cattle, sheep, pigs, goats, and horses; domestic mammals, such as dogs and cats; laboratory animals, including rabbits, mice, rats, and guinea pigs; birds, including domestic, wild, and game birds, such as chickens, turkeys and other gallinaceous birds, ducks, and geese; and the like.
- the term does not denote a particular age or gender. Thus, the term includes adult, young, and newborn individuals as well as male and female.
- a host cell is derived from a subject (e.g., stem cells, progenitor cells, or tissue-specific cells).
- the subject is a non-human subject.
- transgenic organism refers to an organism that contains genetic material into which DNA from an unrelated organism has been artificially introduced.
- the term includes the progeny (any generation) of a transgenic organism, provided that the progeny has the genetic modification.
- the transgenic organism is a non-human transgenic organism.
- isolated can refer to a molecule (e.g., a polynucleotide or a polypeptide) that, by human intervention, exists apart from its native environment and is therefore not a product of nature.
- An isolated polynucleotide or polypeptide can exist in a purified form and/or can exist in a non-native environment such as, for example, in a recombinant cell.
- a “substrate channel” refers to the direct transfer of a reactant from one enzymatic reaction to another enzymatic reaction without first diffusing into the bulk environment (Wheeldon, I., et al., Nat. Chem. 8(4):299-309 (2016)). Intermediates of these enzymatic steps are not in equilibrium with the bulk solution, which enables the increased efficiencies and yields in enzymatic processes. Frequently, enzymes in naturally occurring metabolic processes have evolved means of co-localization and assembly into controlled aggregates.
- substrate channel element refers to a component of a metabolic pathway.
- a substrate channel element is an enzyme that catalyzes a chemical reaction.
- substrate channel complex refers to multiple substrate channel elements that are co-localized together via some means.
- RNA scaffold refers to an RNA molecule that peptides can use as a substrate for binding.
- the present invention relates to engineered polynucleotides encoding Cascade components including, but not limited to, Cascade subunit proteins and Cascade guide polynucleotides.
- the present invention relates to engineered polynucleotides encoding Cascade components that are derived from Cascade Type I-E systems.
- Exemplary polynucleotide constructs comprising Cascade proteins and Cascade crRNAs are presented in Example 1.
- Example 1, Table 10, and SEQ ID NO:1 through SEQ ID NO:20 ( FIG. 3 ) present polynucleotide DNA sequences of genes encoding the five subunit proteins of Type I-E Cascade, specifically from E. coli strain K-12 MG1655, as well as the amino acid sequences of the resulting protein components.
- the polynucleotide sequences were derived from E. coli genomic DNA and were codon optimized specifically for expression in E.
- the minimal CRISPR array comprises two repeat sequences (underlined in the CRISPR array sequences presented in Example 1) flanking an exemplary spacer sequence, which represents the guide portion of the crRNA.
- RNA processing by the Cascade endonuclease generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence.
- Polynucleotide sequences encoding Cascade components from additional bacterial or archaeal species can be identified and designed following the guidance of the present Specification and using bioinformatics tools such as BLAST and PSI-BLAST to locate, as an example, homologs of Cascade subunit genes from E. coli strain K-12 MG1655, and then inspecting the flanking genomic neighborhood of the Cascade gene to locate and identify genes of the remaining Cascade subunit proteins (see, e.g., Example 14, Example 15). Because Cascade genes co-occur as conserved operons, they are typically arranged in a consistent order, within the same Type I subtype, facilitating their identification and selection for follow-up analysis and experimentation.
- Type I-E systems can be identified by locating Cas8 homologs, identifying promising bacterial species for homologous Cascade testing, and then obtaining or designing polynucleotide sequences encoding the Cas8 and other protein components of the Cascade from those homologous CRISPR-Cas systems.
- the polynucleotide sequences for the proteins were derived from the genomic DNA of the host bacterium, and were codon optimized specifically for expression in E. coli , and/or codon optimized specifically for expression in eukaryotic cells (e.g., human cells).
- the polynucleotide DNA sequences encoding corresponding minimal CRISPR arrays were based on repeat sequences derived from the 12 species and can be used to generate mature crRNA that function as guide RNAs.
- the minimal CRISPR array comprises two repeat sequences (lower case, underlined) flanking an exemplary “spacer” sequence, which represents the guide portion of the crRNA.
- RNA processing by the endonuclease Cascade subunit generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence.
- Minimal CRISPR Arrays SEQ ID NO: Species Minimal CRISPR repeat SEQ ID I-E_ Oceanicola sp. ctgttccccgcacacgcggggatgaaccg GGTTCT NO: 37 HL-35 TCGATCTGCGCATCCATGATGCCGC C ctgttccccgcacacgcggggatgaaccg SEQ ID I-E_ Pseudomonas sp.
- the present invention relates to engineered polynucleotide sequences encoding Cascade components from additional bacterial or archaeal species, within other Type I subtypes; including, but not limited, to Types I-B, I-C, I-F, and variants of I-F, which can be identified and designed following the guidance of the present Specification and by using bioinformatics tools such as BLAST and PSI-BLAST to locate homologs of Cascade genes from hallmark systems typifying each subtype (see, e.g., Makarova, K. S., et al., Nat. Rev. Microbiol. 13(11):722-736 (2015); Koonin, E.
- bioinformatics tools such as BLAST and PSI-BLAST
- flanking genomic neighborhoods of the Cascade gene can be inspected to locate and identify genes of the remaining Cascade subunit proteins as disclosed herein.
- additional Type I-F systems can be identified by locating Cas8 homologs (and additional Type I-F variant 2 systems can be identified by locating Cas5 homologs) and identifying promising bacterial species for homologous Cascade testing, and then obtaining or designing polynucleotide sequences encoding the Cas8, Cas5, and other protein components of the Cascade from those homologous CRISPR-Cas systems.
- Polynucleotide DNA sequences of genes encoding the three, four, or five subunit proteins of Cascade from Types I-B, I-C, I-F, and I-F variant 2 from twelve additional homologous Cascade complexes, and the amino acid sequences of the resulting protein components, as well as exemplary minimal CRISPR arrays, are presented as SEQ ID NO:214 through SEQ ID NO:351 ( FIG. 3 ).
- the polynucleotide sequences for the subunit proteins were derived from the genomic DNA of the host bacterium, and were codon optimized specifically for expression in E. coli , and/or codon optimized specifically for expression in eukaryotic cells (e.g., human cells).
- the polynucleotide DNA sequences encoding corresponding minimal CRISPR arrays were based on repeat sequences derived from the twelve species and can be used to generate mature crRNA that function as guide RNAs.
- the minimal CRISPR array comprises two repeat sequences (lower case, underlined) flanking an exemplary “spacer” sequence, which represents the guide portion of the crRNA.
- RNA processing by the endonuclease Cascade subunit generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence.
- Minimal CRISPR Arrays SEQ ID NO: Species Minimal CRISPR repeat SEQ ID I-B_ Fusobacterium atgaactgtaaacttgaaagt NO: nucleatum subsp. tttgaaat GTTGACAAATATTC 226 animalis 3_1_33 AGATAATTTTTCAAAATCTTTT atgaactgtaaacttgaaaagt tttgaaat SEQ ID I-B_ Campylobacter gtttgctaatgacaatatttgt NO: fetus subsp.
- Example 19 describes the design and testing of multiple Cascade complex homologs, each comprising a Cas subunit protein-FokI fusion protein, to evaluate the efficiency of genome editing for each Cascade complex.
- the present invention relates to modified Cascade subunit proteins.
- Cascade subunit proteins suitable for modification include, but are not limited to, Cascade subunit proteins of the species described herein.
- the present invention relates to engineered circular permutations of Cascade subunit proteins.
- Such circular permutations of a Cascade subunit protein result in a protein structure having different connectivity of the original linear sequence of amino acids of the Cascade subunit protein, but having an overall similar three-dimensional shape (see, e.g., Bliven, S., et al., PLoS Comput. Biol. 8(3):e1002445 (2012)).
- Circular permutations of Cascade subunit proteins can have a number of advantages.
- a circular permutation of a Cas7 subunit protein can create a new N-terminus and a new C-terminus designed to be positioned for connection with an additional polypeptide sequence to form a fusion protein or linker region without disturbing the Cas7 protein fold or the Cascade complex assembly.
- Three examples of circular permutations of Cas7 are illustrated in FIG. 4A and FIG. 4B .
- FIG. 4A and FIG. 4B three portions of the protein are shown: a N-terminal portion of the native protein (vertical stripes), a central portion of the native protein (grey shading), and a C-terminal portion of the native protein (no shading).
- FIG. 4A and FIG. 4B three portions of the protein are shown: a N-terminal portion of the native protein (vertical stripes), a central portion of the native protein (grey shading), and a C-terminal portion of the native protein (no shading).
- FIG. 4A and FIG. 4B three portions of the protein are shown: a N-termin
- FIG. 4A illustrates relocation of a N-terminal portion of the native protein to the C-terminal position of the cpCas7, wherein the N-terminal portion of the native protein is now at the N-terminal end of the cpCas7 and is connected to the central portion of the native protein by a linker polypeptide.
- FIG. 4B illustrates relocation of a C-terminal portion of the native protein to the N-terminal position of the cpCas7, wherein the C-terminal portion of the native protein is now at the N-terminal end of the cpCas7 and is connected to the central portion of the native protein by a linker polypeptide.
- Example 10 show that purification of Cascade complexes comprising circularly-permuted Cas7 subunit protein variants demonstrate that circularly-permuted Type I-E CRISPR-Cas subunit proteins can be successfully used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- the present invention relates to Cascade subunit proteins fused to additional polypeptide sequences to create fusion proteins, as well as polynucleotides encoding such fusion proteins.
- Additional polypeptide sequences can include, but are not limited to, proteins, protein domains, protein fragments, and functional domains. Examples of such additional polypeptide sequences include, but are not limited to, sequences derived from transcription activator or repressor domains, and nucleotide deaminases (e.g., a cytidine deaminase or an adenine deaminase such as described in Komor et. al., Nature 553:420-424 (2016); Koblan et. al., Nat Biotechnol. 2018 May 29-doi: 10.1038/nbt.4172). Additional functional domains for fusion proteins are presented herein.
- An additional polypeptide sequence can be fused to any of the Cascade subunit proteins wherein the additional polypeptide sequence is encoded by an additional polynucleotide sequence that is typically appended to either the 5′ or 3′ end of a polynucleotide comprising the coding sequence of a Cascade subunit protein.
- additional polynucleotide sequences that encode amino acid linkers connect a Cascade subunit protein to the additional polypeptide sequences of interest.
- the polynucleotide sequences for the fusion protein partner and the linker sequence can be derived from naturally occurring genomic DNA sequences or may be codon optimized for bacterial expression in E.
- coli or eukaryotic expression in mammalian cells (e.g., human cells).
- mammalian cells e.g., human cells.
- affinity tags e.g., His6, Strep-Tag® II (IBA GMBH LLC, Gottingen, Germany)
- NLS nuclear localization signal or sequence
- maltose binding protein e.g., maltose binding protein
- FokI e.g., exemplary amino acid linker sequences are also disclosed in Example 1.
- Example 11 describes Cascade subunit protein-FokI fusions, as well as Cascade subunit protein fusions to cytidine deaminases, endonucleases, restriction enzymes, a nuclease/helicase, or domains thereof.
- Example 11 describes Cascade subunit protein fusions with other Cascade subunit proteins, as well as Cascade subunit protein fusions with other Cascade subunit fusion proteins and an enzymatic protein domain.
- a Type I CRISPR subunit protein can be evaluated in silico for the ability to be used to generate protein fusions at the N-terminus, C-terminus, or positions between the N-terminus and the C-terminus.
- a Type I CRISPR subunit protein can be linked to one or more fusion domains at the N-terminus, C-terminus, or positions between the N-terminus and the C-terminus using one or more polypeptide linkers.
- polypeptide linkers are set forth in Examples 1, 11, 18, and 19.
- FIG. 5A and FIG. 5B illustrate Cascade complexes comprising a Cas8 subunit protein fused to an additional protein sequence (e.g., a FokI).
- FIG. 5A shows an example of the additional protein sequence (“FP”) connected with the C-terminus of a Cas8 subunit protein using a linker polypeptide.
- FIG. 5B shows an example of the additional protein sequence (“FP”) connected with the N-terminus of a Cas8 subunit protein using a linker polypeptide.
- Example 11A describes in silico design, cloning, expression, and purification of a Type I-E Cas8 fused N-terminally with a FokI nuclease domain.
- FIG. 6A and FIG. 6B illustrate additional examples of Cascade complexes comprising a Cascade subunit protein fused to an additional protein sequence.
- FIG. 6A shows an example of a detectable moiety (e.g., a green fluorescent protein, GFP) fused to each of six Cas7 subunit proteins, each via a linker polypeptide.
- GFP green fluorescent protein
- FIG. 6B shows an example of an additional protein sequence (“FP”) connected with Cas6 subunit protein using a linker polypeptide.
- Type I-E Cascade subunit proteins include, but are not limited to, the following: the same subunit (e.g., Cse2_linker_Cse2), circularly permuted subunits (e.g., cpCas7_linker_cpCas7 linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7)), a Type I-E Cascade protein fused to a nuclease (e.g., FokI_linker_Cas8, Cas3_linker_Cas8, Cas6_linker_FokI, S1Nuclease_linker_Cse2_linker_Cse2), a Type I-E Cascade protein fused to a
- FIG. 7A , FIG. 7B , and FIG. 7C present illustrations of modified Type I CRISPR-Cas effector complexes that contain cpCas7 (compare FIG. 4A )).
- FIG. 7A presents a Cascade complex comprising six individual cpCas7 subunit proteins.
- FIG. 7B presents a Cascade complex comprising six fused cpCas7 subunit proteins, wherein the C-terminus of a cpCas7 subunit protein is connected with the N-terminus of an adjacent cpCas7 subunit protein using a linker polypeptide.
- FIG. 7A presents a Cascade complex comprising six individual cpCas7 subunit proteins.
- FIG. 7B presents a Cascade complex comprising six fused cpCas7 subunit proteins, wherein the C-terminus of a cpCas7 subunit protein is connected with the N-terminus of an adjacent cpCa
- the Cascade complex comprises six fused cpCas7 subunit proteins (a “backbone”), wherein the C-terminus of the first cpCas7 subunit protein is connected with the N-terminus of the second cpCas7 subunit protein using a linker polypeptide, the C-terminus of the second cpCas7 subunit protein is connected with the N-terminus of a different protein sequence (“FP”) (e.g., a cytidine deaminase) using a linker polypeptide and the C-terminus of this protein coding sequence is connected with the N-terminus of the third cpCas7 using a linker polypeptide.
- FP protein sequence
- fused backbone of cpCas7 subunit proteins is that an additional protein sequence can be introduced at a specific location along the backbone to provide access of the additional protein sequence to different locations along the length of the nucleic acid target sequence to which the guide directs binding of the Cascade complex.
- FIG. 8A and FIG. 8B illustrate further embodiments of modified Type I CRISPR-Cas effector complexes comprising fusion proteins.
- FIG. 8A shows a Cascade complex comprising a Cse2-Cse2 fusion protein.
- Example 11B shows a Cascade complex comprising a Cse2-Cse2 fusion protein connected with an additional protein sequence (“FP”).
- Example 11D describes in silico design, cloning, expression, and purification of a Cse2-Cse2 protein fused to a cytidine deaminase.
- one or more nuclear localization signals can be added at the engineered N-terminus or C-terminus of a Cascade protein subunit (e.g., a Cas8-FokI fusion protein, a cpCas7 protein, or a Cse2-Cse2 fusion protein).
- a Cascade protein subunit e.g., a Cas8-FokI fusion protein, a cpCas7 protein, or a Cse2-Cse2 fusion protein.
- linker polypeptides connect two or more protein coding sequences.
- the length of exemplary linker polypeptides are described in the Examples. Typically, linker lengths include, but are not limited to, between about 10 amino acids to about 40 amino acids, between about 15 amino acids and about 30 amino acids, and between about 17 amino acids and about 20 amino acids.
- the amino acid composition of linker polypeptides typically comprises amino acids that are polar, small, and/or charged (e.g., Gly, Ala, Leu, Val, Gln, Ser, Thr, Pro, Glu, Asp, Lys, Arg, His, Asn, Cys, Tyr).
- linker polypeptide is designed to provide appropriate spacing and positioning of the functional domain and the Cascade protein within the fusion protein (Chichili, C., et al., Protein Science 22(2):153-167 (2013); Chen, X., et al., 65(10):1357-1369 (2013); George, R., et al., Protein Engineering, Design and Selection 15:(11):871-879 (2002)).
- linker polypeptides useful in the practice of the present invention are linker polypeptides identified that connect coding sequences of Cascade proteins to each other in organisms comprising Cascade systems (e.g., the linker polypeptide that connects Cas8 to Cas3 in Streptomyces griseus as described by Westra, E. R., et al., Mol Cell. 46(5): 595-605 (2012)).
- Fusion protein coding DNA sequences can be codon optimized for expression in a selected organism such as bacteria, archae, plants, fungi, or mammalian cells. Codon-optimizing programs are widely available. such as on the Integrated DNA Technologies website (www.idtdna.com/CodonOpt), or through Genscript® services (Genscript, Piscataway, N.J.). To facilitate cloning into the recipient expression vector, additional sequences overlapping with the vector compatible for SLIC cloning (Li, M., et al., Methods Mol. Biol. 852:51-59 (2012)) can be appended at the 5′ and 3′ ends of the DNA sequence.
- Cascade subunit proteins can be fused to transcription activation and/or repression domains.
- a fusion protein can comprise activator domains (e.g., heat shock transcription factors, NFKB activators, VP16, and VP64 (Eguchi, A. et. al., PNAS 113(51):E8257-E8266 (2016); Perez-Pinera, P. et. al., Nature Methods 10(10):973-6 (2013); Gilbert, L. A., et. al. Cell 159(3):647-61 (2014)) or repressor domains (e.g., a KRAB domain).
- linker nucleic acid sequences are used to join the two or more coding sequences for proteins, protein domains, or protein fragments.
- Cascade complexes comprising Type I CRISPR-Cas subunit proteins fused to transcription activators can be used to activate the expression of the gene.
- the target locus can contain a transcriptional start site (TSS) that typically harbors one or more binding site for the transcriptional activation machinery (factors) of a cell.
- TSS transcriptional start site
- FIG. 9 illustrates a Cascade complex comprising six fusion proteins comprising a cpCas7 connected via a linker polypeptide to the transcriptional activator VP64.
- Such modification of a Cascade complex converts the complex into a flexible tool for transcriptional activation of a gene (CASCADEa), wherein targeting a selected gene is achieved by selection of a guide sequence that directs binding of the Cascade complex to one or more regulatory elements (e.g., a TSS) of the selected gene.
- Example 12 describes the design of a E. coli Type I-E cp-Cas7 protein fused to a VP64 activation domain to confer transcriptional activation activity to the Cascade complex.
- Cascade complexes comprising Type I CRISPR-Cas subunit proteins fused to transcription repressors can be used to repress the expression of the gene.
- the target locus can comprise transcriptional regulatory elements.
- a Cascade subunit protein can be connected to a KRAB domain via a linker polypeptide.
- a Cascade complex comprising the Cascade subunit protein/KRAB domain fusion can convert the complex into a flexible tool for transcriptional repression of a gene (CASCADEi), wherein targeting a selected gene is achieved by selection of a guide sequence that directs binding of the Cascade complex to one or more regulatory elements of the selected gene.
- Cascade subunit proteins can be fused to affinity tags.
- Type I CRISPR-Cas guide polynucleotides can be modified by insertion of a selected polynucleotide element or modification of a nucleotides at selected positions within the guide polynucleotides (e.g., substitution of a DNA moiety for a RNA moiety).
- Such embodiments include, but are not limited to, Type I CRISPR-Cas guide polynucleotides 5′, 3′ or internally fused to one or more nucleotide effector domain (e.g., an MS2 or MS2-P65-HSF1 binding RNA or Aptamer that recruits transcription factors).
- FIG. 10 illustrates a Type I CRISPR guide polynucleotide comprising an RNA aptamer introduced into the 3′ hairpin of the guide.
- Type I CRISPR-Cas guides can also be modified, typically by lengthening or shortening the Cas7 subunit protein and Cse2 subunit protein binding region.
- FIG. 11A illustrates a Cascade complex with three Cas7 subunits, one Cse2 subunit and a shortened crRNA.
- FIG. 11B illustrates a Cascade complex with nine Cas7 subunits, three Cse2 subunit and a lengthened crRNA.
- Example 16 describes the generation and testing of modifications of Type I CRISPR-Cas guide crRNAs and the suitability of the modified guides for use in constructing engineered Type I CRISPR-Cas effector complexes.
- the present invention relates to nucleic acid sequences encoding one or more engineered Cascade components, as well as expression cassettes, vectors, and recombinant cells comprising nucleic acid sequences encoding one or more engineered Cascade components.
- Some embodiments of the third aspect of the invention include one or more polypeptide encoding all the components of a selected Cascade system (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8 proteins, and one or more cognate guides), wherein the components are capable of forming an effector complex.
- the guides have different spacer sequences to direct binding to different nucleic acid target sequences.
- Such embodiments include, but are not limited to, expression cassettes, vectors, and recombinant cells.
- the present invention relates to one or more expression cassettes comprising one or more nucleic acid sequences encoding one or more engineered Cascade components.
- Expression cassettes typically comprise a regulatory sequence involved in one or more of the following: regulation of transcription, post-transcriptional regulation, or regulation of translation.
- Expression cassettes can be introduced into a wide variety of organisms including, but not limited to, bacterial cells, yeast cells, plant cells, and mammalian cells (including human cells).
- Expression cassettes typically comprise functional regulatory sequences corresponding to the organism(s) into which they are being introduced.
- a further embodiment of the present invention relates to vectors, including expression vectors, comprising one or more nucleic acid sequences encoding one or more one or more engineered Cascade components.
- Vectors can also include sequences encoding selectable or screenable markers.
- nuclear targeting sequences can also be added, for example, to Cascade subunit proteins.
- Vectors can also include polynucleotides encoding protein tags (e.g., poly-His tags, hemagglutinin tags, fluorescent protein tags, and bioluminescent tags). The coding sequences for such protein tags can be fused to, for example, one or more nucleic acid sequences encoding a Cascade subunit protein.
- Expression vectors for host cells are commercially available. There are several commercial software products designed to facilitate selection of appropriate vectors and construction thereof, such as insect cell vectors for insect cell transformation and gene expression in insect cells, bacterial plasmids for bacterial transformation and gene expression in bacterial cells, yeast plasmids for cell transformation and gene expression in yeast and other fungi, mammalian vectors for mammalian cell transformation and gene expression in mammalian cells or mammals, and viral vectors (including lentivirus, retrovirus, adenovirus, herpes simplex virus I or II, parvovirus, reticuloendotheliosis virus, and adeno-associated virus (AAV) vectors) for cell transformation and gene expression and methods to easily allow cloning of such polynucleotides.
- insect cell vectors for insect cell transformation and gene expression in insect cells bacterial plasmids for bacterial transformation and gene expression in bacterial cells
- yeast plasmids for cell transformation and gene expression in yeast and other fungi
- Illustrative plant transformation vectors include those derived from a Ti plasmid of Agrobacterium tumefaciens (Lee, L. Y., et al., Plant Physiology 146(2):325-332 (2008)). Also useful and known in the art are Agrobacterium rhizogenes plasmids. For example, SNAPGENETM (GSL Biotech LLC, Chicago, Ill.; snapgene.com/resources/plasmid_files/your_time_is_valuable/) provides an extensive list of vectors, individual vector sequences, and vector maps, as well as commercial sources for many of the vectors.
- vectors can be designed that encode Cascade subunit proteins, as well as a minimal CRISPR arrays comprising guide sequences of interest. Accordingly, one aspect of the present invention includes such expression systems.
- the Cascade complex is expressed off of three distinct plasmid vectors, which collectively encode the following components: a Cas8 protein; Cse2, Cas7, Cas5, and Cas6 proteins; and a CRISPR crRNA.
- the expression plasmid encoding Cas8 comprises the natural, genomic DNA gene sequence and, in other embodiments, the expression plasmid can encode Cas8 that is codon optimized for expression in a chosen cell type.
- the expression plasmid encoding Cse2, Cas7, Cas5, and Cas6 can contain the natural, genomic DNA gene sequences or can contain gene sequences that have been codon optimized for expression in a chosen cell type.
- the entire Cascade subunit protein coding operon can be placed downstream of a single transcriptional promoter, such that the different proteins are all translated from a single polycistronic transcript.
- the gene encoding the Cascade subunit proteins can be separated from each other, with intervening transcriptional terminators and promoters.
- the expression plasmid encoding the crRNA may contain as few as two repeats flanking a single spacer sequence, downstream of an appropriate transcriptional promoter, or may contain many repeats flanking multiple spacer sequences, of either the same exact guide sequence or multiple distinct guide sequences. Coordinated expression of the CRISPR and the Cascade subunits, in particular the Cash subunit, lead to processing of long precursor crRNAs into the mature length crRNA, each one of which comprises fragments of a single repeat on the 5′ and 3′ ends of the crRNA, and a single spacer sequence in the middle.
- An alternative strategy to express the complete Cascade complex in E. coli uses two plasmids: one plasmid that encodes the entire Cas8-Cse2-Cas7-Cas5-Cas6 operon on a single expression plasmid and one encoding the CRISPR crRNA.
- the 5′ end of the Cse2 gene which normally overlaps with the 3′ end of the Cas8 gene, is separated spatially from the 3′ end of the Cas8 gene, in order to append a polynucleotide sequence encoding an affinity tag and/or protease recognition sequence.
- Example 2 describes two types of bacterial expression plasmid systems for the Cascade proteins: the first type comprises two plasmids, a first plasmid encoding the Cas8 protein and a second encoding the 4 subunit proteins of the CasBCDE complex (cse2-cas7-cas5-cas6 operon); and the second type comprises an expression plasmid encoding all 5 subunit proteins of the Cascade complex (cas8-cse2-cas7-cas5-cash operon).
- Cognate CRISPR arrays are also described.
- an affinity tag can be appended onto the Cse2 subunit, such as an N-terminal Strep-II tag or a hexahistidine (His6) tag.
- an amino acid sequence recognized by a protease such as TEV protease or the HRV3C protease can be inserted between the affinity tag and the native N-terminus of the Cse2 subunit, such that biochemical cleavage of the sequence with the protease after initial purification liberates the affinity tag from the final recombinant Cascade complex.
- the affinity tag may also be placed on other subunits, or left on the Cse2 subunit and combined with additional affinity tags on other subunits. Examples of Cascade subunit proteins comprising affinity tags are set forth in Example 1, Example 2, and Example 3.
- a strain of E. coli can be transformed with plasmids encoding the CRISPR crRNA as well as the Cse2-Cas7-Cas5-Cas6 genes, protein expression induced, and a Cascade complex that is lacking the Cas8 subunit can be produced.
- This Cascade complex typically is referred to as a Cas8-minus Cascade complex, or alternatively as a CasBCDE complex (Jore, M., et al., Nat. Struct. Mol. Biol. 18(5):529-536 (2011)).
- This purified complex can be biochemically combined with separately purified Cas8 to reconstitute full Cascade (Sashital, D. G., et al., Mol. Cell 46(5):606-615 (2012)).
- Table 4 presents exemplary sequences of bacterial expression plasmids encoding the minimal CRISPR array, Cas8, Cse2-Cas7-Cas5-Cas6 constructs, and Cas8-Cse2-Cas7-Cas5-Cas6 constructs, containing different tags and designs.
- Plasmids that encode Cascade complexes and Cascade complexes from homologous Type I systems can be designed similarly as the exemplary expression plasmid sequences for the Type I-E found in E. coli K-12 MG1655 following the guidance of the present Specification.
- Table 4 additionally contains sequences of expression plasmids expressing Cas8-Cse2-Cas7-Cas5-Cas6 as well as FokI fusions to either the Cas8 gene or the Cas6 gene, for the production of nuclease-Cascade fusions for gene editing experiments.
- Table 5 contains the sequences of single polypromoter bacterial expression plasmids encoding all 5 subunit proteins together with the crRNA from a single bacterial expression plasmid.
- each gene is separated from the other genes it flanks upstream and downstream with a transcriptional promoter and terminator. Additional sequences can be introduced that encode an affinity tag and/or protease recognition tag, as well as a fusion to a nuclease protein, in order to generate a Cascade-nuclease fusion for gene editing.
- Additional bacterial expression plasmids can be designed encoding homologous Cascade complexes from other Type I subtypes and other bacterial or archaeal organisms based on the design criteria herein.
- Such expression plasmids can be designed with genomic DNA sequences for the Cascade genes, or they can be designed with gene sequences that have been codon optimized for expression in E. coli or other bacterial strains.
- Cascade In order to express Cascade or effectors fusions to Cascade in mammalian cells, such as human cells, eukaryotic expression plasmid vectors were designed to enable expression of the relevant proteins and RNA components by eukaryotic transcription and translation machinery.
- Cascade can be generated in mammalian cells by encoding each of the protein components on a separate expression vector driven by a eukaryotic promoter (e.g., a cytomegalovirus (CMV) promoter), and encoding the crRNA on a separate expression vector driving by a RNA Polymerase III promoter (e.g., the human U6 promoter).
- a eukaryotic promoter e.g., a cytomegalovirus (CMV) promoter
- a RNA Polymerase III promoter e.g., the human U6 promoter
- the CRISPR RNA can be encoded with a minimal CRISPR array containing at least two repeats flanking one or more spacer sequences that function as the guide portion of the mature crRNA.
- the construct generating CRISPR RNA can be designed with additional sequences flanking the outermost repeats in the minimal array. Processing of the precursor CRISPR RNA is enabled by the RNA processing subunit of the Cascade complex (Cas6 subunit protein), which can be expressed from a separate plasmid.
- Table 6 contains the sequences of individual eukaryotic expression plasmids for each protein of the E. coli Type I-E Cascade complex. Cas8 subunit can be fused to additional effector nuclease domains, such as the FokI nuclease (Example 1 and Example 3). Table 6 also contains the sequences of expression plasmids for the crRNA component of Cascade, encoding two separate dual-guide crRNAs, whereby three repeat sequences flank two spacer spacers. Each of the protein-coding genes can be appended to polynucleotide sequences that append nuclear localization signals (NLS), affinity tags, and linker sequences connecting those tags.
- NLS nuclear localization signals
- fusions to any of the Cascade subunit proteins can be encoded by additional polynucleotide sequences that typically are appended to either the 5′ or 3′ coding sequence, including additional polynucleotide sequences that encode amino acid linkers connecting to the Cascade subunit protein to additional polypeptide sequences of interest. Examples of candidate fusions proteins are described herein.
- polycistronic expression vectors can be constructed, whereby a single promoter (e.g., CMV promoter) drives expression of multiple coding sequence simultaneously that are separated by a Thosea asigna virus 2A sequence.
- 2A viral peptide sequences induce ribosomal skipping, thus enabling multiple protein-coding genes to be concatenated within a single polycistronic construct for expression in eukaryotic cells.
- polycistronic vectors can be designed that encode 4 or 5 subunits of the Cascade complex on a single transcript driven by a single promoter.
- Table 7 contains the sequences of eukaryotic polycistronic expression plasmids that can be combined with a CRISPR RNA expression plasmid to produce functional Cascade in mammalian cells.
- the CRISPR RNA is encoded within the 3′ untranslated region (UTR) of a protein-coding gene, whose expression is driven by a RNA Polymerase II promoter (e.g., CMV promoter) to produce a transcript.
- a RNA Polymerase II promoter e.g., CMV promoter
- the minimal CRISPR array is designed to exist downstream of a protein coding gene such as Cas6, Cas7, or a reporter gene (e.g., an enhanced green fluorescent protein, eGFP), and is separated from the protein coding sequence by a MALAT1 triplex sequence that has previously been shown to confer stability to the upstream transcript.
- the minimal CRISPR array is processed by the RNA processing subunit of Cascade (typically expressed using a different plasmid), an endonuclease that cleaves the minimal CRISPR array, and a break is introduced into the transcript, and the triplex sequence protects the 3′ end of the upstream protein-coding gene from premature exonucleolytic degradation.
- Table 8 contains sequences of three polynucleotide sequences, whereby the CRISPR array is cloned downstream of either Cas6, Cas7, or eGFP, and expression of the entire fusion sequence is driven by a CMV promoter.
- the CRISPR RNA array is encoded on the same vector as the polycistronic construct driving expression of the 5 Cascade subunits; the combination of these two elements generates an all-in-one vector that produces all functional subunits (both protein and RNA) of the Cascade complex, together with any nuclease or effector domains fused to one of the Cascade subunits.
- Table 9 contains two representative sequences of these all-in-one polynucleotide sequences that encode all the respective components to produce functional FokI-Cascade RNPs in mammalian cells.
- Example 3 describes expression systems using separate plasmids expressing each Cascade subunit protein and minimal CRISPR array, expression systems wherein multiple Cascade subunit protein coding sequences are expressed from a single promoter, and an expression system wherein a single plasmid Cascade expression system was constructed to express the entire Cas8-Cse2-Cas7-Cas5-Cas6 operon and a minimal CRISPR array for use in mammalian cells.
- the present invention relates to production of engineered Type I CRISPR-Cas effector complexes by introduction of plasmids encoding one or more components of the engineered Type I CRISPR-Cas effector complexes into host cells.
- Transformed host cells (or recombinant cells) or the progeny of cells that have been transformed or transfected using recombinant DNA techniques can comprise one or more nucleic acid sequences encoding one or more component of an engineered Type I CRISPR-Cas effector complex.
- Methods of introducing polynucleotides (e.g., an expression vector) into host cells are known in the art and are typically selected based on the kind of host cell.
- Such methods include, for example, viral or bacteriophage infection, transfection, conjugation, electroporation, calcium phosphate precipitation, polyethyleneimine-mediated transfection, DEAE-dextran mediated transfection, protoplast fusion, lipofection, liposome-mediated transfection, particle gun technology, microprojectile bombardment, direct microinjection, and nanoparticle-mediated delivery.
- polynucleotides encoding components of engineered Type I CRISPR-Cas effector complexes are introduced into bacterial cells (e.g., E. coli ).
- Example 4 describes a method for introduction and expression of Cas8 protein coding sequences, as well as coding sequences for components of engineered Type I CRISPR-Cas effector complexes for bacterial production of such complexes using E. coli expression systems.
- a variety of exemplary host cells disclosed herein can be used to produce recombinant cells using an engineered Cascade effector complex.
- Such host cells include, but are not limited to, a plant cell, a yeast cell, a bacterial cell, an insect cell, an algal cell, and a mammalian cell.
- transfection is used below to refer to any method of introducing polynucleotides into a host cell.
- a host cell is transiently or non-transiently transfected with nucleic acid sequences encoding one or more component of a Type I CRISPR-Cas effector complex.
- a cell is transfected as it naturally occurs in a subject.
- a cell that is transfected is first removed from a subject, e.g., a primary cell or progenitor cell.
- the primary cell or progenitor cell is cultured and/or is returned after ex vivo transfection to the same subject or to a different subject.
- Example 9 illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells.
- the Example describes the delivery of plasmid vectors expressing Cascade complex components into eukaryotic cells.
- the present invention relates to the purification of engineered Type I CRISPR-Cas effector complexes from cells and uses of such complexes.
- Engineered Type I CRISPR-Cas effector complexes are produced in a host cell.
- the engineered Type I CRISPR-Cas effector complexes (in this case Cascade ribonucleoprotein (RNP) complexes) are purified from cell lysates.
- RNP Cascade ribonucleoprotein
- Example 5 describes purification of E. coli Type I-E Cascade RNP complexes produced by overexpression in bacteria as described in Example 4.
- the method uses immobilized metal affinity chromatography followed by size exclusion chromatography.
- the Example also describes methods that can be used to assess the quality of purified Cascade RNP products. Examples are presented illustrating the purification of Cas8, Cas7, Cas6, Cas5, and Cse2 Cascade RNP complexes, Cascade complexes comprising Cas7, Cas6, Cas5, and Cse2 proteins, and FokI-Cas8 fusion proteins.
- the purified, engineered Type I CRISPR-Cas effector complexes can also be used directly in biochemical assays (e.g., binding and/or cleavage assays).
- Example 6 describes production of dsDNA target sequences for use in in vitro DNA binding or cleavage assays.
- the Example describes three methods to produce target sequences, including annealing of synthetic ssDNA oligonucleotides, PCR amplification of selected nucleic acid target sequences from genomic DNA, as well as cloning of nucleic acid target sequences into bacterial plasmids.
- the dsDNA target sequences were used in Cascade binding or cleavage assays.
- the site-specific binding of and/or cutting by one or more engineered Type I CRISPR-Cas effector complexes can be confirmed, if necessary, using an electrophoretic mobility shift assay (see, e.g., Garner, M., et al., Nucleic Acids Research 9(13):3047-3060 (1981); Fried, M., et al., Nucleic Acids Research 9(23):6505-6525 (1981); Fried, M., Electrophoresis 10:366-376 (1989); Gagnon, K., et al., Methods Molecular Biology 703:275-2791 (2011); Fillebeen, C., et al., J. Vis. Exp. (94), e52230, doi:10.3791/52230 (2014)), or the biochemical cleavage assay described in Example 7.
- an electrophoretic mobility shift assay see, e.g., Garner, M., et al., Nucleic Acids Research 9(13)
- Example 7 demonstrate that engineered Type I CRISPR-Cas effector complexes can exhibited nearly quantitative DNA cleavage, as evidenced by conversion of a supercoiled, circular plasmid substrate into a cleaved, linear form.
- the complexes are introduced directly into a cell, as an alternative to expressing one or more nucleic acid sequences encoding one or more components of engineered Type I CRISPR-Cas effector complexes in a cell.
- the purified, engineered Type I CRISPR-Cas effector complexes can be directly introduced into cells. Methods to introduce the components into a cell include electroporation, lipofection, particle gun technology, and microprojectile bombardment.
- Example 8 illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising Cas subunit protein-FokI fusion proteins to human cells.
- the data in the Example demonstrate delivery of pre-assembled Cascade RNPs into target cells and effective genome editing in human cells.
- the engineered Type I CRISPR-Cas effector complexes described herein can be used to generate non-human transgenic organisms by site specifically introducing a selected polynucleotide sequence (e.g., a portion of a donor polynucleotide) at a DNA target locus in the genome to generate a modification of the genomic DNA.
- the transgenic organism can be an animal or a plant.
- a transgenic animal is typically generated by introducing engineered Type I CRISPR-Cas effector complexes into a zygote cell.
- a basic technique, described with reference to making transgenic mice involves five basic steps: first, preparation of a system, as described herein, including a suitable donor polynucleotide; second, harvesting of donor zygotes; third, microinjection of the system into the mouse zygote; fourth, implantation of microinjected zygotes into pseudo-pregnant recipient mice; and fifth, performing genotyping and analysis of the modification of the genomic DNA established in founder mice.
- the founder mice will pass the genetic modification to any progeny.
- the founder mice are typically heterozygous for the transgene. Mating between these mice will produce mice that are homozygous for the transgene 25% of the time.
- a generated transgenic plant typically contains one transgene inserted into one chromosome. It is possible to produce a transgenic plant that is homozygous with respect to a transgene by sexually mating (i.e., selfing) an independent segregant transgenic plant containing a single transgene to itself.
- Typical zygosity assays include, but are not limited to, single nucleotide polymorphism assays and thermal amplification assays that distinguish between homozygotes and heterozygotes.
- the present invention relates to use of engineered Type I CRISPR-Cas effector complexes to create substrate channels.
- fusion proteins comprising substrate channel elements and Cas7 subunit proteins are constructed. These Cas7 fusion proteins are then assembled into an engineered Type I CRISPR-Cas effector complex (e.g., comprising Cse2, Cas5, Cas6, Cas7-substrate channel element fusions, and Cas8).
- the crRNA of the engineered Type I CRISPR-Cas effector complex can be extended to accommodate additional Cas7 subunits (Luo, M., et al., Nucleic Acids Research 44:7385-7394 (2016)).
- Different substrate elements can be fused to Cas7 and then mixed at the desired stoichiometry.
- these various Cas7 subunits assemble into a complete Type I CRISPR-Cas effector complex, co-localization of substrate elements can improve the efficacy of substrate channeling.
- an RNA scaffold is constructed such that multiple Cas7-substrate channel element fusions can bind to it in the absence of other Type I CRISPR-Cas effector complex components.
- Substrate channel elements can be fused to the N-terminus of Cas7 and/or the C-terminus of Cas7.
- circular permutations of Cas7 can be fused to substrate channel elements.
- FIG. 12A and FIG. 12B presents illustrations of substrate channels consisting of three consecutive enzymes in a pathway. Substrate channels facilitate the passing of intermediary metabolic products directly to the active site of the consecutive enzyme in the metabolic pathway chain without release into the extra channel space.
- FIG. 12A illustrates a typical arrangement of an engineered substrate channel. Enzymes E1, E2, and E3 are linked covalently or non-covalently to a scaffold protein (S1, S2, S3) matrix. The substrate is then processed to the product without release to the extra channel space.
- FIG. 12B illustrates one embodiment of the present invention comprising a modified Type I CRISPR-Cas effector complex that carries Enzymes E1, E2, and E3 as fusion proteins to Cas7 subunit proteins, thus creating a substrate channel. cpCas7 proteins and backbones formed of cpCas7 proteins can also be useful in the practice of this aspect of the present invention.
- substrate channel elements can be fused to Cas6.
- the Cas6 subunit of Cascade complexes recognizes specific RNA hairpin structures.
- An RNA scaffold can be constructed that is composed of multiple Cas6 RNA hairpin structures concatenated together. Cas6 peptides from different Cascade complexes have different recognition sequences. Accordingly, RNA scaffolds can be constructed from multiple orthogonal Cas6 RNA hairpins. By fusing different substrate channel elements to orthogonal Cas6 peptides, substrate channel complexes can be assembled in specific stoichiometry.
- Substrate channel elements can be fused to the N-terminus of Cas6 and/or the C-terminus of Cas6.
- circular permutations of Cas6 can be fused to substrate channel elements.
- a heterologous metabolic pathway of interest can be expressed in a model organism, such as E. coli .
- the genes can be codon optimized to express the genes more efficiently.
- the metabolic pathway of interest is the mevalonate pathway from Saccharomyces cerevisiae .
- Substrate channel elements of this pathway include, but are not limited to, acetoacetyl-CoA-th row (AtoB), hydroxy-methylglutaryl-CoA synthase (HMGS), and hydroxy-methylglutaryl-CoA reductase (HMGR).
- the metabolic pathway of interest is the glycerol synthesis pathway from S. cerevisiae .
- Substrate channel elements of this pathway include, but are not limited to, glycerol-3-phosphate dehydrogenase (GPD1) and glycerol-3-phosphate phosphatase (GPP2).
- the metabolic pathway of interest is the starch hydrolysis pathway from Clostridium stercorarium .
- Substrate channel elements of this pathway include, but are not limited to, CelY and CelZ.
- the metabolic pathway of interest is the glucose phosphotransferase pathway from E. coli .
- Substrate channel elements of this pathway include, but are not limited to, trehalose-6-phosphate synthetase (TPS) and trehalose-6-phosphate phosphatase (TPP).
- the present invention relates to site-directed recruitment of functional domains fused to Cascade subunit proteins by complexes comprising a Class 2 Type II Cas9 protein and a nucleic acid-targeting nucleic acid (NATNA; see e.g., U.S. Pat. No. 9,260,752, issued 16 Feb. 2016; U.S. Pat. No. 9,580,727, issued 28 Feb. 2017; U.S. Pat. No. 9,677,090, issued 13 Jun. 2017; U.S. Pat. No. 9,771,600, issued 26 Sep. 2017; U.S. Pat. No. 9,816,093, issued 14 Nov. 2017).
- NATNA nucleic acid-targeting nucleic acid
- Functional domains include, but are not limited to, protein domains having enzymatic function, capable of transcriptional activation, or capable of transcriptional repression.
- Example 13 describes a method of modifying a Class 2 Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA sequences with a Class 1 Type I CRISPR repeat stem sequence, allowing for the recruitment of one or more Cascade subunit proteins to a Type II CRISPR Cas protein/guide RNA complex binding site.
- FIG. 13A , FIG. 13B , and FIG. 13C present a generalized illustration of the site-directed recruitment of a functional protein domain fused to a Cascade subunit protein by a dCas9:NATNA complex to a target site.
- a Class 2 Type II CRISPR NATNA ( FIG. 13A, 102 ) comprising a spacer sequence ( FIG. 13A, 101 ) is covalently linked through a linker nucleic acid sequence ( FIG. 13A, 103 ) to a Class 1 Type I CRISPR repeat stem sequence ( FIG. 13A, 104 ).
- the Type II CRISRP NATNA covalently linked to the Type I CRISPR repeat stem sequence ( FIG.
- FIG. 13A, 105 is capable of binding to a Type II dCas9 ( FIG. 13A, 106 ) and a Type I Cascade subunit protein (e.g., Cas6; FIG. 13A, 107 ) which is fused though a linker sequence ( FIG. 13A, 108 ) to a functional protein domain (e.g., an enzymatic domain, a transcriptional activation or repression domain; FIG. 13A, 109 ) to form an RNP complex.
- This RNP complex ( FIG. 13B, 110 ) is capable of targeting a double-stranded DNA ( FIG. 13B, 111 ) comprising a target sequence ( FIG.
- FIG. 13B, 112 complementary to the Type II CRISPR NATNA spacer sequence ( FIG. 13A, 101 ).
- Target recognition by the RNP complex results in hybridization ( FIG. 13B, 113 ) between the spacer sequence ( FIG. 13A, 101 ) and the target sequence ( FIG. 13B, 112 ).
- Localization of the Cascade subunit-functional domain fusion protein to the DNA allows for modification of the DNA by the functional protein domain or transcriptional regulation of an adjacent gene ( FIG. 13C, 114 ).
- the present invention relates to compositions comprising engineered Type I CRISPR-Cas effector complexes, modified guide polynucleotides, and combinations thereof.
- the engineered Type I CRISPR-Cas effector complex comprises an associated Cas3 fusion protein.
- An embodiment of this aspect of the present invention relates to a composition
- a composition comprising two engineered Type I CRISPR-Cas effector complexes each comprising a spacer and a fusion protein comprising a Cas subunit and an endonuclease (e.g., a FokI; see e.g., the Cascade complexes of FIG. 2A , FIG. 2B , and FIG. 2C ), wherein at least two parameters are varied to modulate genome editing efficiency.
- Such parameters include:
- a linker polypeptide used to produce the fusion protein comprising a Cas subunit protein and the endonuclease (e.g., FokI);
- the length of the interspacer distance between the nucleic acid target sequences to which the spacers are capable of binding is the length of the interspacer distance between the nucleic acid target sequences to which the spacers are capable of binding.
- a first engineered Type I CRISPR-Cas effector complex comprising,
- a first Cse2 subunit protein a first Cas5 subunit protein, a first Cas6 subunit protein, and a first Cas7 subunit protein
- a first fusion protein comprising a first Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a first guide polynucleotide comprising a first spacer capable of binding a first nucleic acid target sequence
- a second engineered Type I CRISPR-Cas effector complex comprising,
- a second fusion protein comprising a second Cas8 subunit protein and a second FokI, wherein the N-terminus of the second Cas8 subunit protein or the C-terminus of the second Cas8 protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the second linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a second guide polynucleotide comprising a second spacer capable of binding a second nucleic acid target sequence, wherein a protospacer adjacent motif (PAM) of the second nucleic acid target sequence and a PAM of the first nucleic acid target sequence have an interspacer distance between 20 bp 42 bp.
- PAM protospacer adjacent motif
- FIG. 2A Examples of such a first engineered Type I CRISPR-Cas effector complex bound to a first nucleic acid target sequence and a second engineered Type I CRISPR-Cas effector complex bound to a second nucleic acid target sequence are illustrated in FIG. 2A , FIG. 2B , and FIG. 2C .
- the length of the first linker polypeptide and/or the second linker polypeptide is a length of between about 15 amino acids and about 30 amino acids, or between about 17 amino acids and about 20 amino acids. In one embodiment, the length of the first linker polypeptide and the second linker polypeptide are the same.
- the first Cas8 subunit protein and the second Cas8 subunit protein can each comprise identical amino acid sequences of the Cas8 subunit protein.
- first Cse2 subunit protein and the second Cse2 subunit protein can each comprise identical amino acid sequences of the Cse2 subunit protein
- first Cas5 subunit protein and the second Cas5 subunit protein can each comprise identical amino acid sequences of the Cas5 subunit protein
- first Cas6 subunit protein and the second Cas6 subunit protein can each comprise identical amino acid sequences of the Cas6 subunit protein
- first Cas7 subunit protein and the second Cas7 subunit protein can each comprise identical amino acid sequences of the Cas7 subunit protein, and combinations thereof.
- the N-terminus of the first Cas8 subunit protein is covalently connected by the first linker polypeptide to the C-terminus of the first FokI
- the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the N-terminus of the first FokI
- the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI
- the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the N-terminus of the second FokI, and combinations thereof.
- Embodiments of this aspect of the present invention include embodiments wherein the length between the second nucleic acid target sequence and the first nucleic acid target sequence is an interspacer distance between about 22 bp to about 40 bp, between about 26 bp to about 36 bp, between about 29 bp to about 35 bp, or between about 30 bp to about 34 bp.
- the first FokI and the second FokI can be monomeric subunits that are capable of associating to form a homodimer, or distinct subunits that are capable of associating to form a heterodimer.
- the guide polynucleotides comprise RNA.
- genomic DNA comprises the PAM of the second nucleic acid target sequence and the PAM of the first nucleic acid target sequence.
- the engineered Type I CRISPR-Cas effector complexes are based on Type I CRISPR-Cas effector complexes of one or more organisms selected from the group consisting of Salmonella enterica, Geothermobacter sp. EPR-M, Methanocella arvoryzae MRE50, Streptococcus thermophilus (strain ND07)), S. thermophilus, Pseudomonas sp. S-6-2 and E. coli .
- the engineered Type I CRISPR-Cas effector complexes are based on Type I CRISPR-Cas effector complexes of S. thermophilus, Pseudomonas sp. S-6-2, and/or E. coli.
- Example 18 and Example 20 demonstrate that varying the length of the linker polypeptide used to produce the fusion protein comprising the Cas subunit protein and the FokI and/or varying the length of the interspacer distance between the nucleic acid target sequences to which the spacers are capable of binding facilitate modulation of genome editing efficiency in cells.
- the present invention relates to an engineered Type I CRISPR-Cas effector complex
- a first fusion protein that comprises a Cascade subunit protein (e.g., a Cas8 subunit protein) and a first functional domain (e.g., FokI)
- a second fusion protein that comprises a dCas3* protein and a second functional domain (e.g., FokI).
- the engineered Type I CRISPR-Cas effector complex comprising the first functional domain (e.g., FokI) ( FIG.
- Cas8-linker1-FP1 fusion can bind DNA and can then recruit the dCas3*-second functional domain (e.g., FokI) fusion protein ( FIG. 14A , dCas3*-linker2-FP2).
- the first functional domain FIG. 14A , Cas8-linker1-FP1 fusion
- the second functional domain FIG. 14A
- FIG. 14A illustrates the binding to dsDNA of an engineered Type I CRISPR-Cas effector complex ( FIG. 15A , Cascade) comprising the first functional domain ( FIG. 15A , FD1) connected to a Cas subunit protein ( FIG.
- FIG. 15A striped box
- FIG. 15A Linker 1
- a dCas3* connected to a second functional domain ( FIG. 15A , FD2) via a linker polypeptide ( FIG. 15A , Linker 2) associated with the Cascade complex; thus bringing FD1 and FD2 into proximity and facilitating the interaction of FD1 and FD2.
- Binding of the Cascade complex involves a single PAM sequence ( FIG. 15A , PAM, open box).
- the functional domain being a dimeric endonuclease (e.g., FokI)
- the proximity of FD1 and FD2 facilitates formation of a functional dimer.
- One advantage of this embodiment of the present invention is a single Cascade complex (recognizing a single PAM sequence) can be used to cleave a double-stranded nucleic acid target sequence, versus using two FokI-Cascade complexes ( FIG. 15A compare FIG. 2A , FIG. 2B , and FIG. 2C ).
- Using two FokI-Cascade complexes requires two PAM sequences in the proper orientation ( FIG. 2A , FIG. 2B , and FIG. 2C ), which can limit selection of proximal nucleic acid target sequences.
- the length and/or composition of the linker polypeptide used to produce the fusion protein comprising a Cas subunit protein and an endonuclease e.g., FokI
- an endonuclease e.g., FokI
- Example 21 describes the design and testing of multiple Cas3-FokI linker compositions and lengths and FokI-Cas8 linker compositions and lengths for modulation of genome editing efficiency.
- Another embodiment of this aspect of the invention comprises an engineered Type I CRISPR-Cas effector complex and a fusion protein comprising a dCas3* protein and a functional domain (e.g., cytidine deaminase) connected by a linker polypeptide ( FIG. 14B , dCas3*, Linker, and FP).
- the engineered Type I CRISPR-Cas effector complex can bind DNA and recruit the dCas3*-functional domain (e.g., cytidine deaminase) fusion protein.
- This embodiment can facilitate site-specific targeting of a nucleic acid target sequence for modification by, or interaction with, a functional domain.
- FIG. 15B illustrates an example of an engineered Type I CRISPR-Cas effector complex ( FIG. 15B , Cascade) comprising a fusion protein comprising a dCas3* protein ( FIG. 15B , dCas3*) connected with a functional domain ( FIG. 15B , FD) via a linker polypeptide ( FIG.
- FIG. 15C illustrates another example of an engineered Type I CRISPR-Cas effector complex ( FIG. 15C , Cascade) comprising a fusion protein comprising a dCas3* protein ( FIG. 15C , dCas3*) connected with a functional domain ( FIG. 15C , FD) via a linker polypeptide ( FIG. 15C , Linker), wherein the complex is bound to dsDNA.
- FIG. 15C contact of the functional domain with ssDNA is facilitated.
- Some embodiments of the invention can use an engineered Type I CRISPR-Cas effector complex and mutant form of Cas3 lacking ATPase and/or helicase activity (e.g., the Cas3 can be a nickase).
- the engineered Type I CRISPR-Cas effector complexes can bind DNA and then recruit the ATPase or helicase mutant form of Cas3. This embodiment can facilitate site-specific cleavage of genomic DNA by a mutant form of Cas3.
- the present invention relates to methods of using engineered Type I CRISPR-Cas effector complexes.
- the present invention includes a method of binding a nucleic acid target sequence in a polynucleotide (e.g., dsDNA) comprising providing one or more engineered Type I CRISPR-Cas effector complexes for introduction into a cell or a biochemical reaction and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide.
- a polynucleotide e.g., dsDNA
- a first engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a first nucleic acid target sequence in the polynucleotide and a second engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a second nucleic acid target sequence in the polynucleotide.
- an engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a nucleic acid target sequence in the polynucleotide and further comprises a dCas3* fusion protein capable of associating with the complex.
- a first engineered 1 Type I CRISPR-Cas effector complex binds to a first nucleic acid target sequence and a second engineered Type I CRISPR-Cas effector complex binds to a second nucleic acid target sequence in the polynucleotide.
- an engineered Type I CRISPR-Cas effector complex binds to a nucleic acid target sequence in the polynucleotide, and the effector complex comprises a dCas3*fusion protein associated with the complex.
- Such methods of binding a nucleic acid target sequence can be carried out in vitro (e.g., in a biochemical reaction or in cultured cells; in some embodiments, the cultured cells are human cultured cells that remain in culture and are not introduced into a human); in vivo (e.g., in cells of a living organism, with the proviso that, in some embodiments, the organism is a non-human organism); or ex vivo (e.g., cells removed from a subject, with the proviso that, in some embodiments, the subject is a non-human subject).
- in vitro e.g., in a biochemical reaction or in cultured cells; in some embodiments, the cultured cells are human cultured cells that remain in culture and are not introduced into a human
- in vivo e.g., in cells of a living organism, with the proviso that, in some embodiments, the organism is a non-human organism
- ex vivo e.g., cells removed from a subject, with
- a variety of methods are known in the art to evaluate and/or quantitate interactions between nucleic acid sequences and polypeptides including, but not limited to, the following: immunoprecipitation (ChIP) assays, DNA electrophoretic mobility shift assays (EMSA), DNA pull-down assays, and microplate capture and detection assays.
- ChrIP immunoprecipitation
- EMSA DNA electrophoretic mobility shift assays
- DNA pull-down assays DNA pull-down assays
- microplate capture and detection assays Commercial kits, materials, and reagents are available to practice many of these methods and, for example, can be obtained from the following suppliers: Thermo Scientific (Wilmington, Del.), Signosis (Santa Clara, Calif.), Bio-Rad (Hercules, Calif.), and Promega (Madison, Wis.).
- EMSA see, e.g., Hellman L. M., et al., Nature Protocols 2(8):1849
- the present invention includes a method of cutting a nucleic acid target sequence in a polynucleotide (e.g., a single-strand cut in dsDNA or double-strand cut in dsDNA) comprising providing one or more engineered Type I CRISPR-Cas effector complexes for introduction into a cell or biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide.
- a method of cutting a nucleic acid target sequence in a polynucleotide comprising providing one or more engineered Type I CRISPR-Cas effector complexes for introduction into a cell or biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRIS
- a first engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) ( FIG. 16A , Cascade1)
- a second engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a second nucleic acid target sequence in the polynucleotide and a second nuclease domain (e.g., FokI) ( FIG. 16A , Cascade 2) are introduced into the cell or biochemical reaction.
- an engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) ( FIG. 17A , Cascade), and a dCas3*-second nuclease domain (e.g., FokI) fusion protein ( FIG. 17A , dCas3) capable of associating with the complex are introduced into the cell or biochemical reaction.
- a first nuclease domain e.g., FokI
- a dCas3*-second nuclease domain e.g., FokI
- the contacting results in cutting of the nucleic acid target sequence(s) in the polynucleotide (e.g., a dsDNA) by the engineered Type I CRISPR-Cas effector complex(es).
- the first engineered 1 Type I CRISPR-Cas effector complex binds to the first nucleic acid target sequence in dsDNA ( FIG. 16B , Cascade1) and cleaves the first strand of a dsDNA ( FIG. 16C , Cascade1)
- the second engineered Type I CRISPR-Cas effector complex binds to the second nucleic acid target sequence in dsDNA ( FIG.
- the engineered Type I CRISPR-Cas effector complex binds to a nucleic acid target sequence in dsDNA ( FIG. 17B , Cascade) and cleaves the first strand of a dsDNA ( FIG. 17C , Cascade), and the dCas3* fusion protein associates with the complex ( FIG. 17B , dCas3*) and cleaves the second strand of the dsDNA ( FIG. 17C , dCas3*).
- a donor polynucleotide can also be introduced into a cell to facilitate incorporation of at least a portion of the donor polynucleotide into genomic DNA of the cell.
- FIG. 18A illustrates an example of both strands of a dsDNA being cleaved by a first engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) ( FIG.
- FIG. 18A illustrates a second engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a second nucleic acid target sequence in the polynucleotide and a second nuclease domain (e.g., FokI) ( FIG. 18A , Cascade 2).
- FIG. 18B illustrates a donor polynucleotide comprising homology arms complementary to DNA sequences adjacent the double-strand cut site ( FIG. 18B , Donor, dashed lines).
- FIG. 18C illustrates incorporation of a portion of the donor polynucleotide ( FIG. 18C dashed lines) at the double-strand cut site. Incorporation of the donor polynucleotide is mediated by cellular DNA repair mechanisms (e.g., homology-directed repair).
- an engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in a polynucleotide and a first nuclease domain can be paired with a second component comprising a second nuclease domain, wherein the second component is capable of binding to a second nucleic acid target sequence in the polynucleotide.
- second components include, a transcription activator-like effector nuclease (TALEN) comprising the second nuclease domain, a zinc finger comprising the second nuclease domain, or a dCas9/NATNA complex comprising the second nuclease domain.
- the nucleic acid target sequence is dsDNA (e.g., genomic) DNA.
- the nucleic acid target sequence is double-stranded and one or both of the strands is cut. Such methods of cutting a nucleic acid target sequence can be carried out in vitro, in vivo, or ex vivo.
- the present invention includes a method of modifying one or more nucleic acid target sequences in a polynucleotide (e.g., DNA) in a cell or biochemical reaction comprising providing one or more engineered Type I CRISPR-Cas effector complexes (e.g., comprising a Cas subunit protein-cytidine deaminase fusion protein) for introduction into the cell or the biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide resulting in binding of the engineered Type I CRISPR-Cas effector complex(es) to the nucleic acid target sequence(s) in the polynucleotide that facilitates modification of the nucleic acid target sequence(s) (e.g., C-to-T, G-to-A,
- FIG. 19A to FIG. 19D illustrate an example of using a Cascade complex comprising a Cas subunit protein-linker polypeptide-cytidine deaminase fusion protein (Cascade/CD complex) to modify a target nucleotide in genomic DNA of a cell.
- the Cascade/CD complex ( FIG. 19A ) is introduced into the cell.
- the Cascade/CD complex comprises a guide complementary to a DNA target sequence adjacent a target cytosine ( FIG. 19B , FIG. 19C ).
- the Cascade/CD complex binds the DNA target sequence ( FIG. 19B ) and the cytidine deaminase converts the cytosine to a uracil ( FIG. 19C ).
- Cellular repair mechanisms can then repair the uracil to a thymidine, and change the mismatched guanidine to adenine ( FIG. 19D ).
- the present invention includes methods of modulating in vitro or in vivo transcription, for example, transcription of a gene comprising regulatory element sequences.
- Such methods comprise providing one or more engineered Type I CRISPR-Cas effector complexes (e.g., comprising a Cas subunit protein-transcription factor fusion protein) for introduction into the cell or the biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the regulatory element sequences resulting in binding of the engineered Type I CRISPR-Cas effector complex(es) to the regulatory element sequences thereby facilitating modulating in vitro or in vivo transcription of the gene comprising the regulatory element sequences.
- engineered Type I CRISPR-Cas effector complexes e.g., comprising a Cas subunit protein-transcription factor fusion protein
- FIG. 20A and FIG. 20B present general illustrations of examples for the transcriptional activation of a generic gene (“GENE1”).
- FIG. 20A provides an overview of transcriptional regulation of an endogenous gene in a eukaryotic cell.
- the two dark parallel lines represent double-stranded DNA, the location of Gene 1 ( FIG. 20A , GENE 1) is indicated, as well as the transcriptional start site ( FIG. 20A , TSS) associated with Gene 1.
- a transcription factor FIG. 20A , TF
- FIG. 20A , Pol II polymerase II
- the second panel illustrates association of the TF with its cognate TSS.
- the TF then recruits a transcription activation protein (TP) that then recruits RNA Polymerase II (Pol II).
- TP transcription activation protein
- Polymerase II RNA Polymerase II
- the TF factor and the TP form a complex comprising multiple proteins and possibly other molecules.
- the third panel illustrates the resulting transcription of Gene 1 by Pol II. This type of transcriptional activation is typically dependent on TF(s) that are specific to the expression of a gene(s).
- FIG. 20B presents an illustration of one embodiment of the present invention, wherein a Cascade complex is modified with a protein or factor ( FIG.
- CASCADEa that attracts one or more components in the cells responsible for transcriptional activation (Transcriptional Activation factor; FIG. 20B , TA).
- TSS Transcriptional Activation factor
- FIG. 20B An example of one such protein or factor is the protein vp64.
- CASCADEa comprises a guide that is capable of binding at or near the TSS ( FIG. 20B , TSS).
- the two dark parallel lines represent double-stranded DNA, the location of Gene 1 ( FIG. 20B , GENE 1) is indicated, as well as the transcriptional start site (TSS) associated with Gene 1.
- TSS transcriptional start site
- the second panel illustrates association of CASCADEa with its target, the TSS.
- the CASCADEa then recruits a transcription activation protein ( FIG. 20B , TA) that then recruits RNA Polymerase II ( FIG. 20B , Pol II).
- the third panel illustrates the resulting transcription of Gene 1 by Pol II.
- FIG. 21A and FIG. 21B present a general illustration of an example for the transcriptional repression of a generic gene ( FIG. 21 A, GENE 1) using a Cascade complex comprising a Cas subunit protein-KRAB domain fusion and a guide ( FIG. 21A , CASCADEi) complementary to regulatory sequences ( FIG. 21A , promoter) associated with GENE 1. Binding of CASCADEi to the regulatory sequences ( FIG. 21B ) results in transcriptional repression of GENE 1.
- the present invention relates to using Type I CRISPR systems and Cas3 to delete nucleic acid target sequences in a 3′ to 5′ manner.
- This method can be used to make long range deletions of a specific length and can be useful for creation of gene knockouts.
- a region of a target polynucleotide (e.g., genomic DNA) can be deleted using a combination of a Cascade complex comprising a guide complementary to a first nucleic acid target sequence in the target polynucleotide and a dCas9/NATNA complex wherein the NATNA comprises a spacer sequence complementary to a second nucleic acid target sequence in the target polynucleotide.
- the first and second nucleic acid target sequences are selected to flank the nucleic acid target sequence targeted for deletion.
- a Cas3 protein comprising an active endonuclease activity associates with the Cascade complex and then progressively deletes a single strand of the dsDNA comprising the nucleic acid target sequence targeted for deletion.
- the Cas3 protein collides with the dCas9/NATNA complex the Cas3 nuclease activity can be stopped at the second nucleic acid target sequence by the dCas9/NATNA complex.
- FIG. 22A to FIG. 22D illustrate an example of a Cas3 deletion of a nucleic acid target sequence.
- FIG. 22A shows a dsDNA comprising nucleic acid target sequence 1 ( FIG. 22A , NATS1) and nucleic acid target sequence 2 ( FIG.
- FIG. 22A shows the Cascade complex comprising a guide complementary to NATS1 ( FIG. 22A , Cascade), the Cas3 protein ( FIG. 22A , Cas3), and the dCas9/NATNA complex comprising a spacer complementary to NATS2 ( FIG. 22A , dCas9).
- FIG. 22B shows binding of the Cascade complex to NATS1, association of the Cas3 protein with the Cascade complex, and binding of the dCas9/NATNA complex to NATS2.
- FIG. 22C illustrates the progressive deletion by Cas3 of a single strand of the nucleic acid target sequence targeted for deletion.
- FIG. 22D shows the dissociation of the Cas3 protein from the dsDNA at the position of the dCas9/NATNA complex bound to NATS2.
- a region of a target polynucleotide (e.g., genomic DNA) can be deleted using a combination of a first Cascade complex comprising a guide complementary to a first nucleic acid target sequence in the target polynucleotide and a second Cascade complex comprising a guide complementary to a second nucleic acid target sequence in the target polynucleotide.
- the first and second nucleic acid target sequences are selected to flank the nucleic acid target sequence targeted for deletion.
- Cas3 proteins comprising active endonuclease activity associate with each Cascade complex and then progressively delete both strands of the nucleic acid target sequence targeted for deletion.
- FIG. 23A to FIG. 23D illustrate an example of a Cas3 deletion of both strands of a nucleic acid target sequence.
- FIG. 23A shows a dsDNA comprising nucleic acid target sequence 1 ( FIG. 23A , NATS1) and nucleic acid target sequence 2 ( FIG. 23A , NATS2) that flank the nucleic acid target sequence targeted for deletion.
- FIG. 23A shows the first Cascade complex comprising a guide complementary to NATS1 ( FIG. 23A , Cascade1), the Cas3 proteins ( FIG.
- FIG. 23A Cas3
- FIG. 23B shows binding of the Cascade complexes to NATS1 and NATS2, as well as association of the Cas3 proteins with the Cascade complexes.
- FIG. 23C illustrates the progressive deletion by Cas3 of both strands of the nucleic acid target sequence targeted for deletion.
- FIG. 23D shows the dissociation of the Cas3 proteins from the dsDNA at the positions of the Cascade complexes bound to NATS1 and NATS2.
- a kit includes a package with one or more containers holding the kit elements, as one or more separate compositions or, optionally if the compatibility of the components allows, as admixture.
- a kit also comprises one or more of the following excipients: a buffer, a buffering agent, a salt, a sterile aqueous solution, a preservative, and combinations thereof.
- kits can comprise one or more engineered Type I CRISPR-Cas effector complexes and one or more excipients, or one or more nucleic acid sequences encoding one or more components of engineered Type I CRISPR-Cas effector complexes.
- kits can further comprise instructions for using engineered Type I CRISPR-Cas effector complex compositions.
- Another aspect of the invention relates to methods of making or manufacturing one or more engineered Type I CRISPR-Cas effector complexes, or components thereof.
- a method of making or manufacturing comprises production of engineered Type I CRISPR-Cas effector complexes in a cell and purification of the engineered Type I CRISPR-Cas effector complexes from cell lysates.
- Engineered Type I CRISPR-Cas effector complex compositions can further comprise a detectable label, such as a moiety that can provide a detectable signal.
- detectable labels include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore (FAM), a fluorescent protein (green fluorescent protein (GFP), red fluorescent protein, mCherry, tdTomato), a DNA or RNA aptamer together with a suitable fluorophore (enhanced GFP (eGFP), “Spinach”), a quantum dot, an antibody, and the like.
- FAM fluorophore
- GFP green fluorescent protein
- mCherry red fluorescent protein
- tdTomato red fluorescent protein
- DNA or RNA aptamer together with a suitable fluorophore enhanced GFP (eGFP), “Spinach”
- a large number and variety of suitable detectable labels are well-known to one of ordinary
- Cells comprising engineered Type I CRISPR-Cas effector complexes, cells modified through the use of engineered Type I CRISPR-Cas effector complexes, or progeny of such cells can be used as pharmaceutical compositions formulated, for example, with a pharmaceutically acceptable excipient.
- a pharmaceutically acceptable excipient include carriers, stabilizers, diluents, dispersing agents, suspending agents, thickening agents, and the like.
- the pharmaceutical compositions can facilitate administration of engineered Type I CRISPR-Cas effector complexes to a subject.
- Pharmaceutical compositions can be administered in therapeutically effective amounts by various forms and routes including, for example, intravenous, subcutaneous, intramuscular, oral, aerosol, parenteral, ophthalmic, and pulmonary administration.
- Embodiments of the present invention include, but are not limited to, the following.
- a composition comprising:
- a first engineered Class 1 Type I CRISPR-Cas effector complex comprising,
- a second engineered Class 1 Type I CRISPR-Cas effector complex comprising,
- composition of embodiment 1, wherein the first linker polypeptide has a length of between 15 amino acids and 30 amino acids.
- composition of embodiment 2, wherein the first linker polypeptide has a length of between 17 amino acids and 20 amino acids.
- composition of embodiment 4, wherein the second linker polypeptide has a length of between 17 amino acids and 20 amino acids.
- composition of any preceding embodiment, wherein the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI.
- composition of any preceding embodiment wherein the first Cse2 subunit protein and the second Cse2 subunit protein each comprises identical amino acid sequences, the first Cas5 subunit protein and the second Cas5 subunit protein each comprises identical amino acid sequences, the first Cas6 subunit protein and the second Cas6 subunit protein each comprises identical amino acid sequences, and the first Cas7 subunit protein and the second Cas7 subunit protein each comprises identical amino acid sequences.
- composition of any preceding embodiment, wherein the first guide polynucleotide comprises RNA.
- composition of any preceding embodiment, wherein the second guide polynucleotide comprises RNA.
- genomic DNA comprises the PAM of the second nucleic acid target sequence and the PAM of the first nucleic acid target sequence.
- One or more expression cassettes comprising the one or more nucleic acid sequences of embodiment 26, embodiment 27, or embodiment 26 and embodiment 27.
- One or more vectors comprising the one or more expression cassettes of embodiment 28.
- a method of binding a polynucleotide comprising the first nucleic acid target sequence and the second nucleic acid target sequence comprising:
- composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the second engineered Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, resulting in binding of the first engineered Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and binding of the second engineered Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence in the polynucleotide.
- genomic DNA comprises the polynucleotide.
- composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the engineered second Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, resulting in cutting of the first nucleic acid target sequence by the first engineered Class 1 Type I CRISPR-Cas effector complex and cutting of the second nucleic acid target sequence by the second engineered Class 1 Type I CRISPR-Cas effector complex.
- genomic DNA comprises the polynucleotide.
- a kit comprising: the composition of any one of embodiments 1-21; and a buffer.
- a kit comprising: the one or more nucleic acid sequences of embodiment 26, embodiment 27, or embodiment 26 and embodiment 27; and a buffer.
- a composition comprising:
- a second fusion protein comprising an engineered Class 1 Type I CRISPR-Cas3 fusion protein comprising a dCas3*protein and a second FokI, wherein the N-terminus of the dCas3* protein or the C-terminus of the dCas3* protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, effector complex comprising,
- composition of embodiment 36, wherein the first linker polypeptide has a length of between 5 amino acids to 40 amino acids.
- composition of embodiment 36, wherein the first linker polypeptide has a length of between 5 amino acids to 40 amino acids.
- a cell comprising: the composition of any one of embodiments 36 to 38.
- the cell of embodiment 39 wherein the cell is a prokaryotic cell.
- the cell of embodiment 39 wherein the cell is a eukaryotic cell.
- One or more nucleic acid sequences encoding the second fusion protein of any one of embodiments 36 to 38 are provided.
- One or more expression cassettes comprising the one or more nucleic acid sequences of embodiment 42, embodiment 43, or embodiment 42 and embodiment 43.
- One or more vectors comprising the one or more expression cassettes of embodiment 44.
- a method of binding a polynucleotide comprising the nucleic acid target sequence comprising:
- composition into the cell or the biochemical reaction, thereby facilitating contact of the engineered Class 1 Type I CRISPR-Cas effector complex with the nucleic acid target sequence and contact of the second fusion protein with the engineered Class 1 Type I CRISPR-Cas effector complex, resulting in binding of the engineered Class 1 Type I CRISPR-Cas effector complex and the second fusion protein to the nucleic acid target sequence in the polynucleotide.
- genomic DNA comprises the polynucleotide.
- a method of cutting a polynucleotide comprising the nucleic acid target sequence comprising:
- composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the engineered second Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, and
- composition into the cell or the biochemical reaction, thereby facilitating contact of the second engineered Class 1 Type I CRISPR-Cas effector complex with the nucleic acid target sequence and contact of the second fusion protein with the engineered Class 1 Type I CRISPR-Cas effector complex, resulting in cutting of the nucleic acid target sequence by the engineered Class 1 Type I CRISPR-Cas effector complex and the second fusion protein.
- genomic DNA comprises the polynucleotide.
- a kit comprising: the composition of any one of embodiments 36 to 38; and a buffer.
- a kit comprising: the one or more nucleic acid sequences of embodiment 42, embodiment 43, or embodiment 42 and embodiment 43; and a buffer.
- This Example provides a description of the design of polynucleotide components encoding Cascade using gene, protein, and CRISPR sequences derived from a Type I-E CRISPR-Cas system.
- Table 10 presents polynucleotide DNA sequences of genes encoding the five proteins of Cascade from Type I-E, specifically from E. coli strain K-12 MG1655, as well as the amino acid sequences of the resulting protein components. Genomic sequences were obtained from NCBI Reference Sequence NZ_CP014225.1. In the Table, polynucleotide sequences were either amplified from E. coli genomic DNA or manufacturer-produced polynucleotides encoding Cascade protein components that were codon optimized specifically for expression in E. coli and also for expression in human cells.
- fusion proteins comprising Cascade proteins were designed.
- Table 11 presents polynucleotide DNA sequences of genes encoding Cascade protein fusion proteins, as well as the amino acid sequences of the resulting protein components.
- fusion proteins described in Table 11 include short tri-amino acid linkers connecting the two polypeptide sequences within the fusion construct; this linker typically comprises glycine-glycine-serine (GGS) or glycine-serine-glycine (GSG).
- GGS glycine-glycine-serine
- GSG glycine-serine-glycine
- the His6 hexahistidine; SEQ ID NO:418) and Strep-TagTM II (GE Healthcare Bio-Sciences, Pittsburgh, Pa.) (SEQ ID NO:419) peptide tags on the Cse2 protein, when co-expressed with other Cascade proteins, enable purification of the complex via either Nickel-nitriloacetic acid (Ni-NTA) resin or Strep-TactinTM (GE Healthcare Bio-Sciences, Pittsburgh, Pa.) resin, respectively.
- the HRV3C human rhinovirus 3C protease recognition sequence (SEQ ID NO:420) is cleaved by an HRV3C protease and can be used to remove N-terminal fusions from a protein of interest.
- the NLS nuclear localization signal; SEQ ID NO:421 peptide tag on the Cas6, Cas7, and/or Cas8 proteins enables nuclear trafficking in eukaryotic systems.
- the HA hemagglutinin; SEQ ID NO:422) peptide tag on the Cas6 or Cas7 proteins enables detection of heterologous protein expression by Western blotting with an anti-HA antibody.
- the MBP maltose binding protein; SEQ ID NO:423) peptide fusion is a solubilization tag that facilitates purification of the Cas8 protein.
- the TEV (tobacco etch virus) protease recognition sequence (SEQ ID NO:424) is cleaved by TEV protease and can be used to remove N-terminal fusions from a protein of interest.
- the FokI nuclease domain comprises the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)), two monomeric FokI subunits associate to form a homodimer, and catalyze double-stranded DNA cleavage upon homo-dimerization.
- a linker sequence (SEQ ID NO:425) is used to fuse the FokI nuclease domain to the Cas8 protein.
- Table 13 contains the polynucleotide DNA sequence of four minimal CRISPR arrays that, when transcribed into precursor crRNA and processed by the RNA endonuclease protein of Cascade, generate mature crRNAs that function as the guide RNA to target complementary DNA sequences in biochemical assays and in cell culture gene editing experiments.
- the minimal CRISPR array comprises two repeat sequences (underlined, lower case) flanking a spacer sequence, which represents the guide portion of the crRNA.
- RNA processing by the Cascade endonuclease protein generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence.
- the CRISPR array may also be expanded to include three repeat sequences (underlined) flanking two spacer sequences, which represent the guide portions of two distinct crRNAs by RNA processing by the endonuclease Cascade protein.
- the arrays can be further expanded to include additional spacer sequences, if desired.
- CRISPR Array Sequences SEQ ID Cell Minimal CRISPR NO: type Target array sequence SEQ ID E .
- This Example describes the design of bacterial expression vectors that encode the Cascade-associated proteins, as well as a minimal CRISPR array comprising the guide sequence as described in Example 1.
- the construction of Cascade subunit protein expression systems for use with plasmids encoding minimal CRISPR arrays is described.
- a single-plasmid Cascade protein expression system was constructed to express the proteins of either a complex of Cascade in E. coli , known as the CasBCDE complex (which contains the Cse2, Cas7, Cas5, and Cash proteins, but not the Cas8 protein), or the entire functional Cascade complex in E. coli .
- the single plasmid system comprises either the cse2-cas7-cas5-cas6 operon, or the entire cas8-cse2-cas7-cas5-cas6 operon on a single expression plasmid.
- the Cas8 protein can be expressed from its own expression plasmid, for use in biochemical experiments where it is mixed together with the CasBCDE complex to reconstitute Cascade.
- the single plasmid Cascade protein expression system comprising a Cas operon was assembled as follows.
- the coding sequences for the cas genes were arranged in the order cse2-cas7-cas5-cas6 (CasBCDE complex or cas8-cse2-cas7-cas5-cas6 (full Cascade complex), and were separated by sequences corresponding to the wild-type bacterial gene arrangement (see NCBI Reference Sequence NZ_CP014225.1).
- the cse2-cas7-cas5-cas6 and cas8-cse2-cas7-cas5-cas6 operons were cloned into the pCDF (MilliporeSigma, Hayward, Calif.) vector backbone, which confers spectinomycin resistance due to the presence of the aadA gene. Transcription of the operon is driven by a T7 promoter and is under control of the Lac operator; the vector also encodes the Lad repressor. A T7 terminator was cloned downstream of the cse2-cas7-cas5-cas6 or cas8-cse2-cas7-cas5-cas6 operon.
- the vector contains a CDF origin of replication.
- the cas8 gene was cloned into a pET (MilliporeSigma, Hayward, Calif.) family vector backbone, which confers kanamycin resistance due to the presence of the kanR gene. Transcription of the operon is driven by a T7 promoter (PT7), and is under control of the Lac operator (lacO); the vector also encodes the Lad repressor (lacI gene). A T7 terminator was cloned downstream of the cas8 gene. The vector contains a ColE1 origin of replication.
- FIG. 24A , FIG. 24B , FIG. 24C , FIG. 24D , and FIG. 24E present schematic diagrams of overexpression vectors for the cas8, fokI-cas8, the cse2-cas7-cas5-cas6 operon, the cas8-cse2-cas7-cas5-cas6 operon, and the fokI-cas8-cse2-cas7-cas5-cas6 operon.
- PT7 T7 promoter
- lacO Lac operator
- His6 hexahistidine
- MBP maltose binding protein
- Strep-TagTM II HRV3C (human rhinovirus 3C) protease recognition sequence
- TEV tobacco etch virus protease recognition sequence
- NLS nuclear localization signal
- kanR kanamycin resistance gene
- lacI Lad repressor gene
- colE1 ori osteoin of replication
- CDF ori CloDF13 origin of replication
- FokI nuclease domain Sharkey variant
- aadA gene encoding aminoglycoside resistance protein
- Table 14 provides sequences of bacterial expression plasmids encoding the Cas8 protein, the 4 proteins of the CasBCDE complex (cse2-cas7-cas5-cas6 operon), and all 5 proteins of the Cascade complex (cas8-cse2-cas7-cas5-cas6 operon). Polynucleotide sequences are provided with and without the N-terminal FokI fusion on the Cas8 protein.
- the protein expression vectors encoding the cse2-cas7-cas5-cas6 operon or the cas8-cse2-cas7-cas5-cas6 operon are combined with a vector containing a minimal CRISPR array.
- CRISPR arrays were cloned into the pACYC-Duet1 vector backbone, which confers chloramphenicol resistance due to the camR gene. Transcription of the array is driven by a T7 promoter and is under control of the Lac operator (lacO); the vector also encodes the Lad repressor. A T7 terminator was cloned downstream of the CRISPR array.
- the vector contains a p15A origin of replication.
- FIG. 25 contains a schematic diagram of an expression vector containing a CRISPR array with 2 repeats ( FIG. 25 , “repeats”) and 1 spacer ( FIG. 25 , “spacer”).
- the array can be expanded, as described herein.
- the designations in FIG. 25 are described in this Example and in Example 1 and are as follows: PT7 (T7 promoter), lacO (Lac operator), lac/(LacI repressor gene), p15A ori (origin of replication), and camR (chloramphenicol resistance gene).
- Table 15 provides the sequences of bacterial expression plasmids encoding examples of minimal CRISPR arrays.
- This Example describes the design of eukaryotic expression plasmid vectors that encode Cascade-associated proteins, as well as minimal CRISPR arrays comprising the component sequences as described in Example 1.
- Cascade proteins can be expressed in mammalian cells by encoding each of the protein components on a separate expression vector driven by the human cytomegalovirus (CMV) immediate-early promoter/enhancer and encoding the crRNA on a separate expression vector driven by the human U6 promoter.
- CMV cytomegalovirus
- the starting plasmid for each expression plasmid was a derivative of pcDNA3.1 (Thermo Scientific, Wilmington, Del.). Coding sequences for the Cascade proteins, codon optimized for expression in human cells (see Example 1), were inserted into the vector downstream of the CMV promoter and upstream of a bovine growth hormone (bGH) polyadenylation signal.
- the cse2 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and 3 ⁇ -FLAG epitope tag.
- the cas5 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS.
- the cash gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and HA epitope tag.
- the cas7 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and Myc epitope tag.
- the cas8 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS; in another embodiment, the cas8 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS, HA epitope tag, and FokI nuclease domain.
- Each gene or gene fusion was cloned into a pcDNA3.1 derivative vector backbone, which confers ampicillin resistance due to the presence of the ampR gene.
- the vector also encodes neomycin resistance due to the presence of the neoR gene, which is downstream of an SV40 early promoter (P SV40 ) and origin (SV40 ori), and upstream of an SV40 early polyadenylation signal (SV40 pA).
- P CMV human CMV immediate-early promoter/enhancer
- bGH bovine growth hormone
- FIG. 26 contains a schematic diagram of a mammalian expression vector encoding the FokI-Cas8 fusion protein.
- the designations in FIG. 26 are described in this Example and in Example 1 and are as follows: the human CMV immediate-early promoter/enhancer (P CMV ), NLS (nuclear localization signal), FokI (FokI nuclease domain (Sharkey variant)), Cas8 protein coding sequence, bGH pA (bovine growth hormone polyadenylation signal), fl ori (fl phage origin of replication), P SV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal), colE1 ori (origin of replication), and ampR (ampicillin resistance gene).
- Vectors encoding the other Cascade proteins were designed similarly.
- Table 16 provides the sequences of individual mammalian expression vectors encoding each of Cse2, Cas5, Cas6, Cas7, Cas8, and FokI-Cas8.
- the CRISPR RNA was encoded with a minimal CRISPR array containing three repeats flanking two spacer sequences.
- the construct generating CRISPR RNA can be designed with additional sequences flanking the outermost repeats in the minimal array. Processing of the precursor CRISPR RNA is enabled by the RNA processing protein of the Cascade complex (Cas6 protein), which can be expressed on a separate plasmid.
- the CRISPR array was cloned into the same pcDNA3.1 derivative vector backbone described above, except the human CMV promoter was replaced with the human U6 promoter (P U6 ), and the bGH polyadenylation signal was replaced with a poly-T termination signal.
- FIG. 27 contains a schematic diagram of a eukaryotic expression vector encoding a representative CRISPR array targeting the TRAC gene.
- the designations in FIG. 27 are described in this Example and in Example 1 and are as follows: P U6 (human U6 promoter), repeats (CRISPR RNA repeats), TRAC spacer-1 (first spacer targeting the TRAC gene), TRAC spacer-2 (second spacer targeting the TRAC gene), polyT (poly-T termination signal), fl ori (fl phage origin of replication), P SV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal), colE1 ori (origin of replication), and ampR (ampicillin resistance gene).
- P U6 human U6 promoter
- CRISPR RNA repeats CRISPR RNA repeats
- TRAC spacer-1 first spacer targeting the TRAC gene
- TRAC spacer-2 second spacer targeting
- Table 17 provides the sequence of a representative mammalian expression vector encoding a CRISPR array targeting the TRAC gene; a spacer sequence that targets matching DNA sequences in the TRAC gene can be found in Table 13.
- polycistronic expression vectors were constructed. On each, a single CMV promoter drives expression of multiple coding sequences simultaneously that are separated by a 2A viral peptide sequence.
- the Thosea asigna virus 2A peptide sequence induces ribosomal skipping (Liu, Z., et al., Sci. Rep. 7:2193 (2017)), thus enabling multiple protein-coding genes to be concatenated within a single polycistronic construct.
- the starting plasmid for the polycistronic expression plasmid was the same derivative of pcDNA3.1 described above, containing the CMV promoter and bGH polyadenylation signal. Coding sequences for the Cascade proteins, codon optimized for expression in human cells (see Example 1), were joined in the order cas7-cse2-cas5-cas6-cas8, with a polynucleotide sequence coding for the Thosea asigna virus 2A (T2A) peptide inserted in between each pair of genes.
- T2A Thosea asigna virus 2A
- polynucleotide sequences encoding NLS tags were appended to the 5′ end of each Cascade protein gene, and a polynucleotide sequence encoding the FokI nuclease domain was appended to the 5′ end of the cas8 gene, connecting by a 30-amino acid linker sequence.
- the final construct has the following order of elements: NLS-cas7-T2A-NLS-cse2-T2A-NLS-cas5-T2A-NLS-cas6-T2A-NLS-fokI-linker-cas8.
- FIG. 28 contains a schematic diagram of an exemplary polycistronic mammalian expression vector encoding all the Cascade proteins.
- the designations in FIG. 28 are described in this Example and in Example 1 and are as follows: the human CMV immediate-early promoter/enhancer (P CMV ), NLS (nuclear localization signal), T2A (polynucleotide sequence coding for the Thosea asigna virus 2A peptide), coding sequences for the Cas7, Cse2, Cas5, and Cash proteins, fokI (FokI nuclease domain (Sharkey variant) a linker sequence, coding sequence for Cas8 protein, bGH pA (bovine growth hormone polyadenylation signal), fl ori (fl phage origin of replication), P SV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal
- Table 18 provides the sequence of an exemplary polycistronic mammalian expression vector encoding all the Cascade proteins. This vector can be combined with the mammalian expression vector encoding CRISPR RNA described above to produce functional Cascade complexes in mammalian cells.
- a single plasmid Cascade expression system was constructed to express the complete Cascade complex in human cells.
- the plasmid encodes the entire cas8-cse2-cas7-cas5-cas6 operon and a minimal CRISPR array on a single plasmid.
- This plasmid was constructed from the polycistronic protein expression vector (described above in Table 18 and FIG. 28 ) by inserting the minimal CRISPR array along with the upstream human U6 promoter and downstream poly-T termination signal into the MluI restriction site.
- Table 19 provides the sequence of the single plasmid for expression of all five Cascade proteins together with the crRNA to facilitate formation of Cascade complexes in human cells.
- T2A_NLS-Cas6_NLS- Cas8 contains N-terminal 5 Cascade FokI-Cas8 NLS-HA-FokI; FokI confers proteins and the ability to cleave double- crRNA stranded DNA
- Plasmids were also designed for the expression of the Cas3 protein (SEQ ID NO:21; monomer Cas3 nuclease/helicase E. coli K-12 substr. MG1655) in E. coli and in mammalian cells. Table 20 provides the constructs and sequences of these plasmids.
- This Example describes for introduction and expression of Cas8 subunit protein coding sequences, as well as coding sequences for components of engineered Type I CRISPR-Cas effector complexes in bacterial cells using E. coli expression systems.
- E. coli Type I-E Cas8 protein was expressed from a plasmid (Example 2, SEQ ID NO:438, Table 14, FIG. 24A ) containing an operon for the IPTG inducible expression of His6-MBP-TEV-Cas8 from a T7 promoter.
- the expression plasmid conferred resistance to kanamycin.
- E. coli cells were transformed with the expression plasmid. Briefly, a 100 ⁇ L aliquot of chemically competent E. coli cells ( E. coli BL21 StarTM cells (Thermofisher, Waltham, Mass.)) in a microcentrifuge tube was thawed on ice for 10 minutes. 35 ng of plasmid DNA was added to the thawed cells and the cells were incubated with the DNA on ice for 8 minutes. Heat shock was performed by a placing the microcentrifuge tube in a 42° C. water bath for 30 seconds and then immediately placing the tube in ice for 2 minutes.
- chemically competent E. coli cells E. coli BL21 StarTM cells (Thermofisher, Waltham, Mass.)
- 35 ng of plasmid DNA was added to the thawed cells and the cells were incubated with the DNA on ice for 8 minutes.
- Heat shock was performed by a placing the microcentrifuge tube in a 42°
- a single colony was picked from the colonies that grew on the antibiotic selection plates and was inoculated into 10 mL of 2 ⁇ YT media supplemented with kanamycin (50 ⁇ g/mL).
- the culture was grown overnight at 37° C. while shaking in an orbital shaker at 200 RPMs. 6 mL of the overnight culture were transferred to a 2 L baffled flask having 1 L of 2 ⁇ YT media supplemented with chloramphenicol (34 ⁇ g/mL) and spectinomycin (100 ⁇ g/mL).
- the 1 L culture was grown at 37° C. while shaking in an orbital shaker at 200 RPM until the optical density at 600 nm was 0.56.
- a complete set of the five E. coli Cascade proteins and RNA guides were co-expressed in E. coli cells using a two-plasmid system to produce Cascade RNP complexes.
- One plasmid (Example 2, SEQ ID NO:441, Table 14, FIG. 24D ) contained an operon for IPTG inducible expression of the Cse2, Cas5, Cas6, Cas7, and Cas8 proteins from a T7 promoter.
- a His6 affinity tag was included as a translational fusion to the N-terminus of Cse2 (Example 1, SEQ ID NO:392, Table 11).
- the second plasmid coded for the IPTG inducible expression of the J3 guide (Example 2, SEQ ID NO:444, Table 15, FIG. 25 ).
- the Cascade protein expression plasmid conferred spectinomycin resistance, and the Cascade RNA guide expression plasmid conferred chloramphenicol resistance.
- E. coli cells were simultaneously transformed with the two plasmids.
- a 100 ⁇ L aliquot of chemically competent E. coli cells E. coli , BL21 StarTM (DE3) (Thermofisher, Waltham, Mass.)
- E. coli , BL21 StarTM (DE3) Thermofisher, Waltham, Mass.
- 35 ng of each plasmid was added to the thawed cells and the cells were incubated with the DNA on ice for 8 minutes.
- Heat shock was performed by a placing the microcentrifuge tube in a 42° C. water bath for 30 seconds and then immediately placing the microcentrifuge tube in ice for 2 minutes.
- a single colony was picked from the colonies that grew on the antibiotic selection plates and was inoculated into 10 mL of 2 ⁇ YT media supplemented with chloramphenicol (34 ⁇ g/mL) and spectinomycin (100 ⁇ g/mL).
- the culture was grown overnight at 37° C. while shaking in an orbital shaker at 200 RPMs. 6 mL of the overnight culture were transferred to a 2 L baffled flask having 1 L of 2 ⁇ YT media supplemented with chloramphenicol (34 ⁇ g/mL) and spectinomycin (100 ⁇ g/mL).
- the 1 L culture was grown at 37° C. while shaking in an orbital shaker at 200 RPM until the optical density at 600 nm was 0.56.
- This Example describes a method to purify E. coli Type I-E Cascade ribonucleoprotein (RNP) complexes produced by overexpression in bacteria as described in Example 4. The method uses immobilized metal affinity chromatography followed by size exclusion chromatography. This Example also describes the methods used to assess the quality of the purified Cascade RNP product. In addition, this Example describes purification and characterization of Cascade components.
- RNP E. coli Type I-E Cascade ribonucleoprotein
- E. coli Type I-E Cascade RNP complexes were produced as described in Example 4.
- the Cascade complexes were captured using immobilized metal affinity chromatography. Briefly, the re-suspended cell pellets, produced as described in Example 4, were thawed on ice and the volume was brought to 35 mL by of an additional 15 mL of lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, and 1 mM TCEP supplemented with 1 CompleteTM protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer.
- lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, and 1 mM TCEP supplemented with 1 CompleteTM protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer.
- the 50 mL conical tube was placed in an ice water bath and the cells were lysed by two rounds of sonication using a Q500 sonicator with a 1 ⁇ 2 inch tip (Qsonica, Newtown, Conn.). Each round of sonication consisted of a treatment cycle of 2.5 minutes with repeating cycles of 10 seconds of sonication at 50% amplitude followed by 20 seconds of rest. The tube was allowed to cool in the ice water bath for one minute between rounds of sonication. The lysates were clarified by centrifugation at 48,384 RCF for 30 minutes at 4° C.
- the clarified supernatant was then added to a HispurTM Ni-NTA resin (Thermofisher, Waltham, Mass.), that had been pre-equilibrated with Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP.
- Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP.
- a 1.5 mL bed volume of nickel affinity resin was used for each 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the resin was pelleted by centrifugation at 500 RCF for 2 minutes at 4° C. The supernatant was aspirated and the resin was washed 5 times with 5 bed volumes of Ni-wash buffer.
- bound proteins including the Cascade RNP complexes
- Ni-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 300 mM imidazole, 5% glycerol, and 1 mM tris(2-carboxyethyl)phosphine (TCEP).
- TCEP tris(2-carboxyethyl)phosphine
- the nickel affinity eluate was further purified by size exclusion chromatography (SEC).
- the nickel affinity eluate was concentrated to a final volume of 0.5 mL by ultrafiltration at 12° C. using an Amicon® ultrafiltration spin concentrator (Millipore Sigma, Billerica, Mass.) with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.).
- the concentrated sample was filtered using a 0.22 ⁇ M Ultrafree-MC GV Centrifugal Filter (Millipore Sigma, Billerica, Mass.) before being further purified by separation at 4° C.
- SEC buffer composed of 50 mM Tris pH 7.5, 500 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP. Proteins were eluted with SEC buffer and 1 ml fractions were collected. The earliest eluting peak, as judged by UV 280, was assumed to be high molecular weight aggregated material and the corresponding fractions were discarded. Subsequent elution fractions were analyzed by Coomassie stained SDS-PAGE.
- Each properly formed complex contained one molecule of Cas8, six molecules of Cas7, one molecule each of Cas6 and Cas5, and two molecules of Cse2. Elution fractions that had the approximate expected stoichiometry of Cascade proteins, when visualized on the SDS-PAGE gel, were pooled. Pooled fractions were analyzed spectrophotometrically to confirm they contained a significant nucleic acid component, as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nm.
- the pooled samples were exchanged into storage buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP by concentrating the pooled samples to 100 uL with an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.) and then diluting 50-fold with the storage buffer. Finally, the sample was concentrated to 10 mg/mL using the same ultrafiltration device and stored at ⁇ 80° C.
- the final purified product was analyzed spectrophotometrically to determine the final concentration of the Cascade RNP complexes and to confirm the presence of a nucleic acid component as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nM.
- concentration of the Cascade RNP complexes was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex with a 1 cm path length.
- the predicted absorbance of a 0.1% solution of the purified complex is 2.03 cm′ and was calculated by dividing the sum of the calculated extinction coefficients at 280 nm for each of the molecules in the complex (916940 M ⁇ 1 cm ⁇ 1 ) by the sum of the molecular weights of each of the molecules in the complex (450832 g/mole).
- the final product was analyzed by SDS-PAGE with Coomassie blue staining to confirm that each protein component was present in approximately the correct stoichiometry, and to assess the presence of contaminating proteins.
- SDS-PAGE gels were stained with a Coomassie InstantBlueTM (Expedeon, San Diego, Calif.) stain. Gels were imaged using a Gel DocTM EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.).
- this method for purification of E. coli Type I-E Cascade complexes can be applied to the production of other purified Type I Cascade complexes.
- a Cascade complex composed of the and the protein components Cas7, Cas6, Cas5, and Cse2 was purified.
- the L3 guide RNA (Example 2, SEQ ID NO:445, Table 15) was expressed from a first plasmid (Example 2, FIG. 25 ) essentially as described in Example 4.B.
- the Cascade proteins were expressed from a second plasmid (Example 2, SEQ ID NO:440, Table 14, FIG. 24C ) essentially as described in Example 4B.
- the complex was captured using affinity chromatography. Re-suspended cell pellets were thawed on ice. In a 50 mL conical tube, the volume was brought up to 35 mL with an additional 15 mL of lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 1 mM TCEP, and supplemented with 1 CompleteTM protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer. The 50 mL conical tube was placed in an ice water bath, and the cells were lysed by six rounds of sonication using a Q500 sonicator with a 1 ⁇ 2 inch tip (Qsonica, Newtown, Conn.).
- Each round of sonication consisted of a 1 minute treatment cycle with repeating cycles of 3 seconds of sonication at 90% amplitude followed by 9 seconds of rest.
- the tube was allowed to cool in the ice water bath for one minute between rounds of sonication.
- the lysate was clarified by centrifugation at 48,384 RCF for 30 minutes at 4° C.
- the clarified supernatant was affinity purified by addition of Strep-Tactin® Sepharose® resin (IBA Life Sciences, Gottingen, Germany) that had been pre-equilibrated with Strep-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 1 mM EDTA, 5% glycerol, and 1 mM TCEP.
- a 0.55 mL bed volume of affinity resin was used for each 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the sample was poured onto a 30 mL disposable gravity flow column (Bio-Rad, Hercules, Calif.) allowing the unbound material to flow through the column. The resin was washed five times with five bed volumes of Strep-wash buffer. Finally, the bound proteins were eluted with two sequential additions of five bed volumes of Strep-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 2.5 mM Desthiobiotin, 5% glycerol, 1 mM EDTA, and 1 mM TCEP.
- the affinity eluate was further purified by SEC.
- the affinity eluate was concentrated to a final volume of 550 uL by ultrafiltration at 12° C. using an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.).
- the concentrated sample was filtered using a 0.22 ⁇ m 13 mm UltraCruz® PVDF syringe filter (Santa Cruz Biotechnology, Dallas, Tex.) before being further purified by separation at 4° C.
- the pooled samples were exchanged into storage buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP by concentrating down to 200 uL with an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.) and then diluting 75-fold with storage buffer.
- the sample was concentrated a second time to 700 uL and again diluted 20-fold with storage buffer. Finally, the sample was concentrated to 4.7 mg/mL in the same ultrafiltration device and stored at ⁇ 80° C.
- the final purified product was analyzed spectrophotometrically to determine the final concentration of the Cascade RNP complexes and to confirm the presence of a nucleic acid component as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nM.
- concentration of the Cascade RNP complexes was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex with a 1 cm path length.
- the predicted absorbance of a 0.1% solution of the purified complex is 2.18 cm ⁇ 1 and was calculated by dividing the sum of the calculated extinction coefficients at 280 nm for each molecule in the complex (762240 M ⁇ 1 cm ⁇ 1 ) by the sum of the molecular weights of each molecule in the complex (348952.07 g/mole).
- the final product was analyzed by SDS-PAGE with Coomassie blue staining to confirm that each Cascade protein was present in approximately the correct stoichiometry, and to assess the presence of contaminating proteins.
- SDS-PAGE gels were stained with Coomassie InstantBlueTM (Expedeon, San Diego, Calif.) stain. Gels were imaged using a Gel DocTM EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.). Each properly formed complex contained six molecules of Cas7, one molecule each of Cas6 and Cas5, and two molecules of Cse2.
- a method used to purify a fusion protein comprising a FokI nuclease fusion to the E. coli Type I-E Cas8 protein from bacterial over-expression pellets using immobilized metal affinity chromatography, cation exchange chromatography (CIEX), and finally size exclusion chromatography (SEC) is described herein.
- the E. coli Type I-E FokI-Cas8 fusion protein, including a linker sequence, is described in Example 1 (SEQ ID NO:413, Table 11).
- the expression plasmid is described in Example 2 (SEQ ID NO:439, Table 14, FIG. 24B ).
- Cells comprising the fusion protein were produced essentially as described in Example 4A.
- the Cas8 fusion protein contained a N-terminal His6 tag, a Maltose binding protein domain, a TEV cleavage site, a FokI nuclease domain, and a 30 amino acid linker.
- the protein was captured using immobilized metal affinity chromatography.
- a 50 mL conical tube containing the re-suspended cell pellets was thawed on ice. The tube was then placed in an ice water bath, and the cells were lysed by sonication using a Q500 sonicator with a 1 ⁇ 4 inch tip (Qsonica, Newtown, Conn.) for a treatment cycle of three minutes with repeating cycles of 10 seconds of sonication at 40% amplitude followed by 20 seconds of rest. The lysates were clarified by centrifugation at 30,970 RCF for 30 minutes at 4° C.
- the clarified supernatant was then added to HispurTM Ni-NTA resin (Thermofisher, Waltham, Mass.), that had been pre-equilibrated with Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP.
- Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP.
- a 2 mL bed volume of nickel affinity resin was used for 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the sample was poured onto a 30 mL disposable gravity flow column (Bio-Rad, Hercules, Calif.), allowing the unbound material to flow through the column.
- Ni-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 300 mM imidazole, 5% glycerol, and 1 mM TCEP.
- the nickel affinity eluate was treated with TEV protease to remove the affinity tag.
- TEV protease was added to the eluate at a ratio of 1:25 (w/w).
- the sample, including TEV, was dialyzed overnight against Ni-wash buffer using a 12 mL Slid-A-LyzerTM, 10K MWCO dialysis cassette (Thermofisher, Waltham, Mass.).
- the TEV protease and the cleaved His6-MBP fragment were removed from the dialyzed sample by Ni affinity chromatography.
- the dialyzed sample was poured over a clean HispurTM Ni-NTA resin (Thermofisher, Waltham, Mass.) column equilibrated with Ni-wash buffer. The resin was then washed with 1 column volume of Ni-NTA wash buffer. The flow through and wash were combined, concentrated, and exchanged into storage buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% glycerol, and 1 mM TCEP) using an Amicon® spin concentrator with an Ultracel®-10 membrane (Millipore Sigma, Billerica, Mass.). This sample was then frozen at ⁇ 80C for storage.
- CIEX cation exchange chromatography
- a 10 mL capillary loop was used to load the sample onto a 1 mL HitrapTM SP HP column (GE Healthcare, Uppsala, Sweden), equilibrated with a buffer comprising CIEX_A buffer and 5% CIEX_B buffer (50 mM Tris pH 7.5, 1 M NaCl, 5% glycerol, and 1 mM TCEP). The flow rate throughout the separation was of 0.75 mL/min. The loop was emptied onto the column with 15 mL of with 5% CIEX B buffer. The unbound sample was washed out with an additional 2 mL of 5% CIEX_B buffer.
- the pooled CIEX fractions were further purified by SEC.
- the pooled CIEX fractions were concentrated to a final volume of 0.3 mL by ultrafiltration at 12° C. using an Amicon® spin concentrator with an Ultracel®-10 membrane (Millipore Sigma, Billerica, Mass.).
- the concentrated sample was filtered using a 0.22 ⁇ m Ultrafree-MC GV Centrifugal spin filter (Millipore Sigma, Billerica, Mass.), and further purified by separation at 4° C.
- the final purified product was analyzed spectrophotometrically to determine the final concentration of the fusion protein and to confirm the absence of a significant nucleic acid component as evidenced by an absorbance at 280 nm that is greater than the absorbance at 260 nm.
- concentration of the FokI-Cas8 fusion was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex.
- the predicted absorbance of a 0.1% solution of the purified complex is 1.05 cm′ and was calculated by dividing extinction coefficient at 280 nm for the FokI-Cas8 fusion (86290 M ⁇ 1 cm ⁇ 1 ) by its molecular weight (82171.32 g/mole).
- the final product was analyzed by SDS-PAGE gels stained with InstantBlueTM stain (Expedeon, San Diego, Calif.). Gels were imaged using a Gel DocTM EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.). This analysis demonstrates that the purified fusion protein was the expected size and that only a low level of contaminating proteins were present.
- dsDNA Double-Stranded DNA
- Double-stranded DNA (dsDNA) target sequences for use in in vitro DNA binding or cleavage assays with Cascade or Cascade-fusion effector complexes can be produced using several different methods.
- This Example describes three methods to produce target sequences, including annealing of synthetic single-stranded DNA (ssDNA) oligonucleotides, PCR amplification of selected nucleic acid target sequences from genomic DNA, and/or cloning of nucleic acid target sequences into bacterial plasmids.
- the dsDNA target sequences were used in Cascade binding or cleavage assays.
- DNA oligonucleotides encoding the target region of interest comprising the target sequence also known as the protospacer, that is recognized by the guide portion of CRISPR RNA, the neighboring protospacer adjacent motif (PAM), and additional 5′ and 3′ flanking sequences were purchased from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa). Two oligonucleotides were ordered per construct, one comprising the sense strand and one comprising the nonsense strand.
- Table 21 lists oligonucleotide sequences that were ordered to contain a target sequence denoted J3, which is derived from bacteriophage lambda genomic DNA. The target and PAM sequences are flanked by 20-bp of additional sequence on both the 5′ and 3′ ends.
- the oligonucleotides were annealed by mixing both oligonucleotides at equimolar concentration (10 ⁇ M) in 1 ⁇ annealing buffer (6 mM HEPES, pH 7.0, and 60 mM KCl), heating at 95° C. for 2 minutes, and then slow cooling. Annealed oligonucleotides were then used directly in DNA binding and/or DNA cleavage assays with Cascade and/or Cascade-effector domain fusion RNPs.
- Four oligonucleotides were ordered per construct, one comprising the 5′ fluorescent-labeled sense strand, one comprising the 5′ unlabeled sense strand, one comprising the 5′ fluorescent-labeled nonsense strand, and one comprising the 5′ unlabeled nonsense strand.
- the target and PAM sequences are flanked by 20-bp of additional sequence on both the 5′ and 3′ ends.
- Table 22 lists oligonucleotide sequences that were ordered to contain a target sequence denoted J3, which was derived from bacteriophage lambda genomic DNA and a control target sequence denoted CCR5, which was derived from the human CCR5 locus.
- the oligonucleotides were annealed by mixing a labeled and unlabeled or two labeled or two unlabeled oligonucleotides at equimolar concentration (1 ⁇ M) in 1 ⁇ annealing buffer (6 mM HEPES, pH 7.0, 60 mM KCl), heating at 95° C. for 2 minutes, and then slow cooling. Annealed oligonucleotides were then used directly in DNA binding assays with Cascade and/or Cascade-effector domain fusion RNPs. Cy5 fluorescently-labeled DNA oligonucleotides were imaged with an AZURE c600 Bioimager (Azure BioSystems, Dublin, Calif.).
- This method can be applied to produce additional labeled or unlabeled target or dual-target sequences, whereby a dual target is defined as a target that contains two protospacer sequences targeted by individual Cascade molecules, separated by an interspacer sequence.
- Double-stranded DNA target sequences for dual targets derived from human genomic DNA were produced using PCR amplification directly from genomic DNA template material. Specifically, PCR reactions contained human genomic DNA purified from K562 cells and Q5 Hot Start High-Fidelity 2 ⁇ Master Mix (New England Biolabs, Ipswich, Mass.), as well as the primers listed in Table 23, where the underlined portions correspond to primer binding sites within genomic DNA.
- PCR was performed according to the manufacturer's instructions (New England Biolabs, Ipswich, Mass.), and the desired product DNA, 288-bp in length, was purified using a Nucleospin Gel and PCR Cleanup kit (Macherey-Nagel, Bethlehem, Pa.) This dsDNA was then used directly in DNA binding and/or DNA cleavage assays with Cascade and/or Cascade-effector domain fusion RNPs.
- DNA oligonucleotides encoding the target region of interest comprising the target sequence, also known as the protospacer, that is recognized by the guide portion of CRISPR RNA, the neighboring protospacer adjacent motif (PAM), and additional 5′ and 3′ flanking sequences were purchased from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa).
- the oligonucleotides were designed such that, when annealed, the termini regenerate sticky ends upon cleavage of their respective recognition sites by the restriction enzymes EcoRI and BlpI, or by BamHI and EcoRI. Oligonucleotides were designed to contain a single target sequence derived from the bacteriophage lambda genome, denoted J3.
- oligonucleotides were designed to contain two tandem target sequences derived from the bacteriophage lambda genome, denoted J3 and L3, separated from each other by a 15-bp interspacer sequence. Sequences of these oligonucleotides are listed in Table 24.
- the oligonucleotides contain 5′-phosphorylated ends, which were introduced by the commercial manufacturer or phosphorylated in-house using T4 polynucleotide kinase (New England Biolabs, Ipswich, Mass.).
- the oligonucleotides were then annealed at a final concentration of 1 ⁇ M by mixing together equimolar amounts in annealing buffer (6 mM HEPES, pH 7.0, 60 mM KCl), heating to 95° C. for 2 minutes, and then slow-cooling on the benchtop.
- pACYC-Duet1 (MilliporeSigma, Hayward, Calif.) plasmid was double-digested with the corresponding pair of restriction enzymes, either BamHI and EcoRI, or EcoRI and BlpI, whose sticky ends match the sticky ends formed by the termini of the hybridized oligonucleotides.
- the double-digested vector was separated from the removed insert using agarose gel electrophoresis.
- the hybridized oligonucleotides were diluted to a 50 nM stock concentration, and then a 10 ⁇ L ligation reaction was formed using hybridized oligonucleotides, the double-digested vector, and Quick Ligase from New England Biolabs. The ligation reaction was then used to transform chemically competent E. coli strains, and after overnight growth on agarose plates, individual clones were isolated and grown in liquid culture to generate sufficient bacterial cultures from which to isolate plasmids. Sanger sequencing was then used to validate the desired plasmid sequence.
- Table 25 provides complete vector sequences for plasmids containing the J3 target sequence (SEQ ID NO:481) and plasmids containing the J3 and L3 targets sequences separated by the 15-bp interspacer sequence (SEQ ID NO:482).
- the 15-bp interspacer sequence of SEQ ID NO:482 contains unique AvrII and XhoI restriction sites.
- introduction of additional hybridized oligonucleotides into these restriction sites expands the interspacer to longer lengths, for biochemical testing with purified Cascade and Cascade-nuclease fusion RNPs.
- the crRNA-guided FokI-Cascade fusion complex targets two adjacent DNA site, dimerization of the FokI domains from adjacent DNA-bound complexes leads to DNA cleavage within the interspacer separating the two target sites.
- Variable interspacer lengths were designed and tested to evaluate a given interspacer length with a given tethering geometry between the FokI nuclease domain and its fused Cascade subunit protein.
- the complete vector sequence for a target DNA substrate containing an expanded interspacer sequence of 30-bp in length is given in Table 25 as SEQ ID NO:483.
- the following cloning strategy provided a plasmid substrate that contains several target sequences serially connected along one large insert.
- a gene block was ordered from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa) that contained 17 consecutive dual targets.
- the gene block contained 4 bp separating each dual target from a neighboring dual target, and contained 16 dual targets derived from Homo sapiens genomic DNA, as well as one control dual target containing J3/L3 targets derived from the bacteriophage lambda genome.
- the genomic coordinates of the 16 consecutive human dual targets are shown in Table 26.
- the gene block was ordered with flanking SacI and SbfI restriction sites on the ends, such that it could be cloned into SacI and SbfI sites in the pACYC-Duet1 vector.
- the full vector sequence of the multi-target plasmid substrate generated by cloning the gene block into pACYC-Duet1 is presented as SEQ ID NO:484 in Table 25. This multi-target sequence plasmid allowed for biochemical testing of multiple different FokI-Cascade preparations harboring crRNAs targeting one of the serially connected target sites within the plasmid.
- This Example illustrates the use of FokI-Cascade fusion protein complexes in biochemical double-stranded DNA (dsDNA) cleavage assays. Protein reagents were compared in terms of their activity in dsDNA cleavage.
- FokI-Cascade RNPs derived from the E. coli Type I-E Cascade system were designed, recombinantly expressed in E. coli , and purified for use, as outlined in Examples 1, 2, and 5. These RNPs were designed to contain either CRISPR RNAs that target the J3 and L3 target sequences derived from bacteriophage lambda genomic DNA, or that target an intron in the TRAC gene within human genomic DNA. Each RNP preparation is a heterogeneous mixture comprising two FokI-Cascade complexes that are otherwise identical except for the guide portion of the crRNA.
- a FokI-Cascade complex was reconstituted by mixing together a CasBCDE complex (produced using SEQ ID NO:440 and SEQ ID NO:446, as described in Example 2) with purified FokI-Cas8 comprising a 16-aa linker (the general FokI-Cas8 expression vector sequence is described in Example 2, SEQ ID NO:439 in Table 14; the particular 16-aa linker is in Example 1, SEQ ID NO:431 in Table 12).
- Reconstitution was performed in 1 ⁇ Cascade Cleavage Buffer (20 mM Tris-Cl, pH 7.5, 200 mM NaCl, 5 mM MgCl 2 , 1 mM TCEP, 5% glycerol) with CasBCDE and FokI-Cas8 both at 1 ⁇ M final concentrations.
- reaction mixtures were as follows.
- a plasmid substrate comprising the J3/L3 double-target sequence with a 30-bp interspacer (SEQ ID NO:483 in Table 25) was incubated with varying concentrations of FokI-Cascade complex (3-100 nM) in a 15 ⁇ L reaction in 1 ⁇ Cascade Cleavage Buffer, with the plasmid DNA at a final concentration of 13.3 ng/ ⁇ L.
- Reactions were incubated for 30 minutes at 37° C., after which 3 ⁇ L of 6 ⁇ SDS loading dye was added. The loading dye was added to denature bound FokI-Cascade complexes.
- the reaction mixture components were resolved by 0.8% agarose gel electrophoresis. Gels were stained after electrophoresis with SYBRTM Safe DNA Gel Stain (Thermo Scientific, Wilmington, Del.).
- Streptococcus pyogenes Cas9 protein was programmed with a single-guide RNA (sgRNA) targeting a 20-bp portion of the Cascade J3 target sequence (sgRNA-J3; the spacer sequence is presented as SEQ ID NO:501).
- Cas9/sgRNA-J3 complexes were reconstituted by mixing Cas9 together with a 2-fold molar excess of sgRNA in 1 ⁇ CCE buffer (20 mM HEPES pH 7.4, 10 mM MgCl2, 150 mM KCl, 5% glycerol).
- FokI-Cascade complex reagents were also tested for their kinetics of target DNA cleavage.
- a plasmid substrate containing the J3/L3 double-target sequence with a 30-bp interspacer (SEQ ID NO:483) was incubated with 200 nM FokI-Cascade complex or 200 nM Cas9-sgRNA in a 15 ⁇ L reaction, with the plasmid DNA at a final concentration of 13.3 ng/ ⁇ L. Reactions were quenched at either 0, 7, 10, 15, 20, 25, or 30 minutes, and reaction components were resolved by agarose gel electrophoresis as described above.
- the FokI-Cascade complex exhibited similar but slightly slower rates of target DNA cleavage activity as Cas9/sgRNA-J3 complex, with the target plasmid quantitatively linearized by the 25 minute time-point for the FokI-Cascade complex and by the 20 minute time point for the Cas9/sgRNA-J3 complex.
- FokI-Cascade complex reagents were also tested for their non-specific DNA cleavage and/or nicking activity on the pACYC-Duet1 non-target plasmid substrate, versus specific DNA cleavage of a the J3/L3 double-target plasmid substrate.
- Table 27 contains the sequence of the pACYC-Duet1 non-target plasmid substrate used for this control (SEQ ID NO:502). Specifically, the dependence of non-specific and specific DNA target cleavage was investigated as a function of the monovalent salt concentration in the reaction buffer.
- Modified variants of the 1 ⁇ Cascade Cleavage Buffer (20 mM Tris-Cl, pH 7.5, 200 mM NaCl, 5 mM MgCl 2 , 1 mM TCEP, and 5% glycerol) were prepared, in which the NaCl concentration was dropped from 200 mM to either 150 mM, 100 mM or 50 mM, and the same cleavage reactions as described above were performed by incubating 200 nM FokI-Cascade complex with either 13.3 ng/ ⁇ L of the J3/L3 target plasmid or 13.3 ng/ ⁇ L of the pACYC-Duet1 non-target plasmid.
- non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions: ⁇ FokI-Cascade complex; +FokI-Cascade complex, 100 mM NaCl buffer+10 mM EDTA; +FokI-Cascade complex, 50 mM NaCl buffer; +FokI-Cascade complex, 100 mM NaCl buffer; +FokI-Cascade complex, 150 mM NaCl buffer; +FokI-Cascade complex, 200 mM NaCl buffer.
- the target plasmid was incubated with only FokI-Cascade 1 complex or only AfeI, and the same reactions were performed with a non-target plasmid that can be cleaved by AfeI but not by FokI-Cascade 1 complex (because the plasmid lacks the J3/L3 dual target).
- Table 27 contains the sequence of the pACYC-Duet1 non-target plasmid substrate used for this control (SEQ ID NO:502).
- non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions: ⁇ AfeI/ ⁇ FokI-Cascade complex; ⁇ AfeI/+FokI-Cascade complex; +AfeI/+FokI-Cascade complex; and +AfeI/ ⁇ FokI-Cascade complex.
- the data demonstrate that FokI-Cascade complex cleaved the target plasmid in the expected location, because co-incubation with FokI-Cascade 1 complex and AfeI lead to two linear products of the expected lengths.
- additional control plasmid substrates were generated that contain as follows: mutations in the PAM flanking the J3 target, mutations in the PAM flanking the L3 target, mutations in both PAMs flanking J3/L3 targets; mutations in the spacer sequence within the J3 target, mutations in the spacer sequence within the L3 target, mutations in both spacer sequences within J3/L3 targets; and the J3 target but not the L3 target, the L3 target but not the J3 target, and neither J3 nor L3 target.
- the plasmid substrates were as follows: J3 PAM mutant, L3 PAM mutant, J3/L3 PAM mutant, J3 spacer mutant, L3 spacer mutant, J3/L3 spacer mutant, non-target plasmid, J3-only target, L3-only target, and J3/L3 target plasmid.
- Each target was subjected to the following reaction conditions: ⁇ NdeI/ ⁇ FokI-Cascade complex; +NdeI/ ⁇ FokI-Cascade complex; and ⁇ NdeI/+FokI-Cascade 1 complex.
- Table 27 contains the sequences of all the mutated plasmid substrates described above (SEQ ID NO:502 through SEQ ID NO:510).
- DNA cleavage reactions were performed as described above, using 200 nM FokI-Cascade complex and 13.3 ng/ ⁇ L plasmid substrates; control reactions to linearize each plasmid substrate were performed with NdeI (New England Biolabs, Ipswich, Mass.). Agarose gel electrophoresis was performed as described above. The data demonstrate that efficient double-strand beak introduction and linearization of the target plasmid is only observed for the J3/L3 target plasmids, but not for control plasmids harboring PAM or seed mutations, or only one of the two target sites.
- Components for various FokI-Cascade complexes were cloned and overexpressed. RNPs produced by these components were purified and tested for biochemical DNA cleavage, in order to compare activity for different FokI-Cascade complexes.
- FokI-Cascade complexes comprising the following: separately purified CasBCDE complex (produced using SEQ ID NO:440 and SEQ ID NO:446) and FokI-Cas8 (produced using SEQ ID NO:439); FokI-Cascade harboring the J3/L3 guide crRNAs (produced using SEQ ID NO:442 and SEQ ID NO:446); FokI-Cascade harboring an additional nuclear localization signal on either the Cas7 subunit (produced using SEQ ID NO:443 and SEQ ID NO:446) or the Cas6 subunit; FokI-Cascade harboring an additional nuclear localization signal and HA tag on either the Cas7 subunit or the Cas6 subunit; FokI-Cascade that underwent a more stringent purification involving both size exclusion chromatography (SEC) and ion exchange chromatography (IEX); and FokI-Cascade that was
- non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions: negative control; AfeI; CasBCDE+FokI-Cas8 complex; FokI-Cascade complex; FokI-Cascade (NLS-Cas6) complex; FokI-Cascade (Cas7-NLS) complex; FokI-Cascade (NLS-HA-Cas6) complex; FokI-Cascade (Cas7-HA-NLS) complex; FokI-Cascade complex (IEX, SEC clean-up); and FokI-Cascade complex (no clean-up).
- each pair of crRNAs contains two unique spacer sequences that correspond to two adjacent target sites in human genomic DNA, separated by an interspacer; the target sequences are described in SEQ ID NO:485 through SEQ ID NO:500.
- Table 28 contains sequences of both crRNAs within each pair that targets Hsa01 through Hsa16 genomic DNA sequences; the spacer of the crRNA is underlined and in lower case, and the sequences 5′ and 3′ of the guide region correspond to repeat sequences from the CRISPR array.
- This Example illustrates the design and delivery of E. coli Type Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells and describes their delivery into target cells as pre-assembled Cascade RNP complexes.
- Minimal CRISPR arrays were designed to target eight distinct loci in the human genome. Each minimal CRISPR array contained two spacer sequences, both of which were flanked by CRISPR repeat sequences. The two spacer sequences targeted loci in the genome separated by 30 bp (i.e., a 30-bp interspacer region), and each spacer was designed to bind a target sequence adjacent to an AAG or ATG protospacer adjacent motif (PAM) sequence in the target cell genome.
- AAM ATG protospacer adjacent motif
- Plasmid vectors containing each minimal CRISPR array were produced by ligating annealed oligonucleotides (Integrated DNA Technologies, Coralville, Iowa) into a pACYC-Duet1 (Millipore Sigma, Billerica, Mass.) vector backbone for bacterial expression.
- Overlapping primers to produce selected spacers in minimal CRISPR arrays are set forth in Table 29, and the sequences of the primers are described in Table 30.
- Example 2 The design of bacterial expression vectors for production of Cascade RNP complexes is detailed in Example 2.
- each cas gene was expressed from a single operon, and the coding sequences for the cas genes were arranged in the order of cas8-cse2-cas7-cas5-cas6.
- the FokI moiety was attached by a 30-aa linker to Cas8, and a nuclear localization signal (NLS) was attached to the N-terminus of FokI-Cas8 (FokI-Cascade complex) and the N-terminus of Cas6 (hereafter referred to as FokI-Cascade-NLS-Cas6 complex, SEQ ID NO:577).
- NLS nuclear localization signal
- FokI-Cascade-NLS-Cas6 complexes were purified as assembled complexes from E. coli essentially as described in Example 5A.
- HEK293 cells ATCC, Manassas, Va.
- DMEM medium supplemented with 10% FBS and 1 ⁇ Antibiotic-Antimycotic Solution (Mediatech, Inc., Manassas, Va.) at 37° C., 5% CO 2 and 100% humidity.
- HEK293 cells were transfected using the Nucleofector® 96-well Shuttle System (Lonza, Allendale, N.J.). Prior to nucleofection, 5 ⁇ l of FokI-Cascade RNPs were transferred to individual wells of a 96-well plate. Each well contained ⁇ 225-500 pmol of FokI-Cascade-NLS-Cas6 complexes, depending on the RNP.
- HEK293 cells were transferred to a 50 ml conical centrifuge tube and centrifuged at 200 ⁇ G for 3 minutes. The media was aspirated and the cell pellet was washed in calcium and magnesium-free PBS. The cells were centrifuged once more and re-suspended in Nucleofector SF buffer (Lonza, Allendale, N.J.) at a concentration of 1 ⁇ 10 7 cells/ml. 20 ⁇ l of this cell suspension was added to the FokI-Cascade-NLS-Cas6 complexes in the 96-well plate, mixed, and then the entire volume was transferred to a 96-well NucleocuvetteTM Plate.
- Nucleofector SF buffer Longza, Allendale, N.J.
- the plate was then loaded into the NucleofectorTM 96-well ShuttleTM and cells were nucleofected using the 96-CM-130 NucleofectorTM program (Lonza, Allendale, N.J.). Immediately following nucleofection, 80 ⁇ l of complete DMEM medium was added to each well of the 96-well NucleocuvetteTM Plate. The entire contents of the well were then transferred to a 96-well tissue culture plate containing 100 ⁇ l of complete DMEM medium. The cells were cultured at 37° C., 5% CO 2 and 100% humidity for ⁇ 72 hours.
- the HEK293 cells were centrifuged at 500 ⁇ G for 5 minutes and the medium was removed. The cells were washed in calcium and magnesium-free PBS. The cell pellets were then re-suspended in 50 ⁇ l of QuickExtract DNA Extraction solutions (Epicentre, Madison, Wis.). The gDNA samples obtained were then incubated at 37° C. for 10 minutes, 65° C. for 6 minutes, and 95° C. for 3 minutes to stop the reaction. gDNA samples were then diluted with 50 ⁇ l of water and stored at ⁇ 20° C. for subsequent deep sequencing analysis.
- a first PCR was performed using Q5 Hot Start High-Fidelity 2 ⁇ Master Mix (New England Biolabs, Ipswich, Mass.) at 1 ⁇ concentration, primers at 0.5 ⁇ M each, 3.75 ⁇ L of gDNA in a final volume of 10 ⁇ L and amplified 98° C. for 1 minute, 35 cycles of 10 seconds at 98° C., 20 seconds at 60° C., 30 seconds at 72° C., and a final extension at 72° C. for 2 minutes.
- PCR reaction was diluted 1:100 in water.
- Target-specific primers are shown in Table 31.
- the target-specific primers contained Illumina-compatible sequences so that the amplification products could be analyzed using a MiSeq Sequencer (Illumina, San Diego).
- a second “barcoding” PCR was set up such that each target was amplified with primers (G2 and H2 in Table 30) that each contained unique 8 bp indices (denoted by “NNNNNNNN” in the primer sequence (see SEQ ID NO:575 and SEQ ID NO:576), thus allowing de-multiplexing of each amplicon during sequence analysis.
- the second PCR was performed using Q5 Hot Start High-Fidelity 2 ⁇ Master Mix (New England Biolabs, Ipswich, Mass.) at 1 ⁇ concentration, primers at 0.5 ⁇ M each, 1 ⁇ L of 1:100 diluted first PCR, in a final volume of 10 ⁇ L and amplified 98° C. for 1 minute, 12 cycles of 10 seconds at 98° C., 20 seconds at 60° C., 30 seconds at 72° C., and a final extension at 72° C. for 2 minutes. PCR reactions were pooled into a single microfuge tube for SPRIselect bead (Beckman Coulter, Pasadena, Calif.)-based cleanup of amplicons for sequencing.
- SPRIselect bead Beckman Coulter, Pasadena, Calif.
- the microfuge tube was spun in a microcentrifuge to collect the contents of the tube, and was then returned to the magnet, incubated until solution had cleared, and the supernatant containing the purified amplicons were dispensed into a clean microfuge tube.
- the purified amplicon library was quantified using the NanodropTM 2000 system (Thermo Scientific, Wilmington, Del.).
- the amplicon library was normalized to 4 nM concentration as calculated from optical absorbance at 260 nm (NanodropTM 2000 system; Thermo Scientific, Wilmington, Del.) and size of the amplicons. Library was analyzed on MiSeq Sequencer with MiSeq Reagent Kit v2, 300 cycles (Illumina, San Diego), with two 151-cycle paired-end run plus two eight-cycle index reads.
- Aligned reads were compared to wild-type loci; reads not aligning to any part of the loci were discarded.
- Reads matching wild-type sequence were tallied.
- Reads with indels (surrounding 10 bp from the FokI-Cascade RNP putative cut site) were categorized by indel type and tallied.
- This Example illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells.
- This Example also describes the delivery of plasmid vectors expressing Cascade complex components into eukaryotic cells.
- a minimal CRISPR array was designed to target the TRAC locus in the human genome.
- the minimal CRISPR array contained two spacer sequences, both of which were flanked by CRISPR repeat sequences, as described in Examples 1 and 3.
- the two spacer sequences targeted loci in the genome separated by 30 bp and each spacer was complementary to a genomic sequence adjacent to an AAG PAM sequence.
- the plasmid vector containing the minimal CRISPR array was produced by ligating annealed oligonucleotides (Integrated DNA Technologies, Coralville, Iowa) encoding a CRISPR repeat flanked by two spacer sequences into a mammalian expression vector with two CRISPR repeat sequences.
- the resulting plasmid contained a “repeat-spacer-repeat-spacer-repeat” dual guide expressed from the human U6 (hU6) promoter (SEQ ID No:454).
- FokI-Cascade RNP protein component-encoding genes were cloned into plasmid vectors containing CMV promoters to enable delivery and expression in mammalian cells.
- Cas genes were cloned into separate plasmids (SEQ ID NO:448 through SEQ ID NO:451 and SEQ ID NO:453) or in a single plasmid as a polycistronic construct with each gene linked via 2A viral peptide “ribosome-skipping” sequences (in SEQ ID NO:455).
- FokI-Cascade RNP complexes were delivered into eukaryotic cells via two different methods: cas genes and the minimal CRISPR array were supplied on separate plasmids (“six plasmid”-delivery system, SEQ ID NO:448 through SEQ ID NO:451, SEQ ID NO:453 and SEQ ID NO:454), or one plasmid encoding all cas genes as a polycistronic construct and a second plasmid encoding the minimal CRISPR array (“two plasmid”-delivery system, SEQ ID NO:454 and SEQ ID NO:455).
- Transfection conditions for the six plasmid-delivery system and two plasmid-delivery systems were performed as detailed in Example 8 with the following modifications.
- 5 ⁇ l of plasmid vector solution was transferred to individual wells of a 96-well plate.
- the six plasmid-delivery system was initially tested by examining the necessity of each component for genome editing. More specifically, plasmid “cocktails” were added to each well such that there was a constant amount (420 ng) of five plasmids and a variable amount of the sixth plasmid (either 0 ng, 70 ng, 700 ng, or 1,400 ng).
- the six plasmid delivery system and the two plasmid-delivery system were compared by nucleofecting in a fixed amount (3.5 ⁇ g) of total plasmid DNA while varying the ratio of minimal CRISPR array plasmid to cas-encoding plasmid(s). Finally, lysate was harvested ⁇ 72 hours after nucleofection for subsequent deep sequencing analysis.
- Deep sequencing was performed as detailed in Example 8, but only using target-specific primers Y and Z from Table 31.
- FIG. 31 shows data comparing genome editing with the six plasmid-delivery system or the two plasmid-delivery system. Across both methods, the highest levels of editing were achieved with the highest ratio of cas:minimal CRISPR array plasmids. Additionally, the polycistronic plasmid enabled higher levels of editing, potentially due to increased transcription perm of plasmid.
- This Example illustrates in silico design, cloning, expression, and purification of a circularly-permuted (cp) E. coli Type I-E Cas7 protein using a structure-guided modelling approach.
- E. coli Type I-E Cas7 protein (SEQ ID NO:18) was circularly permuted using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)).
- the native Cas7 N-terminus and C-terminus were connected with a two-amino acid peptide linker having the sequence glycine-serine (G-S).
- the polypeptide sequence of this circularized Cas7 was opened at the position corresponding to the peptide bond between residues 301 and 302 in wild-type Cas7 polypeptide sequence to form a new N-terminus (residue 302) and a new C-terminus (residue 301), resulting in a circular permuted version of Cas7 protein (cp-Cas7 V1 protein).
- the new N-terminus and new C-terminus were designed to be positioned for connection with a fusion protein or linker region without disturbing the Cas7 protein fold or the Cascade complex assembly.
- a methionine residue was added to the new N-terminus the amino acid residue corresponding to residue 302 of the wild-type Cas7 protein) of the cp-Cas7 V1 protein (SEQ ID NO:578).
- a second cp-Cas7 protein, cp-Cas7 V2 protein was similarly engineered using the G-S linker.
- the N-terminus and C-terminus of the cp-Cas7 V2 protein correspond to residues 338 and 339, respectively, in the wild-type Cas7 sequence.
- the new N-terminus and new C-terminus were designed to be positioned for connection with a fusion protein or linker region without disturbing the Cas7 protein fold or the Cascade complex assembly.
- a methionine residue was added to the N-terminus (i.e., the amino acid residue corresponding to residue 339 of the wild-type Cas7 protein) of the cp-Cas7 V2 protein (SEQ ID NO:579).
- DNA coding sequences of the in silico designed polypeptide sequences of cp-Cas7 V1 protein and cp-Cas7 V2 protein were codon optimized for expression in E. coli.
- DNA coding sequences were provided to a commercial manufacturer (GenScript, Piscataway, N.J.) for synthesis.
- the DNA sequences were individually introduced into a Cascade-operon expression vector (Table 14; SEQ ID NO:441) to replace the wild-type Cas7 protein in the expression vector as described in Example 2.
- Each expression vector was transfected into E. coli BL21 StarTM cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444) Table 15, as described in Example 2.
- Cells were cultured as described in Example 4.
- E. coli Type I-E Cascade complexes containing Cas5, Cas6, cp-Cas7 V1, Cse2, and Cas8 proteins, as well as guide RNA/target J3; and Cas5, Cas6, cp-Cas7 V2, Cse2, and Cas8 proteins as well as guide RNA/target J3, were purified as described in Example 5.
- Cascade/cp-Cas7 complexes were purified as described in this Example and subjected to an EMSA to demonstrate specific binding to their respective target sequence. Briefly, Cascade/cp-Cas7 and Cascade/WT-Cas7 were purified and concentrated to 10 mg/mL. Cy5 double-stranded target DNA was produced as described in Example 6 and diluted to 1 ⁇ M in TE buffer (J3 target SEQ ID NO:469 and SEQ ID NO:472 and CCR5 target SEQ ID NO:474 and SEQ ID NO:470). Cascade complexes and labeled double-stranded target DNA were incubated for 30 min at 37° C. at different protein/target ratios.
- This Example illustrates in silico design, cloning, expression, and purification of a E. coli Type I-E Cas8 protein fused to a FokI nuclease domain to confer nuclease activity to the Cascade complex.
- E. coli Type I-E Cas8 was fused N-terminally with a Flavobacterium okeanokoites FokI nuclease domain (GenBank no. AAA24927.1).
- the FokI nuclease domain comprises residues contained in the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)), and catalyzes double-stranded DNA cleavage upon homo-dimerization.
- the amino acid sequence for the FokI nuclease (SEQ ID NO:580) contained residues Q384 to F579 (GenBank no.
- the E. coli Type I-E Cascade H3-His6-MBP-TEV-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH (SEQ ID NO:439) was expressed and purified as described in Example 4 and Example 5C.
- the protein sequence after TEV cleavage comprises NH3-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH (SEQ ID NO:587).
- a FokI-Cas8 fusion protein was constructed in a vector that carries NLS-FokI-linker-Cas8_His6-HRV3C-Cse2_Cas7_Cas5_Cas6 as described in Examples 1 and 2 (SEQ ID NO:442).
- Each expression vector was transfected into E. coli BL21 StarTM cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2. This construct was expressed and purified as described in Example 4B and Example 5A.
- APOBEC corresponds to a gene that is member of the cytidine deaminase pathway (human APOBEC I Genbank no. AB009426, human APOBEC 3F Genbank no. CH471095, human APOBEC 3G Genbank no. CR456472, rat APOBEC UCSC genome browser ID RGD:2133 rat); AID corresponds to an activation-induced cytidine deaminase (Genbank no.
- PmCDA1 is an AID ortholog (Nishida, et al., Science 16:353 (2016); Iwamatsu, et al., J Biochem 110:151-158 (1991)); PvuIIHIFIT46G is a PvuII high fidelity variant T46G (Fonfara, et al., Nucleic Acids Res, 40:847-860 (2012)); PvuIIsinglechainT46G is described in pdbID 3KSK); I-TevI is a site-specific, sequence-tolerant homing endonuclease from bacteriophage T4 and comprises an N-terminal catalytic domain as well as a C-terminal DNA-binding domain (the domains are connected by a long, flexible linker) (Van Roey, et al., EMBO J, 20:3631-3637 (2001)); BcnI (Sokolowska, et al., J Mol Bio
- the two Cse2 proteins of the Cascade complex were fused together using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)). Briefly, the C-terminus of one Cse2 and the N-terminus of a second Cse2 were fused together using a 10-aa flexible linker (SEQ ID NO:589). The full sequence of the Cse2-Cse2 (CasB_CasB) fusion protein is shown in SEQ ID NO:588.
- Each expression vector was transfected into E. coli BL21 Star′ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2.
- the E. coli Type I-E Cascade complex containing Cas5, Cas6, Cas7, Cse2-Cse2, and Cas8 was expressed and purified as described in Example 4B and 5B.
- Electrophoretic Mobility Shift Assays (EMSA) of Cascade/Cse2-Cse2 and J3 Target
- Cascade/Cse2-Cse2 complexes were purified as described in this Example and subjected to an EMSA to demonstrate specific binding to their respective target sequence. Briefly, Cascade/Cse2-Cse2 and Cascade/WT-Cse2 were purified and concentrated to 10 mg/mL. Cy5 double-stranded target DNA was produced as described in Example 6 and diluted to 1M in TE buffer (J3 target SEQ ID NO:469 and SEQ ID NO:472 and CCR5 target SEQ ID NO:474 and SEQ ID NO:470). Cascade complexes and labeled double-stranded target DNA were incubated for 30 min at 37° C. at different protein/target ratios.
- the cytidine deaminase rAPOBEC1 (apolipoprotein B mRNA editing enzyme catalytic subunit 1, Rattus norvegicus ; NCBI Gene ID: 25383, uEnsembl:ENSRNOG00000015411) was selected for fusion.
- the Cse2-Cse2 protein was fused with rAPOBEC1 using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)).
- This Example illustrates the design of a E. coli Type I-E cp-Cas7 protein fused to a VP64 activation domain to confer transcriptional activation activity to the Cascade complex.
- VP64 is a transcriptional activator comprising four tandem copies of VP16 (herpes simplex viral protein 16, DALDDFDLDML (SEQ ID NO:614); amino acids 437-447, UNIPROT:UL48) connected with glycine-serine (GS) linkers.
- VP64 SEQ ID No:615
- GS glycine-serine linkers.
- E. coli Type I-E cp-Cas7 V2 (SEQ ID NO:616) can be selected for engineering.
- the activation domain VP64 can be fused to the N-terminus of cpCas7 V2 (described in Example 10).
- a linker e.g., 5 to 50 amino acids in length
- DNA sequences can be provided to a commercial manufacturer for synthesis.
- the DNA sequences encoding a VP64-cpCas7 V2 fusion protein can be cloned into an expression vector (e.g., SEQ ID NO:455, wherein VP64-cpCas7 V2 can be substituted for Cas7).
- Each expression vector can be transfected into E. coli BL21 Star′ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2.
- SEQ ID NO:444 guide RNA for the J3 target
- Type I-E Cascade complex containing Cas5, Cas6, VP64 cpCas7 V2, Cse2, and Cas8 can be expressed and purified as described in Examples 4 and 5. Purification of the Cascade complexes comprising the fused VP64_cpCas7 V2 variant can be used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- Selection of a guide targeted to the promoter region of a particular gene can be used to verify the ability of the Cascade complex comprising the fused VP64 cpCas7 V2 to facilitate transcriptional activation of the gene.
- This Example describes a method of modifying a Class 2 Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA sequence with a Class 1 Type I CRISPR repeat stem sequence (e.g., a Type I-F CRISPR repeat stem sequence) for the recruitment of one or more Cascade subunit proteins (i.e., Cas6, Cas5, etc.) fused to a functional domain, to a Type II CRISPR Cas protein/guide RNA complex binding site.
- This method here is adapted from Gilbert, L et. al., Cell 154(2):442-451 (2013) and Ferry, Q et. al., Nature Communication 8, 14633 doi: 10.1038/ncomms14633 (2017).
- a Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA can be selected for engineering.
- a Type II guide RNA sequence can be evaluated in silico for regions of incorporation of a Type I CRISPR repeat stem sequence.
- the Type I CRISPR repeat stem sequence can be attached at the 5′ or 3′ end of the Type II guide RNA, internal to the Type II guide RNA, or can replace secondary structure in the Type II guide RNA (e.g., 3′ hairpin elements).
- Incorporation of the Type I CRISPR repeat stem sequence can be accompanied by a linker element nucleotide sequence.
- An example of a Type II tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence is presented in Table 36.
- Type II tracrRNA with 3′ Type I CRISPR Repeat Stem Sequence SEQ ID NO: Sequence* SEQ ID NO: 5′-AGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCA 617 ACUUGAAAAAGUGGCACCGAGUCGGUGCUUAAGUUCA cu gccguauaggcag CUUU-3′ *Type I CRISPR repeat stem sequence is underlined and in lower case letters.
- a corresponding DNA coding sequence is presented as SEQ ID NO: 618.
- a mammalian gene such as C-X-C chemokine receptor type 4 (CXCR4), can be selected for targeting.
- CXCR4 C-X-C chemokine receptor type 4
- the junction between the 5′ UTR and exon 1 can be scanned in silico for a Type II CRISPR Cas protein target sequence occurring adjacent a Type II CRISPR Cas protein PAM sequence (e.g., 5′-NGG).
- the 20-nucleotide target sequence occurring upstream, in a 5′ direction can be incorporated into the Type II crRNA.
- An example of a Type II crRNA targeting CXCR4 is shown in Table 37.
- RNA CXCR4 targeting spacer
- SEQ ID NO:619 can be covalently linked to the 5′ end of the Type II tracrRNA with 3′ Type I CRISPR repeat stem sequence (RNA) (SEQ ID NO:617) with a linker.
- a suitable linker element is 5′-GAAA-3′.
- Type II guide RNAs with the incorporated Type I CRISPR repeat stem sequence can be provided to a commercial manufacturer for synthesis.
- a Type I Cascade subunit protein (e.g., Cas6) can be operably linked to a transcriptional activation or repression domain (e.g., KRAB) and c-terminally tagged with a nuclear localization signal (NLS) as described in Example 12.
- a transcriptional activation or repression domain e.g., KRAB
- NLS nuclear localization signal
- a Type II Cas protein (e.g., Cas9) can be mutated such that it is catalytically inactive (e.g. dCas9) and tagged with a NLS sequence.
- the Cas6-KRAB-NLS protein and the dCas9-NLS protein can be recombinantly expressed and purified from E. coli.
- Ribonucleoprotein complexes can be formed at a concentration of 60 pmol dCas9 protein:60 pmol Cas6-KRAB-NLS:120 pmol:CXCR4 targeting crRNA:120 pmol tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence.
- each of the 120 pmol CXCR4 targeting crRNA and 120 pmol tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence (herein referred to as “modified Type II guide RNA”) can be diluted to the desired total concentration (120 pmol) in a final volume of 2 ⁇ L, incubated for 2 minutes at 95° C., removed from a thermocycler, and allowed to equilibrate to room temperature.
- dCas9 and the Cas6-KRAB-NLS protein can be diluted to an appropriate concentration in binding buffer (20 mM HEPES, 100 mM KCl, 5 mM MgCl 2 , and 5% glycerol at pH 7.4) to a final volume of 3 ⁇ L and mixed with the 2 ⁇ L of Type II guide RNA, followed by incubation at 37° C. for 30 minutes.
- a nontransfected control e.g., buffer only
- unmodified Type II guide RNA, or a Cas6 not linked to a repression domain can be used to assemble negative control RNPs.
- dCas9:Cas6-KRAB-NLS modified Type II guide RNA nucleoprotein complexes can be transfected into HEK293 cells (ATCC, Manassas Va.), using the Nucleofector® 96-well Shuttle System (Lonza, Allendale, N.J.) and the following protocol: The complexes can be dispensed in a 5 ⁇ L final volume into individual wells of a 96-well plate. The cell culture medium can be removed from the HEK293 cell culture plate and the cells detached with TrypLETM (Thermo Scientific, Wilmington, Del.).
- Suspended HEK293 cells can be pelleted by centrifugation for 3 minutes at 200 ⁇ g, TrypLE reagents aspirated, and cells washed with calcium and magnesium-free phosphate buffered saline (PBS). Cells can be pelleted by centrifugation for 3 minutes at 200 ⁇ g, the PBS aspirated, and the cell pellet re-suspended in 10 mL of calcium and magnesium-free PBS.
- PBS calcium and magnesium-free phosphate buffered saline
- the cells can be counted using the Countess® II Automated Cell Counter (Life Technologies; Grand Island, N.Y.). 2.2 ⁇ 10 7 cells can be transferred to a 1.5 ml microfuge tube and pelleted.
- the PBS can be aspirated and the cells re-suspended in NucleofectorTM SF (Lonza, Allendale, N.J.) solution to a density of 1 ⁇ 10 7 cells/m.
- 20 ⁇ L of the cell suspension can be then added to each individual well containing 5 ⁇ L of ribonucleoprotein complexes, and the entire volume from each well can be transferred to a well of a 96-well NucleocuvetteTM Plate (Lonza, Allendale, N.J.).
- the plate can be loaded onto the NucleofectorTM 96-well ShuttleTM (Lonza, Allendale, N.J.) and cells nucleofected using the 96-CM-130 NucleofectorTM program (Lonza, Allendale, N.J.).
- DMEM Dulbecco's Modified Eagle Medium
- FBS Fetal Bovine Serum
- penicillin and streptomycin Life Technologies, Grand Island, N.Y.
- 50 ⁇ L of the cell suspension can be transferred to a 96-well cell culture plate containing 150 ⁇ L pre-warmed DMEM complete culture medium.
- the plate can be transferred to a tissue culture incubator and maintained at 37° C. in 5% CO 2 for 48 hours.
- cells can be evaluated for repression of CXCR4 expression.
- Culture medium can be aspirated from the HEK293, and the cells can be washed once with calcium and magnesium-free PBS then are trypsinized by the addition of TrypLE (Life Technologies, Grand Island, N.Y.) followed by incubation at 37° C. for 3-5 minutes. Trypsinized cells can be gently pipetted up and down to form a single cell suspension, and the cells can then be pelleted by centrifugation for 3 minutes at 200 ⁇ g.
- TrypLE Life Technologies, Grand Island, N.Y.
- the culture medium can be aspirated and cells are re-suspended in a 10 mM EDTA/PBS buffer and gently mixed into a singles cell suspension.
- the single-cell suspension can be stained using 0.05% FITC conjugated to an anti-human CXCR4 antibodies (Medical & Biological Laboratories Co., Japan) in PBS containing 10% FBS for 1 hour at room temperature. Isotype controls and native RNP controls can be similarly stained for reference. Stained cells can then be sorted LSR II flow cytometer (BD laboratories, San Jose Calif.) and population of FITC positive fluorescent cells tallied.
- Reduction in CXCR4 expression is measure by a decrease in detected fluorescence of a dCas9:Cas6-KRAB-NLS: modified Type II guide RNA nucleofected sample compared to the measured fluorescence of a non-transfected control.
- Decrease in fluorescence from the flow cytometer can be used to demonstrate that a modified Type II guide RNA with a Type I CRISPR repeat stem sequence can be used in combination with a nuclease-deficient Type II Cas9 protein to recruit and localize a Type I CRISPR Cascade subunit protein fused to repression domain to a gene target and repress transcription of said gene target.
- This Example describes a method to identify and screen Type I cas genes from different species. The method presented here is adapted from Shmakov, S., et al., Molecular Cell 60(3):385-397 (2015).
- BLAST Basic Local Alignment Search Tool
- blast.ncbi.nlm.nih.gov/Blast.cgi a search of the genomes of various species can be conducted to identify one or more genes coding for the various gene component of the Type I CRISPR-Cas complex.
- the cas1 integrase gene is a component of both Class 1 and Class 2 CRISPR-Cas families, and upon identification of species containing the cas1 gene, subsequence searcher in these genomes can be conducted to isolate genomes comprising Type I-specific genes.
- Genome searches can be anchored upon the CRISPR-Cas integrase genes cas1, an exemplary cas1 sequence from the Type I-E system from E.
- coli K-12 MG1655 that can be used is SEQ ID. NO:621.
- Particular genes e.g., cas7 and cas5 are core components of the interference complexes of the Type I systems and can be used to further differentiate species containing Type I systems.
- Exemplary sequences of E. coli K-12 MG1655 cas7 and cas5 genes that can be used are SEQ ID. NO:622 and SEQ ID. NO:623, respectively.
- Genomes identified possessing cas7 and cas5 genes can be further parsed through the identification of the Type I-specific nuclease-helicase cas3 gene or homologs thereof.
- An exemplary sequences of a E. coli K-12 MG1655 cas3 sequence that can be used is SEQ ID. NO:624.
- Genomes containing CRISPR-Cas integrase genes cas1, Type I interference complex genes cas7 and cas5, and the nuclease-helicase cas3 gene, or some combination thereof, are likely candidates of Type I CRISPR-Cas system(s).
- Type I CRISPR-Cas genes are generally found in proximity to one in a single genomic locus, typically within 20 kilobases (kb). The area around the cas1, cas7, cas5, or cas3 genes can be searched for other open reading frames (ORFs) of the remaining cas genes that constitute a Type I interference complex.
- the amino acid sequence of putative ORFs can be compared to known Type I genes for homology or the presence of characteristic protein domains of the Type I protein components can be analyzed using the homology detection and structure prediction search tools available through the Max Planck Institute Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de/#/), or equivalent.
- Type I components e.g., cas genes and the corresponding crRNA
- the Type I components can be tested for their ability to carry out programmable DNA targeting.
- Putative cas genes and the crRNA can be encoded into expression vectors following the guidance of Examples 1, 2, and 3.
- Vectors encoding the various cas genes and crRNA can be introduced into a bacteria strain and the Type I interference complex expressed and purified as described in Examples 4 and 5.
- the elution fraction from the size-exclusion chromatography (SEC) column can be analyzed via SDS-PAGE gel to determine the identity, based on weight, of the protein components comprising a complete Type I interference complex.
- An ethidium bromide gel can also be run to detect the presence of crRNA as part of the interference complex.
- Purified Cascade complexes can be tested for their ability to support in vitro biochemical cleavage of a DNA target as described in Examples 6 and 7.
- Control expressions and purification samples, where single putative cas gene are not expressed, can be used to determine the required cas genes that constitute a complete Type I interference complex capable of programmable DNA target.
- cas7 cas7
- This Example describes a method to identify Type I crRNAs in different species. The method presented here is adapted from Chylinski, K., et al., RNA Biology 10(5):726-737 (2013).
- a search of genomes of various species can be conducted to identify Type I CRISPR-Cas genes as described in Example 17A.
- Genomes that comprise one of more Type I specific cas genes are candidate genomes that likely to contain CRISPR RNAs (crRNAs) encoded within the CRISPR repeat-spacer array.
- the sequences adjacent to the identified Type I cas genes e.g., a cas7, cas5, or cas3 gene
- Methods for in silico predictive screening can be used to extract the crRNA sequence from the repeat array following Grissa, I. V., et. al. Nucleic Acids Research 35(Web Server issue):W52-W57 (2007).
- the crRNA sequence is contained within CRISPR repeat array and can be identified by its hallmark repeating sequences interspaced by foreign spacer sequences.
- the putative CRISPR array containing the individual crRNA identified in silico can be further validated using RNA sequencing (RNA-seq).
- Cells from species identified as comprising putative Type I cas genes and crRNA components can be procured from a commercial repository (e.g., ATCC, Manassas, Va.; German Collection of Microorganisms and Cell Cultures GmbH (DSMZ), Braunschweig, Germany).
- ATCC Manassas, Va.
- DSMZ German Collection of Microorganisms and Cell Cultures GmbH
- Cells can be grown to mid-log phase and total RNA prepped using Trizol reagent (SigmaAldrich, St. Louis, Mo.) and treated with DNaseI (Fermentas, Vilnius, Lithuania).
- RNA Clean and Concentrators 10 ⁇ g of the total RNA can be treated with Ribo-Zero rRNA Removal Kit (Illumina, San Diego, Calif.) and the remaining RNA purified using RNA Clean and Concentrators (Zymo Research, Irvine, Calif.).
- a library can be prepared using a TRUSEQTM Small RNA Library Preparation Kit (Illumina, San Diego, Calif.), following the manufacturer's instructions. This will result in cDNAs having adapter sequences.
- the resulting cDNA library can be sequenced using MiSeq Sequencer (Illumina, San Diego, Calif.).
- Sequencing reads of the cDNA library can be processed, for example, using the following method.
- Adapter sequences can be removed using cutadapt 1.1 (pypi.python.org/pypi/cutadapt/1.1) and about 15 nucleotides trimmed from the 3′ end of the read to improve read quality.
- Reads can be aligned to the genome of the respective species (i.e., from which the putative crRNA is to be identified) using Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml).
- the Sequence Alignment/Map (SAM) file which is generated by Bowtie 2, can be converted into a Binary Alignment/Map (BAM) file using SAMTools (http://samtools.sourceforge.net/) for subsequent sequencing analysis steps.
- Read coverage mapping to the CRISPR locus or loci can be calculated from the BAM file using BedTools (bedtools.readthedocs.org/en/latest/).
- the BED file as generated in the previous step, can be loaded into Integrative Genomics Viewer (IGV; www.broadinstitute.org/igv/) to visualize the sequencing read pileup.
- Read pile can be used to identify the 5′ and 3′ termini of the transcribed putative crRNA sequence.
- the RNA-seq data can be used to validate that a putative crRNA element is actively transcribed in vivo.
- Putative crRNA can be tested with their cognate Type I cas genes for the ability to carry out programmable DNA targeting, following the guidance of Example 17. A of the present Specification.
- This Example describes the generation and testing of various modifications of Type I guide crRNAs and their suitability for use in constructing Cascade polynucleotide complexes.
- the method described below is adapted from Briner, A., et al., Molecular Cell 56(2):333-339 (2014).
- Modifications can be introduced into the crRNA backbone, and the modified crRNA tested with a cognate Cascade complex to facilitate the identification of regions or positions in the Type I guide crRNA backbone amenable to modification.
- a crRNA from a Type I CRISPR system can be selected for engineering.
- the crRNA sequence can be modified in silico to introduce one or more base substitutions, deletions, or insertions into nucleic acid sequences in regions selected from one or more of the following regions: nucleic acid sequences 5′ of the spacer (5′ handle), the spacer element, Type I CRISPR repeat stem sequence, or 3′ of the Type I CRISPR repeat stem sequence (3′ handle).
- Base modification can also be used to introduce mismatches in the hydrogen base-pair interactions of any of the crRNA regions, or base-pair mutation introducing an alternative hydrogen base-pair interaction through substitution of two bases, wherein the alternative hydrogen base-pair interaction differs from the original hydrogen base-pair interaction (e.g., the original hydrogen base-pair interaction is Watson-Crick base pairing and the substitution of the two bases form a reverse Hoogsteen base pairing).
- Substitution of bases can also be used to introduce hydrogen base-pair interaction within the crRNA backbone.
- Regions of the crRNA can be independently engineered to introduce secondary structure elements into the crRNA backbone.
- secondary structure elements include, but are not limited to, the following: stem-loop elements, stem elements, pseudo-knots, and ribozymes.
- the crRNA guide RNA backbone can be modified to delete portions of the crRNA backbone either through deletion at the 5′ end, 3′ end, or internal to the crRNA.
- Alternative backbone structures can also be introduced.
- Modified crRNAs can be evaluated for their ability to support binding by individual Cascade subunit proteins (i.e., Cas6, Cas5, etc.), or to support complete formation of the Cascade protein complex, or to support formation of the Cascade complex and modification of a double-stranded DNA target sequence through recruitment of a nuclease (e.g., Cas3).
- crRNA binding to individual Cascade subunit proteins and Cascade protein complex assembly can be evaluated by nano-ESI mass spectrometry in a manner similar to Jore, M., et al., Nature Structural & Molecular Biology 18:529-536 (2011).
- Biochemical characterization of crRNA and Cascade protein complex modification of a double-stranded DNA target sequence through recruitment of a nuclease can be carried out in a manner similar to those described in Examples 6 and 7.
- Modified crRNA that are capable of supporting formation of the Cascade complex and modification of a double-stranded DNA target sequence through recruitment of a nuclease can be validated for activity in cells using the method described in Example 8.
- This Example illustrates the use of Type I CRISPR proteins and Type I guide crRNAs of the present invention to modify DNA target sequences present in human genomic DNA (gDNA) and to measure the level of cleavage activity at those sites.
- Target sites can be first selected from genomic DNA.
- Type I guide crRNAs can be designed to target the selected sequences.
- Assays e.g., as described in Example 7 can be performed to determine the level of DNA target sequence cleavage.
- PAM sequences e.g., ATG
- Cascade protein complex e.g., E. coli Type I-E Cascade
- One or more Cascade DNA target sequences (e.g., 32 nucleotides in length) that are 3′ adjacent to a ATG PAM sequence can be identified.
- Criteria for selection of nucleic acid target sequences can include, but are not limited to, the following: homology to other regions in the genome; percent G-C content; melting temperature; presences of homopolymer within the spacer; distance between the two sequences; and other criteria known to one skilled in the art.
- a DNA target binding sequence that hybridizes to the Cascade DNA target sequence can be incorporated into a guide crRNAs.
- the nucleic acid sequence of a guide crRNA construct is typically provided to and synthesized by a commercial manufacturer.
- a guide crRNA as described herein, can be used with cognate Type I Cascade protein complex to form crRNA/Cascade protein complexes.
- In vitro cleavage percentages and specificity (i.e., the amount of off-target binding) related to a guide crRNA can be determined, for example, using the cleavage assays described in Example 7, and compared as follows:
- cleavage percentage and specificity for each of the DNA target sequences can be determined. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, or introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- the percentage cleavage data and site-specificity data obtained from the cleavage assays can be compared between different DNAs comprising the target binding sequence to identify the DNA target sequences having the desired cleavage percentage and specificity.
- Cleavage percentage data and specificity data provide criteria on which to base choices for a variety of applications. For example, in some situations the activity of the guide crRNA may be the most important factor. In other situations, the specificity of the cleavage site may be relatively more important than the cleavage percentage.
- cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- in-cell cleavage percentages and specificities of guide crRNAs can be obtained using, for example, the method described in Example 8, and compared as follows:
- cleavage percentage and specificity for each of the DNA target sequences can be determined. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, or introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- the percentage cleavage data and site-specificity data obtained from the cleavage assays can be compared between different DNAs comprising the target binding sequence to identify the DNA target sequences having the desired cleavage percentage and specificity.
- Cleavage percentage data and specificity data provide criteria on which to base choices for a variety of applications. For example, in some situations the activity of the guide crRNA may be the most important factor. In other situations, the specificity of the cleavage site may be relatively more important than the cleavage percentage.
- cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- This Example illustrates the design and testing of multiple fusion proteins comprising FokI-Cas8 and linker polypeptides of various lengths, as well as the effect of varying interspacer distances for efficient genome editing.
- Minimal CRISPR arrays were designed to target a set of loci in the human genome at or near two different genes: ADAMTSL1 and PCSK9. Interspacer distances ranged from 14-60 bp, in increments of 2 bp. Four targets were designed for each interspacer distance. Targets were flanked by either AAG or ATG PAM sequences. Dual guides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned as described in Example 9 with SEQ ID NO:454. SEQ ID NO:625 through SEQ ID NO:816 provide the sequences for the full set of oligonucleotide sequences used to generate the minimal CRISPR arrays.
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker (SEQ ID NO:455 from Example 3). Additional linker polypeptide sequences of varying length and amino acid composition were designed and used to connect FokI to the Cas8 protein in these vectors. The additional linker polypeptide sequences are listed in Table 38.
- Transfection conditions were essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 ⁇ l of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 2.4 ⁇ g of plasmid encoding FokI-Cascade RNP complex subunit protein components and ⁇ 1-2 ⁇ g of plasmid encoding the minimal CRISPR array.
- Deep sequencing was performed essentially as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers were SEQ ID NO:825 to SEQ ID NO:1016.
- FIG. 32A and FIG. 32B present the results of the data analysis.
- An initial analysis of the data showed genome editing was highest with FokI-Cas8 linkers of 17 and 20 amino acids (SEQ ID NO:821 and SEQ ID NO:822, respectively) and with interspacer distances of ⁇ 26 bp and ⁇ 30-32 bp.
- This Example illustrates the design and testing of multiple homolog Cascade complexes to evaluate the efficiency of genome editing.
- a panel of sites was identified for testing additional homolog Cascade complexes.
- minimal CRISPR arrays were designed to target a set of loci in the human genome with 30 bp interspacer distances and that were flanked by either AAG or ATG PAM sequences.
- Dual-guide polynucleotides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned following the method described in Example 9 with SEQ ID NO:454.
- oligonucleotide sequences used to generate the minimal CRISPR arrays are presented as SEQ ID NO:1017 to SEQ ID NO:1130 (Hsa33F, SEQ ID NO:1017, and Hsa33R, SEQ ID NO:1074, exemplify one pair).
- a positive control dual-guide targeting the TRAC locus was included (SEQ ID NO:454).
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker (SEQ ID NO:455 from Example 3).
- Transfection conditions were performed essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 ⁇ l of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 3 ⁇ g of plasmid encoding FokI-Cascade RNP subunit protein components and 0.3 ⁇ g of plasmid encoding the minimal CRISPR array.
- Deep sequencing was performed essentially as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers used in this Example were SEQ ID NO:1131 to SEQ ID NO:1244.
- FIG. 33 present the results of the data analysis.
- editing ranged from ⁇ 6% to below the limit of detection.
- Cas8 protein sequences from different Type I systems were used as queries for psi-BLASTp to generate phylogenetic trees for homolog selection. Specifically, Cas8 from Fusobacterium nucleatum (WP 008798978.1) was used for Type I-B, Cas8 from Bacillus halodurans (WP 010896519.1) was used for Type I-C, Cas8 from E.
- psi-BLASTp was iterated multiple times until thousands of homologs were identified for each Type I system. From this information, phylogenetic trees were built using the interactive Tree of Life online software (iTOL, accessible at itol.embl.de/login.cgi). The trees were visually inspected after auto-collapsing clades using variable branch lengths.
- Cascade homologs were selected only if (1) they were found in organisms that grow at 37° C.; (2) their cas gene operons were intact and had all the expected Cascade subunit protein encoding genes, a cas3 gene, and intact acquisition genes (i.e., cas and cast); (3) their cas gene operon was flanked by one or more CRISPR arrays; and (4) their CRISPR arrays contained >10 spacers.
- the CRISPRfinder program crispr.i2bc.paris-saclay.fr/Server/ was used to identify putative PAM sequences. Based on the above criteria, the 22 homolog Cascade complexes shown in Table 39 were selected.
- AAG 32 I-E NO: 1250 SEQ ID Methanocella arvoryzae MRE50 AAG 32 I-E NO: 1251 SEQ ID Pseudomonas aeruginosa DHS01 AAG 32 I-E NO: 1252 SEQ ID Lachnospiraceae bacterium KH1T2 GAA 35 I-E NO: 1253 SEQ ID Klebsiella pneumoniae strain GAA 33 I-E NO: 1254 VRCO0172 SEQ ID Streptococcus thermophilus strain GAA 33 I-E NO: 1255 ND07 SEQ ID Streptomyces sp. S4 GAA 33 I-E NO: 1256 SEQ ID Campylobacter fetus subsp.
- Sequences for each cas gene from each homolog were synthesized as part of a polycistronic construct that included a fusion protein comprising FokI nuclease and Cas8.
- a set of ⁇ 7-8 guides targeting loci with the appropriate PAM sequences were generated.
- a set of ⁇ 2-7 guides targeting loci with appropriate PAM sequences were generated.
- Each Cascade complex homolog system required unique repeat sequences to process their cognate guide (SEQ ID NO:1267 to SEQ ID NO:1288).
- Dual guides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned using the method described in Example 9 for SEQ ID NO:454. Oligonucleotides were phosphorylated on the 5′ end and appended with overhang sequences to enable cloning into plasmid vectors with the appropriate repeat sequences. The full set of oligonucleotide sequences used to generate the minimal CRISPR arrays for the 22 Cascade complex homologs are presented as (SEQ ID NO:1289 to SEQ ID NO:1400).
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker.
- Transfection conditions were essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 ⁇ l of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 1.5 ⁇ g of plasmid encoding FokI-Cascade RNP subunit protein components and ⁇ 0.5-1.5 ⁇ g of plasmid encoding the minimal CRISPR array. Experiments were performed in triplicate and included FokI-Cascade RNP complexes from E.
- coli (SEQ ID NO:455) targeted to eight sites (Hsa07 from Example 8 and Hsa37, Hsa43, Hsa46, Hsa60, Hsa77, Hsa88, Hsa126 from section D of this Example) as positive controls.
- the following oligonucleotides were used to generate the minimal CRISPR arrays used with the E.
- Hsa37 (SEQ ID NO:1019; SEQ ID NO:1076), Hsa43 (SEQ ID NO:1024; SEQ ID NO:1081), Hsa46 (SEQ ID NO:1027; SEQ ID NO:1084), Hsa60 (SEQ ID NO:1037; SEQ ID NO:1094), Hsa77 (SEQ ID NO:1045; SEQ ID NO:1102), Hsa88 (SEQ ID NO:1050; SEQ ID NO:1107), Hsa126(SEQ ID NO:1072; SEQ ID NO:1129).
- Deep sequencing was performed essentially as described in Example 8 with the following modifications.
- the target-specific primers used in this Example were SEQ ID NO:1401 to SEQ ID NO:1512.
- control samples comprising E. coli Type I-E Cascade were included for comparison and sequenced with target-specific primers corresponding to targets Hsa07 from Example 8 and Hsa37, Hsa43, Hsa46, Hsa60, Hsa77, Hsa88, Hsa126 from this Example.
- target-specific amplification primers were used for these targets: Hsa37 (SEQ ID NO:1133; SEQ ID NO:1190), Hsa43 (SEQ ID NO:1138; SEQ ID NO:1195), Hsa46 (SEQ ID NO:1141; SEQ ID NO:1198), Hsa60 (SEQ ID NO:1151; SEQ ID NO:1208), Hsa77 (SEQ ID NO:1159; SEQ ID NO:1216), Hsa88 (SEQ ID NO:1164; SEQ ID NO:1221), Hsa126(SEQ ID NO:1186; SEQ ID NO:1243).
- FIG. 34A and FIG. 34B show results from these experiments. Editing was observed with many of the Type I-E FokI-Cascade homologs ( FIG. 34A ). The highest editing was observed with the variant from Pseudomonas sp. S-6-2, while other homologs (i.e., Salmonella enterica, Geothermobacter sp. EPR-M, Methanocella arvoryzae MRE50, and S. thermophilus (strain ND07)) showed editing approximately equivalent to E. coli . Editing with FokI-Cascade RNPs derived from Types I-B, I-C, I-F, and I-Fv2 was not observed and therefore may be below the limit of detection ( FIG. 34B ).
- This Example illustrates the design and testing of multiple fusion proteins comprising FokI-Cas8 and linker polypeptides of various lengths, as well as the effect of varying interspacer distances for efficient genome editing with Pseudomonas sp S-6-2 Type I-E CRISPR-Cas systems.
- Minimal CRISPR arrays were designed to target a set of loci in the human genome. Interspacer distances ranged from 23-34 bp, in increments of 1 bp. Eight targets were designed for each of the interspacer distances, and targets were flanked by AAG PAM sequences. Dual guides were generated with PCR-based assembly using three oligonucleotides (SEQ ID NO:1513 to SEQ ID NO:1515) and a unique primer encoding a “repeat-spacer-repeat-spacer-repeat” sequence to enable FokI-Cascade targeting. The full set of unique oligonucleotide sequences to generate the minimal CRISPR arrays were SEQ ID NO:1516 to SEQ ID NO:1704. PCR-assembled guides were purified and concentrated using SPRIselect® beads (Beckman Coulter, Pasadena, Calif.) essentially according to the manufacturer's instructions.
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells, cas genes linked via 2A “ribosome-skipping” sequences, and FokI attached to Cas8 with a 30-aa linker (SEQ ID NO:1748). Additional linker polypeptide sequences of varying length were designed and used to connect FokI to the Cas8 protein to form fusion proteins. The linker polypeptide sequences are listed in Table 40.
- Transfection conditions were performed essentially as described in Example 8 except for with the following modifications. Prior to nucleofection, 5 ⁇ l of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 5 ⁇ g of plasmid encoding FokI-Cascade RNP protein components and ⁇ 0.1-0.5 ⁇ g of linear PCR product encoding the minimal CRISPR array.
- Deep sequencing was performed essentially as described in Example 8. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers were SEQ ID NO:1705 to SEQ ID NO:1803.
- This Example illustrates the use of Cas3-FokI and FokI-Cascade to induce dimerization of FokI to generate a double-strand break at a locus in the human genome (see e.g., FIG. 17A ., FIG. 17B , and FIG. 17C ). More specifically, this Example details the design and testing of multiple Cas3-FokI linker compositions and lengths and FokI-Cas8 linker compositions and lengths for affecting genome editing efficiency.
- Minimal CRISPR arrays are designed to target three distinct sites flanked by AAG PAMs in the human genome. Sites are selected that were previously shown to support interspacer editing with E. coli FokI-Cascade dimers directed by dual-guides and are therefore known to be permissive for FokI-Cascade binding (e.g., Hsa37, Hsa43, and Hsa46).
- FokI-Cascade systems described in the Examples above used two FokI Cascade complexes (see e.g., FIG. 16A , FIG. 16B , and FIG. 16C ); accordingly, dual-guides comprising a first guide sequence specifying a first nucleic acid target site and a second guide sequence specifying a second nucleic acid target site can be used. Because the Cas3-FokI-FokI-Cascade system only requires a single PAM, a guide comprising “repeat-spacer-repeat” should be sufficient to facilitate binding of the functional Cascade complex to a nucleic acid target site.
- a dual-guide containing “repeat-spacer-repeat-spacer-repeats” can also be used but, typically in this embodiment, the two spacer sequences direct binding of the Cascade complex to the same nucleic acid target sequence; that is, the two spacers can have the same sequence.
- the guides are cloned essentially as described in Example 9 with SEQ ID No:454.
- the following annealed oligonucleotides are used for generation of the minimal CRISPR arrays: Hsa37 (SEQ ID NO:1019; SEQ ID NO:1076), Hsa43 (SEQ ID NO:1024; SEQ ID NO:1081), and Hsa46 (SEQ ID NO:1027; SEQ ID NO:1084).
- FokI-Cascade RNP protein component-encoding genes are cloned into plasmid vectors containing CMV promoters to enable delivery and expression in mammalian cells.
- cas genes are linked via 2A “ribosome-skipping” sequences.
- FokI is fused to Cas8 with a 30-aa linker (SEQ ID NO:455 from Example 3). Additional linkers sequences of varying length and composition are designed and used to connect FokI to the Cas8 protein. Example of such sequences are listed in Table 41.
- Cas3 protein from E. coli is fused with FokI on the C-terminus using a 30-aa linker. This fusion is further modified with an NLS sequence on the N-terminus (SEQ ID NO:1806). Additional linkers sequences of varying length and composition are designed and used to connect FokI to the Cas3 protein (Table 41 and SEQ ID NO:1804 to SEQ ID NO:1807).
- Additional Cas3-FokI fusion constructs are generated wherein the helicase or nuclease activity of the Cas3 protein is inactivated (SEQ ID NO:1808 to SEQ ID NO:1815). Helicase and nuclease activities are impaired by making D452A and D75A modifications, respectively, of the Cas3 protein (Mulepati, S., et al., J. Biol. Chem. 288(31):22184-22192 (2013)).
- Transfection conditions are performed as described in Example 8 with the following modifications.
- 5 ⁇ l of plasmid vector solution Prior to nucleofection, 5 ⁇ l of plasmid vector solution are transferred to individual wells of a 96-well plate.
- Each well comprises the following three components: 3 ⁇ g of a plasmid encoding a set of FokI-Cascade RNP protein components, 3 ⁇ g of a plasmid encoding a Cas3-FokI, and 0.5 ⁇ g of a plasmid encoding a minimal CRISPR array.
- the 96-well plate is set up as a matrix to provide all combinations of the three components.
- Deep sequencing is performed as described in Example 8 with the following modifications.
- the target-specific primers used in this Example are as follows: SEQ ID NO:1133 and SEQ ID NO:1190 (Hsa37 target site), SEQ ID NO:1138 and SEQ ID NO:1195 (Hsa43 target site), and SEQ ID NO:1141 and SEQ ID NO:1198 (Hsa46 target site).
- Deep sequencing data analysis is performed as described in Example 8 with the exception that indels ⁇ 1 bp to ⁇ 25 bp upstream of the FokI-Cascade binding site PAM sequence are tallied. In this manner, the combinations of FokI-Cas8 linker sequences, Cas3-FokI linker sequences, and Cas3 variants that support the most efficient editing can be determined.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Mycology (AREA)
- Gastroenterology & Hepatology (AREA)
- Cell Biology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 16/104,875, filed 17 Aug. 2018, now allowed, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/684,735, filed 13 Jun. 2018, now pending, the contents of which are herein incorporated by reference in their entirety.
- Not applicable.
- The present application contains a Sequence Listing that has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. The ASCII copy, created on 11 Jan. 2019 is named CBI032-11_ST25.txt and is 2.2 MB in size.
- The present disclosure relates generally to engineered
Class 1 Type I CRISPR-Cas (Cascade) systems that comprise multi-protein effector complexes, nucleoprotein complexes comprising Type I CRISPR-Cas subunit proteins and nucleic acid guides, polynucleotides encoding Type I CRISPR-Cas subunit proteins, and guide polynucleotides. The disclosure also relates to compositions and methods for making and using the engineered Type I CRISPR-Cas systems of the present invention. - Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) constitute CRISPR-Cas systems. The CRISPR-Cas systems provide adaptive immunity against foreign polynucleotides in bacteria and archaea (see, e.g., Barrangou, R., et al., Science 315:1709-1712 (2007); Makarova, K. S., et al., Nature Reviews Microbiology 9:467-477 (2011); Garneau, J. E., et al., Nature 468:67-71 (2010); Sapranauskas, R., et al., Nucleic Acids Research 39:9275-9282 (2011); Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)). Various CRISPR-Cas systems in their native hosts are capable of DNA targeting (
Class 1 Type I;Class 2 Type II and Type V), RNA targeting (Class 2 Type VI), and joint DNA and RNA targeting (Class 1 Type III) (see, e.g., Makarova, K. S., et al., Nat. Rev. Microbiol. 13(11):722-736 (2015); Shmakov, S., et al., Nat. Rev. Microbiol. 15:169-182 (2017); Abudayyeh, O. O., et al., Science 353:1-17 (2016)). - The classification of CRISPR-Cas systems has had many iterations. Koonin, E. V., et al., (Curr. Opin. Microbiol. 37:67-78 (2017)) proposed a classification system that takes into consideration the signature cas genes specific for individual types and subtypes of CRISPR-Cas systems. The classification also considered sequence similarity between multiple shared Cas proteins, the phylogeny of the best conserved Cas protein, gene organization, and the structure of the CRISPR array. This approach provided a classification scheme that divides CRISPR-Cas systems into two distinct classes:
Class 1 comprising a multiprotein effector complex (Type I (CRISPR-associated complex for antiviral defense (“Cascade”) effector complex), Type III (Cmr/Csm effector complex), and Type IV); andClass 2 comprising a single effector protein (Type II (Cas9), Type V (Cas12a, previously referred to as Cpf1), and Type VI (Cas13a, previously referred to as C2c2)). In theClass 1 systems, Type I is the most common and diverse, Type III is more common in archaea than bacteria, and Type IV is least common. - The Type I systems comprise the signature Cas3 protein. The Cas3 protein has helicase and DNase domains responsible for DNA target sequence cleavage. To date, seven subtypes of the Type I system have been identified (i.e., Type I-A, I-B, I-C, I-D, I-E, I-F (and variants for I-F (e.g., I-Fv1, I-Fv2), and I-U) that have a variable number of cas genes. Type I cas genes include, but are not limited to, the following: cas7, cas5, cas8, cse2, csa5, cas3, cast, cas4, cas1, and cash. Examples of organisms having Type I systems are as follows: I-A, Archaeoglobus fulgidus; I-B, Clostridium kluyveri; I-C, Bacillus halodurans; I-U, Geobacter sulfurreducens; I-D, Cyanothece sp. 8802; I-E, Escherichia coli K12; I-F, Yersinia pseudo-tuberculosis; I-F variant, Shewanella putrefaciens CN-32 (Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)).
- Type I systems typically encode proteins that combine with a CRISPR RNA (crRNA or “guide RNA”) to form a Cascade complex. These complexes comprise multiple proteins and a CRISPR RNA (crRNA), which are transcribed from this CRISPR locus. In Type I systems, primary processing of a pre-crRNA is catalyzed by Cash. This typically results in a crRNA with a 5′ handle of 8 nucleotides, a spacer region, and a 3′ handle; both 5′ and 3′ handles are derived from the repeat sequence. In some systems, the 3′ handle forms a stem-loop structure; in other systems, secondary processing of the 3′ end of crRNA is catalyzed by ribonuclease(s) (van der Oost, J., et al., Nature Reviews Microbiology 12:479-492 (2014)).
- The Cascade effector complexes of the Type I CRISPR-Cas systems comprise a backbone having paralogous Repeat-Associated Mysterious Proteins (RAMPs; e.g., Cas7 and Cas5 proteins) containing the RNA Recognition Motif (RRM) fold and additional “large” and “small” subunit proteins (see, e.g., Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78,
FIG. 2 (2017)). These Cascade effector complexes typically have a Cas5 subunit protein and several Cas7 subunit proteins. Such Cascade effector complexes also comprise the guide RNA. The Cascade effector complexes comprise the various subunit proteins arranged in an asymmetric fashion along the length of the guide RNA. The Cas5 subunit protein and the large subunit protein (Cas8 protein) are positioned at one end of the complex, enveloping the 5′ end of the guide RNA. Several copies of the small subunit protein interact with the guide RNA backbone, which is bound to multiple copies of the Cas7 subunit protein. The Cas6 subunit protein, another RAMP protein, is associated with the Cascade effector complex primarily through association with the 3′ handle (repeat region) of the crRNA. The Cas6 subunit protein usually functions as the repeat-specific RNase involved in pre-crRNA processing; however, in Type I-C systems, Cas5 functions as the repeat-specific RNase and there is no Cas6. - The primary sequences of the CRISPR-Cas Type I Cascade subunit proteins have little sequence identity; however, the presence of homologous RAMP modules and the overall structural similarity of the multiprotein effector complexes supports a common origin of these effector complexes (Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)).
- The adaptive immunity mechanism of action in the Type I CRISPR-Cas systems involves essentially three phases: adaptation, expression, and interference. In the adaptation phase, a foreign DNA or RNA infects the host and proteins encoded by various cas genes bind regions of the infecting DNA or RNA. Such regions are called protospacers. A protospacer adjacent motif (PAM) is a short nucleotide sequence (e.g., 2 to 6 base pair DNA sequence) that is adjacent to the protospacer. PAM sequences are typically recognized by a Cas1 subunit protein/Cas2 subunit protein complex, wherein the active PAM-sensing site is associated with the Cas1 subunit proteins (Jackson, S. A., et al., Science 356:356(6333) (2017)).
- In the expression phase, the CRISPR array comprising multiple spacer-repeat elements is transcribed as a single transcript. Individual spacer repeat elements are processed by an endonuclease (e.g., Type I, a Cas6 protein; and Type I-C, a Cas5 protein) into individual crRNAs. Cas subunit proteins are expressed and associate with the crRNA to form a Cascade effector complex.
- The Cascade effector complex scans foreign polynucleotides infecting the host to identify DNA complementary to the spacer. In Type I systems, interference occurs when the effector complex identifies a sequence complementary to the spacer that is adjacent a PAM; and the Cas3 protein is recruited to the DNA-bound Cascade effector complex to cleave and progressively digest the foreign polynucleotide.
- Makarova, K. S., et al., (Cell 168:946 (2017)) provide a summary of genes, homologs, Cascade complexes, and mechanisms of action for Type I CRISPR-Cas systems.
- Although CRISPR-Cas systems have been used for genome editing, there remains a need to improve editing efficiency and editing fidelity of these systems.
- The present invention generally relates to compositions comprising engineered Type I CRISPR-Cas effector complexes, modified guide polynucleotides, and combinations thereof.
- One embodiment of the present invention is a composition comprising:
- a first engineered Type I CRISPR-Cas effector complex comprising,
- a first Cse2 subunit protein, a first Cas5 subunit protein, a first Cas6 subunit protein, and a first Cas7 subunit protein,
- a first fusion protein comprising a first Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a first guide polynucleotide comprising a first spacer capable of binding a first nucleic acid target sequence; and
- a second engineered Type I CRISPR-Cas effector complex comprising,
- a second Cse2 subunit protein, a second Cas5 subunit protein, a second Cas6 subunit protein, and a second Cas7 subunit protein,
- a second fusion protein comprising a second Cas8 subunit protein and a second FokI, wherein the N-terminus of the second Cas8 subunit protein or the C-terminus of the second Cas8 protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the second linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a second guide polynucleotide comprising a second spacer capable of binding a second nucleic acid target sequence, wherein a protospacer adjacent motif (PAM) of the second nucleic acid target sequence and a PAM of the first nucleic acid target sequence have an interspacer distance between 20 base pairs (bp) to 42 bp.
- In some embodiments, the length of the first linker polypeptide and/or the second linker polypeptide is a length of between about 15 amino acids and about 30 amino acids, or between about 17 amino acids and about 20 amino acids. In one embodiment, the length of the first linker polypeptide and the second linker polypeptide are the same.
- Interspacer distances between the second nucleic acid target sequence and the first nucleic acid target sequence include, but are not limited to, between about 22 bp to about 40 bp, between about 26 bp to about 36 bp, between about 29 bp to about 35 bp, or between about 30 bp to about 34 bp.
- The first FokI and the second FokI can be monomeric subunits that are capable of associating to form a homodimer, or distinct subunits that are capable of associating to form a heterodimer.
- In some embodiments, the N-terminus of the first Cas8 subunit protein is covalently connected by the first linker polypeptide to the C-terminus of the first FokI, the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the N-terminus of the first FokI, the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI, the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the N-terminus of the second FokI, and combinations thereof. The first Cas8 subunit protein and the second Cas8 subunit protein can each comprise a Cas8 subunit protein having a different sequence or both the first and the second Cas8 subunit protein can comprise identical amino acid sequences.
- Similarly, the first Cse2 subunit protein and the second Cse2 subunit protein can each comprise different or identical Cse2 subunit protein amino acid sequences, the first Cas5 subunit protein and the second Cas5 subunit protein can each comprise different or identical Cas5 subunit protein amino acid sequences, the first Cas6 subunit protein and the second Cas6 subunit protein can each comprise different or identical Cas6 subunit protein amino acid sequences, the first Cas7 subunit protein and the second Cas7 subunit protein can each comprise different or identical Cas7 subunit protein amino acid sequences, and combinations thereof.
- In a preferred embodiment, the guide polynucleotides comprise RNA.
- Additional embodiments of the present invention will be readily apparent to those of ordinary skill in the art in view of the disclosures herein.
- The Figures are not proportionally rendered, nor are they to scale. The locations of indicators are approximate.
-
FIG. 1A present a generalized illustration of a Type I CRISPR-Cas effector complex.FIG. 1B presents a generalized illustration of a Type I CRISPR-Cas crRNA. -
FIG. 2A ,FIG. 2B , andFIG. 2C present illustrative examples of two engineered Type I CRISPR-Cas effector complexes with fusion domains bound to neighboring spacer sequences. -
FIG. 3 presents information related to SEQ ID NO:1 to SEQ ID NO:351. -
FIG. 4A andFIG. 4B present examples of circularly permuted proteins. -
FIG. 5A ,FIG. 5B ,FIG. 6A ,FIG. 6B ,FIG. 7A ,FIG. 7B ,FIG. 7C ,FIG. 8A ,FIG. 8B ,FIG. 9 ,FIG. 10 ,FIG. 11A , andFIG. 11B illustrate a variety of examples of engineered Type I CRISPR-Cas effector complexes of the present invention. -
FIG. 12A andFIG. 12B illustrate examples of substrate channels. -
FIG. 13A ,FIG. 13B , andFIG. 13C present a generalized illustration of site-directed recruitment of a functional protein domain fused to a Cascade subunit protein by a dCas9:NATNA complex. -
FIG. 14A ,FIG. 14B ,FIG. 15A ,FIG. 15B , andFIG. 15C illustrate examples of engineered Type I CRISPR-Cas effector complexes of the present invention. -
FIG. 16A ,FIG. 16B ,FIG. 16C ,FIG. 17A ,FIG. 17B ,FIG. 17C ,FIG. 18A ,FIG. 18B ,FIG. 18C ,FIG. 19A ,FIG. 19B ,FIG. 19C ,FIG. 19D ,FIG. 20A ,FIG. 20B ,FIG. 21A , andFIG. 21B present examples of engineered Type I CRISPR-Cas effector complexes of the present invention and methods of use thereof. -
FIG. 22A ,FIG. 22B ,FIG. 22C ,FIG. 22D ,FIG. 23A ,FIG. 23B ,FIG. 23C , andFIG. 23D illustrate embodiments of the present invention that use a Cas3 protein comprising active endonuclease activity. -
FIG. 24A ,FIG. 24B ,FIG. 24C ,FIG. 24D ,FIG. 24E ,FIG. 25 ,FIG. 26 ,FIG. 27 , andFIG. 28 present schematic diagrams of a variety of Cascade component expression systems. -
FIG. 29 ,FIG. 30 ,FIG. 31 ,FIG. 32A ,FIG. 32B ,FIG. 33 ,FIG. 34A ,FIG. 34B , andFIG. 35 present data related to genome editing of the engineered Cascade systems of the present invention. - All patents, publications, and patent applications cited in the present Specification are herein incorporated by reference as if each individual patent, publication, or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
- It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in the present Specification and the Claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes one or more polynucleotides, and reference to “a vector” includes one or more vectors.
- Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although other methods and materials similar, or equivalent, to those described herein can be useful in the present invention, preferred materials and methods are described herein.
- In view of the teachings of the present Specification and the Examples, one of ordinary skill in the art can apply conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant polynucleotides, as taught, for example, by the following standard texts: Cellular and Molecular Immunology, Ninth Edition, A. K. Abbas., et al., Elsevier (2017), ISBN 978-0323479783; Cancer Immunotherapy Principles and Practice, First Edition, L. H. Butterfield, et al., Demos Medical (2017), ISBN 978-1620700976; Janeway's Immunobiology, Ninth Edition, Kenneth Murphy, Garland Science (2016), ISBN 978-0815345053; Clinical Immunology and Serology: A Laboratory Perspective, Fourth Edition, C. Dorresteyn Stevens, et al., F. A. Davis Company (2016), ISBN 978-0803644663; Antibodies: A Laboratory Manual, Second edition, E. A. Greenfield, Cold Spring Harbor Laboratory Press (2014), ISBN 978-1-936113-81-1; Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, Seventh Edition, R. I. Freshney, Wiley-Blackwell (2016), ISBN 978-1118873656; Transgenic Animal Technology, Third Edition: A Laboratory Handbook, C. A. Pinkert, Elsevier (2014), ISBN 978-0124104907; The Laboratory Mouse, Second Edition, H. Hedrich, Academic Press (2012), ISBN 978-0123820082; Manipulating the Mouse Embryo: A Laboratory Manual, Fourth Edition, R. Behringer, et al., Cold Spring Harbor Laboratory Press (2013), ISBN 978-1936113019; PCR 2: A Practical Approach, M. J. McPherson, et al., IRL Press (1995), ISBN 978-0199634248; Methods in Molecular Biology (Series), J. M. Walker, ISSN 1064-3745, Humana Press; RNA: A Laboratory Manual, D. C. Rio, et al., Cold Spring Harbor Laboratory Press (2010), ISBN 978-0879698911; Methods in Enzymology (Series), Academic Press; Molecular Cloning: A Laboratory Manual (Fourth Edition), M. R. Green, et al., Cold Spring Harbor Laboratory Press (2012), ISBN 978-1605500560; Bioconjugate Techniques, Third Edition, G. T. Hermanson, Academic Press (2013), ISBN 978-0123822390; Methods in Plant Biochemistry and Molecular Biology, W. V. Dashek, CRC Press (1997), ISBN 978-0849394805; Plant Cell Culture Protocols (Methods in Molecular Biology), V. M. Loyola-Vargas, et al., Humana Press (2012), ISBN 978-1617798177; Plant Transformation Technologies, C. N. Stewart, et al., Wiley-Blackwell (2011), ISBN 978-0813821955; Recombinant Proteins from Plants (Methods in Biotechnology), C. Cunningham, et al., Humana Press (2010), ISBN 978-1617370212; Plant Genomics: Methods and Protocols (Methods in Molecular Biology), W. Busch, Humana Press (2017), ISBN 978-1493970018; Plant Biotechnology: Methods in Tissue Culture and Gene Transfer, R. Keshavachandran, et al., Orient Blackswan (2008), ISBN 978-8173716164.
- Clustered regularly interspaced short palindromic repeats (CRISPR) and related CRISPR-associated proteins (Cas proteins) constitute CRISPR-Cas systems (see, e.g., Barrangou, R., et al., Science 315:1709-1712 (2007)).
- As used herein, “Cas protein,” “CRISPR-Cas protein,” and “CRISPR-Cas subunit protein,” and “Cas subunit protein,” unless otherwise identified, all refer to
Class 1 Type I CRISPR-Cas proteins. Typically, for use in aspects of the present invention, Cas subunit proteins are capable of interacting with one or more cognate polynucleotides (most typically, a crRNA) to form a Type I effector complex (most typically, a ribonucleoprotein complex). Genes encoding Cas subunit proteins are listed in Table 1. -
TABLE 1 Type I CRISPR-Cas Proteins Universal family name* Alternative designation Rep plex Cas5 CasD, Cas5e, Csc1, Csy2, Csf3, unwinding Cas1822 ng, Cas3 Cas8 Large subunit, CasA, Cse1, recruitment Cas8a, Cas8b, Cas8c, Cas8e, Cas8f, Csy1 Cse2 Small subunit, CasB, Cas11 Cas7 CasC, Cse4, Csc2, Csy3, Csf2, Cas1821, Cst2/DevR Cas6 CasE, Cse3, Cas6e, Cas6f, Csy4 Cas3 Cas3′, Cas3″ *As defined by Makarova, K. S., et al., Nat. Rev. Microbiol. 13(11): 722-736 (2015); Koonin, E. V., et al., Curr Opin Microbiol. 37: 67-78 (2017). indicates data missing or illegible when filed - The terms “Type I CRISPR-Cas effector complex,” “Cascade complex,” “Type I CRISPR-Cas nucleoprotein complex,” and “Type I complexes” are used interchangeably herein. The terms “Cascade RNP complex” and “Type I ribonucleoprotein (RNP) complex” refer to a Cascade complex specifically comprising a crRNA (versus a more generic guide polynucleotide, as described below). An example of a wild-type Type I CRISPR-Cas effector complex is illustrated in
FIG. 1A .FIG. 1A is adapted from Makarova, K. S., et al., (Cell 168:946 (2017); and Makarova, K., et al., Nature reviews Microbiology 13(11):722-736 (2015)). doi:10.1038/nrmicro3569.).FIG. 1A illustrates six Cas7 proteins, a Cas5 protein, a Cas8 protein, two Cse2 proteins, a Cas6 protein, and a crRNA associated as a Cascade complex. The complex is capable of binding a nucleic acid target sequence. After association of a wild-type Cas3 with the complex, the Cascade complex is capable of cleavage of a nucleic acid target sequence. As noted in Table 1, the total number of some Cas subunit proteins can vary in Cascade complexes. - “Cas3” and “Cas3 protein” are used interchangeably herein to refer to Type I CRISPR-Cas3 proteins, modifications, and variants thereof. The Type I CRISPR-Cas effector complexes bind foreign DNA complementary to the crRNA guide and recruit Cas3, a trans-acting nuclease-helicase required for target degradation. Cas3 proteins have motifs characteristic of helicases from
superfamily 2 and contain a DEAD/DEAH box region and a conserved C-terminal domain. Cas3 proteins and variants thereof are known in the art (see, e.g., Westra, E. R., et al., Mol Cell. 46(5): 595-605 (2012); Sinkunas, T., et al., EMBO J. 30(7):1335-1342 (2011); Beloglazova, N., et al., EMBO J. 30:4616-4627 (2011); Mulepati, S., et al., J. Biol. Chem. 286:31896-31903 (2011)). As used herein, dCas3* is a mutated Cas3 protein that does not have any nuclease activity and/or helicase activity. - The term “nuclease” as used herein refers to an enzyme capable of cleaving the phosphodiester bonds, such as those connecting two nucleotides, as found in double-stranded (ds) nucleic acids (e.g., dsDNA, genomic DNA (gDNA), dsRNA), single-stranded (ss) nucleic acids (e.g., ssDNA, RNA) or hybrid dsRNA/DNA. An “endonuclease” typically can effect ss- (nicks) or ds-breaks in its target molecules. One example of a DNA endonuclease is a FokI enzyme. “FokI endonuclease” and “FokI” are used interchangeably herein and refer to a FokI enzyme, FokI homologs, enzymatically active domain(s) of FokI enzymes, and variants of FokI enzymes. FokI dimerization is typically required for DNA cleavage. Dimers of FokI can comprise two monomeric subunits that associate to form a homodimer or two distinct monomeric subunits that associate to form a heterodimer (see, e.g., Bitinaite, J., et al., Proceedings of the National Academy of Sciences 95(18):10570-10575 (1998); Ramalingam, S., et al., Journal of Molecular Biology, 405(3):630-641 (2011)). One example of a FokI variant is the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)). Additional DNA and RNA nucleases are known in the art.
- “CRISPR RNA,” “crRNA,” and “guide RNA,” as used herein, refer to one or more RNAs with which Cas subunit proteins are capable of interacting to form a Type I effector complex that guides the complex to preferentially bind a nucleic acid target sequence in a polynucleotide (relative to a polynucleotide that does not comprise the nucleic acid target sequence). “Guide” and “guide polynucleotide” as used herein refer to the polynucleotide component of Type I effector complexes and can comprise ribonucleotide bases (e.g., RNA), deoxyribonucleotide bases (e.g., DNA), combinations of ribonucleotide bases and deoxyribonucleotide bases, nucleotides, nucleotide analogs, modified nucleotides, and the like, as well as synthetic, naturally occurring, and non-naturally occurring modified backbone residues or linkages, for example, as described herein. An example of a Type I CRISPR-Cas crRNA associated with a nucleic acid target sequence through the crRNA spacer is illustrated in
FIG. 1B .FIG. 1B is adapted from Hochstrasser, M. L., et al., Molecular Cell 63(5):840-851 (2016). InFIG. 1B , the PAM associated with the nucleic acid target sequence and the 5′ and 3′ strands of a double-stranded nucleic acid are illustrated (FIG. 1B , vertical lines represent hydrogen bonds). A guide polynucleotide typically comprises a 5′ handle region (FIG. 1B, 5 ′ Handle Region), a spacer region (FIG. 1B , Spacer) comprising a seed region, and a 3′ hairpin comprising two hydrogen-bonded repeat regions (FIG. 1B, 3 ′ Hairpin; horizontal lines represent hydrogen bonds).FIG. 1B illustrates the Cascade complex spacer bound to the nucleic acid target sequences (FIG. 1B , vertical lines represent hydrogen bonds).FIG. 1B also illustrates the protospacer region (FIG. 1B , protospacer). The spacer can comprise a region of the crRNA between about 6 to about 56 nucleotides, wherein the spacer is complementary to a nucleic acid target sequence in a polynucleotide. The spacer length can be modified to fine-tune Cascade activity in Type I-E CRISPR-Cas systems. Cascade complexes can incorporate an extra Cas7 subunit with every 6 nucleotides added to the crRNA spacer and an extra Cse2 subunit with every 12 nucleotides added to the spacer (Luo, M. L., et al., Nucleic Acids Research. 44(15):7385-7394 (2016)). The spacer typically comprises a region of between about 32 to about 36 nucleotides. - The terms “spacer,” “spacer sequence,” and “nucleic acid target binding sequence” are used interchangeably herein.
- As used herein, a “stem element” or “stem structure” refers to two strands of nucleic acids that are known or predicted to form a double-stranded region (the “stem element”). A “stem-loop element” or “stem-loop structure” refers to a stem structure wherein 3′-end sequences of one strand are covalently bonded to 5′-end sequences of the second strand by a nucleotide sequence of typically single-stranded nucleotides (“a stem-loop element nucleotide sequence”). In some embodiments, the loop element comprises a loop element nucleotide sequence of between about 3 and about 20 nucleotides in length, preferably between about 4 and about 10 nucleotides in length. In preferred embodiments, a loop element nucleotide sequence is a single-stranded nucleotide sequence of unpaired nucleic acid bases that do not interact through hydrogen bond formation to create a stem element within the loop element nucleotide sequence. The term “hairpin element” is also used herein to refer to stem-loop structures. Such structures are well known in the art. The base pairing may be exact; however, as is known in the art, a stem element does not require exact base pairing. Thus, the stem element may include one or more base mismatches or non-paired bases. An example of a stem-loop structure in a guide polynucleotide is illustrated in
FIG. 1B . - A “linker element nucleotide sequence,” “linker nucleotide sequence,” and “linker polynucleotide” are used interchangeably herein and refer to either a single-stranded nucleic acid sequence or a double-stranded nucleic acid sequence of one or more nucleotides covalently attached to a first nucleic acid sequence (e.g., 5′-linker nucleotide sequence-first nucleic acid sequence-3′). In some embodiments, a linker nucleotide sequence connects two separate nucleic acid sequences to form a single polynucleotide (e.g., 5′-first nucleic acid sequence-linker nucleotide sequence-second nucleic acid sequence-3′). Other examples of linker nucleotide sequences include, but are not limited to, 5′-first nucleic acid sequence-linker nucleotide sequence-3′ and 5′-linker nucleotide sequence-first first nucleic acid sequence-linker nucleotide sequence-3′. In some embodiments, the linker element nucleotide sequence can be a single-stranded nucleotide sequence of unpaired nucleic acid bases that do not interact with each other through hydrogen bond formation to create a secondary structure (e.g., a stem-loop structure) within the linker element nucleotide sequence. In some embodiments, two linker element nucleotide sequences can interact with each other through hydrogen bonding between the two linker element nucleotide sequences. In some embodiments, a linker polynucleotide encodes a “linker polypeptide.” Such a linker polynucleotide typically connects the 3′ end of a first polynucleotide encoding a first polypeptide to the 5′ end of a second polynucleotide encoding a second polypeptide to form a single polynucleotide that encodes a fusion protein comprising N—the first polypeptide—the linker polypeptide—the second polypeptide—C. In some embodiments of the present invention, more than two polypeptide sequences can be connected in tandem by linker polypeptides (e.g., N-a first polypeptide-a first linker polypeptide-a second polypeptide-a second linker polypeptide-a third polypeptide-C). Linker polypeptide, “linker polypeptide sequence,” “amino acid linker sequence,” and “linker sequence” are used interchangeably herein.
- As used herein, a “connecting nucleotide sequence” refers to a single-stranded nucleic acid sequence linker sequence that covalently connects a first nucleic acid sequence and a second nucleic acid sequence.
- As used herein, the terms “interspacer,” “interspacer region,” and “interspacer distance” are used interchangeably and refer to the distance between a PAM of a first nucleic acid target sequence (e.g., a first DNA target sequence) and a PAM of a second nucleic acid target sequence (e.g., a second DNA target sequence) typically in a PAM-in orientation, wherein a first Type I CRISPR-Cas effector complex comprises a first spacer capable of binding the first nucleic acid target sequence, and a second Type I CRISPR-Cas effector complex comprises a second spacer capable of binding the second nucleic acid target sequence.
FIG. 2A ,FIG. 2B , andFIG. 2C present illustrative examples of two Type I CRISPR-Cas effector complexes (“Cascade1” comprising “crRNA1” and “Cascade2” comprising “crRNA2”) comprising fusion proteins (“FP1” and “FP2”; e.g., FokI) connected with each Cascade complex through linker polynucleotides (“Linker1” and “Linker2”), wherein the CRISPR-Cas effector complexes are bound to neighboring nucleic acid target sequences on double-stranded DNA (“dsDNA”). PAM sequences associated with each nucleic acid target sequence are indicated (“PAM1,” open box, and “PAM2,” open box)).FIG. 2A illustrates an interspacer (shown as a double-arrowheaded line) between two target sites in a PAM-in (PAM-in/PAM-in) configuration.FIG. 2B illustrates an interspacer (shown as a double-arrowheaded line) between two target sites in a PAM-in/PAM-out configuration.FIG. 2C illustrates an interspacer between two target sites in the PAM-out (PAM-out/PAM-out) configuration.FIG. 2A ,FIG. 2B , andFIG. 2C also illustrate the separation of the two strands of the dsDNA. A Cascade complex recognizes a dsDNA target sequence adjacent a PAM. PAM sequences are recognized by Cse1. Base pairing between the crRNA and complementary target DNA strand results in an R-loop with the displaced non-complementary target DNA strand (Beloglazova, N., et al., Nucleic Acids Research 43(1):530-543 (2015)). - As used herein, the term “cognate” typically refers to a group of Cas subunit proteins (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8) and one or more guide polynucleotides (e.g., a Type I CRISPR-Cas RNA) that are capable of forming a nucleoprotein complex capable of site-directed binding to a nucleic acid target sequence complementary to a spacer present in one of the one or more guide polynucleotides.
- The terms “wild-type,” “naturally occurring,” and “unmodified” are used herein to mean the typical (or most common) form, appearance, phenotype, or strain existing in nature; for example, the typical form of cells, organisms, polynucleotides, proteins, macromolecular complexes, genes, RNAs, DNAs, or genomes as they occur in, and can be isolated from, a source in nature. The wild-type form, appearance, phenotype, or strain serve as the original parent before an intentional modification. Thus, mutant, variant, engineered, recombinant, and modified forms are not wild-type forms.
- As used herein, the terms “engineered,” “genetically engineered,” “recombinant,” “modified,” “non-naturally occurring,” “non-natural,” and “non-native” are interchangeable and indicate intentional human manipulation.
- “Covalent bond,” “covalently attached,” “covalently bound,” “covalently linked,” “covalently connected,” and “molecular bond” are used interchangeably herein and refer to a chemical bond that involves the sharing of electron pairs between atoms. Examples of covalent bonds include, but are not limited to, phosphodiester bonds, phosphorothioate bonds, disulfide bonds and peptide bonds (—CO—NH—).
- “Non-covalent bond,” “non-covalently attached,” “non-covalently bound,” “non-covalently linked,” “non-covalent interaction,” and “non-covalently connected” are used interchangeably herein and refer to any relatively weak chemical bond that does not involve sharing of a pair of electrons. Multiple non-covalent bonds often stabilize the conformation of macromolecules and mediate specific interactions between molecules. Examples of non-covalent bonds include, but are not limited to, hydrogen bonding, ionic interactions (e.g., Na+Cl−), van der Waals interactions, and hydrophobic bonds.
- As used herein, “hydrogen bonding,” “hydrogen-base pairing,” and “hydrogen bonded” are used interchangeably and refer to canonical hydrogen bonding and non-canonical hydrogen bonding including, but not limited to, “Watson-Crick-hydrogen-bonded base pairs” (W—C-hydrogen-bonded base pairs or W—C hydrogen bonding); “Hoogsteen-hydrogen-bonded base pairs” (Hoogsteen hydrogen bonding); and “wobble-hydrogen-bonded base pairs” (wobble hydrogen bonding). W—C hydrogen bonding, including reverse W—C hydrogen bonding, refers to purine-pyrimidine base pairing, e.g., adenine:thymine, guanine:cytosine, and uracil:adenine. Hoogsteen hydrogen bonding, including reverse Hoogsteen hydrogen bonding, refers to a variation of base pairing in nucleic acids wherein two nucleobases, one on each strand, are held together by hydrogen bonds in the major groove. This non-W—C hydrogen bonding can allow a third strand to wind around a duplex and form triple-stranded helices. Wobble hydrogen bonding, including reverse wobble hydrogen bonding, refers to a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. There are four major wobble base pairs: guanine:uracil, inosine (hypoxanthine):uracil, inosine-adenine, and inosine-cytosine. Rules for canonical hydrogen bonding and non-canonical hydrogen bonding are known to those of ordinary skill in the art (see, e.g., The RNA World, Third Edition (Cold Spring Harbor Monograph Series), R. F. Gesteland, Cold Spring Harbor Laboratory Press (2005), ISBN 978-0879697396; The RNA World, Second Edition (Cold Spring Harbor Monograph Series), R. F. Gesteland, et al., Cold Spring Harbor Laboratory Press (1999), ISBN 978-0879695613; The RNA World (Cold Spring Harbor Monograph Series), R. F. Gesteland, et al., Cold Spring Harbor Laboratory Press (1993), ISBN 978-0879694562 (see, e.g., Appendix 1: Structures of Base Pairs Involving at Least Two Hydrogen Bonds, I. Tinoco); Principles of Nucleic Acid Structure, W. Saenger, Springer International Publishing AG (1988), ISBN 978-0-387-90761-1; Principles of Nucleic Acid Structure, First Edition, S. Neidle, Academic Press (2007), ISBN 978-01236950791).
- “Connect,” “connected,” and “connecting” are used interchangeably herein and refer to a covalent bond or a non-covalent bond between two macromolecules (e.g., polynucleotides, proteins, and the like).
- As used herein, the terms “nucleic acid sequence,” “nucleotide sequence,” and “oligonucleotide” are interchangeable and refer to a polymeric form of nucleotides. As used herein, the term “polynucleotide” refers to a polymeric form of nucleotides that has one 5′ end and one 3′ end, and can comprise one or more nucleic acid sequences. A “circular polynucleotide” refers to a polynucleotide having a covalent bond between its 5′ end and 3′ end, thus forming the circular polynucleotide. The nucleotides may be deoxyribonucleotides (DNA), ribonucleotides (RNA), analogs thereof, or combinations thereof, and may be of any length. Polynucleotides may perform any function and may have various secondary and tertiary structures. The terms encompass known analogs of natural nucleotides and nucleotides that are modified in the base, sugar, and/or phosphate moieties. Analogs of a particular nucleotide have the same base-pairing specificity (e.g., an analog of A base pairs with T). A polynucleotide may comprise one modified nucleotide or multiple modified nucleotides. Examples of modified nucleotides include, but are not limited to, fluorinated nucleotides, methylated nucleotides, and nucleotide analogs. Nucleotide structure may be modified before or after a polymer is assembled. Following polymerization, polynucleotides may be additionally modified via, for example, conjugation with a labeling component or target binding component. A nucleotide sequence may incorporate non-nucleotide components. Also encompassed are nucleic acids comprising modified backbone residues or linkages, that are synthetic, naturally occurring, and/or non-naturally occurring, and have similar binding properties as a reference polynucleotide (e.g., DNA or RNA). Examples of such analogs include, but are not limited to, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), Locked Nucleic Acid (LNA™) (Exiqon, Inc., Woburn, Mass.) nucleosides, glycol nucleic acid, bridged nucleic acids, and morpholino structures.
- Peptide-nucleic acids (PNAs) are synthetic homologs of nucleic acids wherein the polynucleotide phosphate-sugar backbone is replaced by a flexible pseudo-peptide polymer, and nucleobases are linked to the polymer. PNAs have the capacity to hybridize with high affinity and specificity to complementary sequences of RNA and DNA.
- In phosphorothioate nucleic acids, the phosphorothioate (PS) bond substitutes a sulfur atom for a non-bridging oxygen in the polynucleotide phosphate backbone. This modification makes the internucleotide linkage resistant to nuclease degradation. In some embodiments, phosphorothioate bonds are introduced between the last 3 to 5 nucleotides at the 5′-end or 3′-end sequences of a polynucleotide sequence to inhibit exonuclease degradation. Placement of phosphorothioate bonds throughout an entire oligonucleotide helps reduce degradation by endonucleases, as well.
- Threose nucleic acid (TNA) is an artificial genetic polymer. The backbone structure of TNA comprises repeating threose sugars linked by phosphodiester bonds. TNA polymers are resistant to nuclease degradation. TNA can self-assemble by base-pair hydrogen bonding into duplex structures.
- Linkage inversions can be introduced into polynucleotides through use of “reversed phosphoramidites” (see, e.g., www.ucalgary.ca/dnalab/synthesis/-modifications/linkages). A 3′-3′ linkage at a terminus of a polynucleotide stabilizes the polynucleotide to exonuclease degradation by creating an oligonucleotide having two 5′-OH termini but lacking a 3′-OH terminus. Typically, such polynucleotides have phosphoramidite groups on the 5′-OH position and a dimethoxytrityl (DMT) protecting group on the 3′-OH position. Normally, the DMT protecting group is on the 5′-OH and the phosphoramidite is on the 3′-OH.
- Polynucleotide sequences are displayed herein in the conventional 5′ to 3′ orientation unless otherwise indicated.
- As used herein, “sequence identity” generally refers to the percent identity of nucleotide bases or amino acids comparing a first polynucleotide or polypeptide to a second polynucleotide or polypeptide using algorithms having various weighting parameters. Sequence identity between two polynucleotides or two polypeptides can be determined using sequence alignment by various methods and computer programs (e.g., BLAST, CS-BLAST, PSI-BLAST, FASTA, HMMER, L-ALIGN, and the like) available through the worldwide web at sites including, but not limited to, GENBANK (www.ncbi.nlm.nih.gov/genbank/) and EMBL-EBI (www.ebi.ac.uk). Sequence identity between two polynucleotides or two polypeptide sequences is generally calculated using the standard default parameters of the various methods or computer programs. A high degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 90% identity and 100% identity, for example, about 90% identity or higher, preferably about 95% identity or higher, more preferably about 98% identity or higher. A moderate degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 80% identity to about 85% identity, for example, about 80% identity or higher, preferably about 85% identity. A low degree of sequence identity, as used herein, between two polynucleotides or two polypeptides is typically between about 50% identity and 75% identity, for example, about 50% identity, preferably about 60% identity, more preferably about 75% identity. For example, a Cas protein (e.g., Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8) comprising amino acid substitutions can have a low degree of sequence identity, a moderate degree of sequence identity, or a high degree of sequence identity over its length to a reference Cas protein (e.g., wild-type Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8, respectively). As another example, a guide polynucleotide can have a low degree of sequence identity, a moderate degree of sequence identity, or a high degree of sequence identity over its length compared with a reference wild-type guide polynucleotide that complexes with the reference Cas proteins (e.g., a guide polynucleotide that forms a complex with a Type I-E Cse2, Cas5, Cas6, Cas7, and/or Cas8).
- As used herein, “hybridization” “hybridize,” or “hybridizing” is the process of combining two complementary single-stranded DNA or RNA molecules so as to form a single double-stranded molecule (DNA/DNA, DNA/RNA, RNA/RNA) through hydrogen base pairing. Hybridization stringency is typically determined by the hybridization temperature and the salt concentration of the hybridization buffer; e.g., high temperature and low salt provide high stringency hybridization conditions. Examples of salt concentration ranges and temperature ranges for different hybridization conditions are as follows: high stringency, approximately 0.01M to approximately 0.05M salt,
hybridization temperature 5° C. to 10° C. below T.; moderate stringency, approximately 0.16M to approximately 0.33M salt,hybridization temperature 20° C. to 29° C. below Tm; and low stringency, approximately 0.33M to approximately 0.82M salt,hybridization temperature 40° C. to 48° C. below Tm. Tm of duplex nucleic acid sequences is calculated by standard methods well known in the art (see, e.g., Maniatis, T., et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press: New York (1982); Casey, J., et al., Nucleic Acids Research 4:1539-1552 (1977); Bodkin, D. K., et al., Journal of Virological Methods 10(1):45-52 (1985); Wallace, R. B., et al., Nucleic Acids Research 9(4):879-894 (1981)). Algorithm prediction tools to estimate T. are also widely available. High stringency conditions for hybridization typically refer to conditions under which a polynucleotide complementary to a target sequence predominantly hybridizes with the target sequence and substantially does not hybridize to non-target sequences. Typically, hybridization conditions are of moderate stringency, preferably high stringency. - As used herein, “complementarity” refers to the ability of a nucleic acid sequence to form hydrogen bond(s) with another nucleic acid sequence (e.g., through canonical Watson-Crick base pairing). A percent complementarity indicates the percentage of residues in a nucleic acid sequence that can form hydrogen bonds with a second nucleic acid sequence. If two nucleic acid sequences have 100% complementarity, the two sequences are perfectly complementary, i.e., all of the contiguous residues of a first polynucleotide hydrogen bond with the same number of contiguous residues in a second polynucleotide.
- As used herein, “binding” refers to a non-covalent interaction between macromolecules (e.g., between a protein and a polynucleotide, between a polynucleotide and a polynucleotide, between a protein and a protein, and the like). Such non-covalent interaction is also referred to as “associating” or “interacting” (e.g., if a first macromolecule interacts with a second macromolecule, the first macromolecule binds to second macromolecule in a non-covalent manner). Some portions of a binding interaction may be sequence-specific (the terms “sequence-specific binding,” “sequence-specifically bind,” “site-specific binding,” and “site specifically binds” are used interchangeably herein). Sequence-specific binding, as used herein, typically refers to one or more guide polynucleotides capable of forming a complex with Type I CRISPR-Cas subunit proteins (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8) to cause the protein to bind a nucleic acid sequence (e.g., a DNA sequence) comprising a nucleic acid target sequence (e.g., a DNA target sequence) preferentially relative to a second nucleic acid sequence (e.g., a second DNA sequence) without the nucleic acid target binding sequence (e.g., the DNA target binding sequence). All components of a binding interaction do not need to be sequence-specific, such as contacts of a protein with phosphate residues in a DNA backbone. Binding interactions can be characterized by a dissociation constant (Kd). “Binding affinity” refers to the strength of the binding interaction. An increased binding affinity is correlated with a lower Kd.
- As used herein, effector complexes are said to “target” a polynucleotide if such a complex binds or cleaves a polynucleotide in the nucleic acid target sequence within the polynucleotide.
- As used herein, a “double-strand break” (DSB) refers to both strands of a double-stranded segment of DNA being severed. In some instances, if such a break occurs, one strand can be said to have a “sticky end” wherein nucleotides are exposed and not hydrogen bonded to nucleotides on the other strand. In other instances, a “blunt end” can occur wherein both strands remain fully base paired with each other.
- “Donor polynucleotide,” “donor oligonucleotide,” and “donor template” are used interchangeably herein and can be a double-stranded polynucleotide (e.g., DNA), a single-stranded polynucleotide (e.g., DNA or RNA), or a combination thereof. Donor polynucleotides can comprise homology arms flanking the insertion sequence (e.g., DSBs in the DNA). The homology arms on each side can vary in length (e.g., 1-50 bases, 50-100 bases, 100-200 bases, 200-300 bases, 300-500 bases, 500-1000 bases). Homology arms can be symmetric or asymmetric in length. Parameters for the design and construction of donor polynucleotides are well known in the art (see, e.g., Ran, F., et al., Nature Protocols 8(11):2281-2308 (2013); Smithies, O., et al., Nature 317:230-234 (1985); Thomas, K., et al., Cell 44:419-428 (1986); Wu, S., et al., Nature Protocols 3:1056-1076 (2008); Singer, B., et al., Cell 31:25-33 (1982); Shen, P., et al., Genetics 112:441-457 (1986); Watt, V., et al., Proceedings of the National Academy of Sciences of the United States of America 82:4768-4772 (1985); Sugawara, N., et al., Journal of Molecular Cell Biology 12(2):563-575 (1992); Rubnitz, J., et al., Journal of Molecular Cell Biology 4(11):2253-2258 (1984); Ayares, D., et al., Proceedings of the National Academy of Sciences of the United States of America 83(14):5199-5203 (1986); Liskay, R., et al., Genetics 115(1):161-167 (1987)).
- As used herein, “homology-directed repair” (HDR) refers to DNA repair that takes place in cells, for example, during repair of a DSB in genomic DNA. HDR requires nucleotide sequence homology and uses a donor or template polynucleotide to repair the sequence wherein the DSB (e.g., within a DNA target sequence) occurred. The donor polynucleotide generally has the requisite sequence homology with the sequence flanking the DSB so that the donor polynucleotide can serve as a suitable template for repair. HDR results in the transfer of genetic information from, for example, the donor polynucleotide to the DNA target sequence. HDR may result in alteration of the DNA target sequence (e.g., insertion, deletion, or mutation) if the donor polynucleotide sequence differs from the DNA target sequence and part or all of the donor polynucleotide is incorporated into the DNA target sequence. In some embodiments, an entire donor polynucleotide, a portion of the donor polynucleotide, or a copy of the donor polynucleotide is integrated at the site of the DNA target sequence. For example, a donor polynucleotide can be used for repair of the break in the DNA target sequence, wherein the repair results in the transfer of genetic information from the donor polynucleotide at the site or in close proximity of the break in the DNA. Accordingly, new genetic information may be inserted or copied at a DNA target sequence.
- A “genomic region” is a segment of a chromosome in the genome of a host cell that is present on either side of the nucleic acid target sequence site or, alternatively, also includes a portion of the nucleic acid target sequence site. The homology arms of the donor polynucleotide have sufficient homology to undergo homologous recombination with the corresponding genomic regions. In some embodiments, the homology arms of the donor polynucleotide share significant sequence homology to the genomic region immediately flanking the nucleic acid target sequence site; it is recognized that the homology arms can be designed to have sufficient homology to genomic regions farther from the nucleic acid target sequence site.
- As used herein, “non-homologous end joining” (NHEJ) refers to the repair of a DSB in DNA by direct ligation of one terminus of the break to the other terminus of the break without a requirement for a donor polynucleotide. NHEJ is a DNA repair pathway available to cells to repair DNA without the use of a repair template. NHEJ in the absence of a donor polynucleotide often results in nucleotides being randomly inserted or deleted at the site of the DSB.
- “Microhomology-mediated end joining” (MMEJ) is pathway for repairing a DSB in genomic DNA. MMEJ involves deletions flanking a DSB and alignment of microhomologous sequences internal to the break site before joining. MMEJ is genetically defined and requires the activity of, for example, CtIP, Poly(ADP-Ribose) Polymerase 1 (PARP1), DNA polymerase theta (Pol 0), DNA Ligase 1 (Lig 1), or DNA Ligase 3 (Lig 3). Additional genetic components are known in the art (see, e.g., Sfeir, A., et al., Trends in Biochemical Sciences 40:701-714 (2015)).
- As used herein, “DNA repair” encompasses any process whereby cellular machinery repairs damage to a DNA molecule contained in the cell. The damage repaired can include ss-breaks or DSBs. At least three mechanisms exist to repair DSBs: HDR, NHEJ, and MMEJ. “DNA repair” is also used herein to refer to DNA repair resulting from human manipulation, wherein a target locus is modified, e.g., by inserting, deleting, or substituting nucleotides, all of which represent forms of genome editing.
- As used herein, “recombination” refers to a process of exchange of genetic information between two polynucleotides.
- As used herein, the terms “regulatory sequences,” “regulatory elements,” and “control elements” are interchangeable and refer to polynucleotide sequences that are upstream (5′ non-coding sequences), within, or downstream (3′ non-translated sequences) of a polynucleotide target to be expressed. Regulatory sequences influence, for example, the timing of transcription, amount or level of transcription, RNA processing or stability, and/or translation of the related structural nucleotide sequence. Regulatory sequences may include activator binding sequences, enhancers, introns, polyadenylation recognition sequences, promoters, transcription start sites, repressor binding sequences, stem-loop structures, translational initiation sequences, internal ribosome entry sites (IRES), translation leader sequences, transcription termination sequences (e.g., polyadenylation signals and poly-U sequences), translation termination sequences, primer binding sites, and the like.
- Regulatory elements include those that direct constitutive, inducible, and repressible expression of a nucleotide sequence in many types of host cells and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). In some embodiments, a vector comprises one or more pol III promoters, one or more pol II promoters, one or more pol I promoters, or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer; see, e.g., Boshart, M., et al., Cell 41:521-530 (1985)), the SV40 promoter, the dihydrofolate reductase promoter, the (3-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1α promoter. It will be appreciated by those skilled in the art that the design of an expression vector may depend on such factors as the choice of the host cell to be transformed, the level of expression desired, and the like. A vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acid sequences as described herein.
- “Gene,” as used herein, refers to a polynucleotide sequence comprising exon(s) and related regulatory sequences. A gene may further comprise intron(s) and/or untranslated region(s) (UTR(s)).
- As used herein, the term “operably linked” refers to polynucleotide sequences or amino acid sequences placed into a functional relationship with one another. For example, regulatory sequences (e.g., a promoter or enhancer) are “operably linked” to a polynucleotide encoding a gene product if the regulatory sequences regulate or contribute to the modulation of the transcription of the polynucleotide. Operably linked regulatory elements are typically contiguous with the coding sequence. However, enhancers can function if separated from a promoter by up to several kilobases or more. Accordingly, some regulatory elements may be operably linked to a polynucleotide sequence but not contiguous with the polynucleotide sequence. Similarly, translational regulatory elements contribute to the modulation of protein expression from a polynucleotide.
- As used herein, “expression” refers to transcription of a polynucleotide from a DNA template, resulting in, for example, a messenger RNA (mRNA) or other RNA transcript (e.g., non-coding, such as structural or scaffolding RNAs). The term further refers to the process through which transcribed mRNA is translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be referred to collectively as “gene product(s).” Expression may include splicing the mRNA in a eukaryotic cell, if the polynucleotide is derived from genomic DNA.
- As used herein, the term “modulate” refers to a change in the quantity, degree or amount of a function. For example, a Type I CRISPR nucleoprotein complex, as disclosed herein, may modulate the activity of a promoter sequence by binding to a nucleic acid target sequence at or near the promoter or a transcriptional start site or regulator site. Depending on the action occurring after binding, the Type I CRISPR nucleoprotein complex can induce, enhance, suppress, or inhibit transcription of a gene operatively linked to the promoter sequence. Thus, “modulation” of gene expression includes both gene activation and gene repression.
- Modulation can be assayed by determining any characteristic directly or indirectly affected by the expression of the target gene. Such characteristics include, for example, changes in RNA or protein levels, protein activity, product levels, expression of the gene, or activity level of reporter genes. Accordingly, the terms “modulating expression,” “inhibiting expression,” and “activating expression” of a gene can refer to the ability of a Type I CRISPR nucleoprotein complex to change, activate, or inhibit transcription of a gene.
- “Vector” and “plasmid,” as used herein, refer to a polynucleotide vehicle to introduce genetic material into a cell. Vectors can be linear or circular. Vectors can contain a replication sequence capable of effecting replication of the vector in a suitable host cell (e.g., an origin of replication). Upon transformation of a suitable host, the vector can replicate and function independently of the host genome or integrate into the host genome. Vector design depends, among other things, on the intended use and host cell for the vector, and the design of a vector of the invention for a particular use and host cell is within the level of skill in the art. The four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Typically, vectors comprise an origin of replication, a multicloning site, and/or a selectable marker. An expression vector typically comprises an expression cassette.
- As used herein, “expression cassette” refers to a polynucleotide construct generated using recombinant methods or by synthetic means and comprising regulatory sequences operably linked to a selected polynucleotide to facilitate expression of the selected polynucleotide in a host cell. For example, the regulatory sequences can facilitate transcription of the selected polynucleotide in a host cell, or transcription and translation of the selected polynucleotide in a host cell. An expression cassette can, for example, be integrated in the genome of a host cell or be present in a vector to form an expression vector.
- As used herein, a “targeting vector” is a recombinant DNA construct typically comprising tailored DNA arms, homologous to genomic DNA, that flank elements of a target gene or nucleic acid target sequence (e.g., a DSB). A targeting vector comprises a donor polynucleotide. Elements of the target gene can be modified in a number of ways, including deletions and/or insertions. A defective target gene can be replaced by a functional target gene, or in the alternative a functional gene can be knocked out. Optionally, the donor polynucleotide of a targeting vector comprises a selection cassette comprising a selectable marker that is introduced into the target gene. Targeting regions adjacent or within a target gene can be used to affect regulation of gene expression.
- As used herein, the term “between” is inclusive of end values in a given range (e.g., between 1 and 50 nucleotides in length includes 1 nucleotide and 50 nucleotides; between 5 amino acids and 50 amino acids in length includes 5 amino acids and 50 amino acids).
- As used herein, the term “amino acid” (aa) refers to natural and synthetic (unnatural) amino acids, including amino acid analogs, modified amino acids, peptidomimetics, glycine, and D or L optical isomers.
- As used herein, the terms “peptide,” “polypeptide,” “protein,” and “subunit protein” are interchangeable and refer to polymers of amino acids. A polypeptide may be of any length. It may be branched or linear, it may be interrupted by non-amino acids, and it may comprise modified amino acids. The terms also refer to an amino acid polymer that has been modified through, for example, acetylation, disulfide bond formation, glycosylation, lipidation, phosphorylation, pegylation, biotinylation, cross-linking, and/or conjugation (e.g., with a labeling component or ligand). Polypeptide sequences are displayed herein in the conventional N-terminal to C-terminal orientation, unless otherwise indicated.
- Polypeptides and polynucleotides can be made using routine techniques in the field of molecular biology (see, e.g., standard texts discussed above). Furthermore, essentially any polypeptide or polynucleotide is available from commercial sources.
- The terms “fusion protein” and “chimeric protein,” as used herein, refer to a single protein created by joining two or more proteins, protein domains, or protein fragments or circular permuted polypeptides that do not naturally occur together in a single protein. In some embodiments, a linker polynucleotide can be used to connect a first protein, protein domains, or protein fragments, or circular permuted polypeptides to a second protein, protein domains, or protein fragments or circular permuted polypeptides. For example, a fusion protein can comprise a Type I CRISPR-Cas protein (e.g., Cas8, Cas3) and a functional domain from another protein (e.g., FokI; see, e.g., U.S. Pat. No. 9,885,026, issued 6 Feb. 2018). The modification to include such domains in fusion proteins may confer additional activity on engineered Type I CRISPR-Cas proteins. Such activities can include nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, and/or myristoylation activity or demyristoylation activity that modifies a polypeptide associated with nucleic acid target sequence (e.g., a histone).
- In some embodiments, a fusion protein can comprise epitope tags (e.g., histidine tags, HA tags, FLAG® (Sigma Aldrich, St. Louis, Mo.) tags, Myc tags, nuclear localization signal (NLS) tags, SunTag), reporter protein sequences (e.g., glutathione-S-transferase, beta-galactosidase, luciferase, green fluorescent protein, cyan fluorescent protein, yellow fluorescent protein), and/or nucleic acid sequence binding domains (e.g., a DNA binding domain or an RNA binding domain).
- A fusion protein can also comprise activator domains (e.g., heat shock transcription factors, NFKB activators) or repressor domains (e.g., a KRAB domain). As described by Lupo, A., et al., Current Genomics 14(4):268-278 (2013), the KRAB domain is a potent transcriptional repression module and is located in the amino-terminal sequence of most C2H2 zinc finger proteins (see, e.g., Margolin, J., et al., Proceedings of the National Academy of Sciences of the United States of America 91:4509-4513 (1994); Witzgall, R., et al., Proceedings of the National Academy of Sciences of the United States of America 91:4514-4518 (1994)). The KRAB domain typically binds to co-repressor proteins and/or transcription factors via protein-protein interactions, causing transcriptional repression of genes to which KRAB zinc finger proteins (KRAB-ZFPs) bind (see, e.g., Friedman J. R., et al., Genes & Development 10:2067-2678 (1996)). In some embodiments, linker nucleic acid sequences are used to join the two or more proteins, protein domains, or protein fragments.
- As used herein “CASCADEa” (Cascade activation) is a CRISPR method or system wherein the method or system activates the expression of a gene within the locus of the target nucleic acid sequence. For the recruitment of endogenous transcription factors, one or more subunit proteins in a Cascade complex and/or the guide polynucleotide is typically fused to an effector domain (e.g., VP16 or VP64). In some embodiments, the guide polynucleotide can be fused 5′ or 3′ to a nucleotide effector domain such as an MS2 binding RNA that also recruits transcription factors. Fusions comprising one or more Cascade subunit proteins and the guide polynucleotide can be combined.
- As used herein “CASCADE” (Cascade inhibition) is a CRISPR method or system wherein the CRISPR method or system downregulates the expression of a gene within the locus of the target nucleic acid sequence. For the recruitment of endogenous repression factors, one or more subunit proteins in a Cascade complex and/or the guide polynucleotide is typically fused to an effector domain (e.g., KRAB). In some embodiments, the guide polynucleotide can be fused 5′ or 3′ to a nucleotide effector domain that also recruits transcription factors. Fusions comprising one or more Cascade subunit proteins and the guide polynucleotide can be combined.
- A “moiety,” as used herein, refers to a portion of a molecule. A moiety can be a functional group or describe a portion of a molecule with multiple functional groups (e.g., that share common structural aspects). The terms “moiety” and “functional group” are typically used interchangeably; however, a “functional group” can more specifically refer to a portion of a molecule that comprises some common chemical behavior.
- The term “affinity tag,” as used herein, typically refers to one or more moieties that increases the binding affinity of one macromolecule for another, for example, to facilitate formation of an engineered Type I CRISPR-Cas nucleoprotein complex. In some embodiments, an affinity tag can be used to increase the binding affinity of one Cas subunit protein for another Cas subunit protein (e.g., a first Cas7 protein for a second Cas7 protein). In some embodiments, an affinity tag can be used to increase the binding affinity of one or more Cas subunit proteins for a cognate guide polynucleotide. Some embodiments of the present invention introduce one or more affinity tags to the N-terminal of a Cas subunit protein sequence, to the C-terminal of a Cas subunit protein sequence, to a position located between the N-terminal and C-terminal of a Cas subunit protein sequence, or to combinations thereof. In some embodiments of the present invention, one or more guide polynucleotide comprises an affinity tag that increases binding affinity of the guide polynucleotide with one or more Cas subunit proteins. A wide variety of affinity tags are disclosed in U.S. Published Patent Application No. 2014-0315985, published 23 Oct. 2014. Ligands and ligand-binding moieties are paired affinity tags.
- As used herein, a “cross-link” is a bond that links one polymer chain (e.g., a polynucleotide or polypeptide) to another. Such bonds can be covalent bonds or ionic bonds. In some embodiments, one polynucleotide can be bound to another polynucleotide by cross linking the polynucleotides. In other embodiments, a polynucleotide can be cross linked to a polypeptide. In additional embodiments, a polypeptide can be cross linked to a polypeptide.
- The term “cross-linking moiety,” as used herein, typically refers to a moiety suitable to provide cross linking between two macromolecules. A cross-linking moiety is another example of an affinity tag.
- As used herein, a “host cell” generally refers to a biological cell. A cell is the basic structural, functional, and/or biological unit of an organism. A cell can originate from any organism having one or more cells. Examples of host cells include, but are not limited to, a prokaryotic cell, eukaryotic cell, a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a cell of a eukaryotic organism, a protozoal cell, a cell from a plant (e.g., cells from plant crops (such as soy, tomatoes, sugar beets, pumpkin, hay, cannabis, tobacco, plantains, yams, sweet potatoes, cassava, potatoes, wheat, sorghum, soybean, rice, corn, maize, oil-producing Brassica (e.g., oil-producing rapeseed and canola), cotton, sugar cane, sunflower, millet, and alfalfa), fruits, vegetables, grains, seeds, flowering plants, conifers, gymnosperms, ferns, clubmosses, hornworts, liverworts, mosses), an algal cell, (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. agardh, and the like), seaweeds (e.g., kelp), a fungal cell (e.g., a yeast cell or a cell from a mushroom), an animal cell, a cell from an invertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode, and the like), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, or mammal), a cell from a mammal (e.g., a pig, a cow, a goat, a sheep, a rodent, a rat, a mouse, a non-human primate, a human, and the like). Furthermore, a cell can be a stem cell or a progenitor cell. In some embodiments, a host cell is a non-human cell. In some embodiments, a host cell is a human cell outside of a human body, wherein in particular embodiments the human cell is not introduced into a human body.
- As used herein, “stem cell” refers to a cell that has the capacity for self-renewal, i.e., the ability to go through numerous cycles of cell division while maintaining the undifferentiated state. Stem cells can be totipotent, pluripotent, multipotent, oligopotent, or unipotent. Stem cells can be embryonic, fetal, amniotic, adult, or induced pluripotent stem cells.
- As used herein, “induced pluripotent stem cell” refers to a type of pluripotent stem cell that is artificially derived from a non-pluripotent cell, typically a somatic cell. In some embodiments, the somatic cell is a human somatic cell. Examples of somatic cells include, but are not limited to, dermal fibroblasts, bone marrow-derived mesenchymal cells, cardiac muscle cells, keratinocytes, liver cells, stomach cells, neural stem cells, lung cells, kidney cells, spleen cells, and pancreatic cells. Additional examples of somatic cells include cells of the immune system, including but not limited to, B cells, dendritic cells, granulocytes, innate lymphoid cells, megakaryocytes, monocytes/macrophages, myeloid-derived suppressor cells, natural killer (NK) cells, T cells, thymocytes, and hematopoietic stem cells.
- “Plant,” as used herein, refers to whole plants, plant organs, plant tissues, germplasm, seeds, plant cells, and progeny of the same. Plant cells include, without limitation, cells from seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores. Plant parts include differentiated and undifferentiated tissues including, but not limited to, roots, stems, shoots, leaves, pollens, seeds, tumor tissue, and various forms of cells and culture (e.g., single cells, protoplasts, embryos, and callus tissue). The plant tissue may be in plant or in a plant organ, tissue, or cell culture. “Plant organ” refers to plant tissue or a group of tissues that constitute a morphologically and functionally distinct part of a plant.
- “Subject,” as used herein, refers to any member of the phylum Chordata, including, without limitation, humans and other primates, including non-human primates such as rhesus macaques, chimpanzees, and other monkey and ape species; farm animals, such as cattle, sheep, pigs, goats, and horses; domestic mammals, such as dogs and cats; laboratory animals, including rabbits, mice, rats, and guinea pigs; birds, including domestic, wild, and game birds, such as chickens, turkeys and other gallinaceous birds, ducks, and geese; and the like. The term does not denote a particular age or gender. Thus, the term includes adult, young, and newborn individuals as well as male and female. In some embodiments, a host cell is derived from a subject (e.g., stem cells, progenitor cells, or tissue-specific cells). In some embodiments, the subject is a non-human subject.
- As used herein, “transgenic organism” refers to an organism that contains genetic material into which DNA from an unrelated organism has been artificially introduced. The term includes the progeny (any generation) of a transgenic organism, provided that the progeny has the genetic modification. In some embodiments, the transgenic organism is a non-human transgenic organism.
- As used herein, “isolated” can refer to a molecule (e.g., a polynucleotide or a polypeptide) that, by human intervention, exists apart from its native environment and is therefore not a product of nature. An isolated polynucleotide or polypeptide can exist in a purified form and/or can exist in a non-native environment such as, for example, in a recombinant cell.
- As used herein, a “substrate channel” refers to the direct transfer of a reactant from one enzymatic reaction to another enzymatic reaction without first diffusing into the bulk environment (Wheeldon, I., et al., Nat. Chem. 8(4):299-309 (2016)). Intermediates of these enzymatic steps are not in equilibrium with the bulk solution, which enables the increased efficiencies and yields in enzymatic processes. Frequently, enzymes in naturally occurring metabolic processes have evolved means of co-localization and assembly into controlled aggregates.
- As used herein, “substrate channel element” refers to a component of a metabolic pathway. In some embodiments, a substrate channel element is an enzyme that catalyzes a chemical reaction.
- As used herein, “substrate channel complex” refers to multiple substrate channel elements that are co-localized together via some means.
- As used herein, an “RNA scaffold” refers to an RNA molecule that peptides can use as a substrate for binding.
- In a first aspect, the present invention relates to engineered polynucleotides encoding Cascade components including, but not limited to, Cascade subunit proteins and Cascade guide polynucleotides.
- In one embodiment, the present invention relates to engineered polynucleotides encoding Cascade components that are derived from Cascade Type I-E systems. Exemplary polynucleotide constructs comprising Cascade proteins and Cascade crRNAs are presented in Example 1. Example 1, Table 10, and SEQ ID NO:1 through SEQ ID NO:20 (
FIG. 3 ) present polynucleotide DNA sequences of genes encoding the five subunit proteins of Type I-E Cascade, specifically from E. coli strain K-12 MG1655, as well as the amino acid sequences of the resulting protein components. The polynucleotide sequences were derived from E. coli genomic DNA and were codon optimized specifically for expression in E. coli, and/or codon optimized specifically for expression in eukaryotic cells (e.g., human cells). When this polynucleotide is transcribed into a precursor crRNA and processed by the Cascade RNA endonuclease, a mature crRNA is produced that functions as a guide RNA to target complementary DNA sequences in the genome. The minimal CRISPR array comprises two repeat sequences (underlined in the CRISPR array sequences presented in Example 1) flanking an exemplary spacer sequence, which represents the guide portion of the crRNA. RNA processing by the Cascade endonuclease generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence. One of ordinary skill in the art, in view of the teachings of the present Specification and the Examples, can select appropriate spacer sequences to target binding of a Cascade complex to a chosen target sequence (e.g., in genomic DNA). - Polynucleotide sequences encoding Cascade components from additional bacterial or archaeal species can be identified and designed following the guidance of the present Specification and using bioinformatics tools such as BLAST and PSI-BLAST to locate, as an example, homologs of Cascade subunit genes from E. coli strain K-12 MG1655, and then inspecting the flanking genomic neighborhood of the Cascade gene to locate and identify genes of the remaining Cascade subunit proteins (see, e.g., Example 14, Example 15). Because Cascade genes co-occur as conserved operons, they are typically arranged in a consistent order, within the same Type I subtype, facilitating their identification and selection for follow-up analysis and experimentation. As an example, additional Type I-E systems can be identified by locating Cas8 homologs, identifying promising bacterial species for homologous Cascade testing, and then obtaining or designing polynucleotide sequences encoding the Cas8 and other protein components of the Cascade from those homologous CRISPR-Cas systems.
- Polynucleotide DNA sequences of genes encoding the five subunit proteins of Cascade from twelve species (these species are listed in Table 2) with Cascade complexes homologous to those derived from E. coli strain K-12 MG1655, and the amino acid sequences of the resulting protein components, as well as exemplary minimal CRISPR arrays, are presented as SEQ ID NO:22 through SEQ ID NO:213 (
FIG. 3 ). The polynucleotide sequences for the proteins were derived from the genomic DNA of the host bacterium, and were codon optimized specifically for expression in E. coli, and/or codon optimized specifically for expression in eukaryotic cells (e.g., human cells). The polynucleotide DNA sequences encoding corresponding minimal CRISPR arrays were based on repeat sequences derived from the 12 species and can be used to generate mature crRNA that function as guide RNAs. In Table 2, the minimal CRISPR array comprises two repeat sequences (lower case, underlined) flanking an exemplary “spacer” sequence, which represents the guide portion of the crRNA. RNA processing by the endonuclease Cascade subunit generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence. -
TABLE 2 Minimal CRISPR Arrays SEQ ID NO: Species Minimal CRISPR repeat SEQ ID I-E_Oceanicola sp. ctgttccccgcacacgcggggatgaaccgGGTTCT NO: 37 HL-35 TCGATCTGCGCATCCATGATGCCGC Cctgttccccgcacacgcggggatgaaccg SEQ ID I-E_Pseudomonas sp. gtgttccccgcacctgcggggatgaaccGGGCCG NO: 53 S-6-2 GGGCGTTTGCGCTGTCAGGGGCGT CCCgtgttccccgcacctgcggggatgaaccg SEQ ID I-E_Salmonella enterica gtgttccccgcgccagcggggataaaccgCAGCTT NO: 69 subsp. enterica TAGCATCGGTCGACAGCCCATCTG serovar Muenster strain GCgtgttccccgcgccagcggggataaaccg SEQ ID I-E_Atlantibacter gtgttccccgcgccagcggggataaaccgTTTTAA NO: 85 hermannii NBRC 105704 AACAGGATGTGGCCCGCCTGGTGC TGgtgttccccgcgccageggggataaaccg SEQ ID I-E_Geothermobacter sp. ctgttccccgcacccgcggggatgaaccgGTCATC NO: 101 EPR-M TATTTTTAATGGACGATATTTTTCA Actgttccccgcacccgcggggatgaaccg SEQ ID I-E_Methylocaldum sp. ctgttccccacgtacgtggggatgaaccgACGGCG NO: 117 14B TAATGGTAATTGTTAGCCGACAAG TTggttccccacgtacgtggggatgaaccg SEQ ID I-E_Methanocella aaagtccccacaggcgtgggggtgaaccgTGATC NO: 133 arvoryzae MRE50 AGTAACCCGGTCACCATTAAACAG ATTaaagtccccacaggcgtgggggtgaaccg SEQ ID I-E_Lachnospiraceae gtattccccacgcacgtggrggtaaatcCGCTGAG NO: 149 bacterium KH1T2 TTTAATTACGCAGCGGAAGCCGGA GCGgtattccccacgcacgtgggggtaaatc SEQ ID I-E_Klebsiella gtcttccccacacgcgtgggggtgtttcCGGCTCTT NO: 165 pneumoniae strain TTTTATCTCCTTCATCCTTCGCTATgt VRCO0172 cttccccacacgcgtgggggtgtttc SEQ ID I-E_Pseudomonas gtgttccccacatgcgtggggatgaaccgGGCACC NO: 181 aeruginosa DHS01 ATCGGCGCCATTGACCGCGCGCTG AAGgtgttccccacatgcgtggggatgaaccg SEQ ID I-E_Streptococcus gtttttcccgcacacgcgggggtgatccTATACCT NO: 197 thermophilus strain ATATCAATGGCCTCCCACGCATAA ND07 GCgtttttcccgcacacgcgggggtgatcc SEQ ID I-E_Streptomyces sp. gtcggccccgcacccgcggggatgctccAATGGC NO: 213 S4 CGAGGACGACGGCGATCTGGCCAC GGACgtcggccccgcacccgcggggatgctcc - In another embodiment, the present invention relates to engineered polynucleotide sequences encoding Cascade components from additional bacterial or archaeal species, within other Type I subtypes; including, but not limited, to Types I-B, I-C, I-F, and variants of I-F, which can be identified and designed following the guidance of the present Specification and by using bioinformatics tools such as BLAST and PSI-BLAST to locate homologs of Cascade genes from hallmark systems typifying each subtype (see, e.g., Makarova, K. S., et al., Nat. Rev. Microbiol. 13(11):722-736 (2015); Koonin, E. V., et al., Curr Opin Microbiol. 37:67-78 (2017)). After identifying desirable homologs, the flanking genomic neighborhoods of the Cascade gene can be inspected to locate and identify genes of the remaining Cascade subunit proteins as disclosed herein. As an example, additional Type I-F systems can be identified by locating Cas8 homologs (and additional
Type I-F variant 2 systems can be identified by locating Cas5 homologs) and identifying promising bacterial species for homologous Cascade testing, and then obtaining or designing polynucleotide sequences encoding the Cas8, Cas5, and other protein components of the Cascade from those homologous CRISPR-Cas systems. - Polynucleotide DNA sequences of genes encoding the three, four, or five subunit proteins of Cascade from Types I-B, I-C, I-F, and
I-F variant 2 from twelve additional homologous Cascade complexes, and the amino acid sequences of the resulting protein components, as well as exemplary minimal CRISPR arrays, are presented as SEQ ID NO:214 through SEQ ID NO:351 (FIG. 3 ). The polynucleotide sequences for the subunit proteins were derived from the genomic DNA of the host bacterium, and were codon optimized specifically for expression in E. coli, and/or codon optimized specifically for expression in eukaryotic cells (e.g., human cells). The polynucleotide DNA sequences encoding corresponding minimal CRISPR arrays were based on repeat sequences derived from the twelve species and can be used to generate mature crRNA that function as guide RNAs. In Table 3 the minimal CRISPR array comprises two repeat sequences (lower case, underlined) flanking an exemplary “spacer” sequence, which represents the guide portion of the crRNA. RNA processing by the endonuclease Cascade subunit generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence. -
TABLE 3 Minimal CRISPR Arrays SEQ ID NO: Species Minimal CRISPR repeat SEQ ID I-B_Fusobacterium atgaactgtaaacttgaaaagt NO: nucleatum subsp. tttgaaat GTTGACAAATATTC 226 animalis 3_1_33 AGATAATTTTTCAAAATCTTTT atgaactgtaaacttgaaaagt tttgaaat SEQ ID I-B_Campylobacter gtttgctaatgacaatatttgt NO: fetus subsp. gttaaaac AAGCGTAGCACCAA 239 testudinum Sp3 AAGAAGCGTATGAAAGCATAGg tttgctaatgacaatatttgtg ttaaaac SEQ ID I-B_Odoribacter cttttaattgaactaaggtaga NO: splanchnicus attgaaacTAGGAATAAACCGT 252 DSM 20712 ACCCAACCACGTAGCCATATAC Gcttttaattgaactaaggtag aattgaaac SEQ ID I-C_Bacillus gtcgcactcttcatgggtgcgt NO: halodurans C- 125 ggattgaaatCCTTTGACGGAG 262 AGGGGAACAGGAAATTAGAGAA Ggtcgcactcttcatgggtgcg tggattgaaat SEQ ID I-C_Desulfovibrio gtcgccccccacgcgggggcgt NO: vulgaris RCH1 ggattgaaacCAGTCTCGTTAC 272 plasmid pDEVAL01 CCTGTCGCGGAGGGCGTCGATg tcgccccccacgcgggggcgtg gattgaaac SEQ ID I-C_Geobacillus gttgcacccggctattaagccg NO: thermocatenulatus ggtgaggattgaaacTATATCA 282 strain KCTC 3921CACAGCTTCTTAGTATCATCGA CAACACGTgttgcacccggcta ttaagccgggtgaggattgaaa c SEQ ID I-F_Vibrio gttcactgccgtacaggcagct NO: cholerae tagaaaAATATGCAGGGGTTTG 295 strain L15 AAACGCTCGATGTTATgttcac tgccgtacaggcagcttagaaa SEQ ID I-F_Klebsiella gttcactgccgtacaggcagct NO: oxytoca strain tagaaa AAAAACTGAGCGGCCG 308 ICU1-2b CAGAATGAAGTTGTAAgttcac tgccgtacaggcagcttagaaa SEQ ID I-F_Pseudomonas gttcactgccgtgtaggcagct NO: aeruginosa aagaaa ACCACCCGCTACCACC 321 UCBPP-PA14 GGCAGCCGCACCGGCCgttcac tgccgtgtaggcagctaagaaa SEQ ID I-Fv2_Shewanella gttcaccgccgcacaggcggct NO: putrefaciens tagaaaTCAACCAAATCATAAA 331 CN-32 TTGCGCGACCACATTGgttcac cgccgcacaggcggcttagaaa SEQ ID I-Fv2_Acinetobacter gttcactgccatataggcagct NO: sp. 869535 tagaaa ATCGTTTTTTCATACG 341 AGATTCGAAACGGACAgttcac tgccatataggcagcttagaaa SEQ ID I-Fv2_Vibrio gttcactgccgcacaggcagct NO: cholerae HE48 tagaaaTAACCGGAGGCGTACA 351 CTCGATAGAGGCAGCGgttcac tgccgcacaggcagcttagaaa - Example 19 describes the design and testing of multiple Cascade complex homologs, each comprising a Cas subunit protein-FokI fusion protein, to evaluate the efficiency of genome editing for each Cascade complex.
- In a second aspect, the present invention relates to modified Cascade subunit proteins. Cascade subunit proteins suitable for modification include, but are not limited to, Cascade subunit proteins of the species described herein.
- In one embodiment, the present invention relates to engineered circular permutations of Cascade subunit proteins. Such circular permutations of a Cascade subunit protein result in a protein structure having different connectivity of the original linear sequence of amino acids of the Cascade subunit protein, but having an overall similar three-dimensional shape (see, e.g., Bliven, S., et al., PLoS Comput. Biol. 8(3):e1002445 (2012)). Circular permutations of Cascade subunit proteins can have a number of advantages. For example, a circular permutation of a Cas7 subunit protein can create a new N-terminus and a new C-terminus designed to be positioned for connection with an additional polypeptide sequence to form a fusion protein or linker region without disturbing the Cas7 protein fold or the Cascade complex assembly. Three examples of circular permutations of Cas7 (circularly permuted Cas7, cpCas7) are illustrated in
FIG. 4A andFIG. 4B . InFIG. 4A andFIG. 4B , three portions of the protein are shown: a N-terminal portion of the native protein (vertical stripes), a central portion of the native protein (grey shading), and a C-terminal portion of the native protein (no shading).FIG. 4A illustrates relocation of a N-terminal portion of the native protein to the C-terminal position of the cpCas7, wherein the N-terminal portion of the native protein is now at the N-terminal end of the cpCas7 and is connected to the central portion of the native protein by a linker polypeptide.FIG. 4B illustrates relocation of a C-terminal portion of the native protein to the N-terminal position of the cpCas7, wherein the C-terminal portion of the native protein is now at the N-terminal end of the cpCas7 and is connected to the central portion of the native protein by a linker polypeptide. - The data in Example 10 show that purification of Cascade complexes comprising circularly-permuted Cas7 subunit protein variants demonstrate that circularly-permuted Type I-E CRISPR-Cas subunit proteins can be successfully used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- In another embodiment, the present invention relates to Cascade subunit proteins fused to additional polypeptide sequences to create fusion proteins, as well as polynucleotides encoding such fusion proteins. Additional polypeptide sequences can include, but are not limited to, proteins, protein domains, protein fragments, and functional domains. Examples of such additional polypeptide sequences include, but are not limited to, sequences derived from transcription activator or repressor domains, and nucleotide deaminases (e.g., a cytidine deaminase or an adenine deaminase such as described in Komor et. al., Nature 553:420-424 (2016); Koblan et. al., Nat Biotechnol. 2018 May 29-doi: 10.1038/nbt.4172). Additional functional domains for fusion proteins are presented herein.
- An additional polypeptide sequence can be fused to any of the Cascade subunit proteins wherein the additional polypeptide sequence is encoded by an additional polynucleotide sequence that is typically appended to either the 5′ or 3′ end of a polynucleotide comprising the coding sequence of a Cascade subunit protein. In some embodiments, additional polynucleotide sequences that encode amino acid linkers connect a Cascade subunit protein to the additional polypeptide sequences of interest. In some embodiments, the polynucleotide sequences for the fusion protein partner and the linker sequence can be derived from naturally occurring genomic DNA sequences or may be codon optimized for bacterial expression in E. coli or eukaryotic expression in mammalian cells (e.g., human cells). Examples of fusions proteins comprising affinity tags (e.g., His6, Strep-Tag® II (IBA GMBH LLC, Gottingen, Germany)), nuclear localization signal or sequence (NLS), maltose binding protein, and FokI are presented in Example 1. Exemplary amino acid linker sequences are also disclosed in Example 1.
- Example 11 describes Cascade subunit protein-FokI fusions, as well as Cascade subunit protein fusions to cytidine deaminases, endonucleases, restriction enzymes, a nuclease/helicase, or domains thereof. Example 11 describes Cascade subunit protein fusions with other Cascade subunit proteins, as well as Cascade subunit protein fusions with other Cascade subunit fusion proteins and an enzymatic protein domain. In some embodiments, a Type I CRISPR subunit protein can be evaluated in silico for the ability to be used to generate protein fusions at the N-terminus, C-terminus, or positions between the N-terminus and the C-terminus. In some embodiments, a Type I CRISPR subunit protein can be linked to one or more fusion domains at the N-terminus, C-terminus, or positions between the N-terminus and the C-terminus using one or more polypeptide linkers. Examples of polypeptide linkers are set forth in Examples 1, 11, 18, and 19.
-
FIG. 5A andFIG. 5B illustrate Cascade complexes comprising a Cas8 subunit protein fused to an additional protein sequence (e.g., a FokI).FIG. 5A shows an example of the additional protein sequence (“FP”) connected with the C-terminus of a Cas8 subunit protein using a linker polypeptide.FIG. 5B shows an example of the additional protein sequence (“FP”) connected with the N-terminus of a Cas8 subunit protein using a linker polypeptide. Example 11A describes in silico design, cloning, expression, and purification of a Type I-E Cas8 fused N-terminally with a FokI nuclease domain. -
FIG. 6A andFIG. 6B illustrate additional examples of Cascade complexes comprising a Cascade subunit protein fused to an additional protein sequence.FIG. 6A shows an example of a detectable moiety (e.g., a green fluorescent protein, GFP) fused to each of six Cas7 subunit proteins, each via a linker polypeptide. Such a Cascade complex can be useful for detection of binding of the complex to a nucleic acid target sequence by providing significant signal amplification as a result of the presence of the multiple detectable moieties associated with the Cascade complex.FIG. 6B shows an example of an additional protein sequence (“FP”) connected with Cas6 subunit protein using a linker polypeptide. - Examples of fusion proteins containing E. coli Type I-E Cascade subunit proteins include, but are not limited to, the following: the same subunit (e.g., Cse2_linker_Cse2), circularly permuted subunits (e.g., cpCas7_linker_cpCas7 linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7), a Type I-E Cascade protein fused to a nuclease (e.g., FokI_linker_Cas8, Cas3_linker_Cas8, Cas6_linker_FokI, S1Nuclease_linker_Cse2_linker_Cse2), a Type I-E Cascade protein fused to a cytidine deaminase (e.g., Cas8_linker_AID, Cse2_linker_Cse2_linker_APOBEC3G), and a Type I-E Cascade protein fused one or more other Type I-E Cascade proteins (e.g., Cas6_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7, cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_Cas5, Cas6_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cpCas7_linker_cp Cas7_linker_Cas5).
-
FIG. 7A ,FIG. 7B , andFIG. 7C present illustrations of modified Type I CRISPR-Cas effector complexes that contain cpCas7 (compareFIG. 4A )).FIG. 7A presents a Cascade complex comprising six individual cpCas7 subunit proteins.FIG. 7B presents a Cascade complex comprising six fused cpCas7 subunit proteins, wherein the C-terminus of a cpCas7 subunit protein is connected with the N-terminus of an adjacent cpCas7 subunit protein using a linker polypeptide.FIG. 7C presents an embodiment wherein the Cascade complex comprises six fused cpCas7 subunit proteins (a “backbone”), wherein the C-terminus of the first cpCas7 subunit protein is connected with the N-terminus of the second cpCas7 subunit protein using a linker polypeptide, the C-terminus of the second cpCas7 subunit protein is connected with the N-terminus of a different protein sequence (“FP”) (e.g., a cytidine deaminase) using a linker polypeptide and the C-terminus of this protein coding sequence is connected with the N-terminus of the third cpCas7 using a linker polypeptide. One advantage of such a fused backbone of cpCas7 subunit proteins is that an additional protein sequence can be introduced at a specific location along the backbone to provide access of the additional protein sequence to different locations along the length of the nucleic acid target sequence to which the guide directs binding of the Cascade complex. -
FIG. 8A andFIG. 8B illustrate further embodiments of modified Type I CRISPR-Cas effector complexes comprising fusion proteins.FIG. 8A shows a Cascade complex comprising a Cse2-Cse2 fusion protein. In silico design, cloning, expression, purification, and electrophoretic mobility shift assays are described in Example 11B and Example 11C Cascade complexes comprising Cse2-Cse2 fusion proteins.FIG. 8B shows a Cascade complex comprising a Cse2-Cse2 fusion protein connected with an additional protein sequence (“FP”). Example 11D describes in silico design, cloning, expression, and purification of a Cse2-Cse2 protein fused to a cytidine deaminase. - In some embodiments, one or more nuclear localization signals can be added at the engineered N-terminus or C-terminus of a Cascade protein subunit (e.g., a Cas8-FokI fusion protein, a cpCas7 protein, or a Cse2-Cse2 fusion protein).
- In some embodiments of fusion polypeptides, linker polypeptides connect two or more protein coding sequences. The length of exemplary linker polypeptides are described in the Examples. Typically, linker lengths include, but are not limited to, between about 10 amino acids to about 40 amino acids, between about 15 amino acids and about 30 amino acids, and between about 17 amino acids and about 20 amino acids. The amino acid composition of linker polypeptides typically comprises amino acids that are polar, small, and/or charged (e.g., Gly, Ala, Leu, Val, Gln, Ser, Thr, Pro, Glu, Asp, Lys, Arg, His, Asn, Cys, Tyr). Following the guidance of the present Specification, the linker polypeptide is designed to provide appropriate spacing and positioning of the functional domain and the Cascade protein within the fusion protein (Chichili, C., et al., Protein Science 22(2):153-167 (2013); Chen, X., et al., 65(10):1357-1369 (2013); George, R., et al., Protein Engineering, Design and Selection 15:(11):871-879 (2002)). Additional examples of linker polypeptides useful in the practice of the present invention are linker polypeptides identified that connect coding sequences of Cascade proteins to each other in organisms comprising Cascade systems (e.g., the linker polypeptide that connects Cas8 to Cas3 in Streptomyces griseus as described by Westra, E. R., et al., Mol Cell. 46(5): 595-605 (2012)).
- Fusion protein coding DNA sequences can be codon optimized for expression in a selected organism such as bacteria, archae, plants, fungi, or mammalian cells. Codon-optimizing programs are widely available. such as on the Integrated DNA Technologies website (www.idtdna.com/CodonOpt), or through Genscript® services (Genscript, Piscataway, N.J.). To facilitate cloning into the recipient expression vector, additional sequences overlapping with the vector compatible for SLIC cloning (Li, M., et al., Methods Mol. Biol. 852:51-59 (2012)) can be appended at the 5′ and 3′ ends of the DNA sequence.
- In other embodiments, Cascade subunit proteins can be fused to transcription activation and/or repression domains. In some embodiments, a fusion protein can comprise activator domains (e.g., heat shock transcription factors, NFKB activators, VP16, and VP64 (Eguchi, A. et. al., PNAS 113(51):E8257-E8266 (2016); Perez-Pinera, P. et. al., Nature Methods 10(10):973-6 (2013); Gilbert, L. A., et. al. Cell 159(3):647-61 (2014)) or repressor domains (e.g., a KRAB domain). In some embodiments, linker nucleic acid sequences are used to join the two or more coding sequences for proteins, protein domains, or protein fragments.
- Cascade complexes comprising Type I CRISPR-Cas subunit proteins fused to transcription activators can be used to activate the expression of the gene. The target locus can contain a transcriptional start site (TSS) that typically harbors one or more binding site for the transcriptional activation machinery (factors) of a cell.
FIG. 9 illustrates a Cascade complex comprising six fusion proteins comprising a cpCas7 connected via a linker polypeptide to the transcriptional activator VP64. Such modification of a Cascade complex converts the complex into a flexible tool for transcriptional activation of a gene (CASCADEa), wherein targeting a selected gene is achieved by selection of a guide sequence that directs binding of the Cascade complex to one or more regulatory elements (e.g., a TSS) of the selected gene. Example 12 describes the design of a E. coli Type I-E cp-Cas7 protein fused to a VP64 activation domain to confer transcriptional activation activity to the Cascade complex. - In addition, Cascade complexes comprising Type I CRISPR-Cas subunit proteins fused to transcription repressors can be used to repress the expression of the gene. The target locus can comprise transcriptional regulatory elements. In one embodiment, a Cascade subunit protein can be connected to a KRAB domain via a linker polypeptide. A Cascade complex comprising the Cascade subunit protein/KRAB domain fusion can convert the complex into a flexible tool for transcriptional repression of a gene (CASCADEi), wherein targeting a selected gene is achieved by selection of a guide sequence that directs binding of the Cascade complex to one or more regulatory elements of the selected gene.
- In additional embodiments, Cascade subunit proteins can be fused to affinity tags.
- In other embodiments of the present invention, Type I CRISPR-Cas guide polynucleotides can be modified by insertion of a selected polynucleotide element or modification of a nucleotides at selected positions within the guide polynucleotides (e.g., substitution of a DNA moiety for a RNA moiety). Such embodiments include, but are not limited to, Type I CRISPR-
Cas guide polynucleotides 5′, 3′ or internally fused to one or more nucleotide effector domain (e.g., an MS2 or MS2-P65-HSF1 binding RNA or Aptamer that recruits transcription factors).FIG. 10 illustrates a Type I CRISPR guide polynucleotide comprising an RNA aptamer introduced into the 3′ hairpin of the guide. - The length of Type I CRISPR-Cas guides can also be modified, typically by lengthening or shortening the Cas7 subunit protein and Cse2 subunit protein binding region.
FIG. 11A illustrates a Cascade complex with three Cas7 subunits, one Cse2 subunit and a shortened crRNA.FIG. 11B illustrates a Cascade complex with nine Cas7 subunits, three Cse2 subunit and a lengthened crRNA. - Example 16 describes the generation and testing of modifications of Type I CRISPR-Cas guide crRNAs and the suitability of the modified guides for use in constructing engineered Type I CRISPR-Cas effector complexes.
- In a third aspect, the present invention relates to nucleic acid sequences encoding one or more engineered Cascade components, as well as expression cassettes, vectors, and recombinant cells comprising nucleic acid sequences encoding one or more engineered Cascade components. Some embodiments of the third aspect of the invention include one or more polypeptide encoding all the components of a selected Cascade system (e.g., Cse2, Cas5, Cas6, Cas7, and Cas8 proteins, and one or more cognate guides), wherein the components are capable of forming an effector complex. Typically, when more than one cognate guide is expressed, the guides have different spacer sequences to direct binding to different nucleic acid target sequences. Such embodiments include, but are not limited to, expression cassettes, vectors, and recombinant cells.
- In one embodiment, the present invention relates to one or more expression cassettes comprising one or more nucleic acid sequences encoding one or more engineered Cascade components. Expression cassettes typically comprise a regulatory sequence involved in one or more of the following: regulation of transcription, post-transcriptional regulation, or regulation of translation. Expression cassettes can be introduced into a wide variety of organisms including, but not limited to, bacterial cells, yeast cells, plant cells, and mammalian cells (including human cells). Expression cassettes typically comprise functional regulatory sequences corresponding to the organism(s) into which they are being introduced.
- A further embodiment of the present invention relates to vectors, including expression vectors, comprising one or more nucleic acid sequences encoding one or more one or more engineered Cascade components. Vectors can also include sequences encoding selectable or screenable markers. Furthermore, nuclear targeting sequences can also be added, for example, to Cascade subunit proteins. Vectors can also include polynucleotides encoding protein tags (e.g., poly-His tags, hemagglutinin tags, fluorescent protein tags, and bioluminescent tags). The coding sequences for such protein tags can be fused to, for example, one or more nucleic acid sequences encoding a Cascade subunit protein.
- General methods for construction of expression vectors are known in the art. Expression vectors for host cells are commercially available. There are several commercial software products designed to facilitate selection of appropriate vectors and construction thereof, such as insect cell vectors for insect cell transformation and gene expression in insect cells, bacterial plasmids for bacterial transformation and gene expression in bacterial cells, yeast plasmids for cell transformation and gene expression in yeast and other fungi, mammalian vectors for mammalian cell transformation and gene expression in mammalian cells or mammals, and viral vectors (including lentivirus, retrovirus, adenovirus, herpes simplex virus I or II, parvovirus, reticuloendotheliosis virus, and adeno-associated virus (AAV) vectors) for cell transformation and gene expression and methods to easily allow cloning of such polynucleotides. Illustrative plant transformation vectors include those derived from a Ti plasmid of Agrobacterium tumefaciens (Lee, L. Y., et al., Plant Physiology 146(2):325-332 (2008)). Also useful and known in the art are Agrobacterium rhizogenes plasmids. For example, SNAPGENE™ (GSL Biotech LLC, Chicago, Ill.; snapgene.com/resources/plasmid_files/your_time_is_valuable/) provides an extensive list of vectors, individual vector sequences, and vector maps, as well as commercial sources for many of the vectors.
- In order to express and purify recombinant Cascade in a bacterial expression system, vectors can be designed that encode Cascade subunit proteins, as well as a minimal CRISPR arrays comprising guide sequences of interest. Accordingly, one aspect of the present invention includes such expression systems. In one embodiment, the Cascade complex is expressed off of three distinct plasmid vectors, which collectively encode the following components: a Cas8 protein; Cse2, Cas7, Cas5, and Cas6 proteins; and a CRISPR crRNA. In some embodiments, the expression plasmid encoding Cas8 comprises the natural, genomic DNA gene sequence and, in other embodiments, the expression plasmid can encode Cas8 that is codon optimized for expression in a chosen cell type. Similarly, the expression plasmid encoding Cse2, Cas7, Cas5, and Cas6 can contain the natural, genomic DNA gene sequences or can contain gene sequences that have been codon optimized for expression in a chosen cell type. In some embodiments, the entire Cascade subunit protein coding operon can be placed downstream of a single transcriptional promoter, such that the different proteins are all translated from a single polycistronic transcript. In additional embodiments, the gene encoding the Cascade subunit proteins can be separated from each other, with intervening transcriptional terminators and promoters.
- The expression plasmid encoding the crRNA may contain as few as two repeats flanking a single spacer sequence, downstream of an appropriate transcriptional promoter, or may contain many repeats flanking multiple spacer sequences, of either the same exact guide sequence or multiple distinct guide sequences. Coordinated expression of the CRISPR and the Cascade subunits, in particular the Cash subunit, lead to processing of long precursor crRNAs into the mature length crRNA, each one of which comprises fragments of a single repeat on the 5′ and 3′ ends of the crRNA, and a single spacer sequence in the middle.
- An alternative strategy to express the complete Cascade complex in E. coli uses two plasmids: one plasmid that encodes the entire Cas8-Cse2-Cas7-Cas5-Cas6 operon on a single expression plasmid and one encoding the CRISPR crRNA. In this case, the 5′ end of the Cse2 gene, which normally overlaps with the 3′ end of the Cas8 gene, is separated spatially from the 3′ end of the Cas8 gene, in order to append a polynucleotide sequence encoding an affinity tag and/or protease recognition sequence.
- Example 2 describes two types of bacterial expression plasmid systems for the Cascade proteins: the first type comprises two plasmids, a first plasmid encoding the Cas8 protein and a second encoding the 4 subunit proteins of the CasBCDE complex (cse2-cas7-cas5-cas6 operon); and the second type comprises an expression plasmid encoding all 5 subunit proteins of the Cascade complex (cas8-cse2-cas7-cas5-cash operon). Cognate CRISPR arrays are also described.
- In order to facilitate purification of Cascade complexes, an affinity tag can be appended onto the Cse2 subunit, such as an N-terminal Strep-II tag or a hexahistidine (His6) tag. Furthermore, an amino acid sequence recognized by a protease, such as TEV protease or the HRV3C protease can be inserted between the affinity tag and the native N-terminus of the Cse2 subunit, such that biochemical cleavage of the sequence with the protease after initial purification liberates the affinity tag from the final recombinant Cascade complex. The affinity tag may also be placed on other subunits, or left on the Cse2 subunit and combined with additional affinity tags on other subunits. Examples of Cascade subunit proteins comprising affinity tags are set forth in Example 1, Example 2, and Example 3.
- For Type I-E Cascade systems, a strain of E. coli can be transformed with plasmids encoding the CRISPR crRNA as well as the Cse2-Cas7-Cas5-Cas6 genes, protein expression induced, and a Cascade complex that is lacking the Cas8 subunit can be produced. This Cascade complex typically is referred to as a Cas8-minus Cascade complex, or alternatively as a CasBCDE complex (Jore, M., et al., Nat. Struct. Mol. Biol. 18(5):529-536 (2011)). This purified complex can be biochemically combined with separately purified Cas8 to reconstitute full Cascade (Sashital, D. G., et al., Mol. Cell 46(5):606-615 (2012)).
- Table 4 presents exemplary sequences of bacterial expression plasmids encoding the minimal CRISPR array, Cas8, Cse2-Cas7-Cas5-Cas6 constructs, and Cas8-Cse2-Cas7-Cas5-Cas6 constructs, containing different tags and designs. Plasmids that encode Cascade complexes and Cascade complexes from homologous Type I systems can be designed similarly as the exemplary expression plasmid sequences for the Type I-E found in E. coli K-12 MG1655 following the guidance of the present Specification. Table 4 additionally contains sequences of expression plasmids expressing Cas8-Cse2-Cas7-Cas5-Cas6 as well as FokI fusions to either the Cas8 gene or the Cas6 gene, for the production of nuclease-Cascade fusions for gene editing experiments.
-
TABLE 4 Vectors for Production of Cascade Effector Complexes Effector complex SEQ ID NO: Description species of origin Type of sequence SEQ ID NO: 352 minimal CRISPR array I-E_Escherichia Spacer sequence targets J3 coli K-12 MG1655 SEQ ID NO: 353 minimal CRISPR array I-E_Escherichia Spacer sequence targets coli K-12 MG1655 CCR5.1 SEQ ID NO: 354 minimal CRISPR array I-E_Escherichia Dual-guide spacer sequence (J3/L3) coli K-12 MG1655 targets J3 and L3 SEQ ID NO: 355 minimal CRISPR array I-E_Escherichia Dual-guide spacer sequence (Hsa07) coli K-12 MG1655 targets Hsa07 SEQ ID NO: 356 His6-MBP-TEV-Cas8 I-E_Escherichia Derived from genomic coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 357 StrepII-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7_Cas5_Cas6 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 358 Cas8_His6-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7_Cas5_Cas6 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 359 FokI-30aa-Cas8_His6-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7_Cas5_Cas6 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 360 FokI-30aa-Cas8_His6-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7_Cas5_NLS-Cas6 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 361 FokI-30aa-Cas8_His6-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7-NLS_Cas5_Cas6 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 362 Cas8_His6-HRV3C- I-E_Escherichia Derived from genomic Cse2_Cas7_Cas5_Cas6-20aa-FokI coli K-12 MG1655 DNA, with appended tags - Table 5 contains the sequences of single polypromoter bacterial expression plasmids encoding all 5 subunit proteins together with the crRNA from a single bacterial expression plasmid. In this design, each gene is separated from the other genes it flanks upstream and downstream with a transcriptional promoter and terminator. Additional sequences can be introduced that encode an affinity tag and/or protease recognition tag, as well as a fusion to a nuclease protein, in order to generate a Cascade-nuclease fusion for gene editing.
-
TABLE 5 Vectors for Production of Cascade Effector Complexes Effector complex SEQ ID NO: Description species of origin Type of sequence SEQ ID NO: 363 Polypromoter, I-E_Escherichia Derived from genomic Cas5_Cas3_Cse2_Cas7_Cas6_Cas8_CRISPR(J3) coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 364 Polypromoter, I-E_Escherichia Derived from genomic Cas5_Cas3_Cse2_Cas7_CRISPR(J3)_Cas6_Cas8 coli K-12 MG1655 DNA, with appended tags SEQ ID NO: 365 Polypromoter(EcoCO), I-E_Escherichia E. coli codon-optimized CRISPR(J3/L3)_Cse2_Cas7_Cas5_Cas8_Cas6 coli K-12 MG1655 DNA gene sequences SEQ ID NO: 366 Polypromoter(EcoCO), I-E_Escherichia E. coli codon-optimized CRISPR(J3/L3)_Cse2_Cas7_Cas5_Cas8_FokI- coli K-12 MG1655 DNA gene sequences 30aa-Cas6 SEQ ID NO: 367 Polypromoter(EcoCO), I-E_Escherichia E. coli codon-optimized CRISPR(J3/L3)_Cse2_Cas7_Cas5_Cas6_FokI- coli K-12 MG1655 DNA gene sequences 30aa-Cas8 - Additional bacterial expression plasmids can be designed encoding homologous Cascade complexes from other Type I subtypes and other bacterial or archaeal organisms based on the design criteria herein. Such expression plasmids can be designed with genomic DNA sequences for the Cascade genes, or they can be designed with gene sequences that have been codon optimized for expression in E. coli or other bacterial strains.
- In order to express Cascade or effectors fusions to Cascade in mammalian cells, such as human cells, eukaryotic expression plasmid vectors were designed to enable expression of the relevant proteins and RNA components by eukaryotic transcription and translation machinery. In one embodiment, Cascade can be generated in mammalian cells by encoding each of the protein components on a separate expression vector driven by a eukaryotic promoter (e.g., a cytomegalovirus (CMV) promoter), and encoding the crRNA on a separate expression vector driving by a RNA Polymerase III promoter (e.g., the human U6 promoter). The CRISPR RNA can be encoded with a minimal CRISPR array containing at least two repeats flanking one or more spacer sequences that function as the guide portion of the mature crRNA. The construct generating CRISPR RNA can be designed with additional sequences flanking the outermost repeats in the minimal array. Processing of the precursor CRISPR RNA is enabled by the RNA processing subunit of the Cascade complex (Cas6 subunit protein), which can be expressed from a separate plasmid.
- Table 6 contains the sequences of individual eukaryotic expression plasmids for each protein of the E. coli Type I-E Cascade complex. Cas8 subunit can be fused to additional effector nuclease domains, such as the FokI nuclease (Example 1 and Example 3). Table 6 also contains the sequences of expression plasmids for the crRNA component of Cascade, encoding two separate dual-guide crRNAs, whereby three repeat sequences flank two spacer spacers. Each of the protein-coding genes can be appended to polynucleotide sequences that append nuclear localization signals (NLS), affinity tags, and linker sequences connecting those tags. Other fusions to any of the Cascade subunit proteins can be encoded by additional polynucleotide sequences that typically are appended to either the 5′ or 3′ coding sequence, including additional polynucleotide sequences that encode amino acid linkers connecting to the Cascade subunit protein to additional polypeptide sequences of interest. Examples of candidate fusions proteins are described herein.
-
TABLE 6 Vectors for Production of Cascade Effector Complexes Effector complex SEQ species Type of ID NO: Description of origin sequence SEQ ID Cas8, HsCO I-E_Escherichia Homo sapiens NO: 368 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-Cas8, HsCO I-E_Escherichia Homo sapiens NO: 369 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-HA-FokI- I-E_Escherichia Homo sapiens NO: 370 30aa-Cas8, coli K-12 codon-optimized HsCO MG1655 DNA gene sequence SEQ ID NLS-Cse2, HsCO I-E_Escherichia Homo sapiens NO: 371 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-Cas7, HsCO I-E_Escherichia Homo sapiens NO: 372 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID Cas5, HsCO I-E_Escherichia Homo sapiens NO: 373 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-Cas5, HsCO I-E_Escherichia Homo sapiens NO: 374 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID Cas6, HsCO I-E_Escherichia Homo sapiens NO: 375 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-Cas6, HsCO I-E_Escherichia Homo sapiens NO: 376 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID NLS-V5-FokI- I-E_Escherichia Homo sapiens NO: 377 30aa-Cas8, coli K-12 codon-optimized HsCO MG1655 DNA gene sequence SEQ ID Cas3-NLS, HsCO I-E_Escherichia Homo sapiens NO: 378 coli K-12 codon-optimized MG1655 DNA gene sequence SEQ ID CRISPR(Hsa07) I-E_Escherichia Homo sapiens NO: 379 coli K-12 codon-optimized MG1655 DNA gene sequence - In order to express components of the Cascade complex on fewer expression vectors, polycistronic expression vectors can be constructed, whereby a single promoter (e.g., CMV promoter) drives expression of multiple coding sequence simultaneously that are separated by a Thosea asigna virus 2A sequence. 2A viral peptide sequences induce ribosomal skipping, thus enabling multiple protein-coding genes to be concatenated within a single polycistronic construct for expression in eukaryotic cells. Thus, polycistronic vectors can be designed that encode 4 or 5 subunits of the Cascade complex on a single transcript driven by a single promoter. Table 7 contains the sequences of eukaryotic polycistronic expression plasmids that can be combined with a CRISPR RNA expression plasmid to produce functional Cascade in mammalian cells.
-
TABLE 7 Vectors for Production of Cascade Effector Complexes Effector complex SEQ ID NO: Description species of origin Type of sequence SEQ ID NO: 380 Polycistronic(HsCO), I-E_Escherichia coli Homo sapiens NLS-Cas7_NLS-Cse2_NLS- K-12 MG1655 codon-optimized Cas5_NLS-Cas6 DNA gene sequence SEQ ID NO: 381 Polycistronic(HsCO), I-E_Escherichia coli Homo sapiens NLS-Cas7_NLS-Cse2_NLS- K-12 MG1655 codon-optimized Cas5_NLS-Cas6_NLS-Cas8 DNA gene sequence SEQ ID NO: 382 Polycistronic(HsCO), I-E_Escherichia coli Homo sapiens NLS-Cas7_NLS-Cse2_NLS- K-12 MG1655 codon-optimized Cas5_NLS-Cas6_NLS-FokI- DNA gene 30aa-Cas8 sequence SEQ ID NO: 383 Polycistronic(HsCO), I-E_Escherichia coli Homo sapiens NLS-Cas7_NLS-Cse2_NLS- K-12 MG1655 codon-optimized Cas5_NLS-Cas6_NLS-FokI- DNA gene 30aa-Cas8, no epitope tags sequence SEQ ID NO: 384 Polycistronic(HsCO), I-E_Escherichia coli Homo sapiens NLS-Cas7_NLS-Cse2_NLS- K-12 MG1655 codon-optimized Cas5_NLS-FokI-30aa- DNA gene Cas6_NLS-Cas8, no epitope sequence tags - In some embodiments, the CRISPR RNA is encoded within the 3′ untranslated region (UTR) of a protein-coding gene, whose expression is driven by a RNA Polymerase II promoter (e.g., CMV promoter) to produce a transcript. In such embodiments, the minimal CRISPR array is designed to exist downstream of a protein coding gene such as Cas6, Cas7, or a reporter gene (e.g., an enhanced green fluorescent protein, eGFP), and is separated from the protein coding sequence by a MALAT1 triplex sequence that has previously been shown to confer stability to the upstream transcript. The minimal CRISPR array is processed by the RNA processing subunit of Cascade (typically expressed using a different plasmid), an endonuclease that cleaves the minimal CRISPR array, and a break is introduced into the transcript, and the triplex sequence protects the 3′ end of the upstream protein-coding gene from premature exonucleolytic degradation. Table 8 contains sequences of three polynucleotide sequences, whereby the CRISPR array is cloned downstream of either Cas6, Cas7, or eGFP, and expression of the entire fusion sequence is driven by a CMV promoter.
-
TABLE 8 Vectors for Production of Minimal CRISPR Arrays Effector complex SEQ species Type of ID NO: Description of origin sequence SEQ ID eGFP_MALAT1- I-E_Escherichia Homo sapiens NO: 385 triplex_CRISPR coli K-12 codon-optimized (Hsa07) MG1655 DNA gene sequence SEQ ID NLS- I-E_Escherichia Homo sapiens NO: 386 Cas7_MALAT1- coli K-12 codon-optimized triplex_CRISPR MG1655 DNA gene (Hsa07) sequence SEQ ID NLS- I-E_Escherichia Homo sapiens NO: 387 Cas6_MALAT1- coli K-12 codon-optimized triplex_CRISPR MG1655 DNA gene (Hsa07) sequence - In some embodiments, the CRISPR RNA array is encoded on the same vector as the polycistronic construct driving expression of the 5 Cascade subunits; the combination of these two elements generates an all-in-one vector that produces all functional subunits (both protein and RNA) of the Cascade complex, together with any nuclease or effector domains fused to one of the Cascade subunits. Table 9 contains two representative sequences of these all-in-one polynucleotide sequences that encode all the respective components to produce functional FokI-Cascade RNPs in mammalian cells.
-
TABLE 9 Vectors for Production of Cascade Effector Complexes Effector complex SEQ species Type of ID NO: Description of origin sequence SEQ ID hU6_CRISPR(Hsa07)_F, I-E_Escherichia Homo NO: 388 CMV_NLS-Cas7_NLS- coli K-12 sapiens Cse2_NLS-Cas5_NLS- MG1655 codon- Cas6_NLS-FokI-30aa- optimized Cas8 DNA gene sequence SEQ ID hU6_CRISPR(Hsa07)_R, I-E_Escherichia Homo NO: 389 CMV_NLS-Cas7_NLS- coli K-12 sapiens Cse2_NLS-Cas5_NLS- MG1655 codon- Cas6_NLS-FokI- optimized 30aa-Cas8 DNA gene sequence - Example 3 describes expression systems using separate plasmids expressing each Cascade subunit protein and minimal CRISPR array, expression systems wherein multiple Cascade subunit protein coding sequences are expressed from a single promoter, and an expression system wherein a single plasmid Cascade expression system was constructed to express the entire Cas8-Cse2-Cas7-Cas5-Cas6 operon and a minimal CRISPR array for use in mammalian cells.
- One of ordinary skill in the art following the guidance of the present Specification can design additional mammalian expression vectors encoding other Cascade complexes analogously to the examples provided the E. coli Type I-E Cascade complex.
- In a fourth aspect, the present invention relates to production of engineered Type I CRISPR-Cas effector complexes by introduction of plasmids encoding one or more components of the engineered Type I CRISPR-Cas effector complexes into host cells. Transformed host cells (or recombinant cells) or the progeny of cells that have been transformed or transfected using recombinant DNA techniques can comprise one or more nucleic acid sequences encoding one or more component of an engineered Type I CRISPR-Cas effector complex. Methods of introducing polynucleotides (e.g., an expression vector) into host cells are known in the art and are typically selected based on the kind of host cell. Such methods include, for example, viral or bacteriophage infection, transfection, conjugation, electroporation, calcium phosphate precipitation, polyethyleneimine-mediated transfection, DEAE-dextran mediated transfection, protoplast fusion, lipofection, liposome-mediated transfection, particle gun technology, microprojectile bombardment, direct microinjection, and nanoparticle-mediated delivery. In one embodiment of the present invention, polynucleotides encoding components of engineered Type I CRISPR-Cas effector complexes are introduced into bacterial cells (e.g., E. coli).
- Example 4 describes a method for introduction and expression of Cas8 protein coding sequences, as well as coding sequences for components of engineered Type I CRISPR-Cas effector complexes for bacterial production of such complexes using E. coli expression systems.
- A variety of exemplary host cells disclosed herein can be used to produce recombinant cells using an engineered Cascade effector complex. Such host cells include, but are not limited to, a plant cell, a yeast cell, a bacterial cell, an insect cell, an algal cell, and a mammalian cell.
- For ease of discussion, “transfection” is used below to refer to any method of introducing polynucleotides into a host cell.
- In some embodiments, a host cell is transiently or non-transiently transfected with nucleic acid sequences encoding one or more component of a Type I CRISPR-Cas effector complex. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is first removed from a subject, e.g., a primary cell or progenitor cell. In some embodiments, the primary cell or progenitor cell is cultured and/or is returned after ex vivo transfection to the same subject or to a different subject.
- Example 9 illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells. The Example describes the delivery of plasmid vectors expressing Cascade complex components into eukaryotic cells.
- In a fifth aspect, the present invention relates to the purification of engineered Type I CRISPR-Cas effector complexes from cells and uses of such complexes. Engineered Type I CRISPR-Cas effector complexes are produced in a host cell. The engineered Type I CRISPR-Cas effector complexes (in this case Cascade ribonucleoprotein (RNP) complexes) are purified from cell lysates.
- Example 5 describes purification of E. coli Type I-E Cascade RNP complexes produced by overexpression in bacteria as described in Example 4. The method uses immobilized metal affinity chromatography followed by size exclusion chromatography. The Example also describes methods that can be used to assess the quality of purified Cascade RNP products. Examples are presented illustrating the purification of Cas8, Cas7, Cas6, Cas5, and Cse2 Cascade RNP complexes, Cascade complexes comprising Cas7, Cas6, Cas5, and Cse2 proteins, and FokI-Cas8 fusion proteins.
- The purified, engineered Type I CRISPR-Cas effector complexes can also be used directly in biochemical assays (e.g., binding and/or cleavage assays). Example 6 describes production of dsDNA target sequences for use in in vitro DNA binding or cleavage assays. The Example describes three methods to produce target sequences, including annealing of synthetic ssDNA oligonucleotides, PCR amplification of selected nucleic acid target sequences from genomic DNA, as well as cloning of nucleic acid target sequences into bacterial plasmids. The dsDNA target sequences were used in Cascade binding or cleavage assays.
- The site-specific binding of and/or cutting by one or more engineered Type I CRISPR-Cas effector complexes can be confirmed, if necessary, using an electrophoretic mobility shift assay (see, e.g., Garner, M., et al., Nucleic Acids Research 9(13):3047-3060 (1981); Fried, M., et al., Nucleic Acids Research 9(23):6505-6525 (1981); Fried, M., Electrophoresis 10:366-376 (1989); Gagnon, K., et al., Methods Molecular Biology 703:275-2791 (2011); Fillebeen, C., et al., J. Vis. Exp. (94), e52230, doi:10.3791/52230 (2014)), or the biochemical cleavage assay described in Example 7.
- The data presented in Example 7 demonstrate that engineered Type I CRISPR-Cas effector complexes can exhibited nearly quantitative DNA cleavage, as evidenced by conversion of a supercoiled, circular plasmid substrate into a cleaved, linear form.
- In another embodiment, the complexes are introduced directly into a cell, as an alternative to expressing one or more nucleic acid sequences encoding one or more components of engineered Type I CRISPR-Cas effector complexes in a cell. The purified, engineered Type I CRISPR-Cas effector complexes can be directly introduced into cells. Methods to introduce the components into a cell include electroporation, lipofection, particle gun technology, and microprojectile bombardment.
- Example 8 illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising Cas subunit protein-FokI fusion proteins to human cells. The data in the Example demonstrate delivery of pre-assembled Cascade RNPs into target cells and effective genome editing in human cells.
- In some embodiments, the engineered Type I CRISPR-Cas effector complexes described herein can be used to generate non-human transgenic organisms by site specifically introducing a selected polynucleotide sequence (e.g., a portion of a donor polynucleotide) at a DNA target locus in the genome to generate a modification of the genomic DNA. The transgenic organism can be an animal or a plant.
- A transgenic animal is typically generated by introducing engineered Type I CRISPR-Cas effector complexes into a zygote cell. A basic technique, described with reference to making transgenic mice (see, e.g., Cho, A., et al., “Generation of Transgenic Mice,” Current Protocols in Cell Biology, CHAPTER.Unit-19.11 (2009)) involves five basic steps: first, preparation of a system, as described herein, including a suitable donor polynucleotide; second, harvesting of donor zygotes; third, microinjection of the system into the mouse zygote; fourth, implantation of microinjected zygotes into pseudo-pregnant recipient mice; and fifth, performing genotyping and analysis of the modification of the genomic DNA established in founder mice. The founder mice will pass the genetic modification to any progeny. The founder mice are typically heterozygous for the transgene. Mating between these mice will produce mice that are homozygous for the transgene 25% of the time.
- Methods for generating transgenic plants are also well known and can be applied using engineered 1 Type I CRISPR-Cas effector complexes. A generated transgenic plant, for example using Agrobacterium-mediated transformation, typically contains one transgene inserted into one chromosome. It is possible to produce a transgenic plant that is homozygous with respect to a transgene by sexually mating (i.e., selfing) an independent segregant transgenic plant containing a single transgene to itself. Typical zygosity assays include, but are not limited to, single nucleotide polymorphism assays and thermal amplification assays that distinguish between homozygotes and heterozygotes.
- In a sixth aspect, the present invention relates to use of engineered Type I CRISPR-Cas effector complexes to create substrate channels. In some embodiments, fusion proteins comprising substrate channel elements and Cas7 subunit proteins are constructed. These Cas7 fusion proteins are then assembled into an engineered Type I CRISPR-Cas effector complex (e.g., comprising Cse2, Cas5, Cas6, Cas7-substrate channel element fusions, and Cas8). In some embodiments, the crRNA of the engineered Type I CRISPR-Cas effector complex can be extended to accommodate additional Cas7 subunits (Luo, M., et al., Nucleic Acids Research 44:7385-7394 (2016)). Different substrate elements can be fused to Cas7 and then mixed at the desired stoichiometry. When these various Cas7 subunits assemble into a complete Type I CRISPR-Cas effector complex, co-localization of substrate elements can improve the efficacy of substrate channeling.
- In some embodiments, an RNA scaffold is constructed such that multiple Cas7-substrate channel element fusions can bind to it in the absence of other Type I CRISPR-Cas effector complex components.
- Substrate channel elements can be fused to the N-terminus of Cas7 and/or the C-terminus of Cas7. In addition, circular permutations of Cas7 can be fused to substrate channel elements.
-
FIG. 12A andFIG. 12B presents illustrations of substrate channels consisting of three consecutive enzymes in a pathway. Substrate channels facilitate the passing of intermediary metabolic products directly to the active site of the consecutive enzyme in the metabolic pathway chain without release into the extra channel space.FIG. 12A illustrates a typical arrangement of an engineered substrate channel. Enzymes E1, E2, and E3 are linked covalently or non-covalently to a scaffold protein (S1, S2, S3) matrix. The substrate is then processed to the product without release to the extra channel space.FIG. 12B illustrates one embodiment of the present invention comprising a modified Type I CRISPR-Cas effector complex that carries Enzymes E1, E2, and E3 as fusion proteins to Cas7 subunit proteins, thus creating a substrate channel. cpCas7 proteins and backbones formed of cpCas7 proteins can also be useful in the practice of this aspect of the present invention. - In other embodiments, substrate channel elements can be fused to Cas6. The Cas6 subunit of Cascade complexes recognizes specific RNA hairpin structures. An RNA scaffold can be constructed that is composed of multiple Cas6 RNA hairpin structures concatenated together. Cas6 peptides from different Cascade complexes have different recognition sequences. Accordingly, RNA scaffolds can be constructed from multiple orthogonal Cas6 RNA hairpins. By fusing different substrate channel elements to orthogonal Cas6 peptides, substrate channel complexes can be assembled in specific stoichiometry.
- Substrate channel elements can be fused to the N-terminus of Cas6 and/or the C-terminus of Cas6. In addition, circular permutations of Cas6 can be fused to substrate channel elements.
- In some embodiments, a heterologous metabolic pathway of interest can be expressed in a model organism, such as E. coli. When genes are heterologously expressed, the genes can be codon optimized to express the genes more efficiently.
- In one embodiment, the metabolic pathway of interest is the mevalonate pathway from Saccharomyces cerevisiae. Substrate channel elements of this pathway include, but are not limited to, acetoacetyl-CoA-thioase (AtoB), hydroxy-methylglutaryl-CoA synthase (HMGS), and hydroxy-methylglutaryl-CoA reductase (HMGR).
- In another embodiment, the metabolic pathway of interest is the glycerol synthesis pathway from S. cerevisiae. Substrate channel elements of this pathway include, but are not limited to, glycerol-3-phosphate dehydrogenase (GPD1) and glycerol-3-phosphate phosphatase (GPP2).
- In yet another embodiment, the metabolic pathway of interest is the starch hydrolysis pathway from Clostridium stercorarium. Substrate channel elements of this pathway include, but are not limited to, CelY and CelZ.
- In an additional embodiment, the metabolic pathway of interest is the glucose phosphotransferase pathway from E. coli. Substrate channel elements of this pathway include, but are not limited to, trehalose-6-phosphate synthetase (TPS) and trehalose-6-phosphate phosphatase (TPP).
- In a seventh aspect, the present invention relates to site-directed recruitment of functional domains fused to Cascade subunit proteins by complexes comprising a
Class 2 Type II Cas9 protein and a nucleic acid-targeting nucleic acid (NATNA; see e.g., U.S. Pat. No. 9,260,752, issued 16 Feb. 2016; U.S. Pat. No. 9,580,727, issued 28 Feb. 2017; U.S. Pat. No. 9,677,090, issued 13 Jun. 2017; U.S. Pat. No. 9,771,600, issued 26 Sep. 2017; U.S. Pat. No. 9,816,093, issued 14 Nov. 2017). Functional domains are disclosed herein and include, but are not limited to, protein domains having enzymatic function, capable of transcriptional activation, or capable of transcriptional repression. Example 13 describes a method of modifying aClass 2 Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA sequences with aClass 1 Type I CRISPR repeat stem sequence, allowing for the recruitment of one or more Cascade subunit proteins to a Type II CRISPR Cas protein/guide RNA complex binding site. -
FIG. 13A ,FIG. 13B , andFIG. 13C present a generalized illustration of the site-directed recruitment of a functional protein domain fused to a Cascade subunit protein by a dCas9:NATNA complex to a target site. AClass 2 Type II CRISPR NATNA (FIG. 13A, 102 ) comprising a spacer sequence (FIG. 13A, 101 ) is covalently linked through a linker nucleic acid sequence (FIG. 13A, 103 ) to aClass 1 Type I CRISPR repeat stem sequence (FIG. 13A, 104 ). The Type II CRISRP NATNA covalently linked to the Type I CRISPR repeat stem sequence (FIG. 13A, 105 ) is capable of binding to a Type II dCas9 (FIG. 13A, 106 ) and a Type I Cascade subunit protein (e.g., Cas6;FIG. 13A, 107 ) which is fused though a linker sequence (FIG. 13A, 108 ) to a functional protein domain (e.g., an enzymatic domain, a transcriptional activation or repression domain;FIG. 13A, 109 ) to form an RNP complex. This RNP complex (FIG. 13B, 110 ) is capable of targeting a double-stranded DNA (FIG. 13B, 111 ) comprising a target sequence (FIG. 13B, 112 ) complementary to the Type II CRISPR NATNA spacer sequence (FIG. 13A, 101 ). Target recognition by the RNP complex results in hybridization (FIG. 13B, 113 ) between the spacer sequence (FIG. 13A, 101 ) and the target sequence (FIG. 13B, 112 ). Localization of the Cascade subunit-functional domain fusion protein to the DNA allows for modification of the DNA by the functional protein domain or transcriptional regulation of an adjacent gene (FIG. 13C, 114 ). - In an eighth aspect, the present invention relates to compositions comprising engineered Type I CRISPR-Cas effector complexes, modified guide polynucleotides, and combinations thereof. In some embodiments, the engineered Type I CRISPR-Cas effector complex comprises an associated Cas3 fusion protein.
- An embodiment of this aspect of the present invention relates to a composition comprising two engineered Type I CRISPR-Cas effector complexes each comprising a spacer and a fusion protein comprising a Cas subunit and an endonuclease (e.g., a FokI; see e.g., the Cascade complexes of
FIG. 2A ,FIG. 2B , andFIG. 2C ), wherein at least two parameters are varied to modulate genome editing efficiency. Such parameters include: - the length of a linker polypeptide used to produce the fusion protein comprising a Cas subunit protein and the endonuclease (e.g., FokI); and
- the length of the interspacer distance between the nucleic acid target sequences to which the spacers are capable of binding.
- Guidance is provided herein regarding the amino acid composition and sequence linker polypeptides.
- One embodiment of this aspect of the present invention is a composition comprising:
- a first engineered Type I CRISPR-Cas effector complex comprising,
- a first Cse2 subunit protein, a first Cas5 subunit protein, a first Cas6 subunit protein, and a first Cas7 subunit protein,
- a first fusion protein comprising a first Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a first guide polynucleotide comprising a first spacer capable of binding a first nucleic acid target sequence; and
- a second engineered Type I CRISPR-Cas effector complex comprising,
- a second Cse2 subunit protein, a second Cas5 subunit protein, a second Cas6 subunit protein, and a second Cas7 subunit protein,
- a second fusion protein comprising a second Cas8 subunit protein and a second FokI, wherein the N-terminus of the second Cas8 subunit protein or the C-terminus of the second Cas8 protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the second linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a second guide polynucleotide comprising a second spacer capable of binding a second nucleic acid target sequence, wherein a protospacer adjacent motif (PAM) of the second nucleic acid target sequence and a PAM of the first nucleic acid target sequence have an interspacer distance between 20
bp 42 bp. - Examples of such a first engineered Type I CRISPR-Cas effector complex bound to a first nucleic acid target sequence and a second engineered Type I CRISPR-Cas effector complex bound to a second nucleic acid target sequence are illustrated in
FIG. 2A ,FIG. 2B , andFIG. 2C . - In some embodiments, the length of the first linker polypeptide and/or the second linker polypeptide is a length of between about 15 amino acids and about 30 amino acids, or between about 17 amino acids and about 20 amino acids. In one embodiment, the length of the first linker polypeptide and the second linker polypeptide are the same.
- The first Cas8 subunit protein and the second Cas8 subunit protein can each comprise identical amino acid sequences of the Cas8 subunit protein.
- Similarly, the first Cse2 subunit protein and the second Cse2 subunit protein can each comprise identical amino acid sequences of the Cse2 subunit protein, the first Cas5 subunit protein and the second Cas5 subunit protein can each comprise identical amino acid sequences of the Cas5 subunit protein, the first Cas6 subunit protein and the second Cas6 subunit protein can each comprise identical amino acid sequences of the Cas6 subunit protein, the first Cas7 subunit protein and the second Cas7 subunit protein can each comprise identical amino acid sequences of the Cas7 subunit protein, and combinations thereof.
- Typically, the N-terminus of the first Cas8 subunit protein is covalently connected by the first linker polypeptide to the C-terminus of the first FokI, the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the N-terminus of the first FokI, the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI, the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the N-terminus of the second FokI, and combinations thereof.
- Embodiments of this aspect of the present invention include embodiments wherein the length between the second nucleic acid target sequence and the first nucleic acid target sequence is an interspacer distance between about 22 bp to about 40 bp, between about 26 bp to about 36 bp, between about 29 bp to about 35 bp, or between about 30 bp to about 34 bp.
- The first FokI and the second FokI can be monomeric subunits that are capable of associating to form a homodimer, or distinct subunits that are capable of associating to form a heterodimer.
- In a preferred embodiment, the guide polynucleotides comprise RNA.
- In some embodiments, genomic DNA comprises the PAM of the second nucleic acid target sequence and the PAM of the first nucleic acid target sequence.
- In some embodiments, the engineered Type I CRISPR-Cas effector complexes are based on Type I CRISPR-Cas effector complexes of one or more organisms selected from the group consisting of Salmonella enterica, Geothermobacter sp. EPR-M, Methanocella arvoryzae MRE50, Streptococcus thermophilus (strain ND07)), S. thermophilus, Pseudomonas sp. S-6-2 and E. coli. In preferred embodiments, the engineered Type I CRISPR-Cas effector complexes are based on Type I CRISPR-Cas effector complexes of S. thermophilus, Pseudomonas sp. S-6-2, and/or E. coli.
- The data presented in Example 18 and Example 20 demonstrate that varying the length of the linker polypeptide used to produce the fusion protein comprising the Cas subunit protein and the FokI and/or varying the length of the interspacer distance between the nucleic acid target sequences to which the spacers are capable of binding facilitate modulation of genome editing efficiency in cells.
- In yet another embodiment, the present invention relates to an engineered Type I CRISPR-Cas effector complex comprising a first fusion protein that comprises a Cascade subunit protein (e.g., a Cas8 subunit protein) and a first functional domain (e.g., FokI), and a second fusion protein that comprises a dCas3* protein and a second functional domain (e.g., FokI). The engineered Type I CRISPR-Cas effector complex comprising the first functional domain (e.g., FokI) (
FIG. 14A , Cas8-linker1-FP1 fusion) can bind DNA and can then recruit the dCas3*-second functional domain (e.g., FokI) fusion protein (FIG. 14A , dCas3*-linker2-FP2). In the case where the first functional domain (FIG. 14A , Cas8-linker1-FP1 fusion) and the second functional domain (FIG. 14A , dCas3*-linker2-FP2) comprise subunits of a dimeric protein, the dCas3*-second functional domain (e.g., FokI) fusion protein binds the engineered Type I CRISPR-Cas effector complex comprising the first functional domain (e.g., FokI) facilitating dimerization of the first functional domain and the second functional domain (FIG. 14A ).FIG. 15A illustrates the binding to dsDNA of an engineered Type I CRISPR-Cas effector complex (FIG. 15A , Cascade) comprising the first functional domain (FIG. 15A , FD1) connected to a Cas subunit protein (FIG. 15A , striped box) via a linker polypeptide (FIG. 15A , Linker 1) and a dCas3* connected to a second functional domain (FIG. 15A , FD2) via a linker polypeptide (FIG. 15A , Linker 2) associated with the Cascade complex; thus bringing FD1 and FD2 into proximity and facilitating the interaction of FD1 and FD2. Binding of the Cascade complex involves a single PAM sequence (FIG. 15A , PAM, open box). In the case of the functional domain being a dimeric endonuclease (e.g., FokI), the proximity of FD1 and FD2 facilitates formation of a functional dimer. - One advantage of this embodiment of the present invention is a single Cascade complex (recognizing a single PAM sequence) can be used to cleave a double-stranded nucleic acid target sequence, versus using two FokI-Cascade complexes (
FIG. 15A compareFIG. 2A ,FIG. 2B , andFIG. 2C ). Using two FokI-Cascade complexes requires two PAM sequences in the proper orientation (FIG. 2A ,FIG. 2B , andFIG. 2C ), which can limit selection of proximal nucleic acid target sequences. - The length and/or composition of the linker polypeptide used to produce the fusion protein comprising a Cas subunit protein and an endonuclease (e.g., FokI), as well as the length and/or composition of the linker polypeptide used to produce the fusion protein comprising a dCas3*protein and an endonuclease can be varied to modulate genome editing efficiency. Example 21 describes the design and testing of multiple Cas3-FokI linker compositions and lengths and FokI-Cas8 linker compositions and lengths for modulation of genome editing efficiency.
- Another embodiment of this aspect of the invention comprises an engineered Type I CRISPR-Cas effector complex and a fusion protein comprising a dCas3* protein and a functional domain (e.g., cytidine deaminase) connected by a linker polypeptide (
FIG. 14B , dCas3*, Linker, and FP). The engineered Type I CRISPR-Cas effector complex can bind DNA and recruit the dCas3*-functional domain (e.g., cytidine deaminase) fusion protein. This embodiment can facilitate site-specific targeting of a nucleic acid target sequence for modification by, or interaction with, a functional domain. In the case of cytidine deaminase, an engineered Type I CRISPR-Cas effector complex and a fusion protein that comprises a dCas3* protein and cytidine deaminase can be used for site-specific base editing in a nucleic acid target sequence.FIG. 15B illustrates an example of an engineered Type I CRISPR-Cas effector complex (FIG. 15B , Cascade) comprising a fusion protein comprising a dCas3* protein (FIG. 15B , dCas3*) connected with a functional domain (FIG. 15B , FD) via a linker polypeptide (FIG. 15B , Linker), wherein the complex is bound to dsDNA. InFIG. 15B , contact of the functional domain with dsDNA is facilitated.FIG. 15C illustrates another example of an engineered Type I CRISPR-Cas effector complex (FIG. 15C , Cascade) comprising a fusion protein comprising a dCas3* protein (FIG. 15C , dCas3*) connected with a functional domain (FIG. 15C , FD) via a linker polypeptide (FIG. 15C , Linker), wherein the complex is bound to dsDNA. InFIG. 15C , contact of the functional domain with ssDNA is facilitated. - Some embodiments of the invention can use an engineered Type I CRISPR-Cas effector complex and mutant form of Cas3 lacking ATPase and/or helicase activity (e.g., the Cas3 can be a nickase). The engineered Type I CRISPR-Cas effector complexes can bind DNA and then recruit the ATPase or helicase mutant form of Cas3. This embodiment can facilitate site-specific cleavage of genomic DNA by a mutant form of Cas3.
- Additional functional domains and proteins that can be used to construct fusion proteins with Type I CRISPR-Cas subunit proteins are described in the present Specification and Examples. Linker polypeptide compositions and lengths for Cas3-linker polypeptide-functional domain fusion proteins can be evaluated following the guidance of Example 21 and the present Specification to evaluate effects on the performance of the functional domain.
- In a ninth aspect, the present invention relates to methods of using engineered Type I CRISPR-Cas effector complexes.
- In one embodiment, the present invention includes a method of binding a nucleic acid target sequence in a polynucleotide (e.g., dsDNA) comprising providing one or more engineered Type I CRISPR-Cas effector complexes for introduction into a cell or a biochemical reaction and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide. In one embodiment, a first engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a first nucleic acid target sequence in the polynucleotide and a second engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a second nucleic acid target sequence in the polynucleotide. In another embodiment, an engineered Type I CRISPR-Cas effector complex comprises a guide complementary to a nucleic acid target sequence in the polynucleotide and further comprises a dCas3* fusion protein capable of associating with the complex. Contact of the complex(es) with the polynucleotide results in binding of the engineered Type I CRISPR-Cas effector complex(es) to the nucleic acid target sequence(s) in the polynucleotide. In one embodiment, a first engineered 1 Type I CRISPR-Cas effector complex binds to a first nucleic acid target sequence and a second engineered Type I CRISPR-Cas effector complex binds to a second nucleic acid target sequence in the polynucleotide. In another embodiment, an engineered Type I CRISPR-Cas effector complex binds to a nucleic acid target sequence in the polynucleotide, and the effector complex comprises a dCas3*fusion protein associated with the complex.
- Such methods of binding a nucleic acid target sequence can be carried out in vitro (e.g., in a biochemical reaction or in cultured cells; in some embodiments, the cultured cells are human cultured cells that remain in culture and are not introduced into a human); in vivo (e.g., in cells of a living organism, with the proviso that, in some embodiments, the organism is a non-human organism); or ex vivo (e.g., cells removed from a subject, with the proviso that, in some embodiments, the subject is a non-human subject).
- A variety of methods are known in the art to evaluate and/or quantitate interactions between nucleic acid sequences and polypeptides including, but not limited to, the following: immunoprecipitation (ChIP) assays, DNA electrophoretic mobility shift assays (EMSA), DNA pull-down assays, and microplate capture and detection assays. Commercial kits, materials, and reagents are available to practice many of these methods and, for example, can be obtained from the following suppliers: Thermo Scientific (Wilmington, Del.), Signosis (Santa Clara, Calif.), Bio-Rad (Hercules, Calif.), and Promega (Madison, Wis.). A common approach to detect interactions between a polypeptide and a nucleic acid sequence is EMSA (see, e.g., Hellman L. M., et al., Nature Protocols 2(8):1849-1861 (2007)).
- In another embodiment, the present invention includes a method of cutting a nucleic acid target sequence in a polynucleotide (e.g., a single-strand cut in dsDNA or double-strand cut in dsDNA) comprising providing one or more engineered Type I CRISPR-Cas effector complexes for introduction into a cell or biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide. In one embodiment, a first engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) (
FIG. 16A , Cascade1), and a second engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a second nucleic acid target sequence in the polynucleotide and a second nuclease domain (e.g., FokI) (FIG. 16A , Cascade 2) are introduced into the cell or biochemical reaction. In another embodiment, an engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) (FIG. 17A , Cascade), and a dCas3*-second nuclease domain (e.g., FokI) fusion protein (FIG. 17A , dCas3) capable of associating with the complex are introduced into the cell or biochemical reaction. The contacting results in cutting of the nucleic acid target sequence(s) in the polynucleotide (e.g., a dsDNA) by the engineered Type I CRISPR-Cas effector complex(es). In one embodiment, the first engineered 1 Type I CRISPR-Cas effector complex binds to the first nucleic acid target sequence in dsDNA (FIG. 16B , Cascade1) and cleaves the first strand of a dsDNA (FIG. 16C , Cascade1), and the second engineered Type I CRISPR-Cas effector complex binds to the second nucleic acid target sequence in dsDNA (FIG. 16B , Cascade2) and cleaves the second strand of a dsDNA (FIG. 16C , Cascade2). In another embodiment, the engineered Type I CRISPR-Cas effector complex binds to a nucleic acid target sequence in dsDNA (FIG. 17B , Cascade) and cleaves the first strand of a dsDNA (FIG. 17C , Cascade), and the dCas3* fusion protein associates with the complex (FIG. 17B , dCas3*) and cleaves the second strand of the dsDNA (FIG. 17C , dCas3*). - In an additional embodiment of the method of cutting a nucleic acid target sequence in a polynucleotide, a donor polynucleotide can also be introduced into a cell to facilitate incorporation of at least a portion of the donor polynucleotide into genomic DNA of the cell.
FIG. 18A illustrates an example of both strands of a dsDNA being cleaved by a first engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in the polynucleotide and a first nuclease domain (e.g., FokI) (FIG. 18A , Cascade1), and a second engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a second nucleic acid target sequence in the polynucleotide and a second nuclease domain (e.g., FokI) (FIG. 18A , Cascade 2).FIG. 18B illustrates a donor polynucleotide comprising homology arms complementary to DNA sequences adjacent the double-strand cut site (FIG. 18B , Donor, dashed lines).FIG. 18C illustrates incorporation of a portion of the donor polynucleotide (FIG. 18C dashed lines) at the double-strand cut site. Incorporation of the donor polynucleotide is mediated by cellular DNA repair mechanisms (e.g., homology-directed repair). - In other embodiments, an engineered Type I CRISPR-Cas effector complex comprising a guide complementary to a first nucleic acid target sequence in a polynucleotide and a first nuclease domain can be paired with a second component comprising a second nuclease domain, wherein the second component is capable of binding to a second nucleic acid target sequence in the polynucleotide. Examples of such second components include, a transcription activator-like effector nuclease (TALEN) comprising the second nuclease domain, a zinc finger comprising the second nuclease domain, or a dCas9/NATNA complex comprising the second nuclease domain.
- In some embodiments, the nucleic acid target sequence is dsDNA (e.g., genomic) DNA. In some embodiments, the nucleic acid target sequence is double-stranded and one or both of the strands is cut. Such methods of cutting a nucleic acid target sequence can be carried out in vitro, in vivo, or ex vivo.
- In yet another embodiment, the present invention includes a method of modifying one or more nucleic acid target sequences in a polynucleotide (e.g., DNA) in a cell or biochemical reaction comprising providing one or more engineered Type I CRISPR-Cas effector complexes (e.g., comprising a Cas subunit protein-cytidine deaminase fusion protein) for introduction into the cell or the biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the polynucleotide resulting in binding of the engineered Type I CRISPR-Cas effector complex(es) to the nucleic acid target sequence(s) in the polynucleotide that facilitates modification of the nucleic acid target sequence(s) (e.g., C-to-T, G-to-A, A-to-G, and T-to-C).
FIG. 19A toFIG. 19D illustrate an example of using a Cascade complex comprising a Cas subunit protein-linker polypeptide-cytidine deaminase fusion protein (Cascade/CD complex) to modify a target nucleotide in genomic DNA of a cell. The Cascade/CD complex (FIG. 19A ) is introduced into the cell. The Cascade/CD complex comprises a guide complementary to a DNA target sequence adjacent a target cytosine (FIG. 19B ,FIG. 19C ). The Cascade/CD complex binds the DNA target sequence (FIG. 19B ) and the cytidine deaminase converts the cytosine to a uracil (FIG. 19C ). Cellular repair mechanisms can then repair the uracil to a thymidine, and change the mismatched guanidine to adenine (FIG. 19D ). - In yet another embodiment, the present invention includes methods of modulating in vitro or in vivo transcription, for example, transcription of a gene comprising regulatory element sequences. Such methods comprise providing one or more engineered Type I CRISPR-Cas effector complexes (e.g., comprising a Cas subunit protein-transcription factor fusion protein) for introduction into the cell or the biochemical reaction, and introducing the engineered Type I CRISPR-Cas effector complex(es) into the cell or biochemical reaction, thereby facilitating contact of the engineered Type I CRISPR-Cas effector complex(es) with the regulatory element sequences resulting in binding of the engineered Type I CRISPR-Cas effector complex(es) to the regulatory element sequences thereby facilitating modulating in vitro or in vivo transcription of the gene comprising the regulatory element sequences.
-
FIG. 20A andFIG. 20B present general illustrations of examples for the transcriptional activation of a generic gene (“GENE1”).FIG. 20A provides an overview of transcriptional regulation of an endogenous gene in a eukaryotic cell. InFIG. 20A , the two dark parallel lines represent double-stranded DNA, the location of Gene 1 (FIG. 20A , GENE 1) is indicated, as well as the transcriptional start site (FIG. 20A , TSS) associated withGene 1. In the first panel ofFIG. 20A , a transcription factor (FIG. 20A , TF) that is needed for the transcriptional activation ofGene 1 and polymerase II (FIG. 20A , Pol II) are illustrated as not yet associated with Gene1-TSS. The second panel illustrates association of the TF with its cognate TSS. The TF then recruits a transcription activation protein (TP) that then recruits RNA Polymerase II (Pol II). Typically, in eukaryotes the TF factor and the TP form a complex comprising multiple proteins and possibly other molecules. The third panel illustrates the resulting transcription ofGene 1 by Pol II. This type of transcriptional activation is typically dependent on TF(s) that are specific to the expression of a gene(s).FIG. 20B presents an illustration of one embodiment of the present invention, wherein a Cascade complex is modified with a protein or factor (FIG. 20B , CASCADEa) that attracts one or more components in the cells responsible for transcriptional activation (Transcriptional Activation factor;FIG. 20B , TA). An example of one such protein or factor is the protein vp64. CASCADEa comprises a guide that is capable of binding at or near the TSS (FIG. 20B , TSS). InFIG. 20B , the two dark parallel lines represent double-stranded DNA, the location of Gene 1 (FIG. 20B , GENE 1) is indicated, as well as the transcriptional start site (TSS) associated withGene 1. In the first panel ofFIG. 20B , CASCADEa and polymerase II (FIG. 20B , Pol II) are illustrated as not yet associated with Gene1-TSS. The second panel illustrates association of CASCADEa with its target, the TSS. The CASCADEa then recruits a transcription activation protein (FIG. 20B , TA) that then recruits RNA Polymerase II (FIG. 20B , Pol II). The third panel illustrates the resulting transcription ofGene 1 by Pol II. One advantage of this embodiment of the present invention is that transcriptional activation of a gene is not dependent on endogenous transcription factors that bind to the TSS of the gene, rather the TSS of a gene can be targeted by selection of an appropriate Cascade guide. -
FIG. 21A andFIG. 21B present a general illustration of an example for the transcriptional repression of a generic gene (FIG. 21 A, GENE 1) using a Cascade complex comprising a Cas subunit protein-KRAB domain fusion and a guide (FIG. 21A , CASCADEi) complementary to regulatory sequences (FIG. 21A , promoter) associated withGENE 1. Binding of CASCADEi to the regulatory sequences (FIG. 21B ) results in transcriptional repression ofGENE 1. - In yet another aspect, the present invention relates to using Type I CRISPR systems and Cas3 to delete nucleic acid target sequences in a 3′ to 5′ manner. This method can be used to make long range deletions of a specific length and can be useful for creation of gene knockouts.
- In one embodiment, a region of a target polynucleotide (e.g., genomic DNA) can be deleted using a combination of a Cascade complex comprising a guide complementary to a first nucleic acid target sequence in the target polynucleotide and a dCas9/NATNA complex wherein the NATNA comprises a spacer sequence complementary to a second nucleic acid target sequence in the target polynucleotide. The first and second nucleic acid target sequences are selected to flank the nucleic acid target sequence targeted for deletion. A Cas3 protein comprising an active endonuclease activity associates with the Cascade complex and then progressively deletes a single strand of the dsDNA comprising the nucleic acid target sequence targeted for deletion. When the Cas3 protein collides with the dCas9/NATNA complex, the Cas3 nuclease activity can be stopped at the second nucleic acid target sequence by the dCas9/NATNA complex.
FIG. 22A toFIG. 22D illustrate an example of a Cas3 deletion of a nucleic acid target sequence.FIG. 22A shows a dsDNA comprising nucleic acid target sequence 1 (FIG. 22A , NATS1) and nucleic acid target sequence 2 (FIG. 22A , NATS2) that flank the nucleic acid target sequence targeted for deletion.FIG. 22A shows the Cascade complex comprising a guide complementary to NATS1 (FIG. 22A , Cascade), the Cas3 protein (FIG. 22A , Cas3), and the dCas9/NATNA complex comprising a spacer complementary to NATS2 (FIG. 22A , dCas9).FIG. 22B shows binding of the Cascade complex to NATS1, association of the Cas3 protein with the Cascade complex, and binding of the dCas9/NATNA complex to NATS2.FIG. 22C illustrates the progressive deletion by Cas3 of a single strand of the nucleic acid target sequence targeted for deletion.FIG. 22D shows the dissociation of the Cas3 protein from the dsDNA at the position of the dCas9/NATNA complex bound to NATS2. - In another embodiment, a region of a target polynucleotide (e.g., genomic DNA) can be deleted using a combination of a first Cascade complex comprising a guide complementary to a first nucleic acid target sequence in the target polynucleotide and a second Cascade complex comprising a guide complementary to a second nucleic acid target sequence in the target polynucleotide. The first and second nucleic acid target sequences are selected to flank the nucleic acid target sequence targeted for deletion. Cas3 proteins comprising active endonuclease activity associate with each Cascade complex and then progressively delete both strands of the nucleic acid target sequence targeted for deletion. When each Cas3 protein collides with one of the Cascade complexes, the Cas3 nuclease activity can be stopped at the first and second nucleic acid target sequences by the Cascade complexes.
FIG. 23A toFIG. 23D illustrate an example of a Cas3 deletion of both strands of a nucleic acid target sequence.FIG. 23A shows a dsDNA comprising nucleic acid target sequence 1 (FIG. 23A , NATS1) and nucleic acid target sequence 2 (FIG. 23A , NATS2) that flank the nucleic acid target sequence targeted for deletion.FIG. 23A shows the first Cascade complex comprising a guide complementary to NATS1 (FIG. 23A , Cascade1), the Cas3 proteins (FIG. 23A , Cas3), and the second Cascade complex comprising a guide complementary to NATS2 (FIG. 23A , Cascade2).FIG. 23B shows binding of the Cascade complexes to NATS1 and NATS2, as well as association of the Cas3 proteins with the Cascade complexes.FIG. 23C illustrates the progressive deletion by Cas3 of both strands of the nucleic acid target sequence targeted for deletion.FIG. 23D shows the dissociation of the Cas3 proteins from the dsDNA at the positions of the Cascade complexes bound to NATS1 and NATS2. - The engineered Type I CRISPR-Cas effector complexes, as described herein, can be incorporated into a kit. In some embodiments, a kit includes a package with one or more containers holding the kit elements, as one or more separate compositions or, optionally if the compatibility of the components allows, as admixture. In some embodiments, a kit also comprises one or more of the following excipients: a buffer, a buffering agent, a salt, a sterile aqueous solution, a preservative, and combinations thereof. Illustrative kits can comprise one or more engineered Type I CRISPR-Cas effector complexes and one or more excipients, or one or more nucleic acid sequences encoding one or more components of engineered Type I CRISPR-Cas effector complexes.
- Furthermore, kits can further comprise instructions for using engineered Type I CRISPR-Cas effector complex compositions.
- Another aspect of the invention relates to methods of making or manufacturing one or more engineered Type I CRISPR-Cas effector complexes, or components thereof. In one embodiment, a method of making or manufacturing comprises production of engineered Type I CRISPR-Cas effector complexes in a cell and purification of the engineered Type I CRISPR-Cas effector complexes from cell lysates.
- Engineered Type I CRISPR-Cas effector complex compositions can further comprise a detectable label, such as a moiety that can provide a detectable signal. Examples of detectable labels include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore (FAM), a fluorescent protein (green fluorescent protein (GFP), red fluorescent protein, mCherry, tdTomato), a DNA or RNA aptamer together with a suitable fluorophore (enhanced GFP (eGFP), “Spinach”), a quantum dot, an antibody, and the like. A large number and variety of suitable detectable labels are well-known to one of ordinary skill in the art.
- Cells comprising engineered Type I CRISPR-Cas effector complexes, cells modified through the use of engineered Type I CRISPR-Cas effector complexes, or progeny of such cells can be used as pharmaceutical compositions formulated, for example, with a pharmaceutically acceptable excipient. Illustrative excipients include carriers, stabilizers, diluents, dispersing agents, suspending agents, thickening agents, and the like. The pharmaceutical compositions can facilitate administration of engineered Type I CRISPR-Cas effector complexes to a subject. Pharmaceutical compositions can be administered in therapeutically effective amounts by various forms and routes including, for example, intravenous, subcutaneous, intramuscular, oral, aerosol, parenteral, ophthalmic, and pulmonary administration.
- Embodiments of the present invention include, but are not limited to, the following.
- A composition comprising:
- a first engineered
Class 1 Type I CRISPR-Cas effector complex comprising, -
- a first Cse2 subunit protein, a first Cas5 subunit protein, a first Cas6 subunit protein, and a first Cas7 subunit protein,
- a first fusion protein comprising a first Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a first guide polynucleotide comprising a first spacer capable of binding a first nucleic acid target sequence; and
- a second engineered
Class 1 Type I CRISPR-Cas effector complex comprising, -
- a second Cse2 subunit protein, a second Cas5 subunit protein, a second Cas6 subunit protein, and a second Cas7 subunit protein,
- a second fusion protein comprising a second Cas8 subunit protein and a second FokI, wherein the N-terminus of the second Cas8 subunit protein or the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the second linker polypeptide has a length of between 10 amino acids to 40 amino acids, and
- a second guide polynucleotide comprising a second spacer capable of binding a second nucleic acid target sequence, wherein a protospacer adjacent motif (PAM) of the second nucleic acid target sequence and a PAM of the first nucleic acid target sequence have an interspacer distance between 20 bp to 42 bp.
- The composition of
embodiment 1, wherein the first linker polypeptide has a length of between 15 amino acids and 30 amino acids. - The composition of
embodiment 2, wherein the first linker polypeptide has a length of between 17 amino acids and 20 amino acids. - The composition of any one of embodiments 1-3, wherein the second linker polypeptide has a length of between 15 amino acids and 30 amino acids.
- The composition of
embodiment 4, wherein the second linker polypeptide has a length of between 17 amino acids and 20 amino acids. - The composition of any preceding embodiment, wherein the length of the first linker polypeptide and the second linker polypeptide are the same.
- The composition of any preceding embodiment, wherein the second nucleic acid target sequence and the first nucleic acid target sequence each has an interspacer distance between 22 bp to 40 bp.
- The composition of
embodiment 7, wherein the second nucleic acid target sequence and the first nucleic acid target sequence each has an interspacer distance between 26 bp to 36 bp. - The composition of
embodiment 8, wherein the second nucleic acid target sequence and the first nucleic acid target sequence each has an interspacer distance between 29 bp to 35 bp. - The composition of
embodiment 9, wherein the second nucleic acid target sequence and the first nucleic acid target sequence each has an interspacer distance between 30 bp to 34 base bp. - The composition of any preceding embodiment, wherein the first FokI and the second FokI are monomeric subunits capable of associating to form a homodimer.
- The composition of any one of embodiments 1-10, wherein the first FokI and the second FokI are distinct monomeric subunits capable of associating to form a heterodimer.
- The composition of any preceding embodiment, wherein the N-terminus of the first Cas8 subunit protein is covalently connected by the first linker polypeptide to the C-terminus of the first FokI.
- The composition of any one of embodiments 1-12, wherein the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the N-terminus of the first FokI.
- The composition of any preceding embodiment, wherein the N-terminus of the second Cas8 subunit protein is covalently connected by the second linker polypeptide to the C-terminus of the second FokI.
- The composition of any one of embodiments 1-14, wherein the C-terminus of the second Cas8 subunit protein is covalently connected by a second linker polypeptide to the N-terminus of the second FokI.
- The composition of any preceding embodiment, wherein the first Cas8 subunit protein and the second Cas8 subunit protein each comprises identical amino acid sequences.
- The composition of any preceding embodiment, wherein the first Cse2 subunit protein and the second Cse2 subunit protein each comprises identical amino acid sequences, the first Cas5 subunit protein and the second Cas5 subunit protein each comprises identical amino acid sequences, the first Cas6 subunit protein and the second Cas6 subunit protein each comprises identical amino acid sequences, and the first Cas7 subunit protein and the second Cas7 subunit protein each comprises identical amino acid sequences.
- The composition of any preceding embodiment, wherein the first guide polynucleotide comprises RNA.
- The composition of any preceding embodiment, wherein the second guide polynucleotide comprises RNA.
- The composition of any preceding embodiment, wherein genomic DNA comprises the PAM of the second nucleic acid target sequence and the PAM of the first nucleic acid target sequence.
- A cell comprising: the composition of any preceding embodiment.
- The cell of
embodiment 22, wherein genomic DNA of the cell comprises the PAM of the second nucleic acid target sequence and the PAM of the first nucleic acid target sequence. - The cell of
embodiment - The cell of
embodiment - One or more nucleic acid sequences encoding the first Cse2 subunit protein, the first Cas5 subunit protein, the first Cas6 subunit protein, the first Cas7 subunit protein, the first fusion protein, and the first guide polynucleotide of any one of embodiments 1-21.
- One or more nucleic acid sequences encoding the second Cse2 subunit protein, the second Cas5 subunit protein, the second Cas6 subunit protein, the second Cas7 subunit protein, the second fusion protein, and the second guide polynucleotide of any one of embodiments 1-21.
- One or more expression cassettes comprising the one or more nucleic acid sequences of
embodiment 26,embodiment 27, orembodiment 26 andembodiment 27. - One or more vectors comprising the one or more expression cassettes of
embodiment 28. - A method of binding a polynucleotide comprising the first nucleic acid target sequence and the second nucleic acid target sequence, the method comprising:
- providing the composition of any one of embodiments 1-21 for introduction into a cell or a biochemical reaction; and
- introducing the composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered
Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the second engineeredClass 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, resulting in binding of the first engineeredClass 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and binding of the second engineeredClass 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence in the polynucleotide. - The method of
embodiment 30, wherein genomic DNA comprises the polynucleotide. - A method of cutting a polynucleotide comprising the first nucleic acid target sequence and the second nucleic acid target sequence, the method comprising:
- providing the composition of any one of embodiments 1-21 for introduction into a cell or a biochemical reaction; and
- introducing the composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered
Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the engineeredsecond Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, resulting in cutting of the first nucleic acid target sequence by the first engineeredClass 1 Type I CRISPR-Cas effector complex and cutting of the second nucleic acid target sequence by the second engineeredClass 1 Type I CRISPR-Cas effector complex. - The method of
embodiment 32, wherein genomic DNA comprises the polynucleotide. - A kit comprising: the composition of any one of embodiments 1-21; and a buffer.
- A kit comprising: the one or more nucleic acid sequences of
embodiment 26,embodiment 27, orembodiment 26 andembodiment 27; and a buffer. - A composition comprising:
- an engineered
Class 1 Type I CRISPR-Cas effector complex comprising, -
- a Cse2 subunit protein, a Cas5 subunit protein, a Cas6 subunit protein, and a Cas7 subunit protein,
- a first fusion protein comprising a Cas8 subunit protein and a first FokI, wherein the N-terminus of the first Cas8 subunit protein or the C-terminus of the first Cas8 subunit protein is covalently connected by a first linker polypeptide to the C-terminus or N-terminus, respectively, of the first FokI, and
- a guide polynucleotide comprising a spacer capable of binding a nucleic acid target sequence; and
- a second fusion protein comprising an engineered
Class 1 Type I CRISPR-Cas3 fusion protein comprising a dCas3*protein and a second FokI, wherein the N-terminus of the dCas3* protein or the C-terminus of the dCas3* protein is covalently connected by a second linker polypeptide to the C-terminus or N-terminus, respectively, of the second FokI, and wherein the first linker polypeptide has a length of between 10 amino acids to 40 amino acids, effector complex comprising, - The composition of
embodiment 36, wherein the first linker polypeptide has a length of between 5 amino acids to 40 amino acids. - The composition of
embodiment 36, wherein the first linker polypeptide has a length of between 5 amino acids to 40 amino acids. - A cell comprising: the composition of any one of
embodiments 36 to 38. - The cell of
embodiment 39, wherein the cell is a prokaryotic cell. - The cell of
embodiment 39, wherein the cell is a eukaryotic cell. - One or more nucleic acid sequences encoding the Cse2 subunit protein, the Cas5 subunit protein, the Cas6 subunit protein, the Cas7 subunit protein, the first fusion protein, and the guide polynucleotide of any one of
embodiments 36 to 38. - One or more nucleic acid sequences encoding the second fusion protein of any one of
embodiments 36 to 38. - One or more expression cassettes comprising the one or more nucleic acid sequences of
embodiment 42,embodiment 43, orembodiment 42 andembodiment 43. - One or more vectors comprising the one or more expression cassettes of
embodiment 44. - A method of binding a polynucleotide comprising the nucleic acid target sequence, the method comprising:
- providing the composition of any one of
embodiments 36 to 38 for introduction into a cell or a biochemical reaction; and - introducing the composition into the cell or the biochemical reaction, thereby facilitating contact of the engineered
Class 1 Type I CRISPR-Cas effector complex with the nucleic acid target sequence and contact of the second fusion protein with the engineeredClass 1 Type I CRISPR-Cas effector complex, resulting in binding of the engineeredClass 1 Type I CRISPR-Cas effector complex and the second fusion protein to the nucleic acid target sequence in the polynucleotide. - The method of
embodiment 46, wherein genomic DNA comprises the polynucleotide. - A method of cutting a polynucleotide comprising the nucleic acid target sequence, the method comprising:
- providing the composition of any one of
embodiments 36 to 38 for introduction into a cell or a biochemical reaction; and - introducing the composition into the cell or the biochemical reaction, thereby facilitating contact of the first engineered
Class 1 Type I CRISPR-Cas effector complex with the first nucleic acid target sequence and contact of the engineeredsecond Class 1 Type I CRISPR-Cas effector complex with the second nucleic acid target sequence, and - introducing the composition into the cell or the biochemical reaction, thereby facilitating contact of the second engineered
Class 1 Type I CRISPR-Cas effector complex with the nucleic acid target sequence and contact of the second fusion protein with the engineeredClass 1 Type I CRISPR-Cas effector complex, resulting in cutting of the nucleic acid target sequence by the engineeredClass 1 Type I CRISPR-Cas effector complex and the second fusion protein. - The method of
embodiment 48, wherein genomic DNA comprises the polynucleotide. - A kit comprising: the composition of any one of
embodiments 36 to 38; and a buffer. - A kit comprising: the one or more nucleic acid sequences of
embodiment 42,embodiment 43, orembodiment 42 andembodiment 43; and a buffer. - Although preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. From the present Specification and the Examples, one skilled in the art can ascertain essential characteristics of this invention, and without departing from the spirit and scope thereof, can make changes, substitutions, variations, and modifications of the invention to adapt it to various usages and conditions. Such changes, substitutions, variations, and modifications are also intended to fall within the scope of the present disclosure.
- Aspects of the present invention are illustrated in the following Examples. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, concentrations, percent changes, and the like) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, temperature is in degrees Centigrade and pressure is at or near atmospheric. It should be understood that these Examples are given by way of illustration only and are not intended to limit the scope of the present invention.
- This Example provides a description of the design of polynucleotide components encoding Cascade using gene, protein, and CRISPR sequences derived from a Type I-E CRISPR-Cas system.
- Table 10 presents polynucleotide DNA sequences of genes encoding the five proteins of Cascade from Type I-E, specifically from E. coli strain K-12 MG1655, as well as the amino acid sequences of the resulting protein components. Genomic sequences were obtained from NCBI Reference Sequence NZ_CP014225.1. In the Table, polynucleotide sequences were either amplified from E. coli genomic DNA or manufacturer-produced polynucleotides encoding Cascade protein components that were codon optimized specifically for expression in E. coli and also for expression in human cells.
-
TABLE 10 Cas Protein DNA and Amino Acid Sequences Type of DNA coding Amino acid Protein sequence sequence sequence Cas8 genomic SEQ ID NO: 1 SEQ ID NO: 16 Cse2 genomic SEQ ID NO: 2 SEQ ID NO: 17 Cas7 genomic SEQ ID NO: 3 SEQ ID NO: 18 Cas5 genomic SEQ ID NO: 4 SEQ ID NO: 19 Cas6 genomic SEQ ID NO: 5 SEQ ID NO: 20 Cas8 E. coli codon-optimized SEQ ID NO: 6 SEQ ID NO: 16 Cse2 E. coli codon-optimized SEQ ID NO: 7 SEQ ID NO: 17 Cas7 E. coli codon-optimized SEQ ID NO: 8 SEQ ID NO: 18 Cas5 E. coli codon-optimized SEQ ID NO: 9 SEQ ID NO: 19 Cas6 E. coli codon-optimized SEQ ID NO: 10 SEQ ID NO: 20 Cas8 H. sapiens codon- SEQ ID NO: 11 SEQ ID NO: 16 optimized Cse2 H. sapiens codon- SEQ ID NO: 12 SEQ ID NO: 17 optimized Cas7 H. sapiens codon- SEQ ID NO: 13 SEQ ID NO: 18 optimized Cas5 H. sapiens codon- SEQ ID NO: 14 SEQ ID NO: 19 optimized Cas6 H. sapiens codon- SEQ ID NO: 15 SEQ ID NO: 20 optimized - In addition, several fusion proteins comprising Cascade proteins were designed. Table 11 presents polynucleotide DNA sequences of genes encoding Cascade protein fusion proteins, as well as the amino acid sequences of the resulting protein components. In most instances, fusion proteins described in Table 11 include short tri-amino acid linkers connecting the two polypeptide sequences within the fusion construct; this linker typically comprises glycine-glycine-serine (GGS) or glycine-serine-glycine (GSG). The exact tri-amino acid linker sequences used in each particular fusion protein can be found in the full-length amino acid sequence in Table 11.
-
TABLE 11 Cascade Fusion Protein Sequences Heterologous polypeptide fused Expression to the N- or C- system for Cascade Heterologous terminus of the DNA coding DNA coding Amino acid protein polypeptide Cascade protein sequence sequence sequence Cse2 Strep-tag ® N SEQ ID NO: 390 E. coli SEQ ID NO: 391 II-HRV3C Cse2 His6-HRV3C N SEQ ID NO: 392 E. coli SEQ ID NO: 393 Cse2 NLS N SEQ ID NO: 394 Mammalian SEQ ID NO: 395 Cas5 NLS N SEQ ID NO: 396 Mammalian SEQ ID NO: 397 Cas6 NLS N SEQ ID NO: 398 E. coli SEQ ID NO: 399 Cas6 NLS-HA N SEQ ID NO: 400 E. coli SEQ ID NO: 401 Cas6 NLS N SEQ ID NO: 402 Mammalian SEQ ID NO: 403 Cas7 NLS C SEQ ID NO: 404 E. coli SEQ ID NO: 405 Cas7 HA-NLS C SEQ ID NO: 406 E. coli SEQ ID NO: 407 Cas7 NLS N SEQ ID NO: 408 Mammalian SEQ ID NO: 409 Cas8 His6-MBP-TEV N SEQ ID NO: 410 E. coli SEQ ID NO: 411 Cas8 His6-MBP-TEV- N SEQ ID NO: 412 E. coli SEQ ID NO: 413 NLS-FokI-linker Cas8 NLS N SEQ ID NO: 414 Mammalian SEQ ID NO: 415 Cas8 NLS-HA- N SEQ ID NO: 416 Mammalian SEQ ID NO: 417 FokI-linker - The His6 (hexahistidine; SEQ ID NO:418) and Strep-Tag™ II (GE Healthcare Bio-Sciences, Pittsburgh, Pa.) (SEQ ID NO:419) peptide tags on the Cse2 protein, when co-expressed with other Cascade proteins, enable purification of the complex via either Nickel-nitriloacetic acid (Ni-NTA) resin or Strep-Tactin™ (GE Healthcare Bio-Sciences, Pittsburgh, Pa.) resin, respectively. The HRV3C (human rhinovirus 3C) protease recognition sequence (SEQ ID NO:420) is cleaved by an HRV3C protease and can be used to remove N-terminal fusions from a protein of interest. The NLS (nuclear localization signal; SEQ ID NO:421 peptide tag on the Cas6, Cas7, and/or Cas8 proteins enables nuclear trafficking in eukaryotic systems. The HA (hemagglutinin; SEQ ID NO:422) peptide tag on the Cas6 or Cas7 proteins enables detection of heterologous protein expression by Western blotting with an anti-HA antibody. The MBP (maltose binding protein; SEQ ID NO:423) peptide fusion is a solubilization tag that facilitates purification of the Cas8 protein. The TEV (tobacco etch virus) protease recognition sequence (SEQ ID NO:424) is cleaved by TEV protease and can be used to remove N-terminal fusions from a protein of interest. The FokI nuclease domain comprises the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)), two monomeric FokI subunits associate to form a homodimer, and catalyze double-stranded DNA cleavage upon homo-dimerization. A linker sequence (SEQ ID NO:425) is used to fuse the FokI nuclease domain to the Cas8 protein.
- Additional linker sequences of varying length and amino acid composition have been designed that connect the FokI nuclease domain to the Cas8 protein. These amino acid sequences can be found in Table 12.
-
TABLE 12 Amino Acid Linker Sequences Linker length SEQ ID NO: (amino acids) Amino acid sequence SEQ ID NO: 426 5 GGGGS SEQ ID NO: 427 8 TGPGAAAR SEQ ID NO: 428 10 GGSGSSGGSG SEQ ID NO: 429 12 TGPGAAARAASG SEQ ID NO: 430 15 GGSGSSGGSGSSGGS SEQ ID NO: 431 16 SGSETPGTSESATPES SEQ ID NO: 432 20 SGSETPGTSESATPESGGS G SEQ ID NO: 433 30 SGSETPGTSESATPESGGS GSSGGSGSSGG - Table 13 contains the polynucleotide DNA sequence of four minimal CRISPR arrays that, when transcribed into precursor crRNA and processed by the RNA endonuclease protein of Cascade, generate mature crRNAs that function as the guide RNA to target complementary DNA sequences in biochemical assays and in cell culture gene editing experiments.
- The minimal CRISPR array comprises two repeat sequences (underlined, lower case) flanking a spacer sequence, which represents the guide portion of the crRNA. RNA processing by the Cascade endonuclease protein generates a crRNA with repeat sequences on both the 5′ and 3′ ends, flanking the guide sequence. The CRISPR array may also be expanded to include three repeat sequences (underlined) flanking two spacer sequences, which represent the guide portions of two distinct crRNAs by RNA processing by the endonuclease Cascade protein. The arrays can be further expanded to include additional spacer sequences, if desired.
-
TABLE 13 CRISPR Array Sequences SEQ ID Cell Minimal CRISPR NO: type Target array sequence SEQ ID E. coli Bacterio- gagttccccgcgccagcg NO: 434 phage λ gggataaaccgCCAGTGA J3 target TAAGTGGAATGCCATGTG GGCTGTCgagttccccgc gccagcggggataaaccg SEQ ID E. coli Bacterio- gagttccccgcgccagcg NO: 435 phage λ gggataaaccgAGTGGCA L3 target GATATAGCCTGGTGGTTC AGGCGGCgagttccccgc gccagcggggataaaccg SEQ ID E. coli Bacterio- gagttccccgcgccagcg NO: 436 phage λ gggataaaccgCCAGTGA L3/J3 TAAGTGGAATGCCATGTG targets GGCTGTCgagttccccgc gccagcggggataaaccg AGTGGCAGATATAGCCTG GTGGTTCAGGCGGCgagt tccccgcgccagcgggga taaaccg SEQ ID H. TRAC gene gagttccccgcgccagcg NO: 437 sapiens gggataaaccgGTTGATT TGCCTGCATTGGTGTTAC ACAGTCTgagttccccgc gccagcggggataaaccg TAAGTTGTGTTCTTCTTT GCCTAGGCCTTCAGgagt tccccgcgccagcgggga taaaccg - This Example describes the design of bacterial expression vectors that encode the Cascade-associated proteins, as well as a minimal CRISPR array comprising the guide sequence as described in Example 1. The construction of Cascade subunit protein expression systems for use with plasmids encoding minimal CRISPR arrays is described.
- A single-plasmid Cascade protein expression system was constructed to express the proteins of either a complex of Cascade in E. coli, known as the CasBCDE complex (which contains the Cse2, Cas7, Cas5, and Cash proteins, but not the Cas8 protein), or the entire functional Cascade complex in E. coli. The single plasmid system comprises either the cse2-cas7-cas5-cas6 operon, or the entire cas8-cse2-cas7-cas5-cas6 operon on a single expression plasmid. The Cas8 protein can be expressed from its own expression plasmid, for use in biochemical experiments where it is mixed together with the CasBCDE complex to reconstitute Cascade.
- A starting plasmid for expression vector construction was used (see Brouns, S. J. J. et al., Science 321:960-964 (2008)). The single plasmid Cascade protein expression system comprising a Cas operon was assembled as follows. The coding sequences for the cas genes were arranged in the order cse2-cas7-cas5-cas6 (CasBCDE complex or cas8-cse2-cas7-cas5-cas6 (full Cascade complex), and were separated by sequences corresponding to the wild-type bacterial gene arrangement (see NCBI Reference Sequence NZ_CP014225.1).
- In order to append a polynucleotide sequence encoding an affinity tag (His6 or Strep-Tag™ II), the corresponding coding sequence was inserted at the junction of the 3′ end of the cas8 gene and the 5′ end of the cse2 gene; these two open reading frames overlap in the wild-type genomic DNA sequence.
- In order to append polynucleotide sequences encoding N-terminal NLS and/or NLS-HA tags onto the 5′ end of the cas6 gene, additional spacing was introduced between the cas6 and upstream cas5 genes, because these open reading frames overlap in the wild-type genomic DNA sequence, such that the Shine-Dalgarno sequence for the cas6 gene is within the 3′ portion of the cas5 gene. A new Shine-Dalgarno sequence was inserted upstream of the new NLS-Cas6 or NLS-HA-Cas6 open reading frames, to improve translational efficiency.
- In order to append polynucleotide sequences encoding C-terminal NLS and/or HA-NLS tags onto the 3′ end of the cas7 gene, additional spacing was introduced between the cas7 and downstream cas5 genes, because these open reading frames are in close proximity in the wild-type genomic DNA sequence, such that the Shine-Dalgarno sequence for the cas5 gene is within the 3′ portion of the cas7 gene. A new Shine-Dalgarno sequence was inserted downstream of the new Cas7-NLS or Cas7-HA-NLS open reading frames, to improve translational efficiency for the cas5 gene.
- In order to append polynucleotide sequences encoding N-terminal NLS-FokI-linker fusions to the Cas8 protein, the corresponding coding sequences were inserted at the 5′ end of the cas8 gene.
- The cse2-cas7-cas5-cas6 and cas8-cse2-cas7-cas5-cas6 operons were cloned into the pCDF (MilliporeSigma, Hayward, Calif.) vector backbone, which confers spectinomycin resistance due to the presence of the aadA gene. Transcription of the operon is driven by a T7 promoter and is under control of the Lac operator; the vector also encodes the Lad repressor. A T7 terminator was cloned downstream of the cse2-cas7-cas5-cas6 or cas8-cse2-cas7-cas5-cas6 operon. The vector contains a CDF origin of replication.
- For expression of Cas8 or FokI-Cas8 fusion proteins, the cas8 gene was cloned into a pET (MilliporeSigma, Hayward, Calif.) family vector backbone, which confers kanamycin resistance due to the presence of the kanR gene. Transcription of the operon is driven by a T7 promoter (PT7), and is under control of the Lac operator (lacO); the vector also encodes the Lad repressor (lacI gene). A T7 terminator was cloned downstream of the cas8 gene. The vector contains a ColE1 origin of replication.
-
FIG. 24A ,FIG. 24B ,FIG. 24C ,FIG. 24D , andFIG. 24E present schematic diagrams of overexpression vectors for the cas8, fokI-cas8, the cse2-cas7-cas5-cas6 operon, the cas8-cse2-cas7-cas5-cas6 operon, and the fokI-cas8-cse2-cas7-cas5-cas6 operon. The designations inFIG. 24A ,FIG. 24B ,FIG. 24C ,FIG. 24D , andFIG. 24E are described in this Example and in Example 1 and are as follows: PT7 (T7 promoter), lacO (Lac operator), His6 (hexahistidine), MBP (maltose binding protein), Strep-Tag™ II, HRV3C (human rhinovirus 3C) protease recognition sequence, TEV (tobacco etch virus) protease recognition sequence, NLS (nuclear localization signal), kanR (kanamycin resistance gene), lacI (Lad repressor gene), colE1 ori (origin of replication), CDF ori (CloDF13 origin of replication), FokI nuclease domain (Sharkey variant), and aadA (gene encoding aminoglycoside resistance protein). - Table 14 provides sequences of bacterial expression plasmids encoding the Cas8 protein, the 4 proteins of the CasBCDE complex (cse2-cas7-cas5-cas6 operon), and all 5 proteins of the Cascade complex (cas8-cse2-cas7-cas5-cas6 operon). Polynucleotide sequences are provided with and without the N-terminal FokI fusion on the Cas8 protein.
-
TABLE 14 Bacterial Plasmid Sequences Arrangement of Vector protein coding Notable SEQ ID NO: designation sequences (N to C) characteristics SEQ ID NO: 438 Cas8 expression His6-MBP-TEV-Cas8 Can be added to CasBCDE vector complex to reconstitute Cascade SEQ ID NO: 439 FokI-Cas8 His6-MBP-TEV- FokI confers the ability to expression NLS-Fok1-linker- cleave double-stranded vector Cas8 DNA SEQ ID NO: 440 CasBCDE Strep-tag ™ II- When co-expressed with a complex HRV3C- CRISPR array, generates expression Cse2_Cas7_Cas5_Cas6 CasBCDE complex vector SEQ ID NO: 441 Cascade Cas8_His6-HRV3C- When co-expressed with a complex Cse2_Cas7_Cas5_Cas6 CRISPR array, generates expression Cascade complex vector SEQ ID NO: 442 FokI-Cascade NLS-FokI-linker- FokI confers the ability to expression Cas8_His6-HRV3C- cleave double-stranded vector Cse2_Cas7_Cas5_Cas6 DNA targeted by crRNA SEQ ID NO: 443 FokI-Cascade NLS-FokI-linker- FokI confers the ability to expression Cas8_His6-HRV3C- cleave double-stranded vector, extra Cse2_Cas7- DNA targeted by crRNA; NLS tag NLS_Cas5_Cas6 extra NLS tag on Cas7 protein improves nuclear trafficking - In order to purify the CasBCDE complex and Cascade complex containing a crRNA, the protein expression vectors encoding the cse2-cas7-cas5-cas6 operon or the cas8-cse2-cas7-cas5-cas6 operon are combined with a vector containing a minimal CRISPR array.
- CRISPR arrays were cloned into the pACYC-Duet1 vector backbone, which confers chloramphenicol resistance due to the camR gene. Transcription of the array is driven by a T7 promoter and is under control of the Lac operator (lacO); the vector also encodes the Lad repressor. A T7 terminator was cloned downstream of the CRISPR array. The vector contains a p15A origin of replication.
-
FIG. 25 contains a schematic diagram of an expression vector containing a CRISPR array with 2 repeats (FIG. 25 , “repeats”) and 1 spacer (FIG. 25 , “spacer”). The array can be expanded, as described herein. The designations inFIG. 25 are described in this Example and in Example 1 and are as follows: PT7 (T7 promoter), lacO (Lac operator), lac/(LacI repressor gene), p15A ori (origin of replication), and camR (chloramphenicol resistance gene). - Table 15 provides the sequences of bacterial expression plasmids encoding examples of minimal CRISPR arrays.
-
TABLE 15 Bacterial Plasmid Sequences Notable SEQ ID Vector DNA targeted character- NO: designation by spacer istics SEQ ID CRISPR(J3) Bacteriophage Two NO: 444 expression λ J3 target repeats, vector one spacer SEQ ID CRISPR(L3) Bacteriophage Two NO: 445 expression λ L3 target repeats, vector one spacer SEQ ID CRISPR(J3/L3) Bacteriophage Three NO: 446 expression λ L3/J3 repeats, vector targets two spacers SEQ ID CRISPR(TRAC) TRAC gene Three NO: 447 expression repeats, vector two spacers - This Example describes the design of eukaryotic expression plasmid vectors that encode Cascade-associated proteins, as well as minimal CRISPR arrays comprising the component sequences as described in Example 1.
- A. Separate Plasmids Expressing Each Cascade Protein and Minimal CRISPR Array
- Cascade proteins can be expressed in mammalian cells by encoding each of the protein components on a separate expression vector driven by the human cytomegalovirus (CMV) immediate-early promoter/enhancer and encoding the crRNA on a separate expression vector driven by the human U6 promoter.
- The starting plasmid for each expression plasmid was a derivative of pcDNA3.1 (Thermo Scientific, Wilmington, Del.). Coding sequences for the Cascade proteins, codon optimized for expression in human cells (see Example 1), were inserted into the vector downstream of the CMV promoter and upstream of a bovine growth hormone (bGH) polyadenylation signal. The cse2 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and 3×-FLAG epitope tag. The cas5 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS. The cash gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and HA epitope tag. The cas7 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS and Myc epitope tag. The cas8 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS; in another embodiment, the cas8 gene was fused to polynucleotide sequences at the 5′ end coding for an N-terminal NLS, HA epitope tag, and FokI nuclease domain.
- Each gene or gene fusion was cloned into a pcDNA3.1 derivative vector backbone, which confers ampicillin resistance due to the presence of the ampR gene. The vector also encodes neomycin resistance due to the presence of the neoR gene, which is downstream of an SV40 early promoter (PSV40) and origin (SV40 ori), and upstream of an SV40 early polyadenylation signal (SV40 pA). In addition to the human CMV immediate-early promoter/enhancer (PCMV) and bGH (bovine growth hormone) polyadenylation signal, the vector contains a T7 promoter upstream of the gene of interest, allowing for in vitro transcription of mRNA. The vector contains an fl origin of replication as well as a ColE1 origin of replication.
-
FIG. 26 contains a schematic diagram of a mammalian expression vector encoding the FokI-Cas8 fusion protein. The designations inFIG. 26 are described in this Example and in Example 1 and are as follows: the human CMV immediate-early promoter/enhancer (PCMV), NLS (nuclear localization signal), FokI (FokI nuclease domain (Sharkey variant)), Cas8 protein coding sequence, bGH pA (bovine growth hormone polyadenylation signal), fl ori (fl phage origin of replication), PSV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal), colE1 ori (origin of replication), and ampR (ampicillin resistance gene). Vectors encoding the other Cascade proteins were designed similarly. - Table 16 provides the sequences of individual mammalian expression vectors encoding each of Cse2, Cas5, Cas6, Cas7, Cas8, and FokI-Cas8.
-
TABLE 16 Mammalian Expression Vectors Vector Notable SEQ ID NO: designation characteristics SEQ ID NO: 448 Mammalian Cse2 Cse2 contains N-terminal NLS- expression vector 3xFLAG tag SEQ ID NO: 449 Mammalian Cas5 Cas5 contains N-terminal NLS expression vector SEQ ID NO: 450 Mammalian Cas6 Cas6 contains N-terminal NLS- expression vector Ha tag SEQ ID NO: 451 Mammalian Cas7 Cas7 contains N-terminal NLS- expression vector Myc tag SEQ ID NO: 452 Mammalian Cas8 Cas8 contains N-terminal NLS expression vector SEQ ID NO: 453 Mammalian FokI- Cas8 contains N-terminal NLS- Cas8 expression HA-FokI; FokI confers the vector ability to cleave double- stranded DNA - The CRISPR RNA was encoded with a minimal CRISPR array containing three repeats flanking two spacer sequences. The construct generating CRISPR RNA can be designed with additional sequences flanking the outermost repeats in the minimal array. Processing of the precursor CRISPR RNA is enabled by the RNA processing protein of the Cascade complex (Cas6 protein), which can be expressed on a separate plasmid.
- The CRISPR array was cloned into the same pcDNA3.1 derivative vector backbone described above, except the human CMV promoter was replaced with the human U6 promoter (PU6), and the bGH polyadenylation signal was replaced with a poly-T termination signal.
-
FIG. 27 contains a schematic diagram of a eukaryotic expression vector encoding a representative CRISPR array targeting the TRAC gene. The designations inFIG. 27 are described in this Example and in Example 1 and are as follows: PU6 (human U6 promoter), repeats (CRISPR RNA repeats), TRAC spacer-1 (first spacer targeting the TRAC gene), TRAC spacer-2 (second spacer targeting the TRAC gene), polyT (poly-T termination signal), fl ori (fl phage origin of replication), PSV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal), colE1 ori (origin of replication), and ampR (ampicillin resistance gene). - Table 17 provides the sequence of a representative mammalian expression vector encoding a CRISPR array targeting the TRAC gene; a spacer sequence that targets matching DNA sequences in the TRAC gene can be found in Table 13.
-
TABLE 17 Mammalian Expression Vector Spacer Vector complementary Notable SEQ ID NO: designation to target characteristics SEQ ID NO: 454 Mammalian TRAC gene Three repeats, CRISPR two spacers RNA expression vector - B. Cascade Protein Expression System Wherein Multiple Cascade Protein Coding Sequences are Expressed from a Single Promoter
- In order to express components of the Cascade complex off of fewer expression vectors, polycistronic expression vectors were constructed. On each, a single CMV promoter drives expression of multiple coding sequences simultaneously that are separated by a 2A viral peptide sequence. The Thosea asigna virus 2A peptide sequence induces ribosomal skipping (Liu, Z., et al., Sci. Rep. 7:2193 (2017)), thus enabling multiple protein-coding genes to be concatenated within a single polycistronic construct.
- The starting plasmid for the polycistronic expression plasmid was the same derivative of pcDNA3.1 described above, containing the CMV promoter and bGH polyadenylation signal. Coding sequences for the Cascade proteins, codon optimized for expression in human cells (see Example 1), were joined in the order cas7-cse2-cas5-cas6-cas8, with a polynucleotide sequence coding for the Thosea asigna virus 2A (T2A) peptide inserted in between each pair of genes. In addition, polynucleotide sequences encoding NLS tags were appended to the 5′ end of each Cascade protein gene, and a polynucleotide sequence encoding the FokI nuclease domain was appended to the 5′ end of the cas8 gene, connecting by a 30-amino acid linker sequence. The final construct has the following order of elements: NLS-cas7-T2A-NLS-cse2-T2A-NLS-cas5-T2A-NLS-cas6-T2A-NLS-fokI-linker-cas8.
-
FIG. 28 contains a schematic diagram of an exemplary polycistronic mammalian expression vector encoding all the Cascade proteins. The designations inFIG. 28 are described in this Example and in Example 1 and are as follows: the human CMV immediate-early promoter/enhancer (PCMV), NLS (nuclear localization signal), T2A (polynucleotide sequence coding for the Thosea asigna virus 2A peptide), coding sequences for the Cas7, Cse2, Cas5, and Cash proteins, fokI (FokI nuclease domain (Sharkey variant) a linker sequence, coding sequence for Cas8 protein, bGH pA (bovine growth hormone polyadenylation signal), fl ori (fl phage origin of replication), PSV40 (SV40 early promoter), SV40 ori (SV40 origin), neoR (neomycin resistance gene), SV40 pA (SV40 early polyadenylation signal), colE1 ori (origin of replication), ampR (ampicillin resistance gene), and an MluI restriction site. - Table 18 provides the sequence of an exemplary polycistronic mammalian expression vector encoding all the Cascade proteins. This vector can be combined with the mammalian expression vector encoding CRISPR RNA described above to produce functional Cascade complexes in mammalian cells.
-
TABLE 18 Mammalian Expression Vectors Arrangement of Vector protein coding Notable SEQ ID NO: designation sequences (N to C) characteristics SEQ ID NO: 455 Polycistronic NLS-Cas7-T2A_NLS-Cse2- Single protein expression mammalian T2A_NLS-Cas5-T2A_NLS- vector encoding all Cascade expression Cas6-T2A_NLS-FokI-Cas8 proteins, each with N- vector terminal NLS tag. Cas8 encoding all contains N-terminal NLS- 5 Cascade HA-FokI; FokI confers the proteins ability to cleave double- stranded DNA - C. Single Plasmid Expression System
- A single plasmid Cascade expression system was constructed to express the complete Cascade complex in human cells. The plasmid encodes the entire cas8-cse2-cas7-cas5-cas6 operon and a minimal CRISPR array on a single plasmid. This plasmid was constructed from the polycistronic protein expression vector (described above in Table 18 and
FIG. 28 ) by inserting the minimal CRISPR array along with the upstream human U6 promoter and downstream poly-T termination signal into the MluI restriction site. - Table 19 provides the sequence of the single plasmid for expression of all five Cascade proteins together with the crRNA to facilitate formation of Cascade complexes in human cells.
-
TABLE 19 Mammalian Expression Vector Arrangement of Vector protein coding Notable SEQ ID NO: designation sequences (N to C) characteristics SEQ ID NO: 456 Polycistronic hU6_CRISPR(TRAC), Single protein expression mammalian CMV_NLS-Cas7- vector encoding crRNA and expression T2A_NLS-Cse2- all Cascade proteins, each vector T2A_NLS-Cas5- with N-terminal NLS tag. encoding all T2A_NLS-Cas6_NLS- Cas8 contains N- terminal 5 Cascade FokI-Cas8 NLS-HA-FokI; FokI confers proteins and the ability to cleave double- crRNA stranded DNA - Plasmids were also designed for the expression of the Cas3 protein (SEQ ID NO:21; monomer Cas3 nuclease/helicase E. coli K-12 substr. MG1655) in E. coli and in mammalian cells. Table 20 provides the constructs and sequences of these plasmids.
-
TABLE 20 Cas3 Protein Fusions SEQ ID NO: Protein Notable characteristics SEQ ID NO: 457 Cas3 Genomic DNA gene sequence SEQ ID NO: 458 Cas3 Protein amino acid sequence SEQ ID NO: 459 His6-MBP-TEV- Derived from genomic DNA gene Cas3 sequence SEQ ID NO: 460 His6-MBP-TEV- Protein amino acid sequence Cas3 SEQ ID NO: 461 His6-MBP-TEV- Cas3 E. coli expression vector Cas3 SEQ ID NO: 462 Cas3, human Homo sapiens codon-optimized codon-optimized DNA gene sequence SEQ ID NO: 463 Cas3-NLS Homo sapiens codon-optimized DNA gene sequence SEQ ID NO: 464 Cas3-NLS Protein amino acid sequence SEQ ID NO: 465 Cas3-NLS Cas3 mammalian expression vector - This Example describes for introduction and expression of Cas8 subunit protein coding sequences, as well as coding sequences for components of engineered Type I CRISPR-Cas effector complexes in bacterial cells using E. coli expression systems.
- A. Expression of Cas8 Protein
- E. coli Type I-E Cas8 protein was expressed from a plasmid (Example 2, SEQ ID NO:438, Table 14,
FIG. 24A ) containing an operon for the IPTG inducible expression of His6-MBP-TEV-Cas8 from a T7 promoter. The expression plasmid conferred resistance to kanamycin. - In order to express Cas8 protein, E. coli cells were transformed with the expression plasmid. Briefly, a 100 μL aliquot of chemically competent E. coli cells (E. coli BL21 Star™ cells (Thermofisher, Waltham, Mass.)) in a microcentrifuge tube was thawed on ice for 10 minutes. 35 ng of plasmid DNA was added to the thawed cells and the cells were incubated with the DNA on ice for 8 minutes. Heat shock was performed by a placing the microcentrifuge tube in a 42° C. water bath for 30 seconds and then immediately placing the tube in ice for 2 minutes. 900 μL of 2×YT media were added to the microcentrifuge tube, and the microcentrifuge tube was placed in a tube rotator at 37° C. for 1 hour. Finally, 100 μL of the recovered cells were plated on LB solid kanamycin (50 μg/mL) and incubated overnight at 37° C.
- A single colony was picked from the colonies that grew on the antibiotic selection plates and was inoculated into 10 mL of 2×YT media supplemented with kanamycin (50 μg/mL). The culture was grown overnight at 37° C. while shaking in an orbital shaker at 200 RPMs. 6 mL of the overnight culture were transferred to a 2 L baffled flask having 1 L of 2×YT media supplemented with chloramphenicol (34 μg/mL) and spectinomycin (100 μg/mL). The 1 L culture was grown at 37° C. while shaking in an orbital shaker at 200 RPM until the optical density at 600 nm was 0.56.
- Expression from both plasmids was then induced by the addition of IPTG to a final concentration of 1 mM. The induced cultures were grown overnight at 16° C. while shaking in an orbital shaker at 200 RPM. Cells were harvested by centrifugation at 4,000 RCF for 15 minutes at 4° C. The cell pellet was re-suspended in 15 mL of a lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, and 1 mM TCEP supplemented with 1 Complete™ protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer. The re-suspended cells were transferred to a 50 mL conical tube for immediate downstream processing. The Cas8 protein was purified and the purified protein characterized essentially as described below for the FokI-Cas8 fusion protein (Example 5C).
- B. Expression of the Components of Cascade RNP Complexes
- A complete set of the five E. coli Cascade proteins and RNA guides were co-expressed in E. coli cells using a two-plasmid system to produce Cascade RNP complexes. One plasmid (Example 2, SEQ ID NO:441, Table 14,
FIG. 24D ) contained an operon for IPTG inducible expression of the Cse2, Cas5, Cas6, Cas7, and Cas8 proteins from a T7 promoter. A His6 affinity tag was included as a translational fusion to the N-terminus of Cse2 (Example 1, SEQ ID NO:392, Table 11). The second plasmid coded for the IPTG inducible expression of the J3 guide (Example 2, SEQ ID NO:444, Table 15,FIG. 25 ). The Cascade protein expression plasmid conferred spectinomycin resistance, and the Cascade RNA guide expression plasmid conferred chloramphenicol resistance. - In order to co-express the Cascade proteins and RNA components in the same cell, E. coli cells were simultaneously transformed with the two plasmids. A 100 μL aliquot of chemically competent E. coli cells (E. coli, BL21 Star™ (DE3) (Thermofisher, Waltham, Mass.)) in a microcentrifuge tube was thawed on ice for 10 minutes. 35 ng of each plasmid was added to the thawed cells and the cells were incubated with the DNA on ice for 8 minutes. Heat shock was performed by a placing the microcentrifuge tube in a 42° C. water bath for 30 seconds and then immediately placing the microcentrifuge tube in ice for 2 minutes. 900 μL of 2×YT media were added to the microcentrifuge tube and the microcentrifuge tube placed in a tube rotator at 37° C. for 1 hour. Finally, 100 μL of the recovered cells were plated on LB solid media with chloramphenicol (34 μg/mL) and spectinomycin (50 μg/mL) and incubated overnight at 37° C.
- A single colony was picked from the colonies that grew on the antibiotic selection plates and was inoculated into 10 mL of 2×YT media supplemented with chloramphenicol (34 μg/mL) and spectinomycin (100 μg/mL). The culture was grown overnight at 37° C. while shaking in an orbital shaker at 200 RPMs. 6 mL of the overnight culture were transferred to a 2 L baffled flask having 1 L of 2×YT media supplemented with chloramphenicol (34 μg/mL) and spectinomycin (100 μg/mL). The 1 L culture was grown at 37° C. while shaking in an orbital shaker at 200 RPM until the optical density at 600 nm was 0.56.
- Expression from both plasmids was induced by the addition of IPTG to a final concentration of 1 mM. The induced cultures were grown overnight at 16° C. while shaking in an orbital shaker at 200 RPM. Cells were harvested by centrifugation for at 4,000 RCF for 15 minutes at 4° C. The cell pellet was re-suspended in 15 mL of lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, and 1 mM TCEP supplemented with 1 Complete™ protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer. The re-suspended cells were transferred to a 50 mL conical tube for immediate downstream processing. Cascade RNP complexes were purified and characterized as described below.
- This Example describes a method to purify E. coli Type I-E Cascade ribonucleoprotein (RNP) complexes produced by overexpression in bacteria as described in Example 4. The method uses immobilized metal affinity chromatography followed by size exclusion chromatography. This Example also describes the methods used to assess the quality of the purified Cascade RNP product. In addition, this Example describes purification and characterization of Cascade components.
- A. Purification of Cas8, Cas7, Cas6, Cas5, and Cse2 Cascade RNP Complexes
- E. coli Type I-E Cascade RNP complexes were produced as described in Example 4. The Cascade complexes were captured using immobilized metal affinity chromatography. Briefly, the re-suspended cell pellets, produced as described in Example 4, were thawed on ice and the volume was brought to 35 mL by of an additional 15 mL of lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, and 1 mM TCEP supplemented with 1 Complete™ protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer.
- The 50 mL conical tube was placed in an ice water bath and the cells were lysed by two rounds of sonication using a Q500 sonicator with a ½ inch tip (Qsonica, Newtown, Conn.). Each round of sonication consisted of a treatment cycle of 2.5 minutes with repeating cycles of 10 seconds of sonication at 50% amplitude followed by 20 seconds of rest. The tube was allowed to cool in the ice water bath for one minute between rounds of sonication. The lysates were clarified by centrifugation at 48,384 RCF for 30 minutes at 4° C. The clarified supernatant was then added to a Hispur™ Ni-NTA resin (Thermofisher, Waltham, Mass.), that had been pre-equilibrated with Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP. A 1.5 mL bed volume of nickel affinity resin was used for each 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the resin was pelleted by centrifugation at 500 RCF for 2 minutes at 4° C. The supernatant was aspirated and the resin was washed 5 times with 5 bed volumes of Ni-wash buffer. After each wash the resin was pelleted at 500 RCF for 2 minutes at 4° C. and the supernatant was removed by aspiration. Finally, bound proteins (including the Cascade RNP complexes) were eluted by the addition of five bed volumes of Ni-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 300 mM imidazole, 5% glycerol, and 1 mM tris(2-carboxyethyl)phosphine (TCEP). After centrifugation at 500 RCF for 2 minutes at 4° C., the nickel affinity eluate was aspirated into a clean 50 mL conical tube.
- The nickel affinity eluate was further purified by size exclusion chromatography (SEC). The nickel affinity eluate was concentrated to a final volume of 0.5 mL by ultrafiltration at 12° C. using an Amicon® ultrafiltration spin concentrator (Millipore Sigma, Billerica, Mass.) with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.). The concentrated sample was filtered using a 0.22 μM Ultrafree-MC GV Centrifugal Filter (Millipore Sigma, Billerica, Mass.) before being further purified by separation at 4° C. with a flow rate of 0.5 mL/minute on a
HiPrep™ 16/60 Sephacryl® S-300 column (GE Healthcare, Uppsala, Sweden) equilibrated with SEC buffer composed of 50 mM Tris pH 7.5, 500 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP. Proteins were eluted with SEC buffer and 1 ml fractions were collected. The earliest eluting peak, as judged byUV 280, was assumed to be high molecular weight aggregated material and the corresponding fractions were discarded. Subsequent elution fractions were analyzed by Coomassie stained SDS-PAGE. Each properly formed complex contained one molecule of Cas8, six molecules of Cas7, one molecule each of Cas6 and Cas5, and two molecules of Cse2. Elution fractions that had the approximate expected stoichiometry of Cascade proteins, when visualized on the SDS-PAGE gel, were pooled. Pooled fractions were analyzed spectrophotometrically to confirm they contained a significant nucleic acid component, as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nm. - The pooled samples were exchanged into storage buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP by concentrating the pooled samples to 100 uL with an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.) and then diluting 50-fold with the storage buffer. Finally, the sample was concentrated to 10 mg/mL using the same ultrafiltration device and stored at −80° C.
- The final purified product was analyzed spectrophotometrically to determine the final concentration of the Cascade RNP complexes and to confirm the presence of a nucleic acid component as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nM. The concentration of the Cascade RNP complexes was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex with a 1 cm path length. The predicted absorbance of a 0.1% solution of the purified complex is 2.03 cm′ and was calculated by dividing the sum of the calculated extinction coefficients at 280 nm for each of the molecules in the complex (916940 M−1 cm−1) by the sum of the molecular weights of each of the molecules in the complex (450832 g/mole).
- Additionally, the final product was analyzed by SDS-PAGE with Coomassie blue staining to confirm that each protein component was present in approximately the correct stoichiometry, and to assess the presence of contaminating proteins. SDS-PAGE gels were stained with a Coomassie InstantBlue™ (Expedeon, San Diego, Calif.) stain. Gels were imaged using a Gel Doc™ EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.).
- In view of the teachings of the Specification and the Examples, this method for purification of E. coli Type I-E Cascade complexes can be applied to the production of other purified Type I Cascade complexes.
- B. Purification of Cascade Complexes Comprising Cas7, Cas6, Cas5, and Cse2 Proteins
- A Cascade complex composed of the and the protein components Cas7, Cas6, Cas5, and Cse2 was purified. The L3 guide RNA (Example 2, SEQ ID NO:445, Table 15) was expressed from a first plasmid (Example 2,
FIG. 25 ) essentially as described in Example 4.B. The Cascade proteins were expressed from a second plasmid (Example 2, SEQ ID NO:440, Table 14,FIG. 24C ) essentially as described in Example 4B. - The complex was captured using affinity chromatography. Re-suspended cell pellets were thawed on ice. In a 50 mL conical tube, the volume was brought up to 35 mL with an additional 15 mL of lysis buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 1 mM TCEP, and supplemented with 1 Complete™ protease inhibitor tablet (Roche, Basel, Switzerland) per 50 mL of lysis buffer. The 50 mL conical tube was placed in an ice water bath, and the cells were lysed by six rounds of sonication using a Q500 sonicator with a ½ inch tip (Qsonica, Newtown, Conn.). Each round of sonication consisted of a 1 minute treatment cycle with repeating cycles of 3 seconds of sonication at 90% amplitude followed by 9 seconds of rest. The tube was allowed to cool in the ice water bath for one minute between rounds of sonication. The lysate was clarified by centrifugation at 48,384 RCF for 30 minutes at 4° C. The clarified supernatant was affinity purified by addition of Strep-Tactin® Sepharose® resin (IBA Life Sciences, Gottingen, Germany) that had been pre-equilibrated with Strep-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 1 mM EDTA, 5% glycerol, and 1 mM TCEP. A 0.55 mL bed volume of affinity resin was used for each 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the sample was poured onto a 30 mL disposable gravity flow column (Bio-Rad, Hercules, Calif.) allowing the unbound material to flow through the column. The resin was washed five times with five bed volumes of Strep-wash buffer. Finally, the bound proteins were eluted with two sequential additions of five bed volumes of Strep-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 2.5 mM Desthiobiotin, 5% glycerol, 1 mM EDTA, and 1 mM TCEP.
- The affinity eluate was further purified by SEC. The affinity eluate was concentrated to a final volume of 550 uL by ultrafiltration at 12° C. using an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.). The concentrated sample was filtered using a 0.22 μm 13 mm UltraCruz® PVDF syringe filter (Santa Cruz Biotechnology, Dallas, Tex.) before being further purified by separation at 4° C. with a flow rate of 0.4 mL/minute on a
HiPrep™ 16/60 Sephacryl® S-300 column (GE Healthcare, Uppsala, Sweden) equilibrated with SEC buffer composed of 50 mM Tris pH 7.5, 500 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP. Protein was eluted with SEC buffer and 0.75 ml fractions were collected. The earliest eluting peak, as judged byUV 280, was assumed to be high molecular weight aggregated material and the corresponding fractions were discarded. Fractions corresponding to the second peak (a shoulder on the back side of thefirst UV 280 peak) were pooled. - The pooled samples were exchanged into storage buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 5% glycerol, 0.1 mM EDTA, and 1 mM TCEP by concentrating down to 200 uL with an Amicon® spin concentrator with an Ultracel®-50 membrane (Millipore Sigma, Billerica, Mass.) and then diluting 75-fold with storage buffer. The sample was concentrated a second time to 700 uL and again diluted 20-fold with storage buffer. Finally, the sample was concentrated to 4.7 mg/mL in the same ultrafiltration device and stored at −80° C.
- The final purified product was analyzed spectrophotometrically to determine the final concentration of the Cascade RNP complexes and to confirm the presence of a nucleic acid component as evidenced by an absorbance at 260 nm that is greater than the absorbance at 280 nM. The concentration of the Cascade RNP complexes was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex with a 1 cm path length. The predicted absorbance of a 0.1% solution of the purified complex is 2.18 cm−1 and was calculated by dividing the sum of the calculated extinction coefficients at 280 nm for each molecule in the complex (762240 M−1 cm−1) by the sum of the molecular weights of each molecule in the complex (348952.07 g/mole).
- Additionally, the final product was analyzed by SDS-PAGE with Coomassie blue staining to confirm that each Cascade protein was present in approximately the correct stoichiometry, and to assess the presence of contaminating proteins. SDS-PAGE gels were stained with Coomassie InstantBlue™ (Expedeon, San Diego, Calif.) stain. Gels were imaged using a Gel Doc™ EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.). Each properly formed complex contained six molecules of Cas7, one molecule each of Cas6 and Cas5, and two molecules of Cse2.
- C. Purification of FokI-Cas8 Fusion Protein
- A method used to purify a fusion protein comprising a FokI nuclease fusion to the E. coli Type I-E Cas8 protein from bacterial over-expression pellets using immobilized metal affinity chromatography, cation exchange chromatography (CIEX), and finally size exclusion chromatography (SEC) is described herein.
- The E. coli Type I-E FokI-Cas8 fusion protein, including a linker sequence, is described in Example 1 (SEQ ID NO:413, Table 11). The expression plasmid is described in Example 2 (SEQ ID NO:439, Table 14,
FIG. 24B ). Cells comprising the fusion protein were produced essentially as described in Example 4A. The Cas8 fusion protein contained a N-terminal His6 tag, a Maltose binding protein domain, a TEV cleavage site, a FokI nuclease domain, and a 30 amino acid linker. The protein was captured using immobilized metal affinity chromatography. A 50 mL conical tube containing the re-suspended cell pellets was thawed on ice. The tube was then placed in an ice water bath, and the cells were lysed by sonication using a Q500 sonicator with a ¼ inch tip (Qsonica, Newtown, Conn.) for a treatment cycle of three minutes with repeating cycles of 10 seconds of sonication at 40% amplitude followed by 20 seconds of rest. The lysates were clarified by centrifugation at 30,970 RCF for 30 minutes at 4° C. The clarified supernatant was then added to Hispur™ Ni-NTA resin (Thermofisher, Waltham, Mass.), that had been pre-equilibrated with Ni-wash buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 10 mM imidazole, 5% glycerol, and 1 mM TCEP. A 2 mL bed volume of nickel affinity resin was used for 1 L of E. coli expression culture. After one hour of incubation at 4° C. with gentle mixing, the sample was poured onto a 30 mL disposable gravity flow column (Bio-Rad, Hercules, Calif.), allowing the unbound material to flow through the column. The resin was washed five times with five bed volumes of Ni-wash buffer. Finally, the bound proteins were eluted with five bed volumes of Ni-elution buffer composed of 50 mM Tris pH 7.5, 100 mM NaCl, 300 mM imidazole, 5% glycerol, and 1 mM TCEP. - The nickel affinity eluate was treated with TEV protease to remove the affinity tag. TEV protease was added to the eluate at a ratio of 1:25 (w/w). The sample, including TEV, was dialyzed overnight against Ni-wash buffer using a 12 mL Slid-A-Lyzer™, 10K MWCO dialysis cassette (Thermofisher, Waltham, Mass.).
- The TEV protease and the cleaved His6-MBP fragment were removed from the dialyzed sample by Ni affinity chromatography. The dialyzed sample was poured over a clean Hispur™ Ni-NTA resin (Thermofisher, Waltham, Mass.) column equilibrated with Ni-wash buffer. The resin was then washed with 1 column volume of Ni-NTA wash buffer. The flow through and wash were combined, concentrated, and exchanged into storage buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% glycerol, and 1 mM TCEP) using an Amicon® spin concentrator with an Ultracel®-10 membrane (Millipore Sigma, Billerica, Mass.). This sample was then frozen at −80C for storage.
- The sample was thawed and further purified by cation exchange chromatography (CIEX). The sample was thawed on ice and diluted 10-fold from 0.475 mL to 4.75 mL with Cold CIEX_A buffer composed of 50 mM Tris pH 7.5, 5% glycerol, and 1 mM TCEP, resulting in a final concentration of 50 mM NaCl. A 10 mL capillary loop was used to load the sample onto a 1 mL Hitrap™ SP HP column (GE Healthcare, Uppsala, Sweden), equilibrated with a buffer comprising CIEX_A buffer and 5% CIEX_B buffer (50 mM Tris pH 7.5, 1 M NaCl, 5% glycerol, and 1 mM TCEP). The flow rate throughout the separation was of 0.75 mL/min. The loop was emptied onto the column with 15 mL of with 5% CIEX B buffer. The unbound sample was washed out with an additional 2 mL of 5% CIEX_B buffer. 500 μL fractions were collected as the bound proteins were eluted with an 8 mL linear gradient from 5% to 65% CIEX_B buffer. There were two major UV280 elution peaks. The four fractions corresponding to the first of those two peaks were pooled. The total pooled volume was 2 mL.
- The pooled CIEX fractions were further purified by SEC. The pooled CIEX fractions were concentrated to a final volume of 0.3 mL by ultrafiltration at 12° C. using an Amicon® spin concentrator with an Ultracel®-10 membrane (Millipore Sigma, Billerica, Mass.). The concentrated sample was filtered using a 0.22 μm Ultrafree-MC GV Centrifugal spin filter (Millipore Sigma, Billerica, Mass.), and further purified by separation at 4° C. with a flow rate of 0.6 mL/minute on a 10/300
Superdex™ 200 GL Increase column (GE Healthcare, Uppsala, Sweden) equilibrated with a Cas8 SEC buffer (50 mM Tris pH 7.5, 200 mM NaCl, 5% glycerol, and 1 mM TCEP). The protein was eluted with the Cas8 SEC buffer and 0.5 ml fractions were collected. The earliest eluting peak, as judged byUV 280, was assumed to be high molecular weight aggregated material and the corresponding fractions were discarded. A secondmajor UV 280 peak was eluted after about 14 mL. The fractions corresponding to this second peak were pooled. The pooled samples were concentrated to 40 μL with an Amicon® spin concentrator with an Ultracel®-3 membrane (Millipore Sigma, Billerica, Mass.) The concentrated sample was stored at −80° C. - The final purified product was analyzed spectrophotometrically to determine the final concentration of the fusion protein and to confirm the absence of a significant nucleic acid component as evidenced by an absorbance at 280 nm that is greater than the absorbance at 260 nm. The concentration of the FokI-Cas8 fusion was determined by dividing the absorbance at 280 nm by the calculated absorbance of a 0.1% solution of the intact complex. The predicted absorbance of a 0.1% solution of the purified complex is 1.05 cm′ and was calculated by dividing extinction coefficient at 280 nm for the FokI-Cas8 fusion (86290 M−1 cm−1) by its molecular weight (82171.32 g/mole). Additionally, the final product was analyzed by SDS-PAGE gels stained with InstantBlue™ stain (Expedeon, San Diego, Calif.). Gels were imaged using a Gel Doc™ EZ imager (Bio-Rad, Hercules, Calif.) and annotated using ImageLab software (Bio-Rad, Hercules, Calif.). This analysis demonstrates that the purified fusion protein was the expected size and that only a low level of contaminating proteins were present.
- Double-stranded DNA (dsDNA) target sequences for use in in vitro DNA binding or cleavage assays with Cascade or Cascade-fusion effector complexes can be produced using several different methods. This Example describes three methods to produce target sequences, including annealing of synthetic single-stranded DNA (ssDNA) oligonucleotides, PCR amplification of selected nucleic acid target sequences from genomic DNA, and/or cloning of nucleic acid target sequences into bacterial plasmids. The dsDNA target sequences were used in Cascade binding or cleavage assays.
- A. Production of dsDNA Target Sequences by Annealing Synthetic Single-Stranded DNA Oligonucleotides
- DNA oligonucleotides encoding the target region of interest comprising the target sequence, also known as the protospacer, that is recognized by the guide portion of CRISPR RNA, the neighboring protospacer adjacent motif (PAM), and additional 5′ and 3′ flanking sequences were purchased from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa). Two oligonucleotides were ordered per construct, one comprising the sense strand and one comprising the nonsense strand. Table 21 lists oligonucleotide sequences that were ordered to contain a target sequence denoted J3, which is derived from bacteriophage lambda genomic DNA. The target and PAM sequences are flanked by 20-bp of additional sequence on both the 5′ and 3′ ends.
-
TABLE 21 Single-stranded DNA Oligonucleotides Seq ID Descrip- NO: tion Sequence SEQ ID Foward ATCATCCTCCTGACAATTTTGACAGCCCA NO: oligo, CATGGCATTCCACTTATCACTGGCATCTT 466 J3 target TAAAAGCCAGGACGGTC sequence SEQ ID Reverse GACCGTCCTGGCTTTTAAAGATGCCAGTG NO: oligo, ATAAGTGGAATGCCATGTGGGCTGTCAAA 467 J3 target ATTGTCAGGAGGATGAT sequence - The oligonucleotides were annealed by mixing both oligonucleotides at equimolar concentration (10 μM) in 1× annealing buffer (6 mM HEPES, pH 7.0, and 60 mM KCl), heating at 95° C. for 2 minutes, and then slow cooling. Annealed oligonucleotides were then used directly in DNA binding and/or DNA cleavage assays with Cascade and/or Cascade-effector domain fusion RNPs.
- 5′ Cy5 fluorescently-labeled DNA oligonucleotides encoding the target region of interest comprising both the target sequence, also known as the protospacer, recognized by the guide portion of CRISPR RNA, as well as the flanking neighboring protospacer adjacent motif (PAM), and additional 5′ and 3′ flanking sequences, were purchased from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa). Four oligonucleotides were ordered per construct, one comprising the 5′ fluorescent-labeled sense strand, one comprising the 5′ unlabeled sense strand, one comprising the 5′ fluorescent-labeled nonsense strand, and one comprising the 5′ unlabeled nonsense strand. The target and PAM sequences are flanked by 20-bp of additional sequence on both the 5′ and 3′ ends.
- Table 22 lists oligonucleotide sequences that were ordered to contain a target sequence denoted J3, which was derived from bacteriophage lambda genomic DNA and a control target sequence denoted CCR5, which was derived from the human CCR5 locus.
-
TABLE 22 Single-stranded DNA (ssDNA) Oligonucleotides for Fluorescently Labeled dsDNA Target Sequence Formation SEQ ID Descrip- NO: tion Sequence SEQ ID target 5′CGCCGAGCTCGAATTCTTTTGACAGCCC NO: 468 strand ACATGGCATTCCACTTATCACTGGCATGGA J3 TCCTGGCTGTGGTGATG SEQ ID non 5′CATCACCACAGCCAGGATCCATGCCAGT NO: 469 target GATAAGTGGAATGCCATGTGGGCTGTCAAA strand AGAATTCGAGCTCGGCG J3 SEQ ID target 5′CGCCGAGCTCGAATTCTTTTTAGGTACC NO: 470 strand TGGCTGTCGTCCATGCTGTGTTTGCATGGA CCR5 TCCTGGCTGTGGTGATG Site SEQ ID non 5′CATCACCACAGCCAGGATCCATGCAAAC NO: 471 target ACAGCATGGACGACAGCCAGGTACCTAAAA strand AGAATTCGAGCTCGGCG CCR5 SEQ ID target 5′Cy5- NO: 472 strand CGCCGAGCTCGAATTCTTTTGACAGCCCAC J3 ATGGCATTCCACTTATCACTGGCATGGATC CTGGCTGTGGTGATG SEQ ID non 5'Cy5- NO: 473 target CATCACCACAGCCAGGATCCATGCCAGTGA strand TAAGTGGAATGCCATGTGGGCTGTCAAAAG J3 AATTCGAGCTCGGCG SEQ ID target 5′Cy5- NO: 474 strand CGCCGAGCTCGAATTCTTTTTAGGTACCTG CCR5 GCTGTCGTCCATGCTGTGTTTGCATGGATC Site CTGGCTGTGGTGATG - The oligonucleotides were annealed by mixing a labeled and unlabeled or two labeled or two unlabeled oligonucleotides at equimolar concentration (1 μM) in 1× annealing buffer (6 mM HEPES, pH 7.0, 60 mM KCl), heating at 95° C. for 2 minutes, and then slow cooling. Annealed oligonucleotides were then used directly in DNA binding assays with Cascade and/or Cascade-effector domain fusion RNPs. Cy5 fluorescently-labeled DNA oligonucleotides were imaged with an AZURE c600 Bioimager (Azure BioSystems, Dublin, Calif.).
- This method can be applied to produce additional labeled or unlabeled target or dual-target sequences, whereby a dual target is defined as a target that contains two protospacer sequences targeted by individual Cascade molecules, separated by an interspacer sequence.
- B. Production of dsDNA Target Sequences by PCR Amplification from Genomic DNA
- Double-stranded DNA target sequences for dual targets derived from human genomic DNA were produced using PCR amplification directly from genomic DNA template material. Specifically, PCR reactions contained human genomic DNA purified from K562 cells and Q5 Hot Start High-
Fidelity 2× Master Mix (New England Biolabs, Ipswich, Mass.), as well as the primers listed in Table 23, where the underlined portions correspond to primer binding sites within genomic DNA. -
TABLE 23 Primers for PCR Amplification SEQ ID NO: Description Sequence SEQ ID Forward primer to CACTCTTTCCCTACACGACG NO: 475 amplify Hsa07 dual- CTCTTCCGATCTTTCCTCCC target from human TAACCTCCACCT genomic DNA SEQ ID Reverse primer to GGAGTTCAGACGTGTGCTCT NO: 476 amplify Hsa07 dual- TCCGATCTTAAAGAGCCCAA target from human CCAGATGC genomic DNA - PCR was performed according to the manufacturer's instructions (New England Biolabs, Ipswich, Mass.), and the desired product DNA, 288-bp in length, was purified using a Nucleospin Gel and PCR Cleanup kit (Macherey-Nagel, Bethlehem, Pa.) This dsDNA was then used directly in DNA binding and/or DNA cleavage assays with Cascade and/or Cascade-effector domain fusion RNPs.
- C. Production of dsDNA Target Sequences by Cloning Target Sequences into Bacterial Plasmids
- DNA oligonucleotides encoding the target region of interest comprising the target sequence, also known as the protospacer, that is recognized by the guide portion of CRISPR RNA, the neighboring protospacer adjacent motif (PAM), and additional 5′ and 3′ flanking sequences were purchased from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa). The oligonucleotides were designed such that, when annealed, the termini regenerate sticky ends upon cleavage of their respective recognition sites by the restriction enzymes EcoRI and BlpI, or by BamHI and EcoRI. Oligonucleotides were designed to contain a single target sequence derived from the bacteriophage lambda genome, denoted J3. In addition, oligonucleotides were designed to contain two tandem target sequences derived from the bacteriophage lambda genome, denoted J3 and L3, separated from each other by a 15-bp interspacer sequence. Sequences of these oligonucleotides are listed in Table 24.
-
TABLE 24 Oligonucleotides Comprising Target Sequences Restric- tion enzyme recogni- SEQ ID tion NO: Description sites Sequence SEQ ID Forward BamHI GATCCATGC NO: 477 oligonucleotide, and CAGTGATAA J3 target sequence EcoRI GTGGAATGC for cloning into CATGTGGGC pACYC-Duet1 TGTCAAAAG SEQ ID Reverse BamHI AATTCTTTT NO: 478 oligonucleotide, and GACAGCCCA J3 target sequence EcoRI CATGGCATT for cloning into CCACTTATC pACYC-Duet1 ACTGGCATG SEQ ID Foward EcoRI AATTCTTTT NO: 479 oligonucleotide, and GACAGCCCA J3-15bp-L3 target BlpI CATGGCATT sequences for CCACTTATC cloning into ACTGGCATC pACYC-Duet1 CTAGGCCTC TCGAGATGA GTGGCAGAT ATAGCCTGG TGGTTCAGG CGGCGCATG C SEQ ID Reverse EcoRI TCAGCATGC NO: 480 oligonucleotide, and GCCGCCTGA J3-15bp-L3 target BlpI ACCACCAGG sequences for CTATATCTG cloning into CCACTCATC pACYC-Duet1 TCGAGAGGC CTAGGATGC CAGTGATAA GTGGAATGC CATGTGGGC TGTCAAAAG - The oligonucleotides contain 5′-phosphorylated ends, which were introduced by the commercial manufacturer or phosphorylated in-house using T4 polynucleotide kinase (New England Biolabs, Ipswich, Mass.). The oligonucleotides were then annealed at a final concentration of 1 μM by mixing together equimolar amounts in annealing buffer (6 mM HEPES, pH 7.0, 60 mM KCl), heating to 95° C. for 2 minutes, and then slow-cooling on the benchtop.
- Separately, a pACYC-Duet1 (MilliporeSigma, Hayward, Calif.) plasmid was double-digested with the corresponding pair of restriction enzymes, either BamHI and EcoRI, or EcoRI and BlpI, whose sticky ends match the sticky ends formed by the termini of the hybridized oligonucleotides. The double-digested vector was separated from the removed insert using agarose gel electrophoresis.
- In order to clone the hybridized oligonucleotides into the double-digested vector, the hybridized oligonucleotides were diluted to a 50 nM stock concentration, and then a 10 μL ligation reaction was formed using hybridized oligonucleotides, the double-digested vector, and Quick Ligase from New England Biolabs. The ligation reaction was then used to transform chemically competent E. coli strains, and after overnight growth on agarose plates, individual clones were isolated and grown in liquid culture to generate sufficient bacterial cultures from which to isolate plasmids. Sanger sequencing was then used to validate the desired plasmid sequence. Table 25 provides complete vector sequences for plasmids containing the J3 target sequence (SEQ ID NO:481) and plasmids containing the J3 and L3 targets sequences separated by the 15-bp interspacer sequence (SEQ ID NO:482).
-
TABLE 25 Complete Plasmid Sequences SEQ ID NO: Description of plasmid SEQ ID NO: 481 J3 target sequence in pACYC-Duet1 SEQ ID NO: 482 J3-15bp-L3 target sequences in pACYC-Duet1 SEQ ID NO: 483 J3-30bp-L3 target sequences in pACYC-Duet1 SEQ ID NO: 484 multi-target plasmid - Further cloning manipulations were used to generate additional double-target plasmid constructs. The 15-bp interspacer sequence of SEQ ID NO:482 contains unique AvrII and XhoI restriction sites. Thus, introduction of additional hybridized oligonucleotides into these restriction sites expands the interspacer to longer lengths, for biochemical testing with purified Cascade and Cascade-nuclease fusion RNPs. Because the crRNA-guided FokI-Cascade fusion complex targets two adjacent DNA site, dimerization of the FokI domains from adjacent DNA-bound complexes leads to DNA cleavage within the interspacer separating the two target sites. Variable interspacer lengths were designed and tested to evaluate a given interspacer length with a given tethering geometry between the FokI nuclease domain and its fused Cascade subunit protein. The complete vector sequence for a target DNA substrate containing an expanded interspacer sequence of 30-bp in length is given in Table 25 as SEQ ID NO:483.
- In addition, the following cloning strategy provided a plasmid substrate that contains several target sequences serially connected along one large insert. A gene block was ordered from a commercial manufacturer (Integrated DNA Technologies, Coralville, Iowa) that contained 17 consecutive dual targets. The gene block contained 4 bp separating each dual target from a neighboring dual target, and contained 16 dual targets derived from Homo sapiens genomic DNA, as well as one control dual target containing J3/L3 targets derived from the bacteriophage lambda genome. The genomic coordinates of the 16 consecutive human dual targets are shown in Table 26. The gene block was ordered with flanking SacI and SbfI restriction sites on the ends, such that it could be cloned into SacI and SbfI sites in the pACYC-Duet1 vector. The full vector sequence of the multi-target plasmid substrate generated by cloning the gene block into pACYC-Duet1 is presented as SEQ ID NO:484 in Table 25. This multi-target sequence plasmid allowed for biochemical testing of multiple different FokI-Cascade preparations harboring crRNAs targeting one of the serially connected target sites within the plasmid.
-
TABLE 26 Human Dual Targets Target 5′ spacer target 3′ spacer target SEQ ID NO: name Gene genomic coordinates genomic coordinates SEQ ID NO: 485 Hsa01 PDCD1 chr2: 241850348-241850382 chr2: 241850408-241850442 SEQ ID NO: 486 Hsa02 CTLA4 chr2: 203870664-203870698 chr2: 203870724-203870758 SEQ ID NO: 487 Hsa03 TRAC chr14: 22509340-22509374 chr14: 22509405-22509439 SEQ ID NO: 488 Hsa04 TRAC chr14: 22509785-22509819 chr14: 22509850-22509884 SEQ ID NO: 489 Hsa05 TRAC chr14: 22513932-22513966 chr14: 22513997-22514031 SEQ ID NO: 490 Hsa06 TRAC chr14: 22515993-22516027 chr14: 22516058-22516092 SEQ ID NO: 491 Hsa07 TRAC chr14: 22516265-22516299 chr14: 22516330-22516364 SEQ ID NO: 492 Hsa08 CD52 chr1: 26320402-26320436 chr1: 26320467-26320501 SEQ ID NO: 493 Hsa09 CTLA4 chr2: 203873012-203873046 chr2: 203873077-203873111 SEQ ID NO: 494 Hsa10 CTLA4 chr2: 203873195-203873229 chr2: 203873260-203873294 SEQ ID NO: 495 Hsa11 TRAC chr14: 22551630-22551664 chr14: 22551700-22551734 SEQ ID NO: 496 Hsa12 CTLA4 chr2: 203872758-203872792 chr2: 203872828-203872862 SEQ ID NO: 497 Hsa13 TRAC chr14: 22551862-22551896 chr14: 22551937-22551971 SEQ ID NO: 498 Hsa14 TRBC2 chr7: 142801112-142801146 chr7: 142801187-142801221 SEQ ID NO: 499 Hsa15 TRAC chr14: 22551630-22551664 chr14: 22551710-22551744 SEQ ID NO: 500 Hsa16 CTLA4 chr2: 203867814-203867848 chr2: 203867894-203867928 - This Example illustrates the use of FokI-Cascade fusion protein complexes in biochemical double-stranded DNA (dsDNA) cleavage assays. Protein reagents were compared in terms of their activity in dsDNA cleavage.
- FokI-Cascade RNPs derived from the E. coli Type I-E Cascade system were designed, recombinantly expressed in E. coli, and purified for use, as outlined in Examples 1, 2, and 5. These RNPs were designed to contain either CRISPR RNAs that target the J3 and L3 target sequences derived from bacteriophage lambda genomic DNA, or that target an intron in the TRAC gene within human genomic DNA. Each RNP preparation is a heterogeneous mixture comprising two FokI-Cascade complexes that are otherwise identical except for the guide portion of the crRNA.
- A FokI-Cascade complex was reconstituted by mixing together a CasBCDE complex (produced using SEQ ID NO:440 and SEQ ID NO:446, as described in Example 2) with purified FokI-Cas8 comprising a 16-aa linker (the general FokI-Cas8 expression vector sequence is described in Example 2, SEQ ID NO:439 in Table 14; the particular 16-aa linker is in Example 1, SEQ ID NO:431 in Table 12). Reconstitution was performed in 1× Cascade Cleavage Buffer (20 mM Tris-Cl, pH 7.5, 200 mM NaCl, 5 mM MgCl2, 1 mM TCEP, 5% glycerol) with CasBCDE and FokI-Cas8 both at 1 μM final concentrations.
- In order to perform DNA cleavage assays, reaction mixtures were as follows. A plasmid substrate comprising the J3/L3 double-target sequence with a 30-bp interspacer (SEQ ID NO:483 in Table 25) was incubated with varying concentrations of FokI-Cascade complex (3-100 nM) in a 15 μL reaction in 1× Cascade Cleavage Buffer, with the plasmid DNA at a final concentration of 13.3 ng/μL. Reactions were incubated for 30 minutes at 37° C., after which 3 μL of 6×SDS loading dye was added. The loading dye was added to denature bound FokI-Cascade complexes. The reaction mixture components were resolved by 0.8% agarose gel electrophoresis. Gels were stained after electrophoresis with SYBR™ Safe DNA Gel Stain (Thermo Scientific, Wilmington, Del.).
- As a positive control, Streptococcus pyogenes Cas9 protein was programmed with a single-guide RNA (sgRNA) targeting a 20-bp portion of the Cascade J3 target sequence (sgRNA-J3; the spacer sequence is presented as SEQ ID NO:501). Cas9/sgRNA-J3 complexes were reconstituted by mixing Cas9 together with a 2-fold molar excess of sgRNA in 1×CCE buffer (20 mM HEPES pH 7.4, 10 mM MgCl2, 150 mM KCl, 5% glycerol). Cleavage by this Cas9/sgRNA-J3 complex was evaluated across the same concentration range (3-100 nM) by incubating reactions for 30 minutes at 37° C. Also included in the experiment were control lanes containing uncut plasmid DNA, as well as plasmid DNA linearized with the NheI restriction enzyme (New England Biolabs, Ipswich, Mass.). Target DNA cleavage is evidenced by a mobility shift in the plasmid, because uncut plasmid DNA is supercoiled and has a faster mobility than cleaved, linearized plasmid DNA. Nicked, open-circular plasmid DNA has a slower mobility than both supercoiled and linearized plasmid DNA.
- The data obtained from these experiments demonstrate that, over the concentration range, the FokI-Cascade complex exhibited similar target DNA cleavage activity as Cas9-sgRNA. At the highest concentration tested (100 nM), the plasmid target was quantitatively linearized by the FokI-Cascade complex and Cas9-sgRNA.
- FokI-Cascade complex reagents were also tested for their kinetics of target DNA cleavage. A plasmid substrate containing the J3/L3 double-target sequence with a 30-bp interspacer (SEQ ID NO:483) was incubated with 200 nM FokI-Cascade complex or 200 nM Cas9-sgRNA in a 15 μL reaction, with the plasmid DNA at a final concentration of 13.3 ng/μL. Reactions were quenched at either 0, 7, 10, 15, 20, 25, or 30 minutes, and reaction components were resolved by agarose gel electrophoresis as described above. The FokI-Cascade complex exhibited similar but slightly slower rates of target DNA cleavage activity as Cas9/sgRNA-J3 complex, with the target plasmid quantitatively linearized by the 25 minute time-point for the FokI-Cascade complex and by the 20 minute time point for the Cas9/sgRNA-J3 complex.
- FokI-Cascade complex reagents were also tested for their non-specific DNA cleavage and/or nicking activity on the pACYC-Duet1 non-target plasmid substrate, versus specific DNA cleavage of a the J3/L3 double-target plasmid substrate. Table 27 contains the sequence of the pACYC-Duet1 non-target plasmid substrate used for this control (SEQ ID NO:502). Specifically, the dependence of non-specific and specific DNA target cleavage was investigated as a function of the monovalent salt concentration in the reaction buffer. Modified variants of the 1× Cascade Cleavage Buffer (20 mM Tris-Cl, pH 7.5, 200 mM NaCl, 5 mM MgCl2, 1 mM TCEP, and 5% glycerol) were prepared, in which the NaCl concentration was dropped from 200 mM to either 150 mM, 100 mM or 50 mM, and the same cleavage reactions as described above were performed by incubating 200 nM FokI-Cascade complex with either 13.3 ng/μL of the J3/L3 target plasmid or 13.3 ng/μL of the pACYC-Duet1 non-target plasmid. Additional control reactions were performed, in which the NaCl concentration was maintained at 100 mM, but the 5 mM MgCl2 was replaced with 10 mM EDTA, which was expected to abrogate cleavage because of the requirement of FokI for divalent metal ions for DNA cleavage. Accordingly, non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions:−FokI-Cascade complex; +FokI-Cascade complex, 100 mM NaCl buffer+10 mM EDTA; +FokI-Cascade complex, 50 mM NaCl buffer; +FokI-Cascade complex, 100 mM NaCl buffer; +FokI-Cascade complex, 150 mM NaCl buffer; +FokI-Cascade complex, 200 mM NaCl buffer. The data demonstrate that FokI-Cascade complex showed non-specific nicking of both the non-target and J3/L3 target plasmid at low salt concentrations <200 mM NaCl, but that at a monovalent salt concentration of 200 mM NaCl, the non-target plasmid remained intact, but the J3/L3 target plasmid was quantitatively linearized. Furthermore, buffer containing EDTA led to a complete abrogation of target cleavage, as expected.
- In order to confirm that the FokI-Cascade complex cleaves the target plasmid at the expected position, that is, within the middle of the interspacer sequence separating the J3 and L3 targets, an experiment was performed in which the target plasmid was first incubated with FokI-Cascade complex, followed by incubation with the AfeI restriction enzyme (New England Biolabs, Ipswich, Mass.), which cleaves elsewhere in the plasmid substrate. Thus, cleavage by both FokI-
Cascade 1 complex and AfeI converts the supercoiled, circular plasmid into two linear fragments migrating as distinct species on an agarose gel. Specifically, cleavage was expected to generate fragments that are 2427 bp and 1357 bp in length. - 13.3 ng/μL J3/L3 target plasmid was incubated with 200 nM FokI-
Cascade 1 complex for 30 minutes, after which 1 μL of AfeI (10 Units/μL; New England Biolabs, Ipswich, Mass.) was added to the reaction, followed by an additional 30-minute incubation at 37° C. Reaction products were resolved by agarose gel electrophoresis, as described above. Additionally, for control experiments, the target plasmid was incubated with only FokI-Cascade 1 complex or only AfeI, and the same reactions were performed with a non-target plasmid that can be cleaved by AfeI but not by FokI-Cascade 1 complex (because the plasmid lacks the J3/L3 dual target). Table 27 contains the sequence of the pACYC-Duet1 non-target plasmid substrate used for this control (SEQ ID NO:502). Accordingly, non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions: −AfeI/−FokI-Cascade complex; −AfeI/+FokI-Cascade complex; +AfeI/+FokI-Cascade complex; and +AfeI/−FokI-Cascade complex. The data demonstrate that FokI-Cascade complex cleaved the target plasmid in the expected location, because co-incubation with FokI-Cascade 1 complex and AfeI lead to two linear products of the expected lengths. - In order to further confirm the sequence specificity of DNA cleavage by the FokI-Cascade complex, additional control plasmid substrates were generated that contain as follows: mutations in the PAM flanking the J3 target, mutations in the PAM flanking the L3 target, mutations in both PAMs flanking J3/L3 targets; mutations in the spacer sequence within the J3 target, mutations in the spacer sequence within the L3 target, mutations in both spacer sequences within J3/L3 targets; and the J3 target but not the L3 target, the L3 target but not the J3 target, and neither J3 nor L3 target. Accordingly, the plasmid substrates were as follows: J3 PAM mutant, L3 PAM mutant, J3/L3 PAM mutant, J3 spacer mutant, L3 spacer mutant, J3/L3 spacer mutant, non-target plasmid, J3-only target, L3-only target, and J3/L3 target plasmid. Each target was subjected to the following reaction conditions: −NdeI/−FokI-Cascade complex; +NdeI/−FokI-Cascade complex; and −NdeI/+FokI-
Cascade 1 complex. Table 27 contains the sequences of all the mutated plasmid substrates described above (SEQ ID NO:502 through SEQ ID NO:510). -
TABLE 27 Mutated Plasmid Substrate Sequences SEQ ID NO: Description of plasmid SEQ ID NO: 502 pACYC-Duet1 non-target plasmid SEQ ID NO: 503 J3-30bp-L3 target plasmid, J3 PAM mutant SEQ ID NO: 504 J3-30bp-L3 target plasmid, L3 PAM mutant SEQ ID NO: 505 J3-30bp-L3 target plasmid, J3/L3 PAM mutants SEQ ID NO: 506 J3-30bp-L3 target plasmid, J3 spacer mutant SEQ ID NO: 507 J3-30bp-L3 target plasmid, L3 spacer mutant SEQ ID NO: 508 J3-30bp-L3 target plasmid, J3/L3 spacer mutants SEQ ID NO: 509 J3-only target plasmid SEQ ID NO: 510 L3-only target plasmid - DNA cleavage reactions were performed as described above, using 200 nM FokI-Cascade complex and 13.3 ng/μL plasmid substrates; control reactions to linearize each plasmid substrate were performed with NdeI (New England Biolabs, Ipswich, Mass.). Agarose gel electrophoresis was performed as described above. The data demonstrate that efficient double-strand beak introduction and linearization of the target plasmid is only observed for the J3/L3 target plasmids, but not for control plasmids harboring PAM or seed mutations, or only one of the two target sites.
- Components for various FokI-Cascade complexes were cloned and overexpressed. RNPs produced by these components were purified and tested for biochemical DNA cleavage, in order to compare activity for different FokI-Cascade complexes. Specifically, DNA cleavage activities were compared for reconstituted FokI-Cascade complexes comprising the following: separately purified CasBCDE complex (produced using SEQ ID NO:440 and SEQ ID NO:446) and FokI-Cas8 (produced using SEQ ID NO:439); FokI-Cascade harboring the J3/L3 guide crRNAs (produced using SEQ ID NO:442 and SEQ ID NO:446); FokI-Cascade harboring an additional nuclear localization signal on either the Cas7 subunit (produced using SEQ ID NO:443 and SEQ ID NO:446) or the Cas6 subunit; FokI-Cascade harboring an additional nuclear localization signal and HA tag on either the Cas7 subunit or the Cas6 subunit; FokI-Cascade that underwent a more stringent purification involving both size exclusion chromatography (SEC) and ion exchange chromatography (IEX); and FokI-Cascade that was purified only by immobilized metal affinity chromatography (IMAC), without further clean-up.
- Accordingly, non-target plasmid and J3/L3 target plasmid were subjected to the following reaction conditions: negative control; AfeI; CasBCDE+FokI-Cas8 complex; FokI-Cascade complex; FokI-Cascade (NLS-Cas6) complex; FokI-Cascade (Cas7-NLS) complex; FokI-Cascade (NLS-HA-Cas6) complex; FokI-Cascade (Cas7-HA-NLS) complex; FokI-Cascade complex (IEX, SEC clean-up); and FokI-Cascade complex (no clean-up). DNA cleavage reactions were performed with these RNP reagents as described above, using either the non-target plasmid or the consensus J3/L3 target plasmids, and reaction products were resolved by agarose gel electrophoresis. The data demonstrate that all of the RNP reagents, with one exception, exhibit nearly identical and quantitative plasmid DNA cleavage, with no background cleavage of the non-target plasmid. The sole exception was the FokI-Cascade purified without further clean-up, which exhibited more non-specific nicking activity, as seen for the lane in which it was incubated with the non-target plasmid.
- Finally, using the NLS-tagged Cas7 variant of the FokI-Cascade complex as a starting point, 16 different paired guide crRNA were tested for biochemical DNA cleavage of a plasmid substrate for Homo sapiens genomic sites Hsa01 through Hsa16 serially connected along one large insert (SEQ ID NO:484). Each pair of crRNAs contains two unique spacer sequences that correspond to two adjacent target sites in human genomic DNA, separated by an interspacer; the target sequences are described in SEQ ID NO:485 through SEQ ID NO:500. Table 28 contains sequences of both crRNAs within each pair that targets Hsa01 through Hsa16 genomic DNA sequences; the spacer of the crRNA is underlined and in lower case, and the
sequences 5′ and 3′ of the guide region correspond to repeat sequences from the CRISPR array. -
TABLE 28 crRNA Sequences SEQ ID NO: DNA target crRNA sequence SEQ ID NO: 511 Hsa01-1 AUAAACCGcgggcaggcagagcuggaggccuuucaggccc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 512 Hsa01-2 AUAAACCGggccugaggugcugccugggcauguguaaagg GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 513 Hsa02-1 AUAAACCGcacugucacccggaccucaguggcuuugccug GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 514 Hsa02-2 AUAAACCGucugugcggcaaccuacaugauggggaaugag GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 515 Hsa03-1 AUAAACCGaugagcuuguuuguagcaccaccauaauucac GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 516 Hsa03-2 AUAAACCGuacguaaguaguggcaugugucagguggauuc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 517 Hsa04-1 AUAAACCGaaggcauuuggaccggcagacacauaauugua GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 518 Hsa04-2 AUAAACCGagacuccagagccauccuugggaagagugcug GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 519 Hsa05-1 AUAAACCGacaagagguguguuuccugaauucccacagug GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 520 Hsa05-2 AUAAACCGuaaguguuucuagccauccuugauuuugauca GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 521 Hsa06-1 AUAAACCGuggcuacugcucugucuccugggauccugccu GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 522 Hsa06-2 AUAAACCGgcccauaccuucaaggaaaauuaaggcaaaua GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 523 Hsa07-1 AUAAACCGguugauuugccugcauugguguuacacagucu GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 524 Hsa07-2 AUAAACCGuaaguuguguucuucuuugccuaggccuucag GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 525 Hsa08-1 AUAAACCGgcacugccugucaacuucuacaaccuggugau GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 526 Hsa08-2 AUAAACCGuaggggccaagcagugcccagcugggggucaa GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 527 Hsa09-1 AUAAACCGcuuucacugaaaguggagcugaugugacagaa GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 528 Hsa09-2 AUAAACCGaugugggucaaggaauuaaguuagggaauggc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 529 Hsa10-1 AUAAACCGgcauaaaauuuaacuugaaaagaucauuucgg GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 530 Hsa10-2 AUAAACCGgcuucaaaaauacucacauggcuauguuuuag GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 531 Hsa11-1 AUAAACCGaggggcaaugcagaggaaggagcgagggagca GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 532 Hsa11-2 AUAAACCGgaggugaaagcugcuaccaccucugugccccc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 533 Hsa12-1 AUAAACCGgcugaaauugcuuuucacauucuggcucuguu GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 534 Hsa12-2 AUAAACCGagaguccauauuucaauuuccaagagcugagg GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 535 Hsa13-1 AUAAACCGugcacagccaggggaggcugcagcagccuugc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 536 Hsa13-2 AUAAACCGauggaucuucaguggguucucuugggcucuag GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 537 Hsa14-1 AUAAACCGccuguggccaggcacaccagugUGGCCUUUUG GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 538 Hsa14-2 AUAAACCGgaggugcacaguggggucagcacagacccgca GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 539 Hsa15-1 AUAAACCGaggggcaaugcagaggaaggagcgagggagca GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 540 Hsa15-2 AUAAACCGcugcuaccaccucugugcccccccggcaaugc GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 541 Hsa16-1 AUAAACCGgacuuuauauagauagcuuugaucccagauau GAGUUCCCCGCGCCAGCGGGG SEQ ID NO: 542 Hsa16-2 AUAAACCGguuuugcucuacuuccugaagaccugaacacc GAGUUCCCCGCGCCAGCGGGG - After the 16 FokI-Cascade complexes were purified, cleavage reactions were performed as described above, wherein the FokI-Cascade complexes were incubated with the plasmid substrate containing Homo sapiens genomic sites Hsa01 through Hsa16, and the reaction products were resolved by agarose gel electrophoresis. The data demonstrate that, of the 16 RNP reagents, 14/16 (Hsa03-Hsa16) exhibited nearly quantitative DNA cleavage, as evidenced by conversion of the supercoiled, circular plasmid substrate into the cleaved, linear form. Only constructs Hsa01 and Hsa02 showed partial nicking activity.
- This Example illustrates the design and delivery of E. coli Type Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells and describes their delivery into target cells as pre-assembled Cascade RNP complexes.
- A. Production of Cascade RNP Complexes Comprising FokI for Transformation into Cells
- Minimal CRISPR arrays were designed to target eight distinct loci in the human genome. Each minimal CRISPR array contained two spacer sequences, both of which were flanked by CRISPR repeat sequences. The two spacer sequences targeted loci in the genome separated by 30 bp (i.e., a 30-bp interspacer region), and each spacer was designed to bind a target sequence adjacent to an AAG or ATG protospacer adjacent motif (PAM) sequence in the target cell genome. Plasmid vectors containing each minimal CRISPR array were produced by ligating annealed oligonucleotides (Integrated DNA Technologies, Coralville, Iowa) into a pACYC-Duet1 (Millipore Sigma, Billerica, Mass.) vector backbone for bacterial expression.
- Overlapping primers to produce selected spacers in minimal CRISPR arrays are set forth in Table 29, and the sequences of the primers are described in Table 30.
-
TABLE 29 Overlapping Primers for Generation of Minimal CRISPR arrays Component Gene target Primers Hsa03 Minimal CRISPR array TRAC intron A, B Hsa04 Minimal CRISPR array TRAC intron C, D Hsa05 Minimal CRISPR array TRAC intron E, F Hsa06 Minimal CRISPR array TRAC intron G, H Hsa07 Minimal CRISPR array TRAC intron I, J Hsa08 Minimal CRISPR array CD52 exon K, L Hsa09 Minimal CRISPR array CTLA4 exon M, N Hsa10 Minimal CRISPR array CTLA4 exon O, P -
TABLE 30 DNA Primer Sequences SEQ ID Oligo- NO: nucleotide Sequence SEQ ID A /5Phos/ACCGATGAGCTTGTTTGTAGCACCACCATAATTC NO: 543 ACGAGTTCCCCGCGCCAGCGGGGATAAACCGTACGTA AGTAGTGGCATGTGTCAGGTGGATTC SEQ ID B /5Phos/ACTCGAATCCACCTGACACATGCCACTACTTACG NO: 544 TACGGTTTATCCCCGCTGGCGCGGGGAACTCGTGAATT ATGGTGGTGCTACAAACAAGCTCAT SEQ ID C /5Phos/ACCGAAGGCATTTGGACCGGCAGACACATAATT NO: 545 GTAGAGTTCCCCGCGCCAGCGGGGATAAACCGAGACT CCAGAGCCATCCTTGGGAAGAGTGCTG SEQ ID D /5Phos/ACTCCAGCACTCTTCCCAAGGATGGCTCTGGAGT NO: 546 CTCGGTTTATCCCCGCTGGCGCGGGGAACTCTACAATT ATGTGTCTGCCGGTCCAAATGCCTT SEQ ID E /5Phos/ACCGACAAGAGGTGTGTTTCCTGAATTCCCACA NO: 547 GTGGAGTTCCCCGCGCCAGCGGGGATAAACCGTAAGT GTTTCTAGCCATCCTTGATTTTGATCA SEQ ID F /5Phos/ACTCTGATCAAAATCAAGGATGGCTAGAAACAC NO: 548 TTACGGTTTATCCCCGCTGGCGCGGGGAACTCCACTGT GGGAATTCAGGAAACACACCTCTTGT SEQ ID G /5Phos/ACCGTGGCTACTGCTCTGTCTCCTGGGATCCTGC NO: 549 CTGAGTTCCCCGCGCCAGCGGGGATAAACCGGCCCAT ACCTTCAAGGAAAATTAAGGCAAATA SEQ ID H /5Phos/ACTCTATTTGCCTTAATTTTCCTTGAAGGTATGG NO: 550 GCCGGTTTATCCCCGCTGGCGCGGGGAACTCAGGCAG GATCCCAGGAGACAGAGCAGTAGCCA SEQ ID I /5Phos/ACCGGTTGATTTGCCTGCATTGGTGTTACACAGT NO: 551 CTGAGTTCCCCGCGCCAGCGGGGATAAACCGTAAGTTG TGTTCTTCTTTGCCTAGGCCTTCAG SEQ ID J /5Phos/ACTCCTGAAGGCCTAGGCAAAGAAGAACACAAC NO: 552 TTACGGTTTATCCCCGCTGGCGCGGGGAACTCAGACTG TGTAACACCAATGCAGGCAAATCAAC SEQ ID K /5Phos/ACCGGCACTGCCTGTCAACTTCTACAACCTGGTG NO: 553 ATGAGTTCCCCGCGCCAGCGGGGATAAACCGTAGGGG CCAAGCAGTGCCCAGCTGGGGGTCAA SEQ ID L /5Phos/ACTCTTGACCCCCAGCTGGGCACTGCTTGGCCCC NO: 554 TACGGTTTATCCCCGCTGGCGCGGGGAACTCATCACCA GGTTGTAGAAGTTGACAGGCAGTGC SEQ ID M /5Phos/ACCGCTTTCACTGAAAGTGGAGCTGATGTGACA NO: 555 GAAGAGTTCCCCGCGCCAGCGGGGATAAACCGATGTG GGTCAAGGAATTAAGTTAGGGAATGGC SEQ ID N /5Phos/ACTCGCCATTCCCTAACTTAATTCCTTGACCCAC NO: 556 ATCGGTTTATCCCCGCTGGCGCGGGGAACTCTTCTGTC ACATCAGCTCCACTTTCAGTGAAAG SEQ ID O /5Phos/ACCGGCATAAAATTTAACTTGAAAAGATCATTT NO: 557 CGGGAGTTCCCCGCGCCAGCGGGGATAAACCGGCTTC AAAAATACTCACATGGCTATGTTTTAG SEQ ID P /5Phos/ACTCCTAAAACATAGCCATGTGAGTATTTTTGAA NO: 558 GCCGGTTTATCCCCGCTGGCGCGGGGAACTCCCGAAAT GATCTTTTCAAGTTAAATTTTATGC SEQ ID Q CACTCTTTCCCTACACGACGCTCTTCCGATCTAGCCTGG NO: 559 AAAGACACAAAGC SEQ ID R GGAGTTCAGACGTGTGCTCTTCCGATCTCAGCCATCCT NO: 560 TTCCACCTAA SEQ ID S CACTCTTTCCCTACACGACGCTCTTCCGATCTATGCTGC NO: 561 AGGCTTTATGCTT SEQ ID T GGAGTTCAGACGTGTGCTCTTCCGATCTTTAGGCCTGC NO: 562 CTGACTTCTC SEQ ID U CACTCTTTCCCTACACGACGCTCTTCCGATCTGGGAAG NO: 563 AAGACCAACAAGAGG SEQ ID V GGAGTTCAGACGTGTGCTCTTCCGATCTTTCAAGGGAA NO: 564 GAAGCCATTG SEQ ID W CACTCTTTCCCTACACGACGCTCTTCCGATCTAAGGCA NO: 565 GGAATTGGATGAAA SEQ ID X GGAGTTCAGACGTGTGCTCTTCCGATCTAACCTGAGAT NO: 566 GACTGCCCAT SEQ ID Y CACTCTTTCCCTACACGACGCTCTTCCGATCTTTCCTCC NO: 567 CTAACCTCCACCT SEQ ID Z GGAGTTCAGACGTGTGCTCTTCCGATCTTAAAGAGCCC NO: 568 AACCAGATGC SEQ ID A2 CACTCTTTCCCTACACGACGCTCTTCCGATCTGTCTCAG NO: 569 CCTTAGCCCTGTG SEQ ID B2 GGAGTTCAGACGTGTGCTCTTCCGATCTCCCACTGCAA NO: 570 GTACAAGGGT SEQ ID C2 CACTCTTTCCCTACACGACGCTCTTCCGATCTGGATGC NO: 571 GGAACCCAAATTA SEQ ID D2 GGAGTTCAGACGTGTGCTCTTCCGATCTTAGTCTTCTCC NO: 572 CTCGCTCCC SEQ ID E2 CACTCTTTCCCTACACGACGCTCTTCCGATCTTGCAGCA NO: 573 TTATGATGTGGGT SEQ ID F2 GGAGTTCAGACGTGTGCTCTTCCGATCTCAACCTTTAG NO: 574 CATCACTGGCT SEQ ID G2 CAAGCAGAAGACGGCATACGAGATNNNNNNNNG NO: 575 TGACTGGAGTTCAGACGTGTGCTC SEQ ID H2 AATGATACGGCGACCACCGAGATCTACACNNNNN NO: 576 NNNACACTCTTTCCCTACACGACG - The design of bacterial expression vectors for production of Cascade RNP complexes is detailed in Example 2. In brief, each cas gene was expressed from a single operon, and the coding sequences for the cas genes were arranged in the order of cas8-cse2-cas7-cas5-cas6. The FokI moiety was attached by a 30-aa linker to Cas8, and a nuclear localization signal (NLS) was attached to the N-terminus of FokI-Cas8 (FokI-Cascade complex) and the N-terminus of Cas6 (hereafter referred to as FokI-Cascade-NLS-Cas6 complex, SEQ ID NO:577).
- FokI-Cascade-NLS-Cas6 complexes were purified as assembled complexes from E. coli essentially as described in Example 5A.
- B. Transfection of Cascade RNP Complexes Comprising FokI into Eukaryotic Cells
- HEK293 cells (ATCC, Manassas, Va.) were cultured in suspension in DMEM medium supplemented with 10% FBS and 1× Antibiotic-Antimycotic Solution (Mediatech, Inc., Manassas, Va.) at 37° C., 5% CO2 and 100% humidity. HEK293 cells were transfected using the Nucleofector® 96-well Shuttle System (Lonza, Allendale, N.J.). Prior to nucleofection, 5 μl of FokI-Cascade RNPs were transferred to individual wells of a 96-well plate. Each well contained ˜225-500 pmol of FokI-Cascade-NLS-Cas6 complexes, depending on the RNP. HEK293 cells were transferred to a 50 ml conical centrifuge tube and centrifuged at 200×G for 3 minutes. The media was aspirated and the cell pellet was washed in calcium and magnesium-free PBS. The cells were centrifuged once more and re-suspended in Nucleofector SF buffer (Lonza, Allendale, N.J.) at a concentration of 1×107 cells/ml. 20 μl of this cell suspension was added to the FokI-Cascade-NLS-Cas6 complexes in the 96-well plate, mixed, and then the entire volume was transferred to a 96-well Nucleocuvette™ Plate. The plate was then loaded into the Nucleofector™ 96-well Shuttle™ and cells were nucleofected using the 96-CM-130 Nucleofector™ program (Lonza, Allendale, N.J.). Immediately following nucleofection, 80 μl of complete DMEM medium was added to each well of the 96-well Nucleocuvette™ Plate. The entire contents of the well were then transferred to a 96-well tissue culture plate containing 100 μl of complete DMEM medium. The cells were cultured at 37° C., 5% CO2 and 100% humidity for ˜72 hours.
- After ˜72 hours, the HEK293 cells were centrifuged at 500×G for 5 minutes and the medium was removed. The cells were washed in calcium and magnesium-free PBS. The cell pellets were then re-suspended in 50 μl of QuickExtract DNA Extraction solutions (Epicentre, Madison, Wis.). The gDNA samples obtained were then incubated at 37° C. for 10 minutes, 65° C. for 6 minutes, and 95° C. for 3 minutes to stop the reaction. gDNA samples were then diluted with 50 μl of water and stored at −20° C. for subsequent deep sequencing analysis.
- C. Deep Sequencing of gDNA from Transfected Cells
- Using the isolated gDNA, a first PCR was performed using Q5 Hot Start High-
Fidelity 2× Master Mix (New England Biolabs, Ipswich, Mass.) at 1× concentration, primers at 0.5 μM each, 3.75 μL of gDNA in a final volume of 10 μL and amplified 98° C. for 1 minute, 35 cycles of 10 seconds at 98° C., 20 seconds at 60° C., 30 seconds at 72° C., and a final extension at 72° C. for 2 minutes. PCR reaction was diluted 1:100 in water. Target-specific primers are shown in Table 31. The target-specific primers contained Illumina-compatible sequences so that the amplification products could be analyzed using a MiSeq Sequencer (Illumina, San Diego). -
TABLE 31 Target-specific Primers Used for Sequencing Target Oligonucleotide* Hsa03 on-target Q, R Hsa04 on-target S, T Hsa05 on-target U, V Hsa06 on-target W, X Hsa07 on-target Y, Z Hsa08 on-target A2, B2 Hsa09 on-target C2, D2 Hsa10 on-target E2, F2 *DNA primer sequences are shown in Table 30 - A second “barcoding” PCR was set up such that each target was amplified with primers (G2 and H2 in Table 30) that each contained unique 8 bp indices (denoted by “NNNNNNNN” in the primer sequence (see SEQ ID NO:575 and SEQ ID NO:576), thus allowing de-multiplexing of each amplicon during sequence analysis.
- The second PCR was performed using Q5 Hot Start High-
Fidelity 2× Master Mix (New England Biolabs, Ipswich, Mass.) at 1× concentration, primers at 0.5 μM each, 1 μL of 1:100 diluted first PCR, in a final volume of 10 μL and amplified 98° C. for 1 minute, 12 cycles of 10 seconds at 98° C., 20 seconds at 60° C., 30 seconds at 72° C., and a final extension at 72° C. for 2 minutes. PCR reactions were pooled into a single microfuge tube for SPRIselect bead (Beckman Coulter, Pasadena, Calif.)-based cleanup of amplicons for sequencing. - To pooled amplicons, 0.9× volumes of SPRIselect beads were added, mixed and incubated at room temperature for 10 minutes. The microfuge tube was placed on a magnetic tube stand (Beckman Coulter, Pasadena, Calif.) until solution had cleared. Supernatant was removed and discarded, and the residual beads were washed with 1 volume of 85% ethanol, and incubated at room temperature (RT) for 30 seconds. After incubation, ethanol was aspirated and beads were air dried at room temperature for 10 minutes. The microfuge tube was then removed from the magnetic stand and 0.25× volumes of water (Qiagen, Hilden, Germany) was added to the beads, mixed vigorously, and incubated for 2 min. at RT. The microfuge tube was spun in a microcentrifuge to collect the contents of the tube, and was then returned to the magnet, incubated until solution had cleared, and the supernatant containing the purified amplicons were dispensed into a clean microfuge tube. The purified amplicon library was quantified using the Nanodrop™ 2000 system (Thermo Scientific, Wilmington, Del.).
- The amplicon library was normalized to 4 nM concentration as calculated from optical absorbance at 260 nm (Nanodrop™ 2000 system; Thermo Scientific, Wilmington, Del.) and size of the amplicons. Library was analyzed on MiSeq Sequencer with MiSeq Reagent Kit v2, 300 cycles (Illumina, San Diego), with two 151-cycle paired-end run plus two eight-cycle index reads.
- D. Deep Sequencing Data Analysis
- The identity of products in the sequencing data was analyzed based upon the index barcode sequences adapted onto the amplicons in the second round of PCR. A computational script executing the following tasks was used to process the MiSeq data:
- Reads were aligned to the human genome (build GRCh38/38) using Bowtie (bowtie-bio.sourceforge.net/index.shtml) software.
- Aligned reads were compared to wild-type loci; reads not aligning to any part of the loci were discarded.
- Reads matching wild-type sequence were tallied. Reads with indels (surrounding 10 bp from the FokI-Cascade RNP putative cut site) were categorized by indel type and tallied.
- Total indel reads were divided by the sum of wild-type reads and indel reads to give percent-mutated reads.
-
FIG. 29 shows genome editing as a function of FokI-Cascade-NLS-Cas6 complex nucleofection (n=1). FokI-Cascade-NLS-Cas6 complexes induced editing at all eight loci. Editing ranged from ˜0.2-5% indels, and indels were centered around the predicted cut site, in the middle of the interspacer region. - This Example illustrates the design and delivery of E. coli Type I-E Cascade complexes comprising FokI fusion proteins to facilitate genome editing in human cells. This Example also describes the delivery of plasmid vectors expressing Cascade complex components into eukaryotic cells.
- A. Production of a Vector Encoding FokI-Cascade RNP Components to be Transfected into Target Cells
- A minimal CRISPR array was designed to target the TRAC locus in the human genome. The minimal CRISPR array contained two spacer sequences, both of which were flanked by CRISPR repeat sequences, as described in Examples 1 and 3. The two spacer sequences targeted loci in the genome separated by 30 bp and each spacer was complementary to a genomic sequence adjacent to an AAG PAM sequence. The plasmid vector containing the minimal CRISPR array was produced by ligating annealed oligonucleotides (Integrated DNA Technologies, Coralville, Iowa) encoding a CRISPR repeat flanked by two spacer sequences into a mammalian expression vector with two CRISPR repeat sequences. The resulting plasmid contained a “repeat-spacer-repeat-spacer-repeat” dual guide expressed from the human U6 (hU6) promoter (SEQ ID No:454).
- FokI-Cascade RNP protein component-encoding genes were cloned into plasmid vectors containing CMV promoters to enable delivery and expression in mammalian cells. Cas genes were cloned into separate plasmids (SEQ ID NO:448 through SEQ ID NO:451 and SEQ ID NO:453) or in a single plasmid as a polycistronic construct with each gene linked via 2A viral peptide “ribosome-skipping” sequences (in SEQ ID NO:455). FokI-Cascade RNP complexes were delivered into eukaryotic cells via two different methods: cas genes and the minimal CRISPR array were supplied on separate plasmids (“six plasmid”-delivery system, SEQ ID NO:448 through SEQ ID NO:451, SEQ ID NO:453 and SEQ ID NO:454), or one plasmid encoding all cas genes as a polycistronic construct and a second plasmid encoding the minimal CRISPR array (“two plasmid”-delivery system, SEQ ID NO:454 and SEQ ID NO:455).
- B. Transfection of Plasmid(s)—Encoding FokI-Cascade Complex RNPs
- Transfection conditions for the six plasmid-delivery system and two plasmid-delivery systems were performed as detailed in Example 8 with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution was transferred to individual wells of a 96-well plate. The six plasmid-delivery system was initially tested by examining the necessity of each component for genome editing. More specifically, plasmid “cocktails” were added to each well such that there was a constant amount (420 ng) of five plasmids and a variable amount of the sixth plasmid (either 0 ng, 70 ng, 700 ng, or 1,400 ng). Next, the six plasmid delivery system and the two plasmid-delivery system were compared by nucleofecting in a fixed amount (3.5 μg) of total plasmid DNA while varying the ratio of minimal CRISPR array plasmid to cas-encoding plasmid(s). Finally, lysate was harvested ˜72 hours after nucleofection for subsequent deep sequencing analysis.
- C. Deep Sequencing of gDNA from Transfected Cells and Data Analysis
- Deep sequencing was performed as detailed in Example 8, but only using target-specific primers Y and Z from Table 31.
- D. Deep Sequencing Data Analysis
- Deep sequencing data analysis was performed as detailed in Example 8.
FIG. 30 shows genome editing at the TRAC locus as a function of each FokI-Cascade component in the six plasmid-delivery strategy (n=1). As is shown, editing was abolished or dramatically reduced (in the case of Cse2) if a given component was lacking. This confirms that each Cascade component is necessary for editing via plasmid delivery. -
FIG. 31 shows data comparing genome editing with the six plasmid-delivery system or the two plasmid-delivery system. Across both methods, the highest levels of editing were achieved with the highest ratio of cas:minimal CRISPR array plasmids. Additionally, the polycistronic plasmid enabled higher levels of editing, potentially due to increased transcription perm of plasmid. - This Example illustrates in silico design, cloning, expression, and purification of a circularly-permuted (cp) E. coli Type I-E Cas7 protein using a structure-guided modelling approach.
- A. In Silico Design
- An E. coli Type I-E Cas7 protein (SEQ ID NO:18) was circularly permuted using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)). The native Cas7 N-terminus and C-terminus were connected with a two-amino acid peptide linker having the sequence glycine-serine (G-S). The polypeptide sequence of this circularized Cas7 was opened at the position corresponding to the peptide bond between
residues residue 302 of the wild-type Cas7 protein) of the cp-Cas7 V1 protein (SEQ ID NO:578). - A second cp-Cas7 protein, cp-Cas7 V2 protein, was similarly engineered using the G-S linker. The N-terminus and C-terminus of the cp-Cas7 V2 protein correspond to
residues residue 339 of the wild-type Cas7 protein) of the cp-Cas7 V2 protein (SEQ ID NO:579). - B. Cloning, Expression, and Purification of Cascade Complexes Comprising cp-Cas7
- DNA coding sequences of the in silico designed polypeptide sequences of cp-Cas7 V1 protein and cp-Cas7 V2 protein were codon optimized for expression in E. coli.
- These DNA coding sequences were provided to a commercial manufacturer (GenScript, Piscataway, N.J.) for synthesis. The DNA sequences were individually introduced into a Cascade-operon expression vector (Table 14; SEQ ID NO:441) to replace the wild-type Cas7 protein in the expression vector as described in Example 2.
- Each expression vector was transfected into E. coli BL21 Star™ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444) Table 15, as described in Example 2. Cells were cultured as described in Example 4. E. coli Type I-E Cascade complexes containing Cas5, Cas6, cp-Cas7 V1, Cse2, and Cas8 proteins, as well as guide RNA/target J3; and Cas5, Cas6, cp-Cas7 V2, Cse2, and Cas8 proteins as well as guide RNA/target J3, were purified as described in Example 5.
- Purification of the Cascade complexes comprising the circularly-permuted Cas7 variants demonstrate that circularly-permuted Type I-E CRISPR-Cas subunit proteins can be successfully used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- C. EMSA (Electrophoretic Mobility Shift Assays) of Cascade/Cp-Cas7 and J3 Target
- Purified Cascade/cp-Cas7 complexes were purified as described in this Example and subjected to an EMSA to demonstrate specific binding to their respective target sequence. Briefly, Cascade/cp-Cas7 and Cascade/WT-Cas7 were purified and concentrated to 10 mg/mL. Cy5 double-stranded target DNA was produced as described in Example 6 and diluted to 1 μM in TE buffer (J3 target SEQ ID NO:469 and SEQ ID NO:472 and CCR5 target SEQ ID NO:474 and SEQ ID NO:470). Cascade complexes and labeled double-stranded target DNA were incubated for 30 min at 37° C. at different protein/target ratios. Immediately following the incubation, 2 μl of 50% glycerol was added to the samples and they were loaded on a 5% native PAA gel. Gels were run at 4° C. at 70V for 90 min in 0.5×TBE buffer and imaged on an AZURE c600 Bioimager (Azure BioSystems, Dublin, Calif.) and the bands were quantitated. The data are presented in Table 32.
-
TABLE 32 Results of Cascade/cp-Cas7 V2 EMSA Cascade:dsDNA Cascade ID and guide Target DNA ratio Gel shift % Cascade/WT-Cas7 J3 J3 6.7 44 Cascade/cp-Cas7 V2 J3 J3 6.7 90 Cascade/WT-Cas7 J3 CCR5 6.7 LOD* Cascade/cp-Cas7 V2 J3 CCR5 6.7 LOD *LOD = below the limit of detection - A. Cascade Subunit Fusion with FokI
- This Example illustrates in silico design, cloning, expression, and purification of a E. coli Type I-E Cas8 protein fused to a FokI nuclease domain to confer nuclease activity to the Cascade complex.
- E. coli Type I-E Cas8 was fused N-terminally with a Flavobacterium okeanokoites FokI nuclease domain (GenBank no. AAA24927.1). The FokI nuclease domain comprises residues contained in the Sharkey variant described by Guo, et al. (Guo, J., et al., J. Mol. Biol. 400:96-107 (2010)), and catalyzes double-stranded DNA cleavage upon homo-dimerization. The amino acid sequence for the FokI nuclease (SEQ ID NO:580) contained residues Q384 to F579 (GenBank no. AAA24927.1) and had the following point mutations: E486Q, L4991, and D469N. Briefly, the FokI Sharkey nuclease domain (SEQ ID NO:581) was fused N-terminal to Cas8 using a linker sequence (SEQ ID NO:582). For purification purposes, a hexahistine tag (His6, SEQ ID NO:583), followed by a MBP tag (SEQ ID NO:584), followed by a TEV protease cleavage sequence (SEQ ID NO:585), a nuclear localization signal (NLS, SEQ ID NO:586), and a GGS linker were appended N-terminal to residue 384 of FokI. The final construct comprised NH3-His6-MBP-TEV-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH in the protein sequence (SEQ ID NO:413).
- In silico designed DNA sequences were provided to a commercial manufacturer (GenScript, Piscataway, N.J.) for synthesis. The DNA sequences were cloned into a pET expression (MilliporeSigma, Hayward, Calif.) family vector backbone, which confers kanamycin resistance due to the presence of the kanR gene as described in Example 2 resulting in a vector carrying NH3-His6-MBP-TEV-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH (SEQ ID NO:439).
- The E. coli Type I-E Cascade H3-His6-MBP-TEV-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH (SEQ ID NO:439) was expressed and purified as described in Example 4 and Example 5C. The protein sequence after TEV cleavage comprises NH3-NLS-GGS-FokISharkey-30aa-linker-Cas8-COOH (SEQ ID NO:587).
- Similarly, a FokI-Cas8 fusion protein was constructed in a vector that carries NLS-FokI-linker-Cas8_His6-HRV3C-Cse2_Cas7_Cas5_Cas6 as described in Examples 1 and 2 (SEQ ID NO:442). Each expression vector was transfected into E. coli BL21 Star™ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2. This construct was expressed and purified as described in Example 4B and Example 5A. Purification of the Cascade complexes comprising the fused FokI-Cas8 variants demonstrate that nuclease fused Type I-E CRISPR-Cas subunit proteins can be successfully used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins. FokI-Cas8 fusions were successfully used for biochemical cleavage of target nucleic acid (Example 7) and for in-cell cleavage of genomic sequences in eukaryotic cells (Examples 8 and 9).
- Table 33 lists further examples of Cas subunit protein-enzyme fusions. In Table 33, APOBEC corresponds to a gene that is member of the cytidine deaminase pathway (human APOBEC I Genbank no. AB009426, human APOBEC 3F Genbank no. CH471095, human APOBEC 3G Genbank no. CR456472, rat APOBEC UCSC genome browser ID RGD:2133 rat); AID corresponds to an activation-induced cytidine deaminase (Genbank no. AY536516); PmCDA1 is an AID ortholog (Nishida, et al., Science 16:353 (2016); Iwamatsu, et al., J Biochem 110:151-158 (1991)); PvuIIHIFIT46G is a PvuII high fidelity variant T46G (Fonfara, et al., Nucleic Acids Res, 40:847-860 (2012)); PvuIIsinglechainT46G is described in pdbID 3KSK); I-TevI is a site-specific, sequence-tolerant homing endonuclease from bacteriophage T4 and comprises an N-terminal catalytic domain as well as a C-terminal DNA-binding domain (the domains are connected by a long, flexible linker) (Van Roey, et al., EMBO J, 20:3631-3637 (2001)); BcnI (Sokolowska, et al., J Mol Biol 369:722-734 (2007)); and MvaI (Kaus-Drobek, et al., Nucleic Acids Res 35:2035-2046 (2007)) are restriction enzymes.
-
TABLE 33 Other Enzyme Fusions such as Nucleases and Cytidine Deaminases with Cas8 SEQ ID NO: Enzyme Fusion to Cas8 SEQ ID NO: 593 Cas8_rAPOBEC1 C terminal SEQ ID NO: 594 Cas8_AID C terminal SEQ ID NO: 595 Cas8_PmCDA1 C terminal SEQ ID NO: 596 Cas8_Human APOBEC1 C terminal SEQ ID NO: 597 Cas8_APOBEC3F C terminal SEQ ID NO: 598 Cas8_APOBEC3G C terminal SEQ ID NO: 599 PvuIIHIFIT46G N terminal SEQ ID NO: 600 PvuIIsinglechainT46G N terminal SEQ ID NO: 601 I-TevI1-169Q158R N terminal SEQ ID NO: 602 I-TevI1-169 N terminal SEQ ID NO: 603 BcnI singlechain N terminal SEQ ID NO: 604 MvaI singlechain N terminal SEQ ID NO: 605 DNaseI N terminal, C terminal SEQ ID NO: 606 Cas3 N terminal SEQ ID NO: 607 S1 Aspergillus N terminal, C terminal - B. Cascade Subunit Protein Fusion with Another Cascade Subunit Protein
- The two Cse2 proteins of the Cascade complex were fused together using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)). Briefly, the C-terminus of one Cse2 and the N-terminus of a second Cse2 were fused together using a 10-aa flexible linker (SEQ ID NO:589). The full sequence of the Cse2-Cse2 (CasB_CasB) fusion protein is shown in SEQ ID NO:588.
- In silico designed DNA sequences were provided to a commercial manufacturer (GenScript, Piscataway, N.J.) for synthesis. The DNA sequences were cloned into the expression vector designed in Example 2 (SEQ ID NO:441). The Cse2 sequence was exchanged with SEQ ID NO:588.
- Each expression vector was transfected into E. coli BL21 Star′ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2. The E. coli Type I-E Cascade complex containing Cas5, Cas6, Cas7, Cse2-Cse2, and Cas8 was expressed and purified as described in Example 4B and 5B. Purification of the Cascade complexes comprising the fused Cse2-Cse2 variant demonstrate that fused Type I-E CRISPR-Cas subunit proteins successfully formed Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- C. Electrophoretic Mobility Shift Assays (EMSA) of Cascade/Cse2-Cse2 and J3 Target
- Purified Cascade/Cse2-Cse2 complexes were purified as described in this Example and subjected to an EMSA to demonstrate specific binding to their respective target sequence. Briefly, Cascade/Cse2-Cse2 and Cascade/WT-Cse2 were purified and concentrated to 10 mg/mL. Cy5 double-stranded target DNA was produced as described in Example 6 and diluted to 1M in TE buffer (J3 target SEQ ID NO:469 and SEQ ID NO:472 and CCR5 target SEQ ID NO:474 and SEQ ID NO:470). Cascade complexes and labeled double-stranded target DNA were incubated for 30 min at 37° C. at different protein/target ratios. Immediately following the incubation, 2 μl of 50% glycerol was added to the samples and they were loaded on a 5% native PAA gel. Gels were run at 4° C. at 70V for 90 min in 0.5×TBE buffer and imaged on an AZURE c600 Bioimager (Azure BioSystems, Dublin, Calif.) and the bands were quantitated. The data are presented in Table 34.
-
TABLE 34 Results of Cascade/Cse2-Cse2 EMSA Cascade:dsDNA Cascade ID and guide Target DNA ratio Gel shift % Cascade/WT-Cse2 J3 J3 6.7 44 Cascade/Cse2-Cse2 J3 J3 6.7 46 Cascade/WT-Cse2 J3 CCR5 6.7 LOD* Cascade/Cse2-Cse2 J3 CCR5 6.7 LOD *LOD = below the limit of detection - D. Cascade Subunit Protein Fusion with Another Cascade Subunit Protein and an Enzymatic Protein Domain
- The cytidine deaminase rAPOBEC1 (apolipoprotein B mRNA editing enzyme
catalytic subunit 1, Rattus norvegicus; NCBI Gene ID: 25383, uEnsembl:ENSRNOG00000015411) was selected for fusion. The Cse2-Cse2 protein was fused with rAPOBEC1 using a structure-guided approach based on the E. coli Cascade crystal structure 5H9E.pdb (www.rcsb.org/pdb/; Hayes, R. P, et al., Nature 530(7591):499-503 (2016)). Briefly, the C-terminus of rAPOBEC1 (SEQ ID NO:590) was fused to the N-terminus of the Cse2-Cse2 dimer (described above) using a 9-aa flexible linker (SEQ ID NO:591). The full sequence of the rAPOBECI_Cse2-Cse2 fusion protein is shown in SEQ ID NO:592. - In silico designed DNA sequences were provided to a commercial manufacturer (GenScript, Piscataway, N.J.) for synthesis. The DNA sequences were cloned into the expression vector (SEQ ID NO:441), replacing the Cse2 sequence. Each expression vector was transfected into E. coli BL21 Star™ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2. The E. coli Type I-E Cascade complex containing Cas5, Cas6, Cas7, rAPOBEC1_Cse2-Cse2, and Cas8 was expressed and purified as described in Example 4B and 5B. Purification of the Cascade complexes comprising the fused rAPOBEC1_Cse2-Cse2 variant demonstrate that cytidine deaminase fusions to Type I-E CRISPR-Cas subunit proteins were successfully used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins. Table 35 presents examples of enzyme fusions with Cse2-Cse2.
-
TABLE 35 Other Enzyme Fusions Such as Cytidine Deaminases with Cse2-Cse2 SEQ ID NO: Enzyme Fusion to Cse2-Cse2 SEQ ID NO: 608 rAPOBEC1 N terminal SEQ ID NO: 609 AID C terminal SEQ ID NO: 610 CPmCDA1 C terminal SEQ ID NO: 611 Human APOBEC1 N terminal SEQ ID NO: 612 Human APOBEC3F N terminal SEQ ID NO: 613 APOBEC3G N terminal - This Example illustrates the design of a E. coli Type I-E cp-Cas7 protein fused to a VP64 activation domain to confer transcriptional activation activity to the Cascade complex.
- VP64 is a transcriptional activator comprising four tandem copies of VP16 (herpes simplex
viral protein 16, DALDDFDLDML (SEQ ID NO:614); amino acids 437-447, UNIPROT:UL48) connected with glycine-serine (GS) linkers. When fused to a protein domain that can bind near the promoter of a gene, VP64 (SEQ ID No:615) acts as a strong transcriptional activator. The E. coli Type I-E cp-Cas7 V2 (SEQ ID NO:616) can be selected for engineering. - The activation domain VP64 can be fused to the N-terminus of cpCas7 V2 (described in Example 10). A linker (e.g., 5 to 50 amino acids in length) can be selected to operably link cpCas7 V2 and the VP64 domain.
- In silico designed DNA sequences can be provided to a commercial manufacturer for synthesis. The DNA sequences encoding a VP64-cpCas7 V2 fusion protein can be cloned into an expression vector (e.g., SEQ ID NO:455, wherein VP64-cpCas7 V2 can be substituted for Cas7). Each expression vector can be transfected into E. coli BL21 Star′ cells (Thermofisher, Waltham, Mass.) with a second vector encoding a guide RNA for the J3 target (SEQ ID NO:444), as described in Example 2. The E. coli Type I-E Cascade complex containing Cas5, Cas6, VP64 cpCas7 V2, Cse2, and Cas8 can be expressed and purified as described in Examples 4 and 5. Purification of the Cascade complexes comprising the fused VP64_cpCas7 V2 variant can be used to form Cascade complexes having essentially the same composition (based on molecular weight) as Cascade complexes comprising wild-type proteins.
- Selection of a guide targeted to the promoter region of a particular gene can be used to verify the ability of the Cascade complex comprising the fused VP64 cpCas7 V2 to facilitate transcriptional activation of the gene.
- This Example describes a method of modifying a
Class 2 Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA sequence with aClass 1 Type I CRISPR repeat stem sequence (e.g., a Type I-F CRISPR repeat stem sequence) for the recruitment of one or more Cascade subunit proteins (i.e., Cas6, Cas5, etc.) fused to a functional domain, to a Type II CRISPR Cas protein/guide RNA complex binding site. This method here is adapted from Gilbert, L et. al., Cell 154(2):442-451 (2013) and Ferry, Q et. al., Nature Communication 8, 14633 doi: 10.1038/ncomms14633 (2017). - A. Modifying a Type II Guide RNA
- A Type II CRISPR sgRNA, crRNA, tracrRNA, or crRNA and tracrRNA (collectively referred to a “Type II guide RNA”) can be selected for engineering.
- A Type II guide RNA sequence can be evaluated in silico for regions of incorporation of a Type I CRISPR repeat stem sequence. The Type I CRISPR repeat stem sequence can be attached at the 5′ or 3′ end of the Type II guide RNA, internal to the Type II guide RNA, or can replace secondary structure in the Type II guide RNA (e.g., 3′ hairpin elements). Incorporation of the Type I CRISPR repeat stem sequence can be accompanied by a linker element nucleotide sequence. An example of a Type II
tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence is presented in Table 36. -
TABLE 36 Exemplary Type II tracrRNA with 3′ Type I CRISPR Repeat Stem Sequence SEQ ID NO: Sequence* SEQ ID NO: 5′-AGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCA 617 ACUUGAAAAAGUGGCACCGAGUCGGUGCUUAAGUUCAcu gccguauaggcagCUUU-3′ *Type I CRISPR repeat stem sequence is underlined and in lower case letters. A corresponding DNA coding sequence is presented as SEQ ID NO: 618. - A mammalian gene, such as C-X-C chemokine receptor type 4 (CXCR4), can be selected for targeting. The junction between the 5′ UTR and
exon 1 can be scanned in silico for a Type II CRISPR Cas protein target sequence occurring adjacent a Type II CRISPR Cas protein PAM sequence (e.g., 5′-NGG). The 20-nucleotide target sequence occurring upstream, in a 5′ direction, can be incorporated into the Type II crRNA. An example of a Type II crRNA targeting CXCR4 is shown in Table 37. -
TABLE 37 Exemplary Type II crRNA Targeting CXCR4 SEQ ID NO: Sequence* SEQ ID NO: 5′-GAACCAGCGGUUACCAUGGAGUUUUAGAGCUAUGC 619 U-3′ *A corresponding DNA coding sequence is presented as SEQ ID NO: 620. - Alternatively, the 3′ end of the CXCR4 targeting spacer (RNA) (SEQ ID NO:619) can be covalently linked to the 5′ end of the Type II tracrRNA with 3′ Type I CRISPR repeat stem sequence (RNA) (SEQ ID NO:617) with a linker. A suitable linker element is 5′-GAAA-3′.
- In silico designed Type II guide RNAs with the incorporated Type I CRISPR repeat stem sequence can be provided to a commercial manufacturer for synthesis.
- A Type I Cascade subunit protein (e.g., Cas6) can be operably linked to a transcriptional activation or repression domain (e.g., KRAB) and c-terminally tagged with a nuclear localization signal (NLS) as described in Example 12.
- A Type II Cas protein (e.g., Cas9) can be mutated such that it is catalytically inactive (e.g. dCas9) and tagged with a NLS sequence.
- The Cas6-KRAB-NLS protein and the dCas9-NLS protein can be recombinantly expressed and purified from E. coli.
- Ribonucleoprotein complexes can be formed at a concentration of 60 pmol dCas9 protein:60 pmol Cas6-KRAB-NLS:120 pmol:CXCR4 targeting crRNA:120
pmol tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence. Prior to assembly with the dCas9 and the Cas6-KRAB-NLS, each of the 120 pmol CXCR4 targeting crRNA and 120pmol tracrRNA 3′ modified with a Type I CRISPR repeat stem sequence (herein referred to as “modified Type II guide RNA”) can be diluted to the desired total concentration (120 pmol) in a final volume of 2 μL, incubated for 2 minutes at 95° C., removed from a thermocycler, and allowed to equilibrate to room temperature. dCas9 and the Cas6-KRAB-NLS protein can be diluted to an appropriate concentration in binding buffer (20 mM HEPES, 100 mM KCl, 5 mM MgCl2, and 5% glycerol at pH 7.4) to a final volume of 3 μL and mixed with the 2 μL of Type II guide RNA, followed by incubation at 37° C. for 30 minutes. A nontransfected control (e.g., buffer only), unmodified Type II guide RNA, or a Cas6 not linked to a repression domain, can be used to assemble negative control RNPs. - B. Cell Transfections Using dCas9:Cas6-KRAB-NLS: Modified Type II Guide RNA
- dCas9:Cas6-KRAB-NLS: modified Type II guide RNA nucleoprotein complexes can be transfected into HEK293 cells (ATCC, Manassas Va.), using the Nucleofector® 96-well Shuttle System (Lonza, Allendale, N.J.) and the following protocol: The complexes can be dispensed in a 5 μL final volume into individual wells of a 96-well plate. The cell culture medium can be removed from the HEK293 cell culture plate and the cells detached with TrypLE™ (Thermo Scientific, Wilmington, Del.). Suspended HEK293 cells can be pelleted by centrifugation for 3 minutes at 200×g, TrypLE reagents aspirated, and cells washed with calcium and magnesium-free phosphate buffered saline (PBS). Cells can be pelleted by centrifugation for 3 minutes at 200×g, the PBS aspirated, and the cell pellet re-suspended in 10 mL of calcium and magnesium-free PBS.
- The cells can be counted using the Countess® II Automated Cell Counter (Life Technologies; Grand Island, N.Y.). 2.2×107 cells can be transferred to a 1.5 ml microfuge tube and pelleted. The PBS can be aspirated and the cells re-suspended in Nucleofector™ SF (Lonza, Allendale, N.J.) solution to a density of 1×107 cells/m. 20 μL of the cell suspension can be then added to each individual well containing 5 μL of ribonucleoprotein complexes, and the entire volume from each well can be transferred to a well of a 96-well Nucleocuvette™ Plate (Lonza, Allendale, N.J.). The plate can be loaded onto the Nucleofector™ 96-well Shuttle™ (Lonza, Allendale, N.J.) and cells nucleofected using the 96-CM-130 Nucleofector™ program (Lonza, Allendale, N.J.). Post-nucleofection, 70 μL Dulbecco's Modified Eagle Medium (DMEM; Thermo Scientific, Wilmington, Del.), supplemented with 10% Fetal Bovine Serum (FBS; Thermo Scientific, Wilmington, Del.), penicillin and streptomycin (Life Technologies, Grand Island, N.Y.) can be added to each well, and 50 μL of the cell suspension can be transferred to a 96-well cell culture plate containing 150 μL pre-warmed DMEM complete culture medium. The plate can be transferred to a tissue culture incubator and maintained at 37° C. in 5% CO2 for 48 hours.
- 72 hours after nucleofection of the dCas9:Cas6-KRAB-NLS: modified Type II guide RNA nucleoprotein complexes, cells can be evaluated for repression of CXCR4 expression. Culture medium can be aspirated from the HEK293, and the cells can be washed once with calcium and magnesium-free PBS then are trypsinized by the addition of TrypLE (Life Technologies, Grand Island, N.Y.) followed by incubation at 37° C. for 3-5 minutes. Trypsinized cells can be gently pipetted up and down to form a single cell suspension, and the cells can then be pelleted by centrifugation for 3 minutes at 200×g. After centrifugation the culture medium can be aspirated and cells are re-suspended in a 10 mM EDTA/PBS buffer and gently mixed into a singles cell suspension. The single-cell suspension can be stained using 0.05% FITC conjugated to an anti-human CXCR4 antibodies (Medical & Biological Laboratories Co., Japan) in PBS containing 10% FBS for 1 hour at room temperature. Isotype controls and native RNP controls can be similarly stained for reference. Stained cells can then be sorted LSR II flow cytometer (BD laboratories, San Jose Calif.) and population of FITC positive fluorescent cells tallied.
- Reduction in CXCR4 expression is measure by a decrease in detected fluorescence of a dCas9:Cas6-KRAB-NLS: modified Type II guide RNA nucleofected sample compared to the measured fluorescence of a non-transfected control. Decrease in fluorescence from the flow cytometer can be used to demonstrate that a modified Type II guide RNA with a Type I CRISPR repeat stem sequence can be used in combination with a nuclease-deficient Type II Cas9 protein to recruit and localize a Type I CRISPR Cascade subunit protein fused to repression domain to a gene target and repress transcription of said gene target.
- This Example describes a method to identify and screen Type I cas genes from different species. The method presented here is adapted from Shmakov, S., et al., Molecular Cell 60(3):385-397 (2015).
- A. Identification of Type I CRISPR-Cas Genes
- Using the Basic Local Alignment Search Tool (BLAST, blast.ncbi.nlm.nih.gov/Blast.cgi), a search of the genomes of various species can be conducted to identify one or more genes coding for the various gene component of the Type I CRISPR-Cas complex. The cas1 integrase gene is a component of both
Class 1 andClass 2 CRISPR-Cas families, and upon identification of species containing the cas1 gene, subsequence searcher in these genomes can be conducted to isolate genomes comprising Type I-specific genes. Genome searches can be anchored upon the CRISPR-Cas integrase genes cas1, an exemplary cas1 sequence from the Type I-E system from E. coli K-12 MG1655 that can be used is SEQ ID. NO:621. Particular genes (e.g., cas7 and cas5) are core components of the interference complexes of the Type I systems and can be used to further differentiate species containing Type I systems. Exemplary sequences of E. coli K-12 MG1655 cas7 and cas5 genes that can be used are SEQ ID. NO:622 and SEQ ID. NO:623, respectively. Genomes identified possessing cas7 and cas5 genes can be further parsed through the identification of the Type I-specific nuclease-helicase cas3 gene or homologs thereof. An exemplary sequences of a E. coli K-12 MG1655 cas3 sequence that can be used is SEQ ID. NO:624. - Genomes containing CRISPR-Cas integrase genes cas1, Type I interference complex genes cas7 and cas5, and the nuclease-helicase cas3 gene, or some combination thereof, are likely candidates of Type I CRISPR-Cas system(s). Type I CRISPR-Cas genes are generally found in proximity to one in a single genomic locus, typically within 20 kilobases (kb). The area around the cas1, cas7, cas5, or cas3 genes can be searched for other open reading frames (ORFs) of the remaining cas genes that constitute a Type I interference complex. The amino acid sequence of putative ORFs can be compared to known Type I genes for homology or the presence of characteristic protein domains of the Type I protein components can be analyzed using the homology detection and structure prediction search tools available through the Max Planck Institute Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de/#/), or equivalent.
- B. Screening of Identified Type I Components
- Once a putative collection of Type I components (e.g., cas genes and the corresponding crRNA) have been identified, the Type I components can be tested for their ability to carry out programmable DNA targeting.
- Putative cas genes and the crRNA can be encoded into expression vectors following the guidance of Examples 1, 2, and 3. Vectors encoding the various cas genes and crRNA can be introduced into a bacteria strain and the Type I interference complex expressed and purified as described in Examples 4 and 5. The elution fraction from the size-exclusion chromatography (SEC) column, can be analyzed via SDS-PAGE gel to determine the identity, based on weight, of the protein components comprising a complete Type I interference complex. An ethidium bromide gel can also be run to detect the presence of crRNA as part of the interference complex.
- Purified Cascade complexes can be tested for their ability to support in vitro biochemical cleavage of a DNA target as described in Examples 6 and 7.
- Control expressions and purification samples, where single putative cas gene are not expressed, can be used to determine the required cas genes that constitute a complete Type I interference complex capable of programmable DNA target.
- For certain applications, identification of individual cas gene homologs (e.g., cas7) from a genomic sequence is sufficient and additional cas genes need not be identified nor screening performed.
- This Example describes a method to identify Type I crRNAs in different species. The method presented here is adapted from Chylinski, K., et al., RNA Biology 10(5):726-737 (2013).
- A search of genomes of various species can be conducted to identify Type I CRISPR-Cas genes as described in Example 17A. Genomes that comprise one of more Type I specific cas genes are candidate genomes that likely to contain CRISPR RNAs (crRNAs) encoded within the CRISPR repeat-spacer array. The sequences adjacent to the identified Type I cas genes (e.g., a cas7, cas5, or cas3 gene) can be probed for an associated CRISPR repeat-spacer array. Methods for in silico predictive screening can be used to extract the crRNA sequence from the repeat array following Grissa, I. V., et. al. Nucleic Acids Research 35(Web Server issue):W52-W57 (2007). The crRNA sequence is contained within CRISPR repeat array and can be identified by its hallmark repeating sequences interspaced by foreign spacer sequences.
- A. Preparation of RNA-Seq Library
- The putative CRISPR array containing the individual crRNA identified in silico can be further validated using RNA sequencing (RNA-seq).
- Cells from species identified as comprising putative Type I cas genes and crRNA components can be procured from a commercial repository (e.g., ATCC, Manassas, Va.; German Collection of Microorganisms and Cell Cultures GmbH (DSMZ), Braunschweig, Germany).
- Cells can be grown to mid-log phase and total RNA prepped using Trizol reagent (SigmaAldrich, St. Louis, Mo.) and treated with DNaseI (Fermentas, Vilnius, Lithuania).
- 10 μg of the total RNA can be treated with Ribo-Zero rRNA Removal Kit (Illumina, San Diego, Calif.) and the remaining RNA purified using RNA Clean and Concentrators (Zymo Research, Irvine, Calif.).
- A library can be prepared using a TRUSEQ™ Small RNA Library Preparation Kit (Illumina, San Diego, Calif.), following the manufacturer's instructions. This will result in cDNAs having adapter sequences.
- The resulting cDNA library can be sequenced using MiSeq Sequencer (Illumina, San Diego, Calif.).
- B. Processing of Sequencing Data
- Sequencing reads of the cDNA library can be processed, for example, using the following method.
- Adapter sequences can be removed using cutadapt 1.1 (pypi.python.org/pypi/cutadapt/1.1) and about 15 nucleotides trimmed from the 3′ end of the read to improve read quality.
- Reads can be aligned to the genome of the respective species (i.e., from which the putative crRNA is to be identified) using Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml). The Sequence Alignment/Map (SAM) file, which is generated by
Bowtie 2, can be converted into a Binary Alignment/Map (BAM) file using SAMTools (http://samtools.sourceforge.net/) for subsequent sequencing analysis steps. - Read coverage mapping to the CRISPR locus or loci can be calculated from the BAM file using BedTools (bedtools.readthedocs.org/en/latest/).
- The BED file, as generated in the previous step, can be loaded into Integrative Genomics Viewer (IGV; www.broadinstitute.org/igv/) to visualize the sequencing read pileup. Read pile can be used to identify the 5′ and 3′ termini of the transcribed putative crRNA sequence. The RNA-seq data can be used to validate that a putative crRNA element is actively transcribed in vivo.
- Putative crRNA can be tested with their cognate Type I cas genes for the ability to carry out programmable DNA targeting, following the guidance of Example 17. A of the present Specification.
- This Example describes the generation and testing of various modifications of Type I guide crRNAs and their suitability for use in constructing Cascade polynucleotide complexes. The method described below is adapted from Briner, A., et al., Molecular Cell 56(2):333-339 (2014).
- Modifications can be introduced into the crRNA backbone, and the modified crRNA tested with a cognate Cascade complex to facilitate the identification of regions or positions in the Type I guide crRNA backbone amenable to modification.
- A crRNA from a Type I CRISPR system (e.g., E. coli Cascade) can be selected for engineering. The crRNA sequence can be modified in silico to introduce one or more base substitutions, deletions, or insertions into nucleic acid sequences in regions selected from one or more of the following regions:
nucleic acid sequences 5′ of the spacer (5′ handle), the spacer element, Type I CRISPR repeat stem sequence, or 3′ of the Type I CRISPR repeat stem sequence (3′ handle). - Base modification can also be used to introduce mismatches in the hydrogen base-pair interactions of any of the crRNA regions, or base-pair mutation introducing an alternative hydrogen base-pair interaction through substitution of two bases, wherein the alternative hydrogen base-pair interaction differs from the original hydrogen base-pair interaction (e.g., the original hydrogen base-pair interaction is Watson-Crick base pairing and the substitution of the two bases form a reverse Hoogsteen base pairing). Substitution of bases can also be used to introduce hydrogen base-pair interaction within the crRNA backbone.
- Regions of the crRNA can be independently engineered to introduce secondary structure elements into the crRNA backbone. Such secondary structure elements include, but are not limited to, the following: stem-loop elements, stem elements, pseudo-knots, and ribozymes. Furthermore, the crRNA guide RNA backbone can be modified to delete portions of the crRNA backbone either through deletion at the 5′ end, 3′ end, or internal to the crRNA. Alternative backbone structures can also be introduced.
- In silico designed crRNA sequences can be provided to a commercial manufacturer for synthesis.
- Modified crRNAs can be evaluated for their ability to support binding by individual Cascade subunit proteins (i.e., Cas6, Cas5, etc.), or to support complete formation of the Cascade protein complex, or to support formation of the Cascade complex and modification of a double-stranded DNA target sequence through recruitment of a nuclease (e.g., Cas3). crRNA binding to individual Cascade subunit proteins and Cascade protein complex assembly can be evaluated by nano-ESI mass spectrometry in a manner similar to Jore, M., et al., Nature Structural & Molecular Biology 18:529-536 (2011). Biochemical characterization of crRNA and Cascade protein complex modification of a double-stranded DNA target sequence through recruitment of a nuclease can be carried out in a manner similar to those described in Examples 6 and 7. Modified crRNA that are capable of supporting formation of the Cascade complex and modification of a double-stranded DNA target sequence through recruitment of a nuclease can be validated for activity in cells using the method described in Example 8.
- This Example illustrates the use of Type I CRISPR proteins and Type I guide crRNAs of the present invention to modify DNA target sequences present in human genomic DNA (gDNA) and to measure the level of cleavage activity at those sites.
- Target sites (DNA target sequences) can be first selected from genomic DNA. Type I guide crRNAs can be designed to target the selected sequences. Assays (e.g., as described in Example 7) can be performed to determine the level of DNA target sequence cleavage.
- A. Selecting DNA Target Sequences from Genomic DNA
- PAM sequences (e.g., ATG) for a Cascade protein complex (e.g., E. coli Type I-E Cascade) can be identified within the selected genomic region.
- One or more Cascade DNA target sequences (e.g., 32 nucleotides in length) that are 3′ adjacent to a ATG PAM sequence can be identified.
- Criteria for selection of nucleic acid target sequences can include, but are not limited to, the following: homology to other regions in the genome; percent G-C content; melting temperature; presences of homopolymer within the spacer; distance between the two sequences; and other criteria known to one skilled in the art.
- A DNA target binding sequence that hybridizes to the Cascade DNA target sequence can be incorporated into a guide crRNAs. The nucleic acid sequence of a guide crRNA construct is typically provided to and synthesized by a commercial manufacturer.
- A guide crRNA, as described herein, can be used with cognate Type I Cascade protein complex to form crRNA/Cascade protein complexes.
- B. Determination of Cleavage Percentages and Specificity
- In vitro cleavage percentages and specificity (i.e., the amount of off-target binding) related to a guide crRNA can be determined, for example, using the cleavage assays described in Example 7, and compared as follows:
- (1) If only a single DNA target sequences is identified or selected for a guide crRNA, the cleavage percentage and specificity for each of the DNA target sequences can be determined. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, or introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- (2) If multiple DNA target sequences are identified or selected for guide crRNAs, the percentage cleavage data and site-specificity data obtained from the cleavage assays can be compared between different DNAs comprising the target binding sequence to identify the DNA target sequences having the desired cleavage percentage and specificity. Cleavage percentage data and specificity data provide criteria on which to base choices for a variety of applications. For example, in some situations the activity of the guide crRNA may be the most important factor. In other situations, the specificity of the cleavage site may be relatively more important than the cleavage percentage. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- Alternatively, or in addition to the in vitro analysis, in-cell cleavage percentages and specificities of guide crRNAs can be obtained using, for example, the method described in Example 8, and compared as follows:
- (1) If only a single DNA target sequences is identified or selected for a guide crRNA, the cleavage percentage and specificity for each of the DNA target sequences can be determined. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, or introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- (2) If multiple DNA target sequences are identified or selected for guide crRNAs, the percentage cleavage data and site-specificity data obtained from the cleavage assays can be compared between different DNAs comprising the target binding sequence to identify the DNA target sequences having the desired cleavage percentage and specificity. Cleavage percentage data and specificity data provide criteria on which to base choices for a variety of applications. For example, in some situations the activity of the guide crRNA may be the most important factor. In other situations, the specificity of the cleavage site may be relatively more important than the cleavage percentage. If so desired, cleavage percentage and/or specificity can be altered in further experiments using methods including, but not limited to, modifying the guide crRNA, introducing effector proteins/effector protein-binding sequences to modify the guide crRNA or the Cascade subunit proteins, or ligand/ligand-binding moieties to modify the guide crRNA or the Cascade subunit proteins.
- This Example illustrates the design and testing of multiple fusion proteins comprising FokI-Cas8 and linker polypeptides of various lengths, as well as the effect of varying interspacer distances for efficient genome editing.
- A. Production of a Vector Encoding E. coli Type I-E Cascade Complex Components Comprising FokI Fusion Proteins to be Transfected into Target Cells
- Minimal CRISPR arrays were designed to target a set of loci in the human genome at or near two different genes: ADAMTSL1 and PCSK9. Interspacer distances ranged from 14-60 bp, in increments of 2 bp. Four targets were designed for each interspacer distance. Targets were flanked by either AAG or ATG PAM sequences. Dual guides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned as described in Example 9 with SEQ ID NO:454. SEQ ID NO:625 through SEQ ID NO:816 provide the sequences for the full set of oligonucleotide sequences used to generate the minimal CRISPR arrays.
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker (SEQ ID NO:455 from Example 3). Additional linker polypeptide sequences of varying length and amino acid composition were designed and used to connect FokI to the Cas8 protein in these vectors. The additional linker polypeptide sequences are listed in Table 38.
-
TABLE 38 Amino Acid Linker Sequences Linker length SEQ ID NO: (amino acids) Amino acid sequence SEQ ID NO: 817 5 GGGGS SEQ ID NO: 818 8 TGPGAAAR SEQ ID NO: 819 10 GGSGSSGGSG SEQ ID NO: 820 15 GGSGSSGGSGSSGGS SEQ ID NO: 821 17 ADPTNRAKGLEAVSVAS SEQ ID NO: 822 20 SGSETPGTSESATPESGG SG SEQ ID NO: 433 30 SGSETPGTSESATPESGG SGSSGGSGSSGG SEQ ID NO: 823 40 SGSETPGTSESATPESGG SGSSGGSGSSGGSGSSGG SGSS SEQ ID NO: 824 50 SGSETPGTSESATPESGG SGSSGGSGSSGGSGSSGG SGSSGGSGSSGGSG - B. Transfection of Vectors Encoding FokI-Cascade RNP Complex Components
- Transfection conditions were essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 2.4 μg of plasmid encoding FokI-Cascade RNP complex subunit protein components and −1-2 μg of plasmid encoding the minimal CRISPR array.
- C. Deep Sequencing of gDNA from Transfected Cells
- Deep sequencing was performed essentially as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers were SEQ ID NO:825 to SEQ ID NO:1016.
- D. Deep Sequencing Data Analysis
- Deep sequencing data analysis was performed essentially as described in Example 8.
FIG. 32A andFIG. 32B present the results of the data analysis. InFIG. 32A andFIG. 32B , percent genome editing is shown as a function of FokI-Cas8 linker type and interspacer distance (n=1); grey scale vertical bar to the right is percentage of indels. An initial analysis of the data showed genome editing was highest with FokI-Cas8 linkers of 17 and 20 amino acids (SEQ ID NO:821 and SEQ ID NO:822, respectively) and with interspacer distances of −26 bp and −30-32 bp. The data was reprocessed and samples with less than a thousand sequences reads were removed as they may contain inflated editing values due to low coverage (sites were only retained if all the associated samples contained >1000 reads). This data, presented inFIG. 32A andFIG. 32B , showed that genome editing was highest with FokI-Cas8 linkers of 17 and 20 amino acids (SEQ ID NO:821 and SEQ ID NO:822, respectively) and with interspacer distances of −30-32 bp. Thus, efficient genome editing using Type I CRISPR-Cas complexes comprising FokI-Cas8 fusion proteins was achieved by varying the interspacer distance and the linker polypeptide length of the FokI-Cas8 fusion protein. The amino acid composition of the linker polypeptides is discussed herein. - This Example illustrates the design and testing of multiple homolog Cascade complexes to evaluate the efficiency of genome editing.
- A. Identification of Sites for Testing with Homolog Cascade Complexes
- A panel of sites was identified for testing additional homolog Cascade complexes. Specifically, minimal CRISPR arrays were designed to target a set of loci in the human genome with 30 bp interspacer distances and that were flanked by either AAG or ATG PAM sequences. Dual-guide polynucleotides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned following the method described in Example 9 with SEQ ID NO:454. The full set of oligonucleotide sequences used to generate the minimal CRISPR arrays are presented as SEQ ID NO:1017 to SEQ ID NO:1130 (Hsa33F, SEQ ID NO:1017, and Hsa33R, SEQ ID NO:1074, exemplify one pair). A positive control dual-guide targeting the TRAC locus was included (SEQ ID NO:454).
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker (SEQ ID NO:455 from Example 3).
- B. Transfection of Vectors Encoding FokI-Cascade RNP Complex Components
- Transfection conditions were performed essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 3 μg of plasmid encoding FokI-Cascade RNP subunit protein components and 0.3 μg of plasmid encoding the minimal CRISPR array.
- C. Deep Sequencing of gDNA from Transfected Cells
- Deep sequencing was performed essentially as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers used in this Example were SEQ ID NO:1131 to SEQ ID NO:1244.
- D. Deep Sequencing Data Analysis
- Deep sequencing data analysis was performed essentially as described in Example 8.
FIG. 33 present the results of the data analysis. InFIG. 33 , percent genome editing is plotted against 58 test sites (oligonucleotide sequences used to generate these minimal CRISPR arrays are discussed above) in addition to target Hsa07 from Example 8 (n=3). As is shown inFIG. 33 , editing ranged from ˜6% to below the limit of detection. From these data, a panel of eight sites (Hsa07 as well astargets 1, 3-5, 10, 13, and 16 corresponding to the following targets Hsa37, Hsa43, Hsa46, Hsa60, Hsa77, Hsa88, and Hsa126) with AAG PAMs were selected for testing homolog Cascade complexes for genome editing. - E. Identification of Homolog Cascade Complexes to Test with FokI Nuclease for Genome Editing
- Cas8 protein sequences from different Type I systems were used as queries for psi-BLASTp to generate phylogenetic trees for homolog selection. Specifically, Cas8 from Fusobacterium nucleatum (WP 008798978.1) was used for Type I-B, Cas8 from Bacillus halodurans (WP 010896519.1) was used for Type I-C, Cas8 from E. coli (WP_001050401.1) was used for Type I-E, Cas8 from Pseudomonas aeruginosa (WP_003139224.1) was used for Type I-F, and Cas5 from Shewanella putrefaciens (WP_011919226.1) was used for Type I-Fv2.
- Next, psi-BLASTp was iterated multiple times until thousands of homologs were identified for each Type I system. From this information, phylogenetic trees were built using the interactive Tree of Life online software (iTOL, accessible at itol.embl.de/login.cgi). The trees were visually inspected after auto-collapsing clades using variable branch lengths.
- Lists of organisms falling within major clades were then outputted and manually inspected for selection. In this step, priority was placed on selecting homologs that sampled from different regions of the phylogenetic tree, both for the 12 homologs within the Type I-E as well as 2-3 representative homologs for Types I-B, I-C, I-F, and I-Fv2. cas8 and cas5 candidates, based on the above phylogenetic analysis, were inputted into NCBI, and the genomic context within the endogenous host bacterium was visually inspected within NCBI's genome graphics browser. Cascade homologs were selected only if (1) they were found in organisms that grow at 37° C.; (2) their cas gene operons were intact and had all the expected Cascade subunit protein encoding genes, a cas3 gene, and intact acquisition genes (i.e., cas and cast); (3) their cas gene operon was flanked by one or more CRISPR arrays; and (4) their CRISPR arrays contained >10 spacers. For some homologs, the CRISPRfinder program (crispr.i2bc.paris-saclay.fr/Server/) was used to identify putative PAM sequences. Based on the above criteria, the 22 homolog Cascade complexes shown in Table 39 were selected.
-
TABLE 39 Homolog Cascade Complexes SEQ Spacer ID NO: Cascade homolog organism PAM length Type SEQ ID Oceanicola sp. HL-35 AAG 32 I-E NO: 1245 SEQ ID Pseudomonas sp. S-6-2 AAG 32 I-E NO: 1246 SEQ ID Salmonella enterica subsp. AAG 32 I-E NO: 1247 enterica serovar Muenster strain SEQ ID Atlantibacter hermannii NBRC AAG 32 I-E NO: 1248 105704 SEQ ID Geothermobacter sp. EPR- M AAG 32 I-E NO: 1249 SEQ ID Methylocaldum sp. 14B AAG 32 I-E NO: 1250 SEQ ID Methanocella arvoryzae MRE50 AAG 32 I-E NO: 1251 SEQ ID Pseudomonas aeruginosa DHS01 AAG 32 I-E NO: 1252 SEQ ID Lachnospiraceae bacterium KH1T2 GAA 35 I-E NO: 1253 SEQ ID Klebsiella pneumoniae strain GAA 33 I-E NO: 1254 VRCO0172 SEQ ID Streptococcus thermophilus strain GAA 33 I-E NO: 1255 ND07 SEQ ID Streptomyces sp. S4 GAA 33 I-E NO: 1256 SEQ ID Campylobacter fetus subsp. TCA 36 I-B NO: 1257 testudinum Sp3 SEQ ID Odoribacter splanchnicus DSM TCA 36 I-B NO: 1258 20712 SEQ ID Bacillus halodurans C-125 TTC 34 I-C NO: 1259 SEQ ID Desulfovibrio vulgaris RCH1 TTC 34 I-C NO: 1260 plasmid pDEVAL01 SEQ ID Geobacillus thermocatenulatus TTC 35 I-C NO: 1261 strain KCTC 3921SEQ ID Vibrio cholerae strain L15 CC 32 I-F NO: 1262 L15_contig8 SEQ ID Pseudomonas aeruginosa UCBPP- CC 32 I-F NO: 1263 PA14 SEQ ID Shewanella putrefaciens CN-32 CC 32 I-Fv2 NO: 1264 SEQ ID Acinetobacter sp. 869535 CC 32 I-Fv2 NO: 1265 SEQ ID Vibrio cholerae HE48 CC 32 I-Fv2 NO: 1266 vcoHE48. contig. 11 - F. Production of Vectors Encoding FokI-Cascade RNP Components from 22 Distinct Species for Transfection into Target Cells
- Sequences for each cas gene from each homolog were synthesized as part of a polycistronic construct that included a fusion protein comprising FokI nuclease and Cas8. For each Type I-E Cascade complex homolog, a set of ˜7-8 guides targeting loci with the appropriate PAM sequences were generated. For each Type I-B, I-C, I-F, and I-Fv2 Cascade homolog, a set of ˜2-7 guides targeting loci with appropriate PAM sequences were generated. Each Cascade complex homolog system required unique repeat sequences to process their cognate guide (SEQ ID NO:1267 to SEQ ID NO:1288). Dual guides containing “repeat-spacer-repeat-spacer-repeat” sequences were cloned using the method described in Example 9 for SEQ ID NO:454. Oligonucleotides were phosphorylated on the 5′ end and appended with overhang sequences to enable cloning into plasmid vectors with the appropriate repeat sequences. The full set of oligonucleotide sequences used to generate the minimal CRISPR arrays for the 22 Cascade complex homologs are presented as (SEQ ID NO:1289 to SEQ ID NO:1400).
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells; cas genes linked via 2A viral peptide “ribosome-skipping” sequences; a fusion protein comprising FokI and Cas8 connected with a 30-aa linker.
- G. Transfection of Plasmids Encoding FokI-Cascade Complex RNPs
- Transfection conditions were essentially as described in Example 8 with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 1.5 μg of plasmid encoding FokI-Cascade RNP subunit protein components and ˜0.5-1.5 μg of plasmid encoding the minimal CRISPR array. Experiments were performed in triplicate and included FokI-Cascade RNP complexes from E. coli (SEQ ID NO:455) targeted to eight sites (Hsa07 from Example 8 and Hsa37, Hsa43, Hsa46, Hsa60, Hsa77, Hsa88, Hsa126 from section D of this Example) as positive controls. As previously described, the following oligonucleotides were used to generate the minimal CRISPR arrays used with the E. coli positive control: Hsa37 (SEQ ID NO:1019; SEQ ID NO:1076), Hsa43 (SEQ ID NO:1024; SEQ ID NO:1081), Hsa46 (SEQ ID NO:1027; SEQ ID NO:1084), Hsa60 (SEQ ID NO:1037; SEQ ID NO:1094), Hsa77 (SEQ ID NO:1045; SEQ ID NO:1102), Hsa88 (SEQ ID NO:1050; SEQ ID NO:1107), Hsa126(SEQ ID NO:1072; SEQ ID NO:1129).
- H. Deep Sequencing of gDNA from Transfected Cells
- Deep sequencing was performed essentially as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers used in this Example were SEQ ID NO:1401 to SEQ ID NO:1512. For both Type I-E RNP complexes and Type I-B, I-C, I-F, and I-Fv2 RNP complexes, control samples comprising E. coli Type I-E Cascade were included for comparison and sequenced with target-specific primers corresponding to targets Hsa07 from Example 8 and Hsa37, Hsa43, Hsa46, Hsa60, Hsa77, Hsa88, Hsa126 from this Example. More specifically, the following target-specific amplification primers were used for these targets: Hsa37 (SEQ ID NO:1133; SEQ ID NO:1190), Hsa43 (SEQ ID NO:1138; SEQ ID NO:1195), Hsa46 (SEQ ID NO:1141; SEQ ID NO:1198), Hsa60 (SEQ ID NO:1151; SEQ ID NO:1208), Hsa77 (SEQ ID NO:1159; SEQ ID NO:1216), Hsa88 (SEQ ID NO:1164; SEQ ID NO:1221), Hsa126(SEQ ID NO:1186; SEQ ID NO:1243).
- I. Deep Sequencing Data Analysis
- Deep sequencing data analysis was performed essentially as described in Example 8.
FIG. 34A andFIG. 34B show results from these experiments. Editing was observed with many of the Type I-E FokI-Cascade homologs (FIG. 34A ). The highest editing was observed with the variant from Pseudomonas sp. S-6-2, while other homologs (i.e., Salmonella enterica, Geothermobacter sp. EPR-M, Methanocella arvoryzae MRE50, and S. thermophilus (strain ND07)) showed editing approximately equivalent to E. coli. Editing with FokI-Cascade RNPs derived from Types I-B, I-C, I-F, and I-Fv2 was not observed and therefore may be below the limit of detection (FIG. 34B ). - This Example illustrates the design and testing of multiple fusion proteins comprising FokI-Cas8 and linker polypeptides of various lengths, as well as the effect of varying interspacer distances for efficient genome editing with Pseudomonas sp S-6-2 Type I-E CRISPR-Cas systems.
- A. Production of a Vector Encoding FokI-Cascade RNP Components to be Transfected into Target Cells
- Minimal CRISPR arrays were designed to target a set of loci in the human genome. Interspacer distances ranged from 23-34 bp, in increments of 1 bp. Eight targets were designed for each of the interspacer distances, and targets were flanked by AAG PAM sequences. Dual guides were generated with PCR-based assembly using three oligonucleotides (SEQ ID NO:1513 to SEQ ID NO:1515) and a unique primer encoding a “repeat-spacer-repeat-spacer-repeat” sequence to enable FokI-Cascade targeting. The full set of unique oligonucleotide sequences to generate the minimal CRISPR arrays were SEQ ID NO:1516 to SEQ ID NO:1704. PCR-assembled guides were purified and concentrated using SPRIselect® beads (Beckman Coulter, Pasadena, Calif.) essentially according to the manufacturer's instructions.
- FokI-Cascade RNP subunit protein component-encoding genes were cloned into vectors comprising: CMV promoters to enable delivery and expression in mammalian cells, cas genes linked via 2A “ribosome-skipping” sequences, and FokI attached to Cas8 with a 30-aa linker (SEQ ID NO:1748). Additional linker polypeptide sequences of varying length were designed and used to connect FokI to the Cas8 protein to form fusion proteins. The linker polypeptide sequences are listed in Table 40.
-
TABLE 40 Amino Acid Linker Sequences Linker length (amino acids) Amino acid sequence SEQ ID NO: 17 ADPTNRAKGLEAVSVAS SEQ ID NO: 821 20 SGSETPGTSESATPESGGSG SEQ ID NO: 822 - B. Transfection of Vectors Encoding FokI-Cascade RNP Complex Components
- Transfection conditions were performed essentially as described in Example 8 except for with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution was transferred to individual wells of a 96-well plate. Each well contained 5 μg of plasmid encoding FokI-Cascade RNP protein components and ˜0.1-0.5 μg of linear PCR product encoding the minimal CRISPR array.
- C. Deep Sequencing of gDNA from Transfected Cells
- Deep sequencing was performed essentially as described in Example 8. Instead of primers Y and Z from Table 31 of Example 8, the target-specific primers were SEQ ID NO:1705 to SEQ ID NO:1803.
- D. Deep Sequencing Data Analysis
- Deep sequencing data analysis was performed essentially as described in Example 8.
FIG. 35 shows genome editing at 95 sites (n=1). Editing ranged from ˜50% (FIG. 35 shows the mean+/−1 standard deviation) to below the limit of detection, and was related to the interspacer distance and linker polypeptide length. The amino acid composition of the linker polypeptides is discussed herein. Interspacer distances of ˜30-33 bp and linker polypeptide lengths of 17 and 20 amino acids provided very efficient editing. - This Example illustrates the use of Cas3-FokI and FokI-Cascade to induce dimerization of FokI to generate a double-strand break at a locus in the human genome (see e.g.,
FIG. 17A .,FIG. 17B , andFIG. 17C ). More specifically, this Example details the design and testing of multiple Cas3-FokI linker compositions and lengths and FokI-Cas8 linker compositions and lengths for affecting genome editing efficiency. - A. Production of a Vectors Encoding FokI-Cas3 and FokI-Cascade RNP Components to be Transfected into Target Cells
- Minimal CRISPR arrays are designed to target three distinct sites flanked by AAG PAMs in the human genome. Sites are selected that were previously shown to support interspacer editing with E. coli FokI-Cascade dimers directed by dual-guides and are therefore known to be permissive for FokI-Cascade binding (e.g., Hsa37, Hsa43, and Hsa46).
- The FokI-Cascade systems described in the Examples above used two FokI Cascade complexes (see e.g.,
FIG. 16A ,FIG. 16B , andFIG. 16C ); accordingly, dual-guides comprising a first guide sequence specifying a first nucleic acid target site and a second guide sequence specifying a second nucleic acid target site can be used. Because the Cas3-FokI-FokI-Cascade system only requires a single PAM, a guide comprising “repeat-spacer-repeat” should be sufficient to facilitate binding of the functional Cascade complex to a nucleic acid target site. A dual-guide containing “repeat-spacer-repeat-spacer-repeats” can also be used but, typically in this embodiment, the two spacer sequences direct binding of the Cascade complex to the same nucleic acid target sequence; that is, the two spacers can have the same sequence. The guides are cloned essentially as described in Example 9 with SEQ ID No:454. The following annealed oligonucleotides are used for generation of the minimal CRISPR arrays: Hsa37 (SEQ ID NO:1019; SEQ ID NO:1076), Hsa43 (SEQ ID NO:1024; SEQ ID NO:1081), and Hsa46 (SEQ ID NO:1027; SEQ ID NO:1084). - As described in Example 9, FokI-Cascade RNP protein component-encoding genes are cloned into plasmid vectors containing CMV promoters to enable delivery and expression in mammalian cells. cas genes are linked via 2A “ribosome-skipping” sequences. Furthermore, FokI is fused to Cas8 with a 30-aa linker (SEQ ID NO:455 from Example 3). Additional linkers sequences of varying length and composition are designed and used to connect FokI to the Cas8 protein. Example of such sequences are listed in Table 41.
- Cas3 protein from E. coli is fused with FokI on the C-terminus using a 30-aa linker. This fusion is further modified with an NLS sequence on the N-terminus (SEQ ID NO:1806). Additional linkers sequences of varying length and composition are designed and used to connect FokI to the Cas3 protein (Table 41 and SEQ ID NO:1804 to SEQ ID NO:1807).
- Additional Cas3-FokI fusion constructs are generated wherein the helicase or nuclease activity of the Cas3 protein is inactivated (SEQ ID NO:1808 to SEQ ID NO:1815). Helicase and nuclease activities are impaired by making D452A and D75A modifications, respectively, of the Cas3 protein (Mulepati, S., et al., J. Biol. Chem. 288(31):22184-22192 (2013)).
-
TABLE 41 Amino Acid Linker Sequences Linker length (amino acids) Amino acid sequence SEQ ID NO: 5 GGGGS SEQ ID NO: 817 10 GGSGSSGGSG SEQ ID NO: 819 17 ADPTNRAKGLEAVSVAS SEQ ID NO: 821 20 SGSETPGTSESATPESGGSG SEQ ID NO: 822 40 SGSETPGTSESATPESGGSG SEQ ID NO: 823 SSGGSGSSGGSGSSGGSGSS - B. Transfection of Plasmids Encoding FokI-Cascade Complex RNPs
- Transfection conditions are performed as described in Example 8 with the following modifications. Prior to nucleofection, 5 μl of plasmid vector solution are transferred to individual wells of a 96-well plate. Each well comprises the following three components: 3 μg of a plasmid encoding a set of FokI-Cascade RNP protein components, 3 μg of a plasmid encoding a Cas3-FokI, and 0.5 μg of a plasmid encoding a minimal CRISPR array. The 96-well plate is set up as a matrix to provide all combinations of the three components.
- C. Deep Sequencing of gDNA from Transfected Cells
- Deep sequencing is performed as described in Example 8 with the following modifications. Instead of primers Y and Z from Table 4 of Example 8, the target-specific primers used in this Example are as follows: SEQ ID NO:1133 and SEQ ID NO:1190 (Hsa37 target site), SEQ ID NO:1138 and SEQ ID NO:1195 (Hsa43 target site), and SEQ ID NO:1141 and SEQ ID NO:1198 (Hsa46 target site).
- D. Deep Sequencing Data Analysis
- Deep sequencing data analysis is performed as described in Example 8 with the exception that indels ˜1 bp to −25 bp upstream of the FokI-Cascade binding site PAM sequence are tallied. In this manner, the combinations of FokI-Cas8 linker sequences, Cas3-FokI linker sequences, and Cas3 variants that support the most efficient editing can be determined.
- As is apparent to one of skill in the art, various modification and variations of the above embodiments can be made without departing from the spirit and scope of this invention. Such modifications and variations are within the scope of this invention.
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/824,603 US10781432B1 (en) | 2018-06-13 | 2020-03-19 | Engineered cascade components and cascade complexes |
US17/027,257 US11555181B2 (en) | 2018-06-13 | 2020-09-21 | Engineered cascade components and cascade complexes |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862684735P | 2018-06-13 | 2018-06-13 | |
US16/104,875 US10227576B1 (en) | 2018-06-13 | 2018-08-17 | Engineered cascade components and cascade complexes |
US16/262,773 US10329547B1 (en) | 2018-06-13 | 2019-01-30 | Engineered cascade components and cascade complexes |
US16/420,061 US10457922B1 (en) | 2018-06-13 | 2019-05-22 | Engineered cascade components and cascade complexes |
US16/665,316 US10597648B2 (en) | 2018-06-13 | 2019-10-28 | Engineered cascade components and cascade complexes |
US16/824,603 US10781432B1 (en) | 2018-06-13 | 2020-03-19 | Engineered cascade components and cascade complexes |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/665,316 Continuation US10597648B2 (en) | 2018-06-13 | 2019-10-28 | Engineered cascade components and cascade complexes |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/027,257 Continuation US11555181B2 (en) | 2018-06-13 | 2020-09-21 | Engineered cascade components and cascade complexes |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200277588A1 true US20200277588A1 (en) | 2020-09-03 |
US10781432B1 US10781432B1 (en) | 2020-09-22 |
Family
ID=65633019
Family Applications (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/104,875 Active US10227576B1 (en) | 2018-06-13 | 2018-08-17 | Engineered cascade components and cascade complexes |
US16/262,773 Active US10329547B1 (en) | 2018-06-13 | 2019-01-30 | Engineered cascade components and cascade complexes |
US16/420,061 Active US10457922B1 (en) | 2018-06-13 | 2019-05-22 | Engineered cascade components and cascade complexes |
US16/665,316 Active US10597648B2 (en) | 2018-06-13 | 2019-10-28 | Engineered cascade components and cascade complexes |
US16/824,603 Active US10781432B1 (en) | 2018-06-13 | 2020-03-19 | Engineered cascade components and cascade complexes |
US17/027,257 Active 2039-07-31 US11555181B2 (en) | 2018-06-13 | 2020-09-21 | Engineered cascade components and cascade complexes |
Family Applications Before (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/104,875 Active US10227576B1 (en) | 2018-06-13 | 2018-08-17 | Engineered cascade components and cascade complexes |
US16/262,773 Active US10329547B1 (en) | 2018-06-13 | 2019-01-30 | Engineered cascade components and cascade complexes |
US16/420,061 Active US10457922B1 (en) | 2018-06-13 | 2019-05-22 | Engineered cascade components and cascade complexes |
US16/665,316 Active US10597648B2 (en) | 2018-06-13 | 2019-10-28 | Engineered cascade components and cascade complexes |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/027,257 Active 2039-07-31 US11555181B2 (en) | 2018-06-13 | 2020-09-21 | Engineered cascade components and cascade complexes |
Country Status (1)
Country | Link |
---|---|
US (6) | US10227576B1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10894812B1 (en) | 2020-09-30 | 2021-01-19 | Alpine Roads, Inc. | Recombinant milk proteins |
US10947552B1 (en) | 2020-09-30 | 2021-03-16 | Alpine Roads, Inc. | Recombinant fusion proteins for producing milk proteins in plants |
US11840717B2 (en) | 2020-09-30 | 2023-12-12 | Nobell Foods, Inc. | Host cells comprising a recombinant casein protein and a recombinant kinase protein |
US12077798B2 (en) | 2023-12-13 | 2024-09-03 | Nobell Foods, Inc. | Food compositions comprising recombinant milk proteins |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201122458D0 (en) * | 2011-12-30 | 2012-02-08 | Univ Wageningen | Modified cascade ribonucleoproteins and uses thereof |
US11293021B1 (en) | 2016-06-23 | 2022-04-05 | Inscripta, Inc. | Automated cell processing methods, modules, instruments, and systems |
US9982279B1 (en) | 2017-06-23 | 2018-05-29 | Inscripta, Inc. | Nucleic acid-guided nucleases |
US10011849B1 (en) | 2017-06-23 | 2018-07-03 | Inscripta, Inc. | Nucleic acid-guided nucleases |
WO2019006436A1 (en) | 2017-06-30 | 2019-01-03 | Inscripta, Inc. | Automated cell processing methods, modules, instruments, and systems |
US10858761B2 (en) | 2018-04-24 | 2020-12-08 | Inscripta, Inc. | Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells |
US10526598B2 (en) | 2018-04-24 | 2020-01-07 | Inscripta, Inc. | Methods for identifying T-cell receptor antigens |
US20210102183A1 (en) * | 2018-06-13 | 2021-04-08 | Caribou Biosciences, Inc. | Engineered cascade components and cascade complexes |
EP4070802A1 (en) | 2018-06-30 | 2022-10-12 | Inscripta, Inc. | Instruments, modules, and methods for improved detection of edited sequences in live cells |
US11142740B2 (en) | 2018-08-14 | 2021-10-12 | Inscripta, Inc. | Detection of nuclease edited sequences in automated modules and instruments |
US11214781B2 (en) | 2018-10-22 | 2022-01-04 | Inscripta, Inc. | Engineered enzyme |
CA3117228A1 (en) | 2018-12-14 | 2020-06-18 | Pioneer Hi-Bred International, Inc. | Novel crispr-cas systems for genome editing |
US20220290120A1 (en) | 2019-02-25 | 2022-09-15 | Novome Biotechnologies, Inc. | Plasmids for gene editing |
US11001831B2 (en) | 2019-03-25 | 2021-05-11 | Inscripta, Inc. | Simultaneous multiplex genome editing in yeast |
EP3947691A4 (en) | 2019-03-25 | 2022-12-14 | Inscripta, Inc. | Simultaneous multiplex genome editing in yeast |
CN111855990B (en) * | 2019-04-29 | 2023-06-27 | 华南师范大学 | CRISPR/Cas system-based universal colorimetric nucleic acid detection method, kit and application |
US10837021B1 (en) | 2019-06-06 | 2020-11-17 | Inscripta, Inc. | Curing for recursive nucleic acid-guided cell editing |
WO2020257715A1 (en) * | 2019-06-21 | 2020-12-24 | The Regents Of The University Of California | Crispr-cas3 for making genomic deletions and inducing recombination |
CN112354571B (en) * | 2019-07-11 | 2022-02-11 | 北京理工大学 | Multidimensional microfluidic electrophoresis chip, detection device and detection method |
WO2021092254A1 (en) * | 2019-11-06 | 2021-05-14 | Locus Biosciences, Inc. | Phage compositions comprising crispr-cas systems and methods of use thereof |
WO2021102059A1 (en) | 2019-11-19 | 2021-05-27 | Inscripta, Inc. | Methods for increasing observed editing in bacteria |
CN114829607A (en) | 2019-12-18 | 2022-07-29 | 因思科瑞普特公司 | Cascade/dCas3 complementation assay for in vivo detection of nucleic acid guided nuclease edited cells |
WO2021154706A1 (en) | 2020-01-27 | 2021-08-05 | Inscripta, Inc. | Electroporation modules and instrumentation |
AU2021239868A1 (en) * | 2020-03-16 | 2022-10-06 | Duke University | Methods and compositions for improved type I-E CRISPR based gene silencing |
US20210332388A1 (en) | 2020-04-24 | 2021-10-28 | Inscripta, Inc. | Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells |
US11787841B2 (en) | 2020-05-19 | 2023-10-17 | Inscripta, Inc. | Rationally-designed mutations to the thrA gene for enhanced lysine production in E. coli |
WO2022060749A1 (en) | 2020-09-15 | 2022-03-24 | Inscripta, Inc. | Crispr editing to embed nucleic acid landing pads into genomes of live cells |
US11512297B2 (en) | 2020-11-09 | 2022-11-29 | Inscripta, Inc. | Affinity tag for recombination protein recruitment |
CA3204158A1 (en) | 2021-01-04 | 2022-07-07 | Juhan Kim | Mad nucleases |
EP4274890A1 (en) | 2021-01-07 | 2023-11-15 | Inscripta, Inc. | Mad nucleases |
US11884924B2 (en) | 2021-02-16 | 2024-01-30 | Inscripta, Inc. | Dual strand nucleic acid-guided nickase editing |
WO2024044329A1 (en) * | 2022-08-24 | 2024-02-29 | The Regents Of The University Of Michigan | Crispr base editor |
DE102022004733A1 (en) * | 2022-12-15 | 2024-06-20 | Forschungszentrum Jülich GmbH | Genetically modified microorganism and its use for the production of D-chiro-inositol |
Family Cites Families (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100076057A1 (en) | 2008-09-23 | 2010-03-25 | Northwestern University | TARGET DNA INTERFERENCE WITH crRNA |
JP5926242B2 (en) | 2010-05-10 | 2016-05-25 | ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア | Endoribonuclease composition and method of use thereof |
GB201122458D0 (en) | 2011-12-30 | 2012-02-08 | Univ Wageningen | Modified cascade ribonucleoproteins and uses thereof |
US9688971B2 (en) | 2012-06-15 | 2017-06-27 | The Regents Of The University Of California | Endoribonuclease and methods of use thereof |
CA2879997A1 (en) | 2012-07-25 | 2014-01-30 | The Broad Institute, Inc. | Inducible dna binding proteins and genome perturbation tools and applications thereof |
WO2014022702A2 (en) | 2012-08-03 | 2014-02-06 | The Regents Of The University Of California | Methods and compositions for controlling gene expression by rna processing |
WO2014093479A1 (en) | 2012-12-11 | 2014-06-19 | Montana State University | Crispr (clustered regularly interspaced short palindromic repeats) rna-guided control of gene regulation |
EP2940140B1 (en) | 2012-12-12 | 2019-03-27 | The Broad Institute, Inc. | Engineering of systems, methods and optimized guide compositions for sequence manipulation |
DK2931898T3 (en) | 2012-12-12 | 2016-06-20 | Massachusetts Inst Technology | CONSTRUCTION AND OPTIMIZATION OF SYSTEMS, PROCEDURES AND COMPOSITIONS FOR SEQUENCE MANIPULATION WITH FUNCTIONAL DOMAINS |
AU2013359212B2 (en) | 2012-12-12 | 2017-01-19 | Massachusetts Institute Of Technology | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
NZ712727A (en) | 2013-03-14 | 2017-05-26 | Caribou Biosciences Inc | Compositions and methods of nucleic acid-targeting nucleic acids |
WO2014144592A2 (en) | 2013-03-15 | 2014-09-18 | The General Hospital Corporation | Using truncated guide rnas (tru-grnas) to increase specificity for rna-guided genome editing |
US20150044772A1 (en) | 2013-08-09 | 2015-02-12 | Sage Labs, Inc. | Crispr/cas system-based novel fusion protein and its applications in genome editing |
WO2015070062A1 (en) | 2013-11-07 | 2015-05-14 | Massachusetts Institute Of Technology | Cell-based genomic recorded accumulative memory |
CA2932436A1 (en) | 2013-12-12 | 2015-06-18 | The Broad Institute, Inc. | Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders |
CA2944978C (en) | 2014-04-08 | 2024-02-13 | North Carolina State University | Methods and compositions for rna-directed repression of transcription using crispr-associated genes |
US10662426B2 (en) | 2014-06-11 | 2020-05-26 | Duke University | Compositions and methods for rapid and dynamic flux control using synthetic metabolic valves |
US20150376587A1 (en) | 2014-06-25 | 2015-12-31 | Caribou Biosciences, Inc. | RNA Modification to Engineer Cas9 Activity |
KR20170032406A (en) | 2014-07-15 | 2017-03-22 | 주노 쎄러퓨티크스 인코퍼레이티드 | Engineered cells for adoptive cell therapy |
US20160053304A1 (en) | 2014-07-18 | 2016-02-25 | Whitehead Institute For Biomedical Research | Methods Of Depleting Target Sequences Using CRISPR |
US20160053272A1 (en) | 2014-07-18 | 2016-02-25 | Whitehead Institute For Biomedical Research | Methods Of Modifying A Sequence Using CRISPR |
WO2016054225A1 (en) | 2014-09-30 | 2016-04-07 | Stc.Unm | Plasmid delivery in the treatment of cancer and other disease states |
AU2015346514B2 (en) | 2014-11-11 | 2021-04-08 | Illumina, Inc. | Polynucleotide amplification using CRISPR-Cas systems |
JP6860483B2 (en) | 2014-11-26 | 2021-04-14 | テクノロジー イノベーション モメンタム ファンド(イスラエル)リミテッド パートナーシップTechnology Innovation Momentum Fund(israel)Limited Partnership | Bacterial gene targeting reduction |
WO2016108926A1 (en) | 2014-12-30 | 2016-07-07 | The Broad Institute Inc. | Crispr mediated in vivo modeling and genetic screening of tumor growth and metastasis |
MX2017009506A (en) | 2015-01-28 | 2017-11-02 | Pioneer Hi Bred Int | Crispr hybrid dna/rna polynucleotides and methods of use. |
EP3292219B9 (en) | 2015-05-04 | 2022-05-18 | Ramot at Tel-Aviv University Ltd. | Methods and kits for fragmenting dna |
US20180148711A1 (en) * | 2015-05-28 | 2018-05-31 | Coda Biotherapeutics, Inc. | Genome editing vectors |
CN108026536A (en) | 2015-05-29 | 2018-05-11 | 北卡罗来纳州立大学 | Use the method for CRISPR nucleic acid screening bacterium, archeobacteria, algae and yeast |
JP6949728B2 (en) | 2015-05-29 | 2021-10-13 | ジュノー セラピューティクス インコーポレイテッド | Compositions and Methods for Modulating Inhibitory Interactions in Genetically Engineered Cells |
JP7051438B2 (en) | 2015-06-15 | 2022-04-11 | ノース カロライナ ステート ユニバーシティ | Methods and Compositions for Efficient Delivery of Nucleic Acid and RNA-Based Antibacterial Agents |
WO2016205745A2 (en) | 2015-06-18 | 2016-12-22 | The Broad Institute Inc. | Cell sorting |
US9790490B2 (en) | 2015-06-18 | 2017-10-17 | The Broad Institute Inc. | CRISPR enzymes and systems |
EP3331905B1 (en) | 2015-08-06 | 2022-10-05 | Dana-Farber Cancer Institute, Inc. | Targeted protein degradation to attenuate adoptive t-cell therapy associated adverse inflammatory responses |
EP3353298B1 (en) | 2015-09-21 | 2023-09-13 | Arcturus Therapeutics, Inc. | Allele selective gene editing and uses thereof |
WO2017059341A1 (en) * | 2015-10-02 | 2017-04-06 | Monsanto Technology Llc | Recombinant maize b chromosome sequence and uses thereof |
EP3362571A4 (en) | 2015-10-13 | 2019-07-10 | Duke University | Genome engineering with type i crispr systems in eukaryotic cells |
WO2017074943A1 (en) * | 2015-10-27 | 2017-05-04 | The Board Of Trustees Of The Leland Stanford Junior University | Methods of inducibly targeting chromatin effectors and compositions for use in the same |
US10946042B2 (en) | 2015-12-01 | 2021-03-16 | The Trustees Of The University Of Pennsylvania | Compositions and methods for selective phagocytosis of human cancer cells |
JP7015239B2 (en) | 2016-01-11 | 2022-03-04 | ザ ボード オブ トラスティーズ オブ ザ レランド スタンフォード ジュニア ユニバーシティー | How to regulate chimeric protein and gene expression |
US10286073B2 (en) | 2016-02-23 | 2019-05-14 | Ilisa Tech, Inc. | Magnetic control of gene delivery in vivo |
US10815535B2 (en) | 2016-03-28 | 2020-10-27 | The Charles Stark Draper Laboratory, Inc. | Bacteriophage engineering methods |
US20190323038A1 (en) | 2016-06-17 | 2019-10-24 | Montana State Univesity | Bidirectional targeting for genome editing |
US9982267B2 (en) | 2016-10-12 | 2018-05-29 | Feldan Bio Inc. | Rationally-designed synthetic peptide shuttle agents for delivering polypeptide cargos from an extracellular space to the cytosol and/or nucleus of a target eukaryotic cell, uses thereof, methods and kits relating to same |
US10913952B2 (en) | 2016-10-26 | 2021-02-09 | Salk Institute For Biological Studies | Environmental stress response transcriptional regulatory network |
CN110268049B (en) * | 2016-11-22 | 2024-06-14 | 新加坡国立大学 | CD7 expression blockers and chimeric antigen receptors for T cell malignancy immunotherapy |
WO2018148412A1 (en) | 2017-02-09 | 2018-08-16 | The Charles Stark Draper Laboratory Inc. | Recombinant k1-5 bacteriophages and uses thereof |
EP3586255A4 (en) * | 2017-02-22 | 2021-03-31 | Twist Bioscience Corporation | Nucleic acid based data storage |
-
2018
- 2018-08-17 US US16/104,875 patent/US10227576B1/en active Active
-
2019
- 2019-01-30 US US16/262,773 patent/US10329547B1/en active Active
- 2019-05-22 US US16/420,061 patent/US10457922B1/en active Active
- 2019-10-28 US US16/665,316 patent/US10597648B2/en active Active
-
2020
- 2020-03-19 US US16/824,603 patent/US10781432B1/en active Active
- 2020-09-21 US US17/027,257 patent/US11555181B2/en active Active
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10894812B1 (en) | 2020-09-30 | 2021-01-19 | Alpine Roads, Inc. | Recombinant milk proteins |
US10947552B1 (en) | 2020-09-30 | 2021-03-16 | Alpine Roads, Inc. | Recombinant fusion proteins for producing milk proteins in plants |
US10988521B1 (en) | 2020-09-30 | 2021-04-27 | Alpine Roads, Inc. | Recombinant milk proteins |
US11034743B1 (en) | 2020-09-30 | 2021-06-15 | Alpine Roads, Inc. | Recombinant milk proteins |
US11072797B1 (en) | 2020-09-30 | 2021-07-27 | Alpine Roads, Inc. | Recombinant fusion proteins for producing milk proteins in plants |
US11142555B1 (en) | 2020-09-30 | 2021-10-12 | Nobell Foods, Inc. | Recombinant milk proteins |
US11401526B2 (en) | 2020-09-30 | 2022-08-02 | Nobell Foods, Inc. | Recombinant fusion proteins for producing milk proteins in plants |
US11685928B2 (en) | 2020-09-30 | 2023-06-27 | Nobell Foods, Inc. | Recombinant fusion proteins for producing milk proteins in plants |
US11840717B2 (en) | 2020-09-30 | 2023-12-12 | Nobell Foods, Inc. | Host cells comprising a recombinant casein protein and a recombinant kinase protein |
US11952606B2 (en) | 2020-09-30 | 2024-04-09 | Nobell Foods, Inc. | Food compositions comprising recombinant milk proteins |
US12077798B2 (en) | 2023-12-13 | 2024-09-03 | Nobell Foods, Inc. | Food compositions comprising recombinant milk proteins |
Also Published As
Publication number | Publication date |
---|---|
US10227576B1 (en) | 2019-03-12 |
US11555181B2 (en) | 2023-01-17 |
US10329547B1 (en) | 2019-06-25 |
US20210002622A1 (en) | 2021-01-07 |
US10781432B1 (en) | 2020-09-22 |
US10597648B2 (en) | 2020-03-24 |
US20200048622A1 (en) | 2020-02-13 |
US10457922B1 (en) | 2019-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11555181B2 (en) | Engineered cascade components and cascade complexes | |
US11001843B2 (en) | Engineered nucleic acid-targeting nucleic acids | |
US10501728B2 (en) | Engineered nucleic-acid targeting nucleic acids | |
AU2019284926B2 (en) | Engineered cascade components and cascade complexes | |
US20210102183A1 (en) | Engineered cascade components and cascade complexes | |
JP2022518329A (en) | CRISPR-Cas12j Enzymes and Systems | |
US20240287453A1 (en) | Persistent allogeneic modified immune cells and methods of use thereof | |
TW202208626A (en) | Rna-guided nucleases and active fragments and variants thereof and methods of use | |
WO2024156084A1 (en) | Variants of cpf1 (cas12a) with improved activity | |
NZ768877A (en) | Engineered cascade components and cascade complexes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: CARIBOU BIOSCIENCES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAMERON, PETER SEAN;KLOMPE, SANNE EVELINE;STERNBERG, SAMUEL HENRY;SIGNING DATES FROM 20180810 TO 20180813;REEL/FRAME:052193/0756 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |