NZ727952B2 - Nuclease-mediated dna assembly - Google Patents
Nuclease-mediated dna assembly Download PDFInfo
- Publication number
- NZ727952B2 NZ727952B2 NZ727952A NZ72795215A NZ727952B2 NZ 727952 B2 NZ727952 B2 NZ 727952B2 NZ 727952 A NZ727952 A NZ 727952A NZ 72795215 A NZ72795215 A NZ 72795215A NZ 727952 B2 NZ727952 B2 NZ 727952B2
- Authority
- NZ
- New Zealand
- Prior art keywords
- nucleic acid
- complementary
- sequence
- digested
- joiner oligo
- Prior art date
Links
- 229920003013 deoxyribonucleic acid Polymers 0.000 title claims abstract description 232
- 101700080605 NUC1 Proteins 0.000 title claims abstract description 182
- 101700006494 nucA Proteins 0.000 title claims abstract description 182
- 230000001404 mediated Effects 0.000 title description 2
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 544
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 515
- 230000000295 complement Effects 0.000 claims abstract description 233
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 134
- 102000004169 proteins and genes Human genes 0.000 claims description 131
- 108090000623 proteins and genes Proteins 0.000 claims description 131
- 229920002391 Guide RNA Polymers 0.000 claims description 124
- 108020005004 Guide RNA Proteins 0.000 claims description 121
- 108010082319 CRISPR-Associated Protein 9 Proteins 0.000 claims description 109
- 239000002253 acid Substances 0.000 claims description 77
- 210000004436 Chromosomes, Artificial, Bacterial Anatomy 0.000 claims description 71
- 238000006243 chemical reaction Methods 0.000 claims description 67
- 239000002773 nucleotide Substances 0.000 claims description 62
- 101700083023 EXRN Proteins 0.000 claims description 61
- 101700008821 EXO Proteins 0.000 claims description 60
- 125000003729 nucleotide group Chemical group 0.000 claims description 59
- 229920001850 Nucleic acid sequence Polymers 0.000 claims description 57
- 229920005681 CRISPR RNA Polymers 0.000 claims description 42
- 241000282414 Homo sapiens Species 0.000 claims description 36
- 229920000272 Oligonucleotide Polymers 0.000 claims description 35
- 238000000137 annealing Methods 0.000 claims description 25
- 229920000160 (ribonucleotides)n+m Polymers 0.000 claims description 22
- 102000004190 Enzymes Human genes 0.000 claims description 17
- 108090000790 Enzymes Proteins 0.000 claims description 17
- 150000007513 acids Chemical class 0.000 claims description 16
- 108091007521 restriction endonucleases Proteins 0.000 claims description 15
- 230000035897 transcription Effects 0.000 claims description 14
- 241000283984 Rodentia Species 0.000 claims description 13
- HCHKCACWOHOZIP-UHFFFAOYSA-N zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 claims description 12
- 229910052725 zinc Inorganic materials 0.000 claims description 12
- 239000011701 zinc Substances 0.000 claims description 12
- 102000003960 Ligases Human genes 0.000 claims description 7
- 108090000364 Ligases Proteins 0.000 claims description 7
- 238000000338 in vitro Methods 0.000 claims description 7
- 235000007575 Calluna vulgaris Nutrition 0.000 claims description 5
- 108020004682 Single-Stranded DNA Proteins 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 206010059866 Drug resistance Diseases 0.000 claims description 2
- 241000353097 Molva molva Species 0.000 claims 1
- 235000018102 proteins Nutrition 0.000 description 125
- 229920000033 CRISPR Polymers 0.000 description 89
- 210000004027 cells Anatomy 0.000 description 83
- 239000000203 mixture Substances 0.000 description 54
- 238000003776 cleavage reaction Methods 0.000 description 52
- 229920000023 polynucleotide Polymers 0.000 description 51
- 239000002157 polynucleotide Substances 0.000 description 51
- UIIMBOGNXHQVGW-UHFFFAOYSA-M buffer Substances [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 34
- 230000000694 effects Effects 0.000 description 32
- 238000009396 hybridization Methods 0.000 description 20
- 239000011780 sodium chloride Substances 0.000 description 15
- 230000004568 DNA-binding Effects 0.000 description 14
- 108010042407 Endonucleases Proteins 0.000 description 14
- 102000004533 Endonucleases Human genes 0.000 description 14
- 150000002500 ions Chemical class 0.000 description 14
- 235000001014 amino acid Nutrition 0.000 description 13
- 230000035772 mutation Effects 0.000 description 13
- 239000011541 reaction mixture Substances 0.000 description 13
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 13
- 101700011961 DPOM Proteins 0.000 description 12
- 101710029649 MDV043 Proteins 0.000 description 12
- 101700061424 POLB Proteins 0.000 description 12
- 101700054624 RF1 Proteins 0.000 description 12
- 230000001939 inductive effect Effects 0.000 description 12
- FAPWRFPIFSIZLT-UHFFFAOYSA-M sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 12
- 102000018120 Recombinases Human genes 0.000 description 11
- 108010091086 Recombinases Proteins 0.000 description 11
- 150000001413 amino acids Chemical class 0.000 description 11
- KCXVZYZYPLLWCC-UHFFFAOYSA-N edta Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 11
- 239000003550 marker Substances 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 238000006011 modification reaction Methods 0.000 description 11
- 230000001105 regulatory Effects 0.000 description 11
- 239000000758 substrate Substances 0.000 description 11
- 239000002202 Polyethylene glycol Substances 0.000 description 10
- 230000029087 digestion Effects 0.000 description 10
- 238000003780 insertion Methods 0.000 description 10
- 229920001223 polyethylene glycol Polymers 0.000 description 10
- 239000000047 product Substances 0.000 description 10
- 229920002594 Polyethylene Glycol 8000 Polymers 0.000 description 9
- 241000193996 Streptococcus pyogenes Species 0.000 description 9
- 230000000875 corresponding Effects 0.000 description 9
- 101710030587 ligN Proteins 0.000 description 9
- 230000036678 protein binding Effects 0.000 description 9
- -1 trachNAs Proteins 0.000 description 9
- 238000010453 CRISPR/Cas method Methods 0.000 description 8
- 108020004705 Codon Proteins 0.000 description 8
- BAWFJGJZGIEFAR-NNYOXOHSSA-N Nicotinamide adenine dinucleotide Chemical compound NC(=O)C1=CC=C[N+]([C@H]2[C@@H]([C@H](O)[C@@H](COP([O-])(=O)OP(O)(=O)OC[C@@H]3[C@H]([C@@H](O)[C@@H](O3)N3C4=NC=NC(N)=C4N=C3)O)O2)O)=C1 BAWFJGJZGIEFAR-NNYOXOHSSA-N 0.000 description 8
- 230000027455 binding Effects 0.000 description 8
- HEDRZPFGACZZDS-UHFFFAOYSA-N chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 8
- 101700077585 ligd Proteins 0.000 description 8
- 229950006238 nadide Drugs 0.000 description 8
- 229920001184 polypeptide Polymers 0.000 description 8
- 239000000243 solution Substances 0.000 description 8
- 238000006467 substitution reaction Methods 0.000 description 8
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L MgCl2 Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 7
- 108009000261 Non-homologous end joining Proteins 0.000 description 7
- 238000004166 bioassay Methods 0.000 description 7
- ISGUIIHZEJGUGQ-UHFFFAOYSA-N heptacosaethylene glycol monomethyl ether Chemical compound COCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO ISGUIIHZEJGUGQ-UHFFFAOYSA-N 0.000 description 7
- 150000003839 salts Chemical class 0.000 description 7
- 229920002676 Complementary DNA Polymers 0.000 description 6
- 241000700159 Rattus Species 0.000 description 6
- 235000004279 alanine Nutrition 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000004140 cleaning Methods 0.000 description 6
- 230000002708 enhancing Effects 0.000 description 6
- 108020001507 fusion proteins Proteins 0.000 description 6
- 102000037240 fusion proteins Human genes 0.000 description 6
- 239000000499 gel Substances 0.000 description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- 238000005304 joining Methods 0.000 description 6
- WCUXLLCKKVVCTQ-UHFFFAOYSA-M potassium chloride Chemical compound [Cl-].[K+] WCUXLLCKKVVCTQ-UHFFFAOYSA-M 0.000 description 6
- 230000002829 reduced Effects 0.000 description 6
- 229920001405 Coding region Polymers 0.000 description 5
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 5
- 238000010367 cloning Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 230000002068 genetic Effects 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- 238000000034 method Methods 0.000 description 5
- 102220174584 rs2228570 Human genes 0.000 description 5
- 108010068698 spleen exonuclease Proteins 0.000 description 5
- 210000001519 tissues Anatomy 0.000 description 5
- 238000009966 trimming Methods 0.000 description 5
- 102200028136 CSN1S1 D10A Human genes 0.000 description 4
- 240000002804 Calluna vulgaris Species 0.000 description 4
- 210000004507 Chromosomes, Artificial Anatomy 0.000 description 4
- CZMRCDWAGMRECN-UGDNZRGBSA-N D-sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 4
- 101710008209 FBLIM1 Proteins 0.000 description 4
- 229920001917 Ficoll Polymers 0.000 description 4
- JKMHFZQWWAIEOD-UHFFFAOYSA-N HEPES Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 4
- 239000007995 HEPES buffer Substances 0.000 description 4
- PHTQWCKDNZKARW-UHFFFAOYSA-N Isoamyl alcohol Chemical compound CC(C)CCO PHTQWCKDNZKARW-UHFFFAOYSA-N 0.000 description 4
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 4
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 4
- 108010010677 Phosphodiesterase I Proteins 0.000 description 4
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 4
- GSEJCLTVZPLZKY-UHFFFAOYSA-N Tris Chemical compound OCCN(CCO)CCO GSEJCLTVZPLZKY-UHFFFAOYSA-N 0.000 description 4
- 239000007983 Tris buffer Substances 0.000 description 4
- 125000000539 amino acid group Chemical group 0.000 description 4
- 230000001580 bacterial Effects 0.000 description 4
- 108091006031 fluorescent proteins Proteins 0.000 description 4
- 102000034387 fluorescent proteins Human genes 0.000 description 4
- ZHNUHDYFZUAESO-UHFFFAOYSA-N formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 4
- 238000005755 formation reaction Methods 0.000 description 4
- 238000002844 melting Methods 0.000 description 4
- 238000010369 molecular cloning Methods 0.000 description 4
- 239000000178 monomer Substances 0.000 description 4
- ISWSIDIOOBJBQZ-UHFFFAOYSA-N phenol Chemical compound OC1=CC=CC=C1 ISWSIDIOOBJBQZ-UHFFFAOYSA-N 0.000 description 4
- 229920000642 polymer Polymers 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 241000699800 Cricetinae Species 0.000 description 3
- RGWHQCVHVJXOKC-SHYZEUOFSA-N Deoxycytidine triphosphate Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](CO[P@](O)(=O)O[P@](O)(=O)OP(O)(O)=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-N 0.000 description 3
- HAAZLUGHYHWQIW-KVQBGUIXSA-N Deoxyguanosine triphosphate Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 3
- 108010067770 Endopeptidase K Proteins 0.000 description 3
- 241000701867 Enterobacteria phage T7 Species 0.000 description 3
- 108010062347 HLA-DQ Antigens Proteins 0.000 description 3
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 3
- 210000003470 Mitochondria Anatomy 0.000 description 3
- 210000004940 Nucleus Anatomy 0.000 description 3
- 230000004570 RNA-binding Effects 0.000 description 3
- 125000003275 alpha amino acid group Chemical group 0.000 description 3
- SUYVUBYJARFZHO-RRKCRQDMSA-J dATP(4-) Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-J 0.000 description 3
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 3
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 3
- 108010026638 endodeoxyribonuclease FokI Proteins 0.000 description 3
- 210000003527 eukaryotic cell Anatomy 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000002744 homologous recombination Methods 0.000 description 3
- 239000012160 loading buffer Substances 0.000 description 3
- 229910001629 magnesium chloride Inorganic materials 0.000 description 3
- 210000004962 mammalian cells Anatomy 0.000 description 3
- 229910052594 sapphire Inorganic materials 0.000 description 3
- 239000010980 sapphire Substances 0.000 description 3
- XPVFAGULZMMNRQ-SZURENNPSA-M sodium;(7S,9S)-7-[(2R,4S,5S,6S)-4-amino-5-hydroxy-6-methyloxan-2-yl]oxy-6,9,11-trihydroxy-9-(2-hydroxyacetyl)-4-methoxy-8,10-dihydro-7H-tetracene-5,12-dione;N,3-bis(2-chloroethyl)-2-oxo-1,3,2$l^{5}-oxazaphosphinan-2-amine;(5Z)-5-(dimethylaminohydrazinylid Chemical compound [Na+].[O-]S(=O)(=O)CCS.CN(C)N\N=C1/N=CN=C1C(N)=O.ClCCNP1(=O)OCCCN1CCCl.O([C@H]1C[C@@](O)(CC=2C(O)=C3C(=O)C=4C=CC=C(C=4C(=O)C3=C(O)C=21)OC)C(=O)CO)[C@H]1C[C@H](N)[C@H](O)[C@H](C)O1 XPVFAGULZMMNRQ-SZURENNPSA-M 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000002194 synthesizing Effects 0.000 description 3
- 108091005946 yellow fluorescent protein Proteins 0.000 description 3
- 229940009098 Aspartate Drugs 0.000 description 2
- 210000004671 Cell-Free System Anatomy 0.000 description 2
- 108091005937 Cerulean Proteins 0.000 description 2
- 229920002168 Chimeric RNA Polymers 0.000 description 2
- 210000003763 Chloroplasts Anatomy 0.000 description 2
- 241000579895 Chlorostilbon Species 0.000 description 2
- 210000000349 Chromosomes Anatomy 0.000 description 2
- 108091005936 CyPet Proteins 0.000 description 2
- CKLJMWTZIZZHCS-UHFFFAOYSA-N DL-aspartic acid Chemical compound OC(=O)C(N)CC(O)=O CKLJMWTZIZZHCS-UHFFFAOYSA-N 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 229920002307 Dextran Polymers 0.000 description 2
- 241000701988 Escherichia virus T5 Species 0.000 description 2
- ZMMJGEGLRURXTF-UHFFFAOYSA-N Ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 2
- 108060002716 Exonucleases Proteins 0.000 description 2
- 102000013165 Exonucleases Human genes 0.000 description 2
- 206010064571 Gene mutation Diseases 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N Guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 208000009889 Herpes Simplex Diseases 0.000 description 2
- 229920002521 Macromolecule Polymers 0.000 description 2
- 108020004999 Messenger RNA Proteins 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 2
- 238000002944 PCR assay Methods 0.000 description 2
- SCVFZCLFOSHCOH-UHFFFAOYSA-M Potassium acetate Chemical compound [K+].CC([O-])=O SCVFZCLFOSHCOH-UHFFFAOYSA-M 0.000 description 2
- 101710043164 Segment-4 Proteins 0.000 description 2
- 241000700584 Simplexvirus Species 0.000 description 2
- 241000194020 Streptococcus thermophilus Species 0.000 description 2
- 241000187191 Streptomyces viridochromogenes Species 0.000 description 2
- 229940094937 Thioredoxin Drugs 0.000 description 2
- 229940035295 Ting Drugs 0.000 description 2
- 239000007984 Tris EDTA buffer Substances 0.000 description 2
- 101700038759 VP1 Proteins 0.000 description 2
- 241000545067 Venus Species 0.000 description 2
- 239000011543 agarose gel Substances 0.000 description 2
- 238000000246 agarose gel electrophoresis Methods 0.000 description 2
- 125000003295 alanine group Chemical group N[C@@H](C)C(=O)* 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000002738 chelating agent Substances 0.000 description 2
- 102000021408 chitin binding proteins Human genes 0.000 description 2
- 108091010307 chitin binding proteins Proteins 0.000 description 2
- 230000021615 conjugation Effects 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 230000001186 cumulative Effects 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000010790 dilution Methods 0.000 description 2
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 2
- 239000010976 emerald Substances 0.000 description 2
- 229910052876 emerald Inorganic materials 0.000 description 2
- 108010050663 endodeoxyribonuclease CreI Proteins 0.000 description 2
- 108091005938 enhanced green fluorescent protein Proteins 0.000 description 2
- 230000002255 enzymatic Effects 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- 238000011049 filling Methods 0.000 description 2
- 108010021843 fluorescent protein 583 Proteins 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 101700005460 hemA Proteins 0.000 description 2
- 239000000185 hemagglutinin Substances 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 2
- 230000001965 increased Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000670 limiting Effects 0.000 description 2
- 238000011068 load Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 229920002106 messenger RNA Polymers 0.000 description 2
- 125000000371 nucleobase group Chemical group 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000036961 partial Effects 0.000 description 2
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- 239000011535 reaction buffer Substances 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000009958 sewing Methods 0.000 description 2
- 238000010186 staining Methods 0.000 description 2
- 239000012536 storage buffer Substances 0.000 description 2
- 230000004960 subcellular localization Effects 0.000 description 2
- 238000010381 tandem affinity purification Methods 0.000 description 2
- 102000002933 thioredoxin family Human genes 0.000 description 2
- 108060008226 thioredoxin family Proteins 0.000 description 2
- 230000002103 transcriptional Effects 0.000 description 2
- 108091006091 transcriptional repressors Proteins 0.000 description 2
- DPFYBZWSVVKNPZ-AQWIXGDGSA-N (3S,4S,6R)-2-[[(2R,4R,5R)-3,5-dihydroxy-4-methoxy-6-(methoxymethyl)oxan-2-yl]methoxymethyl]-6-ethyloxane-3,4,5-triol Chemical compound O[C@H]1[C@@H](O)C(O)[C@@H](CC)OC1COC[C@@H]1C(O)[C@H](OC)[C@H](O)C(COC)O1 DPFYBZWSVVKNPZ-AQWIXGDGSA-N 0.000 description 1
- SBASXUCJHJRPEV-UHFFFAOYSA-N 2-(2-Methoxyethoxy)ethanol Chemical compound COCCOCCO SBASXUCJHJRPEV-UHFFFAOYSA-N 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N 289-95-2 Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 101710017486 ACCB-1 Proteins 0.000 description 1
- 101710036216 ATEG_03556 Proteins 0.000 description 1
- 241000007910 Acaryochloris marina Species 0.000 description 1
- 241001135192 Acetohalobium arabaticum Species 0.000 description 1
- 241001464929 Acidithiobacillus caldus Species 0.000 description 1
- 241000605222 Acidithiobacillus ferrooxidans Species 0.000 description 1
- ZKHQWZAMYRWXGA-KQYNXXCUSA-N Adenosine triphosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-N 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 241000190857 Allochromatium vinosum Species 0.000 description 1
- 241000147155 Ammonifex degensii Species 0.000 description 1
- 108010063905 Ampligase Proteins 0.000 description 1
- 241000620196 Arthrospira maxima Species 0.000 description 1
- 241001495183 Arthrospira sp. Species 0.000 description 1
- 108091005943 Azurite Proteins 0.000 description 1
- 241000823281 Burkholderiales bacterium Species 0.000 description 1
- NLZUEZXRPGMBCV-UHFFFAOYSA-N Butylhydroxytoluene Chemical compound CC1=CC(C(C)(C)C)=C(O)C(C(C)(C)C)=C1 NLZUEZXRPGMBCV-UHFFFAOYSA-N 0.000 description 1
- 102200034180 C5AR1 G12A Human genes 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- 101700003485 CSF2 Proteins 0.000 description 1
- 101700003315 CSF3 Proteins 0.000 description 1
- 108060001965 CSN2 Proteins 0.000 description 1
- 102000000584 Calmodulin Human genes 0.000 description 1
- 108010041952 Calmodulin Proteins 0.000 description 1
- 241001496650 Candidatus Desulforudis Species 0.000 description 1
- 102000019679 Cell-Penetrating Peptides Human genes 0.000 description 1
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 1
- 210000000688 Chromosomes, Artificial, Human Anatomy 0.000 description 1
- 108091005949 Citrine Proteins 0.000 description 1
- 241000193403 Clostridium Species 0.000 description 1
- 241000907165 Coleofasciculus chthonoplastes Species 0.000 description 1
- 241000065716 Crocosphaera watsonii Species 0.000 description 1
- 108010082025 Cyan Fluorescent Protein Proteins 0.000 description 1
- 241000159506 Cyanothece Species 0.000 description 1
- 210000000805 Cytoplasm Anatomy 0.000 description 1
- 101710008158 D15 Proteins 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- GRRNUXAQVGOGFE-UHFFFAOYSA-N Destomysin Chemical compound OC1C(NC)CC(N)C(O)C1OC1C2OC3(C(C(O)C(O)C(C(N)CO)O3)O)OC2C(O)C(CO)O1 GRRNUXAQVGOGFE-UHFFFAOYSA-N 0.000 description 1
- 229940119743 Dextran 70 Drugs 0.000 description 1
- MTHSVFCYNBDYFN-UHFFFAOYSA-N Diethylene glycol Chemical compound OCCOCCO MTHSVFCYNBDYFN-UHFFFAOYSA-N 0.000 description 1
- 108020004461 Double-Stranded RNA Proteins 0.000 description 1
- 229960003722 Doxycycline Drugs 0.000 description 1
- XQTWDDCIUJNLTR-CVHRZJFOSA-N Doxycycline Chemical compound O.O=C1C2=C(O)C=CC=C2[C@H](C)[C@@H]2C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@@H]1[C@H]2O XQTWDDCIUJNLTR-CVHRZJFOSA-N 0.000 description 1
- 108010027570 EC 2.4.2.22 Proteins 0.000 description 1
- 108010091358 EC 2.4.2.8 Proteins 0.000 description 1
- 102000018251 EC 2.4.2.8 Human genes 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 241000326311 Exiguobacterium sibiricum Species 0.000 description 1
- 108010009832 Exodeoxyribonucleases Proteins 0.000 description 1
- 102000009788 Exodeoxyribonucleases Human genes 0.000 description 1
- 241000192016 Finegoldia magna Species 0.000 description 1
- KOSRFJWDECSPRO-WDSKDSINSA-N Glu-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(O)=O KOSRFJWDECSPRO-WDSKDSINSA-N 0.000 description 1
- 108010058597 HLA-DR Antigens Proteins 0.000 description 1
- 102000006354 HLA-DR Antigens Human genes 0.000 description 1
- 241000700721 Hepatitis B virus Species 0.000 description 1
- MAJYPBAJPNUFPV-UHFFFAOYSA-N Histidinyl-Cysteine Chemical compound SCC(C(O)=O)NC(=O)C(N)CC1=CN=CN1 MAJYPBAJPNUFPV-UHFFFAOYSA-N 0.000 description 1
- 102000003893 Histone Acetyltransferases Human genes 0.000 description 1
- 108090000246 Histone Acetyltransferases Proteins 0.000 description 1
- 102000008157 Histone Demethylases Human genes 0.000 description 1
- 108010074870 Histone Demethylases Proteins 0.000 description 1
- 241000713772 Human immunodeficiency virus 1 Species 0.000 description 1
- 229940097277 Hygromycin B Drugs 0.000 description 1
- 241000282619 Hylobates lar Species 0.000 description 1
- 101710036255 ITEVIIIR Proteins 0.000 description 1
- 101710036252 ITEVIIR Proteins 0.000 description 1
- 101710036251 ITEVIR Proteins 0.000 description 1
- 229920002459 Intron Polymers 0.000 description 1
- 108010025815 Kanamycin Kinase Proteins 0.000 description 1
- 241001430080 Ktedonobacter racemifer Species 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- 241000186673 Lactobacillus delbrueckii Species 0.000 description 1
- 241000186869 Lactobacillus salivarius Species 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- 108060001084 Luciferase family Proteins 0.000 description 1
- 241000501784 Marinobacter sp. Species 0.000 description 1
- 241000204637 Methanohalobium evestigatum Species 0.000 description 1
- 108020004388 MicroRNAs Proteins 0.000 description 1
- 241000190928 Microscilla marina Species 0.000 description 1
- 101710041493 NEDD9 Proteins 0.000 description 1
- 241000167285 Natranaerobius thermophilus Species 0.000 description 1
- 241001515112 Nitrosococcus watsonii Species 0.000 description 1
- 241000203619 Nocardiopsis dassonvillei Species 0.000 description 1
- 241001223105 Nodularia spumigena Species 0.000 description 1
- 241000192673 Nostoc sp. Species 0.000 description 1
- 101710003000 ORF1/ORF2 Proteins 0.000 description 1
- 210000003463 Organelles Anatomy 0.000 description 1
- 241000192520 Oscillatoria sp. Species 0.000 description 1
- 101700074659 PCCA Proteins 0.000 description 1
- 239000008118 PEG 6000 Substances 0.000 description 1
- 229920000890 Palindromic sequence Polymers 0.000 description 1
- 241001425545 Pelotomaculum Species 0.000 description 1
- 241000983938 Petrotoga mobilis Species 0.000 description 1
- 101700030467 Pol Proteins 0.000 description 1
- 241001599925 Polaromonas naphthalenivorans Species 0.000 description 1
- 241001472610 Polaromonas sp. Species 0.000 description 1
- 229920000795 Polyadenylation Polymers 0.000 description 1
- 229920000604 Polyethylene Glycol 200 Polymers 0.000 description 1
- 229920001030 Polyethylene Glycol 4000 Polymers 0.000 description 1
- 229920002584 Polyethylene Glycol 6000 Polymers 0.000 description 1
- 241000590028 Pseudoalteromonas haloplanktis Species 0.000 description 1
- KDCGOANMDULRCW-UHFFFAOYSA-N Purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 1
- 108020004412 RNA 3' Polyadenylation Signals Proteins 0.000 description 1
- 108010078067 RNA Polymerase III Proteins 0.000 description 1
- 102000014450 RNA Polymerase III Human genes 0.000 description 1
- 102100020572 RUBCNL Human genes 0.000 description 1
- 101710044922 RUBCNL Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 229920001914 Ribonucleotide Polymers 0.000 description 1
- 108020004422 Riboswitch Proteins 0.000 description 1
- 238000002105 Southern blotting Methods 0.000 description 1
- 229940076156 Streptococcus pyogenes Drugs 0.000 description 1
- 241000194022 Streptococcus sp. Species 0.000 description 1
- 241001518258 Streptomyces pristinaespiralis Species 0.000 description 1
- 241000203590 Streptosporangium Species 0.000 description 1
- 241000203587 Streptosporangium roseum Species 0.000 description 1
- 241000192560 Synechococcus sp. Species 0.000 description 1
- 238000010459 TALEN Methods 0.000 description 1
- NKANXQFJJICGDU-QPLCGJKRSA-N Tamoxifen Chemical compound C=1C=CC=CC=1C(/CC)=C(C=1C=CC(OCCN(C)C)=CC=1)/C1=CC=CC=C1 NKANXQFJJICGDU-QPLCGJKRSA-N 0.000 description 1
- 229960001603 Tamoxifen Drugs 0.000 description 1
- 241000206213 Thermosipho africanus Species 0.000 description 1
- 229920000401 Three prime untranslated region Polymers 0.000 description 1
- 108020004440 Thymidine Kinase Proteins 0.000 description 1
- 102000006601 Thymidine Kinase Human genes 0.000 description 1
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 1
- 241000078013 Trichormus variabilis Species 0.000 description 1
- 108010064978 Type II Site-Specific Deoxyribonucleases Proteins 0.000 description 1
- 108010067022 Type III Site-Specific Deoxyribonucleases Proteins 0.000 description 1
- 108090000848 Ubiquitin Proteins 0.000 description 1
- 102400000757 Ubiquitin Human genes 0.000 description 1
- 101710017715 ZNF816 Proteins 0.000 description 1
- 102100001628 ZNF816 Human genes 0.000 description 1
- 101700070836 ZNFP Proteins 0.000 description 1
- 241001673106 [Bacillus] selenitireducens Species 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K [O-]P([O-])([O-])=O Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 101710017560 accA1 Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 101700005055 ani-1 Proteins 0.000 description 1
- 108010003152 bacteriophage T7 RNA polymerase Proteins 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 108091005941 blue fluorescent protein Proteins 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000003197 catalytic Effects 0.000 description 1
- 230000024881 catalytic activity Effects 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- YTRQFSDWAXHJCC-UHFFFAOYSA-N chloroform;phenol Chemical compound ClC(Cl)Cl.OC1=CC=CC=C1 YTRQFSDWAXHJCC-UHFFFAOYSA-N 0.000 description 1
- 239000011035 citrine Substances 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 229920000407 conserved sequence Polymers 0.000 description 1
- 108060001960 csm3 Proteins 0.000 description 1
- 230000003247 decreasing Effects 0.000 description 1
- 230000004059 degradation Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000006471 dimerization reaction Methods 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 101700012341 exoA Proteins 0.000 description 1
- 101700080064 exoD Proteins 0.000 description 1
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 1
- LYCAIKOWRPUZTN-UHFFFAOYSA-N glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 1
- 239000005090 green fluorescent protein Substances 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 108091022076 maltose binding proteins Proteins 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 102000016397 methyltransferase family Human genes 0.000 description 1
- 108060004795 methyltransferase family Proteins 0.000 description 1
- 229920001239 microRNA Polymers 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 230000000813 microbial Effects 0.000 description 1
- 230000025608 mitochondrion localization Effects 0.000 description 1
- 108091005593 modified peptides Proteins 0.000 description 1
- 235000013919 monopotassium glutamate Nutrition 0.000 description 1
- 238000003541 multi-stage reaction Methods 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 239000003415 peat Substances 0.000 description 1
- 230000000149 penetrating Effects 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 238000007747 plating Methods 0.000 description 1
- 229920000724 poly(L-arginine) polymer Polymers 0.000 description 1
- 108010011110 polyarginine Proteins 0.000 description 1
- 235000011056 potassium acetate Nutrition 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 108010045647 puromycin N-acetyltransferase Proteins 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 108010054624 red fluorescent protein Proteins 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained Effects 0.000 description 1
- 101710004466 rgy Proteins 0.000 description 1
- 101710030364 rgy1 Proteins 0.000 description 1
- 101710030359 rgy2 Proteins 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 102220235118 rs1131691530 Human genes 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 239000000741 silica gel Substances 0.000 description 1
- 229910002027 silica gel Inorganic materials 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 238000002741 site-directed mutagenesis Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 102000015609 tat Gene Products Human genes 0.000 description 1
- 108010038756 tat Gene Products Proteins 0.000 description 1
- GWBUNZLLLLDXMD-UHFFFAOYSA-H tricopper;dicarbonate;dihydroxide Chemical compound [OH-].[OH-].[Cu+2].[Cu+2].[Cu+2].[O-]C([O-])=O.[O-]C([O-])=O GWBUNZLLLLDXMD-UHFFFAOYSA-H 0.000 description 1
- 230000017613 viral reproduction Effects 0.000 description 1
- 230000001018 virulence Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
- C12N15/1031—Mutagenizing nucleic acids mutagenesis by gene assembly, e.g. assembly by oligonucleotide extension PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/64—General methods for preparing the vector, for introducing it into the cell or for selecting the vector-containing host
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/66—General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
Abstract
Methods are provided herein for assembling at least two nucleic acids using a sequence specific nuclease agent (e.g., a gRNA-Cas complex) to create end sequences having complementarity and subsequently assembling the overlapping complementary sequences. The nuclease agent (e.g., a gRNA-Cas complex) can create double strand breaks in dsDNA in order to create overlapping end sequences or can create nicks on each strand to produce complementary overhanging end sequences. Assembly using the method described herein can assemble any nucleic acids having overlapping sequences or can use a joiner oligo to assemble sequences without complementary ends. can create double strand breaks in dsDNA in order to create overlapping end sequences or can create nicks on each strand to produce complementary overhanging end sequences. Assembly using the method described herein can assemble any nucleic acids having overlapping sequences or can use a joiner oligo to assemble sequences without complementary ends.
Description
NUCLEASE-MEDIATED DNA ASSEMBLY
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of US. Provisional Application No.
62/015,809, filed June 23, 2014, US. Provisional ation No. 62/016,400, filed June 24,
2014, and of US. Provisional Application No. 62/036,983, filed August 13, 2014, each of
which is hereby incorporated herein in its entirety by nce.
AS A TEXT FILE VIA EFS WEB
The official copy of the sequence listing is submitted electronically via EFS-
Web as an ASCII formatted sequence listing with a file named 461002SEQLIST.TXT,
d on June 23, 2015, and having a size of 66KB, and is filed concurrently with the
specification. The ce listing contained in this ASCII formatted document is part of the
specification and is herein incorporated by nce in its entirety.
BACKGROUND
Historically, overlap extension could be used as a means of synthesizing larger
double stranded DNA molecules, particularly genes, from overlapping synthetic
oligonucleotides. However, these methods could not effectively combine large DNA
molecules in a rapid manner. Further, site-specific combination of large nucleic acids using
overlapping sequences is often limited by the availability of overlapping sequences at the
d position in the nucleic acids to be ed. Engineered nuclease enzymes designed
to target specific DNA sequences have attracted attention as powerful tools for genetic
manipulation allowing for ed gene deletion, replacement, and repair, as well as the
insertion of exogenous sequences. However, existing technologies suffer from limited
precision, which can lead to ictable off-target effects and time consuming multistep
reactions.
SUMMARY
Methods are provided herein for assembling c acids having overlapping
sequences. Such s comprise a method for ling at least two nucleic acids,
comprising: (a) contacting a first nucleic acid with a first nuclease agent, wherein the first
nuclease agent cleaves the first nucleic acid at a first target site to produce a first digested
nucleic acid with overlapping end sequences between the first digested nucleic acid and a
second nucleic acid; (b) contacting the first digested nucleic acid and the second nucleic acid
with an exonuclease to expose complementary sequences between the first digested nucleic
acid and the second nucleic acid; and (c) assembling the two nucleic acid fragments
generated from step (b). In some such methods step (c) further comprises: (i) annealing the
exposed complementary sequences; (ii) extending 3’ ends of the annealed complementary
sequences; and (iii) ligating the first and the second c acid.
In some of the methods step (a) further comprises contacting the second
nucleic acid with a second nuclease agent, wherein the second nucleic acid does not se
the pping end sequence, and the second nuclease agent cleaves the second c acid
at a second target site to produce a second digested nucleic acid with the overlapping end
sequences between the first digested nucleic acid and the second digested nucleic acid, and
wherein the second nucleic acid of step (b) is the second digested nucleic acid. In some of the
methods, the overlapping end sequence ranges from 20 bp to 200 bp long.
In some of the methods, at least one of the first or second nuclease agent
comprises a Cas protein and a guide RNA (gRNA) (gRNA-Cas complex) that s the first
or the second target site. For example, the Cas protein can be a Cas9 protein. The Cas9
protein may comprise a Rqu domain and a HNH domain, at least one of which lacks
clease activity. In some embodiments, the gRNA comprises a nucleic acid sequence
encoding a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) RNA
(chNA) and a trans-activating CRISPR RNA (trachNA). The first target site and/or second
target site can be d by a Protospacer nt Motif (PAM) sequence. In some of the
methods the se agent comprises a zinc finger nuclease or a Transcription Activator-
Like Effector Nuclease (TALEN).
In some of the methods the first, the second, or both nucleic acids are from a
bacterial artificial chromosome. The bacterial artificial chromosome can comprise a human
DNA, a rodent DNA, a synthetic DNA, or a combination thereof. The bacterial cial
some can comprise a human sequence.
The s disclosed herein include a method for assembling at least two
nucleic acids, comprising: (a) contacting a first nucleic acid with a first nuclease agent and a
second se agent to e a first digested nucleic acid, wherein the first nuclease
agent generates a nick on a first strand of the first nucleic acid at a first target site, and the
second nuclease agent generates a nick on a second strand of the first nucleic acid at a second
target site, to produce a first ed nucleic acid comprising 5’ or 3’ overhanging sequence
at one of its ends; (b) annealing the first digested nucleic acid and a second nucleic acid
comprising a complementary sequence to the 5’ or 3’ overhanging sequence; and (c) ligating
the first ed nucleic acid and the second nucleic acid. In some of the methods, step (b)
further comprises extending the 3’ end of the first strand using the second strand as a template
and extending the 3’ end of the second strand based using the first strand as a template. In
some of the methods, the first target site is separated by at least 4 bp from the second target
site.
In some of the methods, at least one of the first or second nuclease agent
comprises a Cas9 protein and a guide RNA (gRNA) (gRNA-Cas complex) that targets the
first or the second target site. The gRNA can comprise a nucleic acid sequence encoding a
red rly Interspaced Short Palindromic Repeats (CRISPR) RNA (chNA) and a
trans-activating CRISPR RNA NA). In some of the methods, at least one of the first
target site and second target site is flanked by a Protospacer Adjacent Motif (PAM) sequence.
The Cas9 protein can comprise a Rqu domain and a HNH domain, one of which lacks
endonuclease activity.
In some of the methods, the second nucleic acid does not comprise the
complementary sequence to the 5’ or 3’ overhanging sequence of the first ed nucleic
acid, and step (a) further comprises contacting the first digested nucleic acid and the second
digested nucleic acid with a joiner oligo, wherein the joiner oligo comprises: (i) a first
complementary sequence to the 5’ or 3’ overhanging ce of the first digested nucleic
acid; and (ii) a second complementary ce to the 5’ or 3’ overhanging sequence of the
second digested nucleic acid. In some methods, the first, the second, or both nucleic acids are
derived from a ial artificial chromosome. The bacterial artificial chromosome can
comprise a human DNA, a rodent DNA, a synthetic DNA, or a combination thereof. The
bacterial artificial chromosome can comprise a human cleotide sequence. In some
methods, the second c acid ses a bacterial cial chromosome.
Methods provided also include a method for assembling two or more nucleic
acid fragments, comprising: (a) contacting a first nucleic acid with at least one nuclease agent
to generate a first digested nucleic acid; (b) contacting the first digested nucleic acid with a
second nucleic acid, a joiner oligo, and an exonuclease, wherein the joiner oligo comprises:
(i) a first complementary sequence that is complementary to the first digested c acid;
(ii) a spacer; and (iii) a second complementary sequence that is complementary to the second
nucleic acid; wherein the exonuclease exposes the first and second complementary
sequences; and (c) assembling the joiner oligo with the first digested c acid and the
second nucleic acid. In some such methods the assembling in step (c) comprises: (i)
annealing the first complementary sequence of the joiner oligo to the first digested c
acid and the second complementary ce of the joiner oligo to the second nucleic acid;
and (ii) ligating the joiner oligo to the first digested nucleic acid and the second nucleic acid.
In some methods the first complementary sequence and the second
complementary sequence of the joiner oligo comprise between 15 and 120 complementary
bases. In some methods, the spacer of the joiner oligo ses non-complementary nucleic
acids. In some embodiments, the first digested nucleic acid is seamlessly assembled to the
second nucleic acid.
In some methods, the nuclease agent is designed to cleave an at least 20 bp
fragment from the end of the first nucleic acid at which the seamless assembly will occur,
wherein, the spacer of the joiner oligo comprises a sequence identical to said at least 20 bp
fragment, wherein no nucleic acid bases are t between the first complementary
ce and the at least 20 bp fragment, and no nucleic acid bases are present between the
second complementary sequence and the at least 20 bp fragment, such that assembly of said
first nucleic acid with said joiner oligo and said second nucleic acid reconstitutes the at least
bp fragment and seamlessly assembles the first and second nucleic acid. In some methods,
the same method is performed with an at least 20 bp fragment from the second nucleic acid as
the spacer sequence. In some methods, the spacer comprises from about 20 bp to about 120
bp. In some methods, the second nucleic acid is contacted with a second se agent and
an exonuclease, wherein the second nuclease agent cleaves the second nucleic acid to
produce a second digested nucleic acid comprising a nucleotide ce that is
complementary to the second complementary sequence of the joiner oligo, wherein the first
digested nucleic acid is assembled to the second digested nucleic acid. In some methods, the
second nucleic acid is contacted with a restriction enzyme or meganuclease and an
exonuclease, wherein the restriction enzyme or meganuclease cleaves the second nucleic acid
to produce a second digested nucleic acid comprising a tide sequence that is
complementary to the second complementary sequence in the joiner oligo, wherein the first
digested nucleic acid is led to the second ed nucleic acid. In some methods, the
3' end of the first and/or the second digested nucleic acids is extended in step (b). The joiner
oligo can be assembled to said first nucleic acid and said second nucleic acid in the same
reaction or sequentially. In some methods, the first, the second, or both nucleic acids are
d from a ial artificial chromosome, at least 10 kb, and/or se a human
DNA, rodent DNA, a synthetic DNA, or a combination thereof.
In some of the methods, the at least one se agent or second nuclease
agent comprises a Cas protein and a guide RNA (gRNA) (gRNA-Cas complex) that targets
the first or the second target site. For example, the Cas protein can be a Cas9 protein. The
Cas9 protein may comprise a Rqu domain and a HNH domain, at least one of which lacks
endonuclease activity. In some embodiments, the gRNA comprises a nucleic acid sequence
ng a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) RNA
(chNA) and a trans-activating CRISPR RNA (trachNA). The first target site and/or second
target site can be flanked by a Protospacer Adjacent Motif (PAM) sequence. In some of the
methods the at least one nuclease agent and/or the second nuclease agent comprises a zinc
finger nuclease or a Transcription Activator-Like Effector Nuclease (TALEN).
In some embodiments, the joiner oligo comprises a gBlock. In some such
methods, the gBlock does not comprise a selection te.
Methods are further ed for assembling two or more c acids,
comprising: (a) contacting a first c acid with at least one nuclease agent to generate a
first digested nucleic acid; (b) contacting a second nucleic acid with a second nuclease agent
to generate a second digested nucleic acid; (c) contacting the first digested nucleic acid and
the second digested nucleic acid with a joiner oligo and an exonuclease, wherein the joiner
oligo comprises: (i) a first complementary sequence that is complementary to the first
ed nucleic acid; (ii) a spacer; and (iii) a second complementary sequence that is
complementary to the second digested nucleic acid; wherein the exonuclease exposes the first
and second complementary sequences; and (d) assembling the joiner oligo with the first
digested nucleic acid and the second nucleic acid.
Methods are provided herein for assembling nucleic acids having overlapping
sequences. Such methods comprise a method for assembling at least two c acid
fragments, comprising (a) contacting a first and a second nucleic acid comprising pping
sequences with at least one as complex and an exonuclease, thereby generating two
digested nucleic acid fragments comprising complementary sequences at one of their ends;
(b) assembling the two nucleic acid fragments ted from step (a). In some methods, the
at least one gRNA-Cas complex s the first nucleic acid at a first target site to e a
first digested nucleic acid comprising complementary end sequences between the first
digested nucleic acid and the second nucleic acid. In certain methods, step (b) further
comprises: (i) annealing the exposed complementary sequences; (ii) extending 3’ ends of the
annealed mentary sequences; and (iii) ligating the first and the second c acid. In
some methods, step (a) further comprises contacting the second nucleic acid with a second
gRNA-Cas complex, wherein the second nucleic acid does not comprise the overlapping end
sequence, and the second as complex s the second nucleic acid to produce a
second digested nucleic acid comprising the overlapping end sequences between the first
digested nucleic acid and the second digested c acid. For example, the as
complex comprises a Cas9 protein. The Cas9 protein can comprise a Rqu domain and a
HNH domain, at least one of which lacks endonuclease activity. In some methods, the
overlapping sequence ranges from 20 bp to 200 bp long. The method of any one of claims 1-
7, wherein the first, the second, or both nucleic acids are from a bacterial artificial
chromosome. In some methods, the bacterial artificial chromosome comprises a human DNA,
a rodent DNA, a synthetic DNA, or a combination f. The bacterial artificial
chromosome can comprise a human sequence.
Methods provided also include a method for assembling two or more nucleic
acid nts, comprising: (a) exposing a first and a second nucleic acid to at least one
gRNA-Cas complex to generate a first and a second digested nucleic acids comprising a 5’ or
3’ overhanging sequence at one of their ends; (b) assembling the two nucleic acid fragments
generated from step (a). In some methods, assembling step (b) comprises: (i) annealing the 5’
and 3’ overhanging ces; and (ii) ligating the first digested nucleic acid and the second
digested nucleic acid. In some methods, the 5’ and/or 3’ overhanging sequences comprise at
least 4 complementary bases. In some s, step (b) further comprises extending the 3’
end of the first and the second digested nucleic acids. In some methods, the second nucleic
acid does not comprise a complementary sequence to the 5’ or 3’ overhanging sequence of
the first ed c acid, and step (a) further comprises ting the first digested
nucleic acid and the second digested nucleic acid with a joiner oligo, wherein the joiner oligo
comprises: (i) a first complementary sequence to the 5’ or 3’ overhanging sequence of the
first digested nucleic acid; and (ii) a second mentary sequence to the 5’ or 3’
overhanging sequence of the second digested nucleic acid. In some methods, the gRNA-Cas
protein complex comprises a Cas9 protein comprising a Rqu domain and a HNH domain,
one of which lacks endonuclease activity. In some s the gRNA-Cas complex is
ed separately as a chNA, trachNA, and Cas protein. In some methods, the first and
the second nucleic acids comprise a Protospacer nt Motif (PAM) sequence. In some
methods, the first, the second, or both nucleic acids are derived from a ial artificial
chromosome. In some methods, the bacterial artificial chromosome comprises a human DNA,
a rodent DNA, a synthetic DNA, or a combination thereof. For example, the bacterial
artificial chromosome can comprise a human polynucleotide sequence.
Methods are further provided for assembling two or more nucleic acids,
comprising: (a)contacting a first nucleic acid with at least one gRNA-Cas complex to
generate a first digested nucleic acid; and (b) contacting the first digested nucleic acid with a
second nucleic acid, a joiner oligo, and an exonuclease, wherein the joiner oligo comprises:
(i) a first complementary sequence that is complementary to the first digested c acid;(ii)
a spacer; and (iii) a second complementary sequence that is complementary to the second
nucleic acid; wherein the exonuclease exposes the first and second complementary
sequences; and (c) assembling the joiner oligo with the first digested nucleic acid and the
second nucleic acid. In some methods assembling step (c) comprises (i) annealing the first
complementary sequence of the joiner oligo to the first digested nucleic acid and the second
complementary sequence of the joiner oligo to the second nucleic acid; and (ii) ligating the
joiner oligo to the first digested nucleic acid and the second nucleic acid. In some methods
the first complementary ce and the second complementary sequence of the joiner oligo
comprise between 15 and 120 complementary bases. In some methods, the spacer of the
joiner oligo comprises non-complementary nucleic acids.
Using the joiner oligo, the first digested nucleic acid can be seamlessly
led to the second nucleic acid. In some methods, the gRNA-Cas complex is designed
to cleave an at least 20 bp fragment from the end of the first nucleic acid at which the
seamless ly will occur, wherein, the spacer of the joiner oligo comprises a sequence
cal to said at least 20 bp fragment, wherein no c acid bases are present between
the first complementary sequence and the at least 20 bp fragment, and no nucleic acid bases
are present between the second complementary ce and the at least 20 bp fragment,
such that assembly of said first nucleic acid with said joiner oligo and said second nucleic
acid reconstitutes the at least 20 bp fragment and seamlessly assembles the first and second
c acid. In some methods, the same method is med with an at least 20 bp fragment
from the second c acid as the spacer sequence. In some methods, the spacer comprises
from about 20 bp to about 120 bp. In some methods, the second nucleic acid is contacted with
a second gRNA-Cas complex and an exonuclease, wherein the second gRNA-Cas complex
cleaves the second nucleic acid to produce a second digested nucleic acid comprising a
nucleotide sequence that is complementary to the second complementary sequence of the
joiner oligo, n the first digested nucleic acid is assembled to the second digested
nucleic acid. In some methods, the second nucleic acid is contacted with a ction enzyme
or meganuclease and an exonuclease, wherein the restriction enzyme or meganuclease
s the second c acid to e a second digested nucleic acid comprising a
nucleotide sequence that is complementary to the second complementary sequence in the
joiner oligo, wherein the first digested nucleic acid is assembled to the second digested
nucleic acid. In some methods, the 3' end of the first and/or the second digested nucleic acids
is extended in step (b). The joiner oligo can be assembled to said first nucleic acid and said
second nucleic acid in the same reaction or sequentially. In some methods, the gRNA-Cas
complex comprises a Cas9 protein. In some methods, the first, the second, or both nucleic
acids are derived from a bacterial artificial chromosome, at least 10 kb, and/or se a
human DNA, rodent DNA, a synthetic DNA, or a combination thereof.
BRIEF DESCRIPTION OF THE GS
shows ly of a BAC to a PCR t having overlaps ed
to be specific for the BAC. 50 bp overlaps were added to the HYG cassette by PCR.
shows ly of two BACs having overlapping sequences using two
Cas9 target sites on each BAC. The process of assembly using the method disclosed herein
took 2 days.
shows assembly of two BACs with overlapping sequences using
ional methods. The process of assembly using traditional methods took 4 weeks.
shows the cloning efficiencies of Cas9/isothermal assembly method
and the time ed for BAC cloning steps.
shows the construction of a large targeting vector (LTVEC) using
CRISPR/Cas9 system and isothermal assembly. DNA fragments cleaved with CRISPR/Cas9
were seamlessly assembled using one or more joiner oligos and isothermal assembly.
shows the strategy for using linkers (joiner oligos) for seamlessly
assembling nucleic acids after Cas9 cleavage. A as9 complex is designed to cleave a
target site located 5’ upstream of an area of interest (arrow) to generate a first Cas9-digested
DNA nt (5’ DNA). The d portion of the 5’ DNA (slashed box) is then used as a
spacer between the 5’ and 3’ pping sequences in a joiner oligo. Three components are
assembled in the isothermal assembly reaction: (a) a first Cas9-digested DNA fragment (5’
DNA); (b) a joiner oligo; and (c) a second DNA fragment (3’ DNA). The joiner oligo
comprises from 5’ to 3’: (1) an overlapping sequence with 5’ DNA, (2) a spacer containing
the deleted portion of the first digested fragment, and (3) an overlapping sequence with 3’
DNA. The deleted portion of the 5’DNA is reconstituted during the assembly step.
shows the construction of a DNA vector using CRISPR/Cas9 system
and rmal assembly.
WO 00334
shows the construction of a large targeting vector using CRISPR/Cas9
system and isothermal assembly.
shows the construction of a targeting vector for replacement of a
portion of a BAC vector with a cassette using isothermal ly and two linkers (joiner
oligos). The results of various ratios of mBAC to fragments or linkers are presented in panels
#1, #2, #3, and #4.
shows the sequence confirmation of seamless assembly across both
junctions of the ly reaction between an mBAC (BAC ID: RP23-399M19) and a
cassette using two linkers.
shows the assembly of two mBACs using Cas9 and isothermal
assembly. Assembly n the bMQ50f19 vector and the cassette comprising a
hygromycin resistance gene ubiquitin promoter was seamless.
shows the sequence confirmation of ss assembly at linker 1,
and sequence confirmation of assembly that was intentionally not seamless at linker 2 and
linker 3.
shows the insertion of large human gene fragments onto a mBAC
using four linkers and rmal assembly. Cas9 cleaved hGene fragment A from hBACl,
hGene Fragment B from hBAC2, and mBAC to remove mGene fragments.
shows the insertion of human sequence into a BAC vector using Cas9
and Isothermal Assembly.
shows the ion of a gBlock comprising a meganuclease site using
Cas9 and Isothermal ly. A shows the insertion of a gBlock comprising a PI-
SceI site; and B shows the insertion of a gBlock comprising a MauBI site.
illustrates an example of direct humanization of a targeting vector
using three joiner oligos, Cas9, and isothermal assembly.
illustrates an example of indirect humanization of a ing vector
using a donor with up and down joiner oligos, Cas9, and isothermal assembly.
illustrates an example of introducing a point mutation using Cas9 and
Isothermal Assembly.
illustrates an example of BAC trimming by Cas9 and isothermal
assembly. In this example, the trimming removes the Ori sequence. The Ori sequence is re-
inserted in the vector using two joiner oligos and rmal assembly.
DETAILED DESCRIPTION
I. Definitions
The terms “protein,” “polypeptide,” and “peptide,” used interchangeably
herein, include polymeric forms of amino acids of any length, ing coded and non-coded
amino acids and chemically or biochemically modified or derivatized amino acids. The terms
also e polymers that have been modified, such as polypeptides having modified peptide
backbones.
The terms “nucleic acid” and ucleotide,” used interchangeably herein,
include polymeric forms of nucleotides of any length, including ribonucleotides,
deoxyribonucleotides, or analogs or modified versions thereof. They e single-, double-
and multi-stranded DNA or RNA, c DNA, cDNA, DNA-RNA hybrids, and
polymers comprising purine bases, pyrimidine bases, or other natural, chemically modified,
biochemically modified, non-natural, or derivatized nucleotide bases.
“Codon optimization” lly includes a process of modifying a c acid
sequence for enhanced expression in particular host cells by replacing at least one codon of
the native sequence with a codon that is more frequently or most frequently used in the genes
of the host cell while maintaining the native amino acid sequence. For example, a nucleic
acid encoding a Cas protein can be modified to substitute codons having a higher frequency
of usage in a given prokaryotic or eukaryotic cell, ing a bacterial cell, a yeast cell, a
human cell, a man cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, a
hamster cell, or any other host cell, as compared to the naturally occurring nucleic acid
sequence. Codon usage tables are y available, for example, at the “Codon Usage
Database.” These tables can be adapted in a number of ways. See Nakamura et a1. (2000)
Nucleic Acids Research 28:292. Computer algorithms for codon optimization of a particular
sequence for expression in a particular host are also available (see, e. g., Gene Forge).
ble linkage” or being “operably linked” includes juxtaposition of two
or more ents (e.g., a promoter and another sequence element) such that both
components function normally and allow the possibility that at least one of the components
can mediate a function that is exerted upon at least one of the other components. For
example, a promoter can be operably linked to a coding sequence if the promoter controls the
level of transcription of the coding sequence in response to the presence or absence of one or
more riptional regulatory factors.
“Complementarity” of nucleic acids means that a nucleotide sequence in one
strand of nucleic acid, due to ation of its nucleobase groups, forms hydrogen bonds with
another sequence on an opposing c acid strand. The complementary bases in DNA are
typically A with T and C with G. In RNA, they are typically C with G and U with A.
Complementarity can be perfect or substantial/sufficient. Perfect complementarity between
two nucleic acids means that the two nucleic acids can form a duplex in which every base in
the duplex is bonded to a complementary base by Watson-Crick pairing. "Substantial" or
"sufficient" complementary means that a ce in one strand is not completely and/or
perfectly complementary to a sequence in an opposing strand, but that sufficient bonding
occurs between bases on the two strands to form a stable hybrid complex in set of
hybridization conditions (e.g., salt concentration and temperature). Such conditions can be
predicted by using the sequences and standard mathematical calculations to predict the Tm of
ized strands, or by empirical determination of Tm by using routine methods. Tm
includes the temperature at which a population of hybridization complexes formed n
two nucleic acid strands are 50% denatured. At a temperature below the Tm, formation of a
hybridization complex is favored, whereas at a temperature above the Tm, melting or
separation of the strands in the hybridization complex is favored. Tm may be estimated for a
nucleic acid having a known G+C content in an aqueous l M NaCl solution by using, e. g.,
Tm=81.5+0.4l(% G+C), although other known Tm computations take into t nucleic
acid structural characteristics.
"Hybridization condition" includes the cumulative nment in which one
nucleic acid strand bonds to a second nucleic acid strand by complementary strand
interactions and hydrogen bonding to produce a hybridization complex. Such conditions
include the chemical components and their concentrations (e. g., salts, chelating agents,
formamide) of an aqueous or organic solution containing the c acids, and the
temperature of the mixture. Other factors, such as the length of incubation time or reaction
r dimensions may bute to the environment. See, e. g., Sambrook et al.,
Molecular Cloning, A Laboratory Manual, 2.sup.nd ed., pp. 1.90-1.91, 9.47-9.51, l 1.47-
1157 (Cold Spring Harbor tory Press, Cold Spring Harbor, N.Y., 1989).
Hybridization requires that the two nucleic acids contain complementary
ces, although mismatches between bases are possible. The conditions appropriate for
hybridization between two nucleic acids depend on the length of the nucleic acids and the
degree of complementation, variables well known in the art. The greater the degree of
complementation between two tide sequences, the greater the value of the melting
temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations
between nucleic acids with short stretches of complementarity (e.g. complementarity over 35
or fewer, 30 or fewer, 25 or fewer, 22 or fewer, 20 or fewer, or 18 or fewer nucleotides) the
position of ches becomes important (see Sambrook et al., supra, 11.7-11.8).
Typically, the length for a izable nucleic acid is at least about 10 nucleotides.
Illustrative minimum lengths for a izable nucleic acid include at least about 15
nucleotides, at least about 20 nucleotides, at least about 22 nucleotides, at least about 25
nucleotides, and at least about 30 nucleotides. Furthermore, the temperature and wash
solution salt tration may be adjusted as necessary according to factors such as length
of the region of complementation and the degree of complementation.
The ce of polynucleotide need not be 100% mentary to that of
its target nucleic acid to be specifically hybridizable. Moreover, a cleotide may
ize over one or more segments such that intervening or adjacent segments are not
involved in the hybridization event (e. g., a loop structure or hairpin structure). A
polynucleotide (e. g., gRNA) can comprise at least 70%, at least 80%, at least 90%, at least
95%, at least 99%, or 100% sequence complementarity to a target region within the target
nucleic acid sequence to which they are targeted. For example, a gRNA in which 18 of 20
nucleotides are mentary to a target region, and would therefore specifically hybridize,
would represent 90% complementarity. In this e, the ing noncomplementary
nucleotides may be clustered or interspersed with complementary nucleotides and need not be
contiguous to each other or to complementary nucleotides.
Percent complementarity between particular stretches of nucleic acid
sequences within nucleic acids can be determined routinely using BLAST programs (basic
local ent search tools) and PowerBLAST programs known in the art (Altschul et a1.
(1990) J. M01. Biol. 215:403-410; Zhang and Madden (1997) Genome Res. 7:649-656) or by
using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics
Computer Group, University Research Park, Madison Wis.), using default settings, which
uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
The methods and compositions provided herein employ a variety of different
components. It is recognized throughout the description that some components can have
active variants and fragments. Such components include, for example, Cas proteins, CRISPR
RNAs, trachNAs, and guide RNAs. ical activity for each of these components is
described elsewhere herein.
"Sequence identity" or "identity" in the context of two polynucleotides or
polypeptide ces makes reference to the residues in the two sequences that are the same
when aligned for maximum correspondence over a specified comparison window. When
percentage of sequence identity is used in reference to proteins it is recognized that residue
positions which are not cal often differ by conservative amino acid tutions, where
amino acid residues are substituted for other amino acid residues with similar chemical
properties (e. g., charge or hydrophobicity) and therefore do not change the functional
properties of the le. When sequences differ in conservative substitutions, the percent
sequence identity may be adjusted s to correct for the conservative nature of the
substitution. Sequences that differ by such conservative substitutions are said to have
"sequence similarity" or "similarity." Means for making this adjustment are well known to
those of skill in the art. Typically, this es scoring a conservative substitution as a
partial rather than a full mismatch, thereby increasing the percentage sequence identity.
Thus, for e, where an identical amino acid is given a score of l and a non-conservative
substitution is given a score of zero, a conservative substitution is given a score between zero
and l. The g of conservative substitutions is calculated, e. g., as implemented in the
program PC/GENE (Intelligenetics, Mountain View, rnia).
"Percentage of sequence identity" includes the value determined by
comparing two optimally aligned sequences over a comparison window, wherein the portion
of the polynucleotide sequence in the comparison window may comprise additions or
deletions (i.e., gaps) as compared to the reference sequence (which does not comprise
additions or deletions) for optimal alignment of the two sequences. The percentage is
calculated by determining the number of positions at which the identical nucleic acid base or
amino acid residue occurs in both sequences to yield the number of d positions,
dividing the number of matched positions by the total number of positions in the window of
comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
Unless otherwise stated, sequence identity/similarity values e the value
obtained using GAP Version 10 using the following ters: % identity and % similarity
for a nucleotide sequence using GAP Weight of 50 and Length Weight of 3, and the
nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid ce
using GAP Weight of 8 and Length Weight of 2, and the BLOSUM62 scoring matrix; or any
equivalent program thereof. "Equivalent program" es any sequence comparison
program that, for any two sequences in question, generates an alignment having identical
WO 00334
nucleotide or amino acid residue matches and an identical percent sequence identity when
compared to the corresponding alignment generated by GAP Version 10.
Compositions or methods “comprising” or “including” one or more recited
elements may e other elements not specifically recited. For example, a composition
that “comprises” or “includes” a protein may contain the protein alone or in ation with
other ingredients.
Designation of a range of values includes all integers within or defining the
range, and all subranges d by integers within the range.
Unless otherwise apparent from the context, the term “about” encompasses
values within a standard margin of error of measurement (e. g., SEM) of a stated value.
The singular forms of the articles “a,” “an,” and “the” e plural
references unless the context clearly dictates otherwise. For example, the term “a Cas
n” or “at least one Cas protein” can include a plurality of Cas proteins, including
mixtures thereof.
11. General
Traditional methods of ling nucleic acids employ time consuming steps
of conventional enzymatic digestion with restriction enzymes, cloning of the c acids,
and ligating nucleic acids together (see, and for an illustration of traditional
methods and timeline). These methods are made more difficult when large fragments or
vectors are being assembled together. The methods ed herein take advantage of the
malleable target specificity of nucleases (e. g., guide RNAs and Cas9 nucleases) to convert
nucleic acids into a form suitable for use in rapid assembly reactions.
Provided herein are methods for assembling at least two nucleic acids using
nuclease agents directed to specific target sites, such as by guide RNA (gRNA) (e. g., Cas
protein directed to specific target sites by guide RNA (gRNA)). Site directed nuclease agents,
for example, guide RNA-directed Cas proteins, allow rapid and efficient combination of
nucleic acids by selecting and manipulating the end ces generated by their
endonuclease activity. The methods provided herein combine a first polynucleotide with a
nuclease agent (e. g., a as complex) specific for a desired target site and an
lease. The target site can be chosen such that when the nuclease cleaves the nucleic
acid, the resulting ends created by the cleavage have regions complementary to the ends of
the second nucleic acid (e. g., overlapping ends). These complementary ends can then be
assembled to yield a single assembled c acid. Because the nuclease agent (e. g., gRNA-
Cas complex) is specific for an individual target site, the present method allows for
modification of nucleic acids in a precise site-directed manner. The present method further
takes advantage of nuclease agent, for example, a gRNA-Cas complex, specificity by
utilizing rapid and efficient assembly methods specially ed for combining overlapping
c acid ends generated by nuclease cleavage or ed and synthesized for the
assembly reaction. For example, by selecting a nuclease agent (e. g., a as complex)
specific for a target site such that, on cleavage, end sequences complementary to those of a
second nucleic acid are produced, isothermal assembly can be used to assemble the ing
digested nucleic acid. Thus, by selecting nucleic acids and nuclease agents (e. g., gRNA-Cas
complexes) that result in overlapping end sequences, nucleic acids can be assembled by rapid
combinatorial methods to produce the final assembled nucleic acid in a fast and efficient
manner. Alternatively, nucleic acids not having mentary ends can be assembled with
joiner oligos designed to have complementary ends to each nucleic acid. By using the joiner
oligos, two or more nucleic acids can be seamlessly assembled, y reducing unnecessary
sequences in the ing assembled nucleic acid.
111. Nuclease Agent
The present methods employ a nuclease agent for site-directed cleavage of
polynucleotides. ically, endonuclease ge of polynucleotides at an identified
target site produces a digested polynucleotide with ends that can then be joined to a second
polynucleotide to assemble two or more polynucleotides in a site-specific manner.
"Nuclease agent” includes les which possesses activity for DNA
ge. Particular examples of nuclease agents for use in the methods disclosed herein
include ided -Cas9 system, zinc finger proteins, meganucleases, TAL
domains, TALENs, yeast assembly, recombinases, leucine zippers, CRISPR/Cas,
endonucleases, and other nuclease agents known to those in the art. Nuclease agents can be
selected or designed for specificity in cleaving at a given target site. For example, nuclease
agents can be selected for cleavage at a target site that creates pping ends between the
cleaved polynucleotide and a different polynucleotide. Nuclease agents having both protein
and RNA elements as in CRISPR-Cas9 can be supplied with the agents already complexed as
a nuclease agent, or can be supplied with the protein and RNA elements te, in which
case they complex to form a nuclease agent in the reaction mixtures described herein.
The term “recognition site for a nuclease agent” includes a DNA sequence at
which a nick or double-strand break is induced by a nuclease agent. The recognition site for
a nuclease agent can be endogenous (or native) to the cell or the recognition site can be
exogenous to the cell. In specific embodiments, the recognition site is exogenous to the cell
and thereby is not naturally ing in the genome of the cell. In still further embodiments,
the recognition site is exogenous to the cell and to the polynucleotides of interest that one
desires to be positioned at the target locus. In further embodiments, the exogenous or
endogenous recognition site is present only once in the genome of the host cell. In specific
embodiments, an nous or native site that occurs only once within the genome is
fied. Such a site can then be used to design nuclease agents that will produce a nick or
double-strand break at the endogenous recognition site.
The length of the recognition site can vary, and includes, for example,
recognition sites that are about 30-36 bp for a zinc finger nuclease (ZFN) pair (i.e., about 15-
18 bp for each ZFN), about 36 bp for a Transcription Activator-Like or Nuclease
(TALEN), or about 20 bp for a CRISPR/Cas9 guide RNA.
Active variants and fragments of the exemplified recognition sites are also
provided. Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91%,
92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the given
recognition site, wherein the active variants retain biological activity and hence are capable of
being recognized and cleaved by a nuclease agent in a sequence-specific manner. Assays to
measure the -strand break of a recognition site by a nuclease agent are known in the art
(e. g., ® qPCR assay, Frendewey D. et (11., Methods in Enzymology, 2010, -
307, which is incorporated by reference herein in its entirety).
In specific embodiments, the recognition site is positioned within the
polynucleotide encoding the selection marker. Such a position can be d within the
coding region of the ion marker or within the regulatory regions, which influence the
expression of the selection marker. Thus, a recognition site of the nuclease agent can be
d in an intron of the selection marker, a promoter, an enhancer, a regulatory , or
any non-protein-coding region of the polynucleotide encoding the selection marker. In
specific embodiments, a nick or double-strand break at the recognition site disrupts the
ty of the selection marker. Methods to assay for the presence or absence of a functional
selection marker are known.
Any nuclease agent that induces a nick or double-strand break into a desired
recognition site can be used in the methods and itions disclosed herein. A naturally-
occurring or native nuclease agent can be employed so long as the nuclease agent induces a
nick or double-strand break in a desired recognition site. Alternatively, a modified or
engineered nuclease agent can be employed. An “engineered nuclease agent” comprises a
nuclease that is ered (modified or derived) from its native form to specifically
recognize and induce a nick or -strand break in the desired recognition site. Thus, an
engineered nuclease agent can be derived from a native, naturally-occurring nuclease agent or
it can be artificially created or synthesized. The cation of the nuclease agent can be as
little as one amino acid in a protein cleavage agent or one nucleotide in a nucleic acid
cleavage agent. In some embodiments, the engineered se induces a nick or double-
strand break in a recognition site, n the recognition site was not a sequence that would
have been recognized by a native (non-engineered or non-modified) nuclease agent.
Producing a nick or double-strand break in a recognition site or other DNA can be referred to
herein as “cutting” or “cleaving” the recognition site or other DNA.
These breaks can then be repaired by the cell in one of two ways: non-
homologous end joining and homology-directed repair (homologous recombination). In nonhomologous
end joining (NHEJ), the double-strand breaks are ed by direct ligation of
the break ends to one another. As such, no new c acid material is ed into the site,
although some nucleic acid material may be lost, resulting in a deletion. In homology-
directed repair, a donor polynucleotide with homology to the cleaved target DNA sequence
can be used as a template for repair of the cleaved target DNA sequence, resulting in the
transfer of genetic information from the donor polynucleotide to the target DNA. Therefore,
new nucleic acid al may be inserted/copied into the site. The modifications of the target
DNA due to NHEJ and/or homology-directed repair can be used for gene tion, gene
replacement, gene tagging, ene insertion, nucleotide deletion, gene disruption, gene
mutation, etc.
In one embodiment, the nuclease agent is a Transcription tor-Like
Effector Nuclease (TALEN). TAL effector nucleases are a class of sequence-specific
nucleases that can be used to make double-strand breaks at specific target sequences in the
genome of a prokaryotic or eukaryotic organism. TAL effector nucleases are created by
fusing a native or engineered transcription activator-like (TAL) effector, or functional part
thereof, to the tic domain of an endonuclease, such as, for example, Fold. The unique,
modular TAL effector DNA binding domain allows for the design of ns with potentially
any given DNA recognition specificity. Thus, the DNA binding domains of the TAL effector
nucleases can be engineered to recognize specific DNA target sites and thus, used to make
double-strand breaks at desired target sequences. See, ; Morbitzer et a1.
(2010) PNAS 10.1073/pnas.1013l33107; Scholze & Boch (2010) Virulence 1:428-432;
Christian et a1. Genetics (2010) 186:757-761; Li et a1. (2010) Nuc. Acids Res. (2010)
doi:10.1093/nar/gkq704; and Miller et al. (2011) Nature Biotechnology 29:143—148; all of
which are herein incorporated by reference.
Examples of suitable TAL nucleases, and methods for preparing suitable TAL
nucleases, are disclosed, e. g., in US Patent Application No. 2011/0239315 A1, 2011/0269234
A1, 2011/0145940 A1, 2003/0232410 A1, 2005/0208489 A1, 2005/0026157 A1,
2005/0064474 A1, 2006/0188987 A1, and 2006/0063231 A1 (each hereby incorporated by
reference). In various embodiments, TAL effector nucleases are engineered that cut in or
near a target nucleic acid sequence in, e. g., a genomic locus of interest, wherein the target
c acid sequence is at or near a sequence to be modified by a targeting vector. The TAL
nucleases suitable for use with the various s and compositions provided herein include
those that are specifically designed to bind at or near target c acid sequences to be
modified by targeting vectors as described herein.
In one ment, each monomer of the TALEN comprises 33-35 TAL
repeats that recognize a single base pair via two hypervariable residues. In one embodiment,
the nuclease agent is a chimeric protein comprising a TAL repeat-based DNA binding
domain operably linked to an independent nuclease. In one embodiment, the independent
nuclease is a FokI endonuclease. In one embodiment, the nuclease agent comprises a first
peat-based DNA binding domain and a second TAL-repeat-based DNA g
domain, wherein each of the first and the second TAL-repeat-based DNA binding domain is
operably linked to a FokI se subunit, wherein the first and the second TAL-repeat-
based DNA binding domain recognize two contiguous target DNA sequences in each strand
of the target DNA sequence separated by a spacer ce of varying length (12-20 bp), and
wherein the FokI nuclease subunits dimerize to create an active nuclease that makes a double
strand break at a target sequence.
The nuclease agent employed in the various methods and compositions
disclosed herein can further comprise a inger nuclease (ZFN). In one embodiment,
each monomer of the ZFN comprises 3 or more zinc finger-based DNA binding domains,
wherein each zinc finger-based DNA binding domain binds to a 3 bp subsite. In other
embodiments, the ZFN is a chimeric protein comprising a zinc -based DNA g
domain operably linked to an independent nuclease. In one embodiment, the independent
endonuclease is a FokI endonuclease. In one embodiment, the nuclease agent comprises a
first ZFN and a second ZFN, wherein each of the first ZFN and the second ZFN is ly
linked to a FokI se subunit, wherein the first and the second ZFN recognize two
contiguous target DNA sequences in each strand of the target DNA sequence separated by
about5-7 bp spacer, and wherein the FokI nuclease subunits dimerize to create an active
nuclease that makes a double strand break. See, for example, US200602465 67;
US20080182332; US20020081614; US20030021776; WO/2002/057308A2;
US20130123484; US20100291048; WO/2011/017293A2; and Gaj et al. (2013) Trends in
Biotechnology, 31(7):397-405, each of which is herein incorporated by reference.
In one embodiment of the methods provided herein, the nuclease agent
comprises (a) a chimeric protein comprising a zinc finger-based DNA binding domain fused
to a FokI endonuclease; or, (b) a chimeric protein comprising a Transcription Activator-Like
or Nuclease (TALEN) fused to a FokI clease.
In still another embodiment, the se agent is a meganuclease.
Meganucleases have been classified into four families based on conserved sequence motifs,
the families are the LAGLIDADG (SEQ ID NO: 16), GIY-YIG, H-N-H, and His-Cys box
families. These motifs participate in the coordination of metal ions and hydrolysis of
odiester bonds. HEases are notable for their long recognition sites, and for tolerating
some sequence polymorphisms in their DNA substrates. Meganuclease domains, ure
and function are known, see for example, Guhan and Muniyappa (2003) Crit Rev Biochem
Mol Biol 38:199-248; Lucas et al., (2001) Nucleic Acids Res 29:960-9; Jurica and Stoddard,
(1999) Cell Mol Life Sci 55: 1304-26; Stoddard, (2006) Q Rev Biophys 38:49-95; and Moure
et al., (2002) Nat Struct Biol 9:764. In some es a naturally occurring variant, and/or
engineered tive meganuclease is used. s for modifying the kinetics, cofactor
ctions, expression, l conditions, and/or recognition site specificity, and screening
for activity are known, see for e, Epinat et al., (2003) Nucleic Acids Res 31 :2952-62;
ier et al., (2002) Mol Cell 10:895-905; Gimble et al., (2003) Mol Biol 334:993-1008;
Seligman et al., (2002) Nucleic Acids Res 30:3870-9; Sussman et al., (2004) J Mol Biol
342:31-41; Rosen et al., (2006) Nucleic Acids Res 34:4791-800; Chames et al., (2005)
Nucleic Acids Res 3326178; Smith et al., (2006) Nucleic Acids Res 34:e149; Gruen et al.,
(2002) Nucleic Acids Res ; Chen and Zhao, (2005) Nucleic Acids Res 33:e154;
WO2005 1 05989; WO2003078619; 097854; WO2006097853; WO2006097784; and
WO2004031346.
Any meganuclease can be used herein, including, but not limited to, I-SceI, I-
SceII, I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI, sbIP, I-
CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI, PI-PspI, F-SceI, F-SceII, F-SuvI, F-
TevI, F—TevII, I-AmaI, I-AniI, I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP,
I-DdiI, I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI, I-NaaI, I-NanI, I-
NcIIP, I-NngP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI, I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-
PngP, I-PobIP, I-PorI, I-PorIIP, I-PprP, I-SpBetaIP, I-ScaI, I-SeXIP, I-SneIP, I-SpomI, I-
SpomCP, I-SpomIP, IIP, I-SquIP, 803I, I-SthPhiJP, I-SthPhiST3P, I-
SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, I-UarAP, I-UarHGPAIP, I-UarHGPAl3P,
P, I-ZbiIP, PI-MtuI, PI—MtuHIP PI—MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI—PkoII, PI-
Rma43812IP, PI—SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, PI—TliII, or any
active variants or fragments thereof.
In one embodiment, the meganuclease recognizes double-stranded DNA
sequences of 12 to 40 base pairs. In one embodiment, the clease izes one
perfectly matched target sequence in the . In one ment, the meganuclease is a
homing nuclease. In one embodiment, the homing nuclease is a ADG (SEQ ID NO:
16) family of homing nuclease. In one embodiment, the ADG (SEQ ID NO: 16)
family of homing nuclease is selected from I-SceI, I-CreI, and I-Dmol.
Nuclease agents can further comprise restriction endonucleases (restriction
enzymes), which e Type I, Type II, Type III, and Type IV cleases. Type I and
Type III restriction endonucleases recognize specific recognition sites, but typically cleave at
a variable position from the nuclease binding site, which can be hundreds of base pairs away
from the ge site (recognition site). In Type II systems the ction activity is
independent of any methylase activity, and cleavage typically occurs at specific sites within
or near to the binding site. Most Type II enzymes cut palindromic sequences, however Type
IIa enzymes recognize lindromic recognition sites and cleave outside of the
recognition site, Type IIb enzymes cut sequences twice with both sites outside of the
recognition site, and Type IIs enzymes recognize an asymmetric recognition site and cleave
on one side and at a defined distance of about 1-20 tides from the recognition site.
Type IV restriction enzymes target methylated DNA. Restriction enzymes are further
described and classified, for example in the REBASE database (webpage at rebase.neb.com;
Roberts et al., (2003) c Acids Res 31 :418-20), Roberts et al., (2003) Nucleic Acids Res
31:1805-12, and Belfort et al., (2002) in Mobile DNA 11, pp. 761-783, Eds. Craigie et al.,
(ASM Press, Washington, DC). In specific embodiments, at least two endonuclease enzymes
can be selected as the nuclease agents wherein the enzymes create compatible, or
complementary, sticky ends.
The nuclease agent ed in the various methods and compositions can
also comprise a CRISPR/Cas system. Such systems can employ a Cas9 nuclease, which in
some instances, is -optimized for the desired cell type. in which it is to he. sed.
The system r employs a fused chl\A-traerRNA construct that functions with the
cotton-optimized (fast). This single RNA is often referred to as a guide RNA or gRNA.
Within a gRNA, the chNA portion is identified as the t sequence’ for the given
recognition site and the trachNA is often referred to as the ‘scaffold’. This system has been
shown to function in a variety of eukaryotic and proltai'yotic cells. Briefly, a short DNA
fragment ning the target sequence is inserted into a guide RNA expression plasmid.
The gRNA sion plasmid comprises the target sequence (in some embodiments around
nucleotides), a form of the. trachNA sequence (the scaffold) as well as a suitable
promoter that is active in the cell and necessary elements for proper processing in eukai'yotic
cells. Many of the systems rely on custom, complementary oligos that are annealed to form a
double stranded DNA and then cloned into the gRNA expression plasmid. The gRNA
expression cassette and the Cas‘}.l expression cassette are then introduced into the cell. See,
for example, Mali P et al. (2013) Science 2013 Feb 15; 339 (6121):823-6; Jinek M et al.
Science 2012 Aug 17;337(6096):816-21; Hwang WY et al. Nat Biotechnol 2013
(3):227-9; Jiang W et al. Nat hnol 2013 Mar;31(3):233-9; and, Cong L et al.
Science 2013 Feb 15;339(6121):819-23, each of which is herein incorporated by reference.
The methods and compositions disclosed herein can utilize Clustered
Regularly Interspersed Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas)
systems or components of such systems to modify a genome within a cell. CRISPR/Cas
systems include transcripts and other elements involved in the expression of, or directing the
activity of, Cas genes. A CRISPR/Cas system can be a type I, a type II, or a type 111 system.
The methods and compositions sed herein employ CRISPR/Cas s by utilizing
CRISPR complexes (comprising a guide RNA (gRNA) complexed with a Cas protein) for
site-directed cleavage of nucleic acids.
Some CRISPR/Cas systems used in the methods disclosed herein are non-
naturally occurring. A “non-naturally occurring” system es anything indicating the
involvement of the hand of man, such as one or more components of the system being altered
or mutated from their naturally occurring state, being at least substantially free from at least
one other component with which they are naturally associated in nature, or being associated
with at least one other component with which they are not naturally associated. For example,
some /Cas systems employ turally occurring CRISPR complexes comprising
a gRNA and a Cas protein that do not naturally occur together.
2015/037199
Active variants and fragments of nuclease agents (i.e. an engineered nuclease
agent) are also provided. Such active variants can comprise at least 65%, 70%, 75%, 80%,
85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence ty to
the native nuclease agent, wherein the active variants retain the ability to cut at a desired
ition site and hence retain nick or double-strand-break-inducing activity. For example,
any of the nuclease agents described herein can be modified from a native endonuclease
sequence and designed to recognize and induce a nick or double-strand break at a recognition
site that was not recognized by the native nuclease agent. Thus, in some embodiments, the
engineered nuclease has a specificity to induce a nick or double-strand break at a recognition
site that is different from the corresponding native nuclease agent recognition site. Assays for
nick or double-strand-break-inducing activity are known and generally e the overall
activity and specificity of the endonuclease on DNA substrates containing the recognition
site.
IV. CRISPR/Cas Systems (gRNA-Cas complex)
The present methods can employ a /Cas system (e.g., gRNA-Cas
complex) for site-directed cleavage of nucleic acids. Specifically, Cas cleavage of nucleic
acids directed by gRNA to an identified target site produces a digested nucleic acid with ends
that can then be joined to a second nucleic acid to assemble two or more nucleic acids in a
site-specific manner.
A " gRNA-Cas complex” includes a complex of a Cas protein with a gRNA.
The gRNA can be designed or selected to direct Cas cleavage to a target site that creates
overlapping ends between the cleaved nucleic acid and a different nucleic acid. The gRNA-
Cas complex can be supplied with the agents already complexed, or can be supplied with the
protein and RNA elements separate, in which case they complex to form a gRNA-Cas
complex in the s and on mixtures described .
A. Cas RNA-Guided Endonucleases
Cas proteins generally comprise at least one RNA recognition or binding
domain. Such domains can interact with guide RNAs (gRNAs, described in more detail
below). Cas ns can also comprise nuclease domains (e. g., DNase or RNase domains),
DNA binding domains, helicase domains, n-protein ction domains, dimerization
s, and other s. A nuclease domain possesses catalytic activity for nucleic acid
cleavage. Cleavage includes the breakage of the covalent bonds of a nucleic acid molecule.
Cleavage can produce blunt ends or staggered ends, and it can be single-stranded or double-
stranded.
Examples of Cas proteins include Casl, CaslB, Cas2, Cas3, Cas4, Cas5,
Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al or
, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl
Csle), Cale, Caled, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB),
Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6,
Cmrl , Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX,
Csx3, Csxl, CsxlS, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions
Any Cas n that induces a nick or double-strand break into a desired
recognition site can be used in the methods and compositions disclosed herein. A naturally-
occurring or native Cas protein can be employed so long as the Cas protein induces double-
strand break at a d recognition site. Alternatively, a modified or engineered Cas protein
can be employed. An “engineered Cas protein” comprises a Cas protein that is engineered
(modified or derived) from its native form to specifically recognize and induce a nick or
double-strand break in the desired recognition site. Thus, an ered Cas protein can be
derived from a , naturally-occurring Cas protein or it can be cially created or
synthesized.
In particular embodiments, the Cas protein is Cas9. Cas9 proteins typically
share four key motifs with a conserved architecture. Motifs l, 2, and 4 are Rqu-like motifs,
and motif 3 is an HNH motif. The nuclease activity of Cas9 cleaves target DNA to produce
double strand breaks. These breaks can then be repaired by the cell in one of two ways: non-
gous end joining and gy-directed repair (homologous recombination). In non-
homologous end joining (NHEJ), the double-strand breaks are repaired by direct ligation of
the break ends to one another. As such, no new nucleic acid material is inserted into the site,
although some c acid material may be lost, resulting in a on. In homology-
ed repair, a donor polynucleotide with homology to the cleaved target DNA sequence
can be used as a template for repair of the cleaved target DNA sequence, resulting in the
transfer of genetic information from the donor polynucleotide to the target DNA. Therefore,
new nucleic acid material may be inserted/copied into the site. The modifications of the target
DNA due to NHEJ and/or gy-directed repair can be used for gene correction, gene
replacement, gene tagging, transgene insertion, tide deletion, gene disruption, gene
mutation, etc.
Cas proteins can be from a type II CRISPR/Cas system. For example, the Cas
protein can be a Cas9 protein or be d from a Cas9 protein. Cas9 proteins typically
share four key motifs with a conserved architecture. Motifs l, 2, and 4 are Rqu-like motifs,
and motif 3 is an HNH motif. The Cas9 protein can be from, for example, Streptococcus
pyogenes, Streptococcus thermophilus, Streptococcus sp., lococcus aureus,
Nocardiopsis dassonvillei, Streptomyces pristinaespiralis, Streptomyces viridochromogenes,
Streptomyces viridochromogenes, Streptosporangium , Streptosporangium roseum,
AlicyclobacHlus acidocaldarius, us pseudomycoides, Bacillus selenitireducens,
Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla
marina, Burkholderiales bacterium, Polaromonas naphthalenivorans, Polaromonas sp.,
Crocosphaera watsonii, Cyanothece sp., ystis aeruginosa, Synechococcus sp.,
Acetohalobium arabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, Candidatus
Desulforudis, idium botulinum, Clostridium diflicile, Finegoldia magna,
Natranaerobius thermophilus, Pelotomaculum propionicum, Acidithiobacillus caldus,
Acidithiobacillusferrooxidans, Allochromatium vinosum, Marinobacter sp., ococcus
halophilus, Nitrosococcus watsoni, Pseudoalteromonas haloplanktis, Ktedonobacter
racemifer, Methanohalobium evestigatum, Anabaena variabilis, Nodularia spumigena,
Nostoc sp., Arthrospira maxima, spira platensis, Arthrospira sp., a sp.,
Microcoleus chthonoplastes, Oscillatoria sp., Petrotoga mobilis, Thermosipho africanus, or
Acaryochloris marina. onal examples of the Cas9 family members are described in
WC 2014/131833, herein incorporated by reference in its ty. Cas9 protein from S.
pyogenes or derived rom is a preferred enzyme. Cas9 protein from S. pyogenes is
assigned SwissProt accession number Q99ZW2.
Cas proteins can be wild type proteins (i.e., those that occur in nature),
modified Cas ns (i.e., Cas protein variants), or fragments of wild type or modified Cas
proteins. Cas proteins can also be active variants or fragments of wild type or modified Cas
proteins. Active variants or fragments can comprise at least 80%, 85%, 90%, 91%, 92%,
93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the wild type or
modified Cas protein or a portion thereof, wherein the active variants retain the ability to cut
at a desired cleavage site and hence retain nick-inducing or double-strand-break—inducing
activity. Assays for nducing or double-strand-break-inducing activity are known and
generally measure the overall activity and icity of the Cas protein on DNA substrates
containing the cleavage site.
Cas proteins can be modified to increase or decrease nucleic acid binding
affinity, nucleic acid g specificity, and/or enzymatic activity. Cas proteins can also be
modified to change any other activity or property of the protein, such as stability. For
example, one or more nuclease s of the Cas protein can be modified, deleted, or
inactivated, or a Cas protein can be truncated to remove domains that are not essential for the
function of the n or to optimize (e. g., enhance or reduce) the activity of the Cas protein.
Some Cas proteins comprise at least two nuclease domains, such as DNase
domains. For example, a Cas9 protein can comprise a Rqu-like nuclease domain and an
HNH-like nuclease domain. The Rqu and HNH domains can each cut a different strand of
double-stranded DNA to make a -stranded break in the DNA. See, e. g., Jinek et al.
(2012) Science 337:816-821, hereby incorporated by reference in its entirety.
One or both of the nuclease domains can be deleted or mutated so that they are
no longer onal or have reduced nuclease ty. If one of the nuclease domains is
deleted or mutated, the resulting Cas protein (e. g., Cas9) can be referred to as a nickase and
can generate a single-strand break at a CRISPR RNA recognition sequence within a double-
stranded DNA but not a -strand break (i.e., it can cleave the complementary strand or
the non-complementary strand, but not both). If both of the nuclease domains are deleted or
mutated, the resulting Cas protein (e. g., Cas9) will have a reduced ability to cleave both
strands of a double-stranded DNA. An example of a on that ts Cas9 into a
nickase is a DlOA (aspartate to alanine at position 10 of Cas9) mutation in the Rqu domain
of Cas9 from S. pyogenes. Likewise, H939A (histidine to e at amino acid on 839)
or H840A (histidine to alanine at amino acid on 840) in the HNH domain of Cas9 from
S. pyogenes can convert the Cas9 into a nickase. Other examples of mutations that convert
Cas9 into a nickase include the corresponding mutations to Cas9 from S. thermophilus. See,
e.g., Sapranauskas et a1. (2011) Nucleic Acids Research 39:9275-9282 and WC 2013/141680,
each of which is herein incorporated by reference in its entirety. Such mutations can be
generated using s such as site-directed mutagenesis, PCR-mediated mutagenesis, or
total gene synthesis. es of other mutations creating nickases can be found, for
example, in WO/2013/l76772Al and WO/2013/142578Al, each of which is herein
incorporated by reference.
Cas proteins can also be fusion proteins. For example, a Cas protein can be
fused to a cleavage domain, an etic modification domain, a transcriptional activation
domain, or a transcriptional repressor domain. See WO 89290, incorporated herein by
reference in its entirety. Cas proteins can also be fused to a logous polypeptide
WO 00334 2015/037199
providing increased or decreased stability. The fused domain or heterologous polypeptide
can be d at the N-terminus, the C-terminus, or ally within the Cas protein.
A Cas protein can be fused to a heterologous polypeptide that provides for
subcellular localization. Such heterologous peptides include, for example, a nuclear
localization signal (NLS) such as the SV40 NLS for targeting to the nucleus, a mitochondrial
localization signal for targeting to the mitochondria, an ER ion signal, and the like.
See, e. g., Lange et a1. (2007) J. Biol. Chem. 282:5101-5105. Such subcellular localization
signals can be located at the N-terminus, the C-terminus, or anywhere within the Cas protein.
An NLS can comprise a stretch of basic amino acids, and can be a monopartite sequence or a
bipartite sequence.
Cas ns can also be linked to a cell-penetrating domain. For example, the
cell-penetrating domain can be derived from the HIV-1 TAT protein, the TLM cell-
penetrating motif from human hepatitis B Virus, MPG, Pep-l, VP22, a cell penetrating
peptide from Herpes simplex Virus, or a polyarginine peptide sequence. See, for example,
, herein incorporated by reference in its ty. The cell-penetrating
domain can be located at the N-terminus, the C-terminus, or anywhere within the Cas protein.
Cas proteins can also comprise a heterologous polypeptide for ease of tracking
or purification, such as a fluorescent protein, a purification tag, or an epitope tag. Examples
of fluorescent proteins include green cent proteins (e. g., GFP, GFP-2, tagGFP,
turboGFP, eGFP, Emerald, Azami Green, Monomeric Azami Green, CopGFP, AceGFP,
nl), yellow fluorescent proteins (e. g., YFP, eYFP, Citrine, Venus, YPet, PhiYFP,
ZsYellowl), blue fluorescent proteins (e.g. eBFP, eBFPZ, Azurite, mKalamal, GFPuV,
Sapphire, T-sapphire), cyan fluorescent ns (e.g. eCFP, Cerulean, CyPet, AmCyanl,
Midoriishi-Cyan), red fluorescent proteins , mKate2, mPlum, DsRed monomer,
mCherry, mRFPl, DsRed-Express, DsRed2, DsRed-Monomer, HcRed-Tandem, HcRedl,
AsRed2, eqFP6ll, mRaspberry, mStrawberry, Jred), orange fluorescent proteins (mOrange,
mKO, Kusabira-Orange, Monomeric Kusabira-Orange, mTangerine, thomato), and any
other suitable fluorescent protein. es of tags include hione-S-transferase (GST),
chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP),
tandem affinity purification (TAP) tag, myc, AcV5, AUl
, AU5, E, ECS, E2, FLAG,
hemagglutinin (HA), nus, Softag l, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, Sl , T7,
V5, VSV-G, ine (His), biotin carboxyl r protein (BCCP), and calmodulin.
In some embodiments, the Cas protein can be modified such that the resulting
nuclease actiVity is altered. n mutations in Cas can reduce the ability of the nuclease to
cleave both the complementary and the non-complementary strands of the target DNA. For
example, Cas ns can be mutated in known positions such that nuclease activity is
limited to cleavage of either the complementary strand or the non-complementary strand.
Specifically, Cas9 haVing a D10A (aspartate to alanine at amino acid position 10 of Cas9)
on can cleave the complementary strand of the target DNA but has reduced ability to
cleave the non-complementary strand of the target DNA. In some embodiments, Cas9 haVing
a H840A (histidine to alanine at amino acid position 840) mutation can cleave the non-
complementary strand of the target DNA but has reduced ability to cleave the complementary
strand of the target DNA. The nuclease actiVity of Cas9 haVing either a D10A or H840A
mutation would result in a single strand break (SSB) d of a DSB. Other residues can be
d to e the same effect (i.e. inactivate one or the other nuclease portions). As non-
limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984,
D986, and/or A987 (i.e., substituted). Further, substitute amino acids other than alanine can
be suitable. In some embodiments when a nuclease has reduced actiVity (e.g., when a Cas9
protein has a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a
A987 mutation, such as D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A,
H983A, A984A, and/or D986A), the nuclease can still bind to target DNA in a site-specific
manner because it is still guided to a target DNA sequence by a gRNA) as long as it s
the ability to interact with the gRNA.
In some embodiments, Cas is altered such that the nuclease does not cleave
either the complementary or non-complementary strand of target DNA. For example, Cas9
with both the D10A and the H840A mutations has a reduced ability to cleave both the
complementary and the mplementary strands of the target DNA. Other residues can be
mutated to achieve the same effect (i.e., inactivate one or the other nuclease portions). As
non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983,
A984, D986, and/or can be substituted in order to substantially eliminate nuclease actiVity.
Further, ons other than alanine substitutions can be suitable.
The terms "target site" or "target sequence" can be used interchangeably and
include nucleic acid sequences present in a target DNA to which a DNA-targeting segment of
a gRNA will bind, provided sufficient conditions for binding exist. For e, the target
site (or target sequence) within a target DNA is targeted by (or is bound by, or hybridizes
with, or is mentary to) the Cas protein or gRNA. Suitable DNA/RNA binding
conditions include logical conditions normally present in a cell. Other suitable
DNA/RNA g ions (e. g., conditions in a cell-free system) are known in the art
(see, e.g., Molecular Cloning: A tory Manual, 3rd Ed. (Sambrook et al., Harbor
Laboratory Press . The strand of the target DNA that is complementary to and
izes with the Cas protein or gRNA is referred to as the "complementary " and the
strand of the target DNA that is complementary to the "complementary strand" (and is
therefore not complementary to the Cas protein or gRNA) is referred to as the
"noncomplementary strand" or "template strand".
The Cas protein may cleave the nucleic acid at a site within the target
sequence or outside of the target sequence. The “cleavage site” includes the position of a
nucleic acid wherein a Cas protein produces a single-strand break or a double-strand break. If
the Cas n produces a double-strand break, the cleavage site can be at the same position
on both s of the nucleic acid (producing blunt ends) or can be at different sites on each
strand (producing sticky or cohesive ends). Sticky ends can also be produced by using two
Cas ns which produce a -strand break at cleavage sites on each strand. Site-
specific cleavage of target DNA by Cas9 can occur at locations determined by both (i) base-
pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif,
referred to as the protospacer adjacent motif (PAM), in the target DNA. For example, the
cleavage site of Cas9 can be about 1 to about 10 or about 2 to about 5 base pairs (e.g., 3 base
pairs) upstream of the PAM sequence. In some embodiments (e. g., when Cas9 from S.
pyogenes, or a closely related Cas9, is used), the PAM sequence of the non-complementary
strand can be 5'—XGG-3’, where X is any DNA nucleotide and X is immediately 3' of the
target sequence of the non-complementary strand of the target DNA. As such, the PAM
sequence of the complementary strand would be 5'-CCY-3', where Y is any DNA tide
and Y is immediately 5' of the target sequence of the complementary strand of the target
DNA. In some such embodiments, X and Y can be complementary and the X-Y base pair can
be any basepair (e. g., X=C and Y=G; X=G and Y=C; X=A and Y=T, X=T and Y=A).
Cas proteins can be provided in any form. For example, a Cas protein can be
provided in the form of a protein, such as a Cas protein complexed with a gRNA.
Alternatively, a Cas n can be provided in the form of a nucleic acid encoding the Cas
protein, such as an RNA (e. g., messenger RNA (mRNA)) or DNA. Optionally, the c
acid encoding the Cas n can be codon optimized for efficient translation into protein in
a particular cell or organism. For example, the nucleic acid encoding the Cas protein can be
modified to substitute codons having a higher frequency of usage in a bacterial cell, a yeast
cell, a human cell, a non-human cell, a mammalian cell, a rodent cell, a mouse cell, a rat cell,
or any other host cell of interest, as compared to the naturally occurring polynucleotide
sequence. When a nucleic acid encoding the Cas protein is introduced into the cell, the Cas
n can be transiently, conditionally, or constitutively expressed in the cell.
Nucleic acids encoding Cas proteins can be stably integrated in the genome of
the cell and operably linked to a promoter active in the cell. atively, nucleic acids
encoding Cas proteins can be operably linked to a promoter in an expression uct.
Expression ucts include any nucleic acid constructs capable of directing expression of a
gene or other nucleic acid sequence of interest (e.g., a Cas gene) and which can transfer such
a nucleic acid sequence of interest to a target cell. For example, the nucleic acid encoding the
Cas protein can be in the targeting vector sing the nucleic acid insert and/or a vector
sing the DNA encoding the gRNA, or it can be in a vector or a plasmid that is separate
from the targeting vector comprising the nucleic acid insert and/or separate from a vector
comprising the DNA encoding the gRNA. Promoters that can be used in an expression
construct e, for example, promoters active in a pluripotent rat, eukaryotic, mammalian,
non-human mammalian, human, rodent, mouse, or hamster cell. Such promoters can be, for
example, conditional promoters, inducible ers, constitutive promoters, or tissue-
specific promoters. Examples of other promoters are described elsewhere herein.
B. Guide RNAs (gRNAs)
A "guide RNA" or "gRNA" includes a RNA molecule that binds to a Cas
protein and targets the Cas protein to a specific on within a target DNA. Guide RNAs
(gRNA) can comprise two segments, a "DNA-targeting segment" and a "protein-binding
segment." "Segment" includes a t, section, or region of a le, such as a
contiguous stretch of nucleotides in an RNA. Some gRNAs comprise two separate RNA
molecules: an ator-RNA" and a "targeter-RNA”. Other gRNAs are a single RNA
molecule (single RNA polynucleotide), which can also be called a e-molecule gRNA,"
a "single-guide RNA," or an "ngNA." See, e.g., WO/2013/l76772Al, WO/2014/065596Al,
WO/2014/089290Al, WO/2014/093622A2, WO/2014/099750A2, WO/2013l42578Al, and
WC 2014/131833Al, each of which is herein incorporated by reference. The terms “guide
RNA” and "gRNA" include both double-molecule gRNAs and single-molecule gRNAs.
An exemplary two-molecule gRNA comprises a chNA-like ("CRISPR RNA"
or "targeter-RNA" or "chNA" or "chNA ") molecule and a corresponding trachNA-
like ("trans-acting CRISPR RNA" or "activator-RNA" or "trachNA" or old”)
le. A chNA comprises both the DNA-targeting segment (single-stranded) of the
gRNA and a stretch of nucleotides that forms one half of the dsRNA duplex of the protein-
binding segment of the gRNA. A corresponding trachNA (activator-RNA) comprises a
stretch of nucleotides that forms the other half of the dsRNA duplex of the protein-binding
segment of the gRNA. A stretch of nucleotides of a chNA are complementary to and
hybridize with a stretch of nucleotides of a trachNA to form the dsRNA duplex of the
protein-binding domain of the gRNA. As such, each chNA can be said to have a
corresponding trachNA. The chNA additionally provides the single stranded DNA-
targeting segment. Accordingly, a gRNA comprises a sequence that hybridizes to a target
sequence, and a trachNA.
The chNA and the ponding trachNA (as a ponding pair)
hybridize to form a gRNA. The chNA additionally provides the single-stranded DNA-
targeting segment that hybridizes to a CRISPR RNA recognition sequence. If used for
modification within a cell, the exact sequence of a given chNA or trachNA molecule can
be designed to be ic to the species in which the RNA molecules will be used. See, for
example, Mali P et al. (2013) Science 2013 Feb 15;339(6121):823-6; Jinek M et al. Science
2012 Aug 17;337(6096):816-21; Hwang WY et al. Nat Biotechnol 2013 Mar;31(3):227-9;
Jiang W et al. Nat hnol 2013 Mar;31(3):233-9; and, Cong L et al. Science 2013 Feb
;339(6121):819-23, each of which is herein incorporated by reference.
The DNA-targeting segment (chNA) of a given gRNA comprises a
nucleotide sequence that is complementary to a sequence in a target DNA. The DNA-
targeting segment of a gRNA interacts with a target DNA in a sequence-specific manner via
hybridization (i.e., base pairing). As such, the nucleotide sequence of the DNA-targeting
segment may vary and determines the location within the target DNA with which the gRNA
and the target DNA will interact. The DNA-targeting segment of a subject gRNA can be
modified to hybridize to any desired sequence within a target DNA. lly ing
chNAs differ ing on the Cas9 system and organism but often contain a targeting
segment of between 21 to 72 tides , flanked by two direct repeats (DR) of a
length of between 21 to 46 nucleotides (see, e. g., WO2014/131833). In the case of S.
pyogenes, the DRs are 36 nucleotides long and the targeting segment is 30 nucleotides long.
The 3’ located DR is complementary to and hybridizes with the ponding A,
which in turn binds to the Cas9 protein.
The DNA-targeting segment can have a length of from about 12 nucleotides to
about 100 nucleotides. For example, the DNA-targeting segment can have a length of from
about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to
about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12
nt to about 20 nt, or from about 12 nt to about 19 nt. Alternatively, the DNA-targeting
segment can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt,
from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40
nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about
60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to
about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20
nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about
nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from
about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or
from about 20 nt to about 100 nt.
The nucleotide ce of the DNA-targeting segment that is complementary
to a nucleotide sequence (CRISPR RNA recognition sequence) of the target DNA can have a
length at least about 12 nt. For example, the DNA-targeting sequence (e.g., the sequence
within the DNA-targeting segment that is complementary to a CRISPR RNA recognition
sequence within the target DNA) can have a length at least about 12 nt, at least about 15 nt, at
least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about
nt, at least about 35 nt, or at least about 40 nt. Alternatively, the DNA-targeting sequence
of the DNA-targeting segment that is complementary to a target ce of the target DNA
can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about
50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to
about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12
nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about
19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from
about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt,
from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30
nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about
45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide
sequence (the DNA-targeting sequence) of the DNA-targeting t that is complementary
to a nucleotide ce (target sequence) of the target DNA can have a length at least about
12 nt. In some cases, the DNA-targeting sequence can have a length of at least about 20 nt.
TrachNAs can be in any form (e.g., ength trachNAs or active partial
trachNAs) and of varying lengths. They can include primary transcripts or processed forms.
For example, trachNAs (as part of a single-guide RNA or as a separate molecule as part of a
two-molecule gRNA) may comprise or consist of all or a portion of a ype trachNA
sequence (e.g., about or more than about 20, 26, 32, 45, 48, 54, 63, 67, 85, or more
nucleotides of a wild-type trachNA sequence). Examples of wild-type trachNA sequences
from S. pyogenes include 171-nucleotide, 89-nucleotide, 75-nucleotide, and leotide
versions. See, for example, Deltcheva et a1. (2011) Nature 471 :602-607; ,
each of which is incorporated herein by reference in their ty. Examples of trachNAs
within single-guide RNAs (ngNAs) include the trachNA segments found within +48, +54,
+67, and +85 versions of ngNAs, where “+n” indicates that up to the +n nucleotide of wild-
type trachNA is included in the ngNA. See US 8,697,359, orated herein by
reference in its entirety.
The percent complementarity between the DNA-targeting sequence and the
CRISPR RNA recognition sequence within the target DNA can be at least 60% (e.g., at least
65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least
97%, at least 98%, at least 99%, or 100%). The percent complementarity between the DNA-
targeting sequence and the CRISPR RNA recognition sequence within the target DNA is
100% over the seven contiguous 5'-most nucleotides of the target sequence of the
complementary strand of the target DNA. In certain ments, the percent
complementarity between the DNA-targeting sequence and the CRISPR RNA recognition
sequence within the target DNA can be at least 60% over about 20 contiguous nucleotides.
As an example, the percent complementarity between the DNA-targeting sequence and the
CRISPR RNA recognition ce within the target DNA is 100% over the fourteen
contiguous nucleotides at the 5'-most end of the CRISPR RNA recognition sequence within
the mentary strand of the target DNA and as low as 0% over the remainder. In such a
case, the DNA-targeting sequence can be considered to be 14 nucleotides in length. As
another example, the t complementarity between the DNA-targeting sequence and the
CRISPR RNA recognition sequence within the target DNA is 100% over the seven
contiguous nucleotides at the 5'-most end of the CRISPR RNA recognition sequence within
the complementary strand of the target DNA and as low as 0% over the remainder. In such a
case, the DNA-targeting sequence can be ered to be 7 nucleotides in length.
Complementarity of nucleic acids means that a nucleotide sequence in one
strand of nucleic acid, due to orientation of its nucleobase groups, en bonds to another
sequence on an opposing c acid strand. The complementary bases typically are, in
DNA: A with T and C with G, and, in RNA: C with G, and U with A. mentarity can
be perfect or substantial/sufficient. Perfect complementarity between two c acids means
that the two nucleic acids can form a duplex in which every base in the duplex is bonded to a
complementary base by Watson-Crick g. "Substantial" or "sufficient" complementary
means that a sequence in one strand is not completely and/or perfectly complementary to a
sequence in an opposing strand, but that sufficient bonding occurs between bases on the two
strands to form a stable hybrid complex in set of hybridization conditions (e. g., salt
concentration and temperature). Such conditions can be ted by using the sequences and
standard mathematical ations to t the Tm of hybridized strands, or by cal
determination of Tm by using routine methods. Tm refers to the temperature at which a
population of ization complexes formed between two nucleic acid strands are 50%
denatured. At a temperature below the Tm, formation of a hybridization x is favored,
whereas at a temperature above the Tm, melting or separation of the strands in the
hybridization complex is d. Tm may be estimated for a c acid having a known
G+C content in an aqueous l M NaCl solution by using, e. g., Tm=81.5+0.4l(% G+C),
gh other known Tm ations take into account nucleic acid structural
characteristics.
"Hybridization condition" refers to the cumulative environment in which one
nucleic acid strand bonds to a second nucleic acid strand by complementary strand
interactions and en bonding to produce a hybridization complex. Such ions
include the chemical components and their concentrations (e. g., salts, chelating agents,
formamide) of an aqueous or organic solution containing the nucleic acids, and the
temperature of the e. Other factors, such as the length of incubation time or on
chamber dimensions may contribute to the environment (e.g., Sambrook et al., Molecular
Cloning, A Laboratory Manual, 2.sup.nd ed., pp. 1.90-1.91, 9.47-9.51, l 1.47-1 1.57 (Cold
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989)).
Hybridization requires that the two nucleic acids contain complementary
sequences, although mismatches between bases are possible. The conditions appropriate for
hybridization between two nucleic acids depend on the length of the c acids and the
degree of complementation, variables well known in the art. The greater the degree of
complementation between two nucleotide sequences, the greater the value of the melting
temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations
between nucleic acids with short stretches of complementarity (e.g. complementarity over 35
or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the on of
mismatches becomes important (see Sambrook et al., supra, 11.7-11.8). Typically, the length
for a hybridizable nucleic acid is at least about 10 nucleotides. Illustrative minimum lengths
for a hybridizable nucleic acid are: at least about 15 nucleotides; at least about 20
nucleotides; at least about 22 nucleotides; at least about 25 nucleotides; and at least about 30
nucleotides). Furthermore, the ature and wash solution salt concentration may be
adjusted as ary according to factors such as length of the region of complementation
and the degree of complementation.
The sequence of polynucleotide need not be 100% mentary to that of
its target nucleic acid to be specifically izable. Moreover, a polynucleotide may
hybridize over one or more segments such that intervening or adjacent segments are not
involved in the hybridization event (e. g., a loop structure or hairpin structure). A
polynucleotide (e. g., gRNA) can comprise at least 70%, at least 80%, at least 90%, at least
95%, at least 99%, or 100% sequence complementarity to a target region within the target
nucleic acid sequence to which they are ed. For example, an gRNA in which 18 of 20
nucleotides of the gRNA are complementary to a target region, and would therefore
specifically hybridize, would represent 90 percent complementarity. In this example, the
remaining noncomplementary tides may be clustered or interspersed with
complementary nucleotides and need not be contiguous to each other or to complementary
nucleotides. Percent complementarity between particular stretches of nucleic acid sequences
within nucleic acids can be determined routinely using BLAST programs (basic local
alignment search tools) and PowerBLAST ms known in the art (Altschul et al., J. Mol.
Biol., 1990, 215, 0; Zhang and Madden, Genome Res., 1997, 7, 6) or by using
the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics
Computer Group, University Research Park, Madison Wis.), using default settings, which
uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
The protein-binding segment of a subject gRNA interacts with a Cas protein.
The t gRNA directs the bound polypeptide to a specific nucleotide sequence within
target DNA via the DNA-targeting segment. The protein-binding segment of a subject gRNA
can comprise two stretches of tides that are complementary to one another. The
mentary nucleotides of the protein-binding segment hybridize to form a -
ed RNA duplex (dsRNA). The protein-binding segment of a subject gRNA interacts
with the Cas protein, and the gRNA directs the bound Cas protein to a specific nucleotide
sequence within the target DNA via the DNA-targeting segment.
In certain embodiments, a gRNA as described herein comprises two te
RNA molecules. Each of the two RNA molecules of a subject gRNA comprises a stretch of
nucleotides that are complementary to one another such that the complementary nucleotides
of the two RNA molecules hybridize to form the double stranded RNA duplex (e.g., hairpin)
of the protein-binding segment. A subject gRNA can comprise any corresponding chNA and
WO 00334
trachNA pair. In the methods described herein, the gRNA can be used as a x (e.g.
gRNA-Cas complex) of chNA and A or the chNA and corresponding trachNA
can be delivered separately. For e, if multiple gRNAs are used for cleavage reaction,
individual chNAs specific for each target site can be delivered separately from a standard
trachNA that can complex with each chNA. In such a , the chNAs can complex
with the standard A in order to direct a Cas protein to the target site.
Guide RNAs can include modifications or sequences that provides for
additional desirable features (e. g., ed or regulated stability; subcellular targeting;
tracking, with a fluorescent label; a binding site for a protein or protein x; and the
like). Non-limiting examples of such modifications include, for example, a 5' cap (e. g., a 7-
methylguanylate cap (m7G)); a 3' polyadenylated tail (i.e., a 3' poly(A) tail); a riboswitch
sequence (e. g., to allow for regulated stability and/or regulated accessibility by proteins
and/or protein complexes); a stability l sequence; a sequence that forms a dsRNA
duplex (i.e., a hairpin)); a modification or sequence that s the RNA to a subcellular
location (e. g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence
that provides for tracking (e. g., direct conjugation to a fluorescent molecule, conjugation to a
moiety that facilitates cent detection, a sequence that allows for fluorescent detection,
and so ; a modification or sequence that provides a binding site for proteins (e. g.,
proteins that act on DNA, including transcriptional tors, transcriptional repressors,
DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone
deacetylases, and the like); and combinations thereof.
Guide RNAs can be provided in any form. For example, the gRNA can be
provided in the form of RNA, either as two molecules (separate chNA and trachNA) or as
one molecule (ngNA), and optionally in the form of a complex with a Cas protein. The
gRNA can also be provided in the form of DNA encoding the RNA. The DNA encoding the
gRNA can encode a single RNA molecule (ngNA) or separate RNA molecules (e.g.,
separate chNA and trachNA). In the latter case, the DNA encoding the gRNA can be
provided as separate DNA molecules encoding the chNA and trachNA, respectively.
DNAs encoding gRNAs can be stably integrated in the genome of the cell and
operably linked to a promoter active in the cell. atively, DNAs encoding gRNAs can be
operably linked to a promoter in an expression construct. For example, the DNA encoding
the gRNA can be in the targeting vector comprising the c acid insert and/or a vector
comprising the nucleic acid encoding the Cas protein, or it can be in a vector or a plasmid
that is separate from the targeting vector comprising the nucleic acid insert and/or separate
from a vector comprising the nucleic acid encoding the Cas protein. Such promoters can be
active, for example, in a pluripotent rat, eukaryotic, mammalian, non-human mammalian,
human, rodent, mouse, or hamster cell. Such promoters can be, for example, conditional
promoters, inducible promoters, constitutive promoters, or -specific promoters. In some
instances, the promoter is an RNA polymerase III promoter, such as a human U6 promoter, a
rat U6 polymerase III promoter, or a mouse U6 polymerase III er. es of other
promoters are described elsewhere herein. When a DNA encoding a gRNA is introduced into
the cell, the gRNA can be transiently, conditionally, or constitutively expressed in the cell.
Alternatively, gRNAs can be prepared by various other methods. For
example, gRNAs can be prepared by in vitro transcription using, for example, T7 RNA
polymerase (see, for example, and ). Guide RNAs can
also be a tically produced molecule prepared by chemical synthesis.
C. CRISPR RNA ition ces
The term R RNA recognition sequence" includes nucleic acid
ces t in a target DNA to which a DNA-targeting segment of a gRNA will bind,
provided sufficient ions for binding exist. For e, CRISPR RNA recognition
sequences include sequences to which a guide RNA is designed to have complementarity,
where hybridization between a CRISPR RNA recognition sequence and a DNA targeting
sequence promotes the ion of a CRISPR complex. Full complementarity is not
necessarily required, provided there is sufficient complementarity to cause hybridization and
promote formation of a CRISPR x. CRISPR RNA recognition sequences also include
cleavage sites for Cas proteins, described in more detail below. A CRISPR RNA recognition
ce can comprise any polynucleotide, which can be located, for example, in the nucleus
or cytoplasm of a cell or within an organelle of a cell, such as a mitochondrion or chloroplast.
] The CRISPR RNA recognition ce within a target DNA can be targeted
by (i.e., be bound by, or hybridize with, or be complementary to) a Cas protein or a gRNA.
Suitable DNA/RNA binding conditions include physiological conditions normally present in
a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system)
are known in the art (see, e. g., Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook
et al., Harbor Laboratory Press 2001)). The strand of the target DNA that is complementary
to and hybridizes with the Cas protein or gRNA can be called the "complementary strand,"
and the strand of the target DNA that is complementary to the "complementary strand" (and
WO 00334
is therefore not complementary to the Cas protein or gRNA) can be called
"noncomplementary strand" or "template strand.”
The Cas protein can cleave the c acid at a site within or outside of the
nucleic acid sequence present in the target DNA to which the rgeting segment of a
gRNA will bind. The “cleavage site” includes the position of a nucleic acid at which a Cas
protein es a single-strand break or a double-strand break. For example, formation of a
CRISPR complex (comprising a gRNA hybridized to a CRISPR RNA recognition sequence
and complexed with a Cas protein) can result in cleavage of one or both strands in or near
(e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the nucleic acid
sequence present in a target DNA to which a DNA-targeting segment of a gRNA will bind.
If the cleavage site is outside of the nucleic acid sequence to which the DNA-targeting
segment of the gRNA will bind, the cleavage site is still considered to be within the “CRISPR
RNA recognition sequence.” The cleavage site can be on only one strand or on both strands
of a nucleic acid. Cleavage sites can be at the same position on both strands of the c
acid (producing blunt ends) or can be at ent sites on each strand (producing staggered
ends). Staggered ends can be produced, for example, by using two Cas proteins, each of
which produces a single-strand break at a different ge site on each strand, thereby
producing a double-strand break. For example, a first nickase can create a single-strand
break on the first strand of double-stranded DNA (dsDNA), and a second nickase can create a
single-strand break on the second strand of dsDNA such that overhanging sequences are
created. In some cases, the CRISPR RNA recognition sequence of the nickase on the first
strand is separated from the CRISPR RNA recognition sequence of the nickase on the second
strand by at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 250, 500, or 1,000
base pairs.
Site-specific cleavage of target DNA by Cas9 can occur at locations
determined by both (i) base-pairing complementarity n the gRNA and the target DNA
and (ii) a short motif, called the pacer nt motif (PAM), in the target DNA. The
PAM can flank the CRISPR RNA recognition sequence. Optionally, the CRISPR RNA
recognition sequence can be flanked by the PAM. For example, the cleavage site of Cas9 can
be about 1 to about 10 or about 2 to about 5 base pairs (e. g., 3 base pairs) upstream or
downstream of the PAM sequence. In some cases (e. g., when Cas9 from S. pyogenes or a
closely d Cas9 is used), the PAM sequence of the non-complementary strand can be 5'-
N1GG-3', where N1is any DNA nucleotide and is immediately 3' of the CRISPR RNA
recognition sequence of the non-complementary strand of the target DNA. As such, the PAM
sequence of the complementary strand would be 5'—CC N2-3', where N2 is any DNA
nucleotide and is immediately 5' of the CRISPR RNA recognition ce of the
complementary strand of the target DNA. In some such cases, N1 and N2 can be
complementary and the N1- N2 base pair can be any base pair (e. g., N1=C and N2=G; N1=G
and N2=C; N1=A and N2=T, N1=T, and N2=A).
Examples of CRISPR RNA recognition sequences include a DNA sequence
complementary to the DNA-targeting segment of a gRNA, or such a DNA sequence in
addition to a PAM sequence. For example, the target motif can be a 20-nucleotide DNA
sequence immediately preceding an NGG motif recognized by a Cas protein, such as
GN19NGG (SEQ ID NO: 8) or N20NGG (SEQ ID NO: 24) (see, for example, WC
65825). The guanine at the 5’ end can facilitate transcription by RNA polymerase in
cells. Other examples of CRISPR RNA recognition sequences can include two guanine
nucleotides at the 5’ end (e. g., GG; SEQ ID NO: 25) to facilitate efficient
transcription by T7 polymerase in vitro. See, for example, . Other CRISPR
RNA recognition sequences can have between 4-22 nucleotides in length of SEQ ID NOS: 8,
24, and 25, including the 5’ G or GG and the 3’ GG or NGG. Yet other CRISPR RNA
recognition sequences can have between 14 and 20 tides in length of SEQ ID NOS: 8,
24, and 25.
The CRISPR RNA recognition sequence can be any nucleic acid ce
endogenous or exogenous to a cell. The CRISPR RNA recognition sequence can be a
sequence coding a gene product (e. g., a protein) or a non-coding sequence (e. g., a regulatory
sequence) or can include both.
In one embodiment, the Cas protein is a type I Cas protein. In one
embodiment, the Cas n is a type II Cas protein. In one embodiment, the type II Cas
protein is Cas9. In one embodiment, the first nucleic acid sequence encodes a human codonoptimized
Cas protein.
] In one ment, the gRNA comprises a nucleic acid sequence encoding a
chNA and a trachNA. In specific embodiments, the Cas protein is Cas9. In some
embodiments, the gRNA comprises (a) the chimeric RNA of the c acid sequence 5’-
GUUUUAGAGCUAGAAAUAGCAAGUUAAAAU
AAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3’
(SEQ ID NO: 1); or, (b) the chimeric RNA of the c acid ce 5’-
GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCG-3’ (SEQ ID NO:
2). In another embodiment, the chNA comprises 5’-
GUUUUAGAGCUAGAAAUAGCAAGUUAAAAU-3’ (SEQ ID NO: 3); 5’-
GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAG (SEQ ID NO: 4); or 5’-
GAGUCCGAGCAGAAGAAGAAGUUUUA-3’ (SEQ ID NO: 5). In yet other embodiments,
the trachNA comprises, GCUAGUCCG-3’ (SEQ ID NO: 6) or 5’-
AAGGCUAGUCCGU UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU-3’
(SEQ ID NO: 7).
V. Assembly of Polynucleotides
The s sed herein can assemble at least two nucleic acids under
conditions effective to join the DNA molecules to form a ntially intact or seamless
double-stranded DNA molecule. Any c acids of interest having overlapping sequences
can be assembled according to the methods disclosed herein. For example, any DNA
molecules of interest having overlapping sequences can be assembled, including DNAs
which are naturally occurring, cloned DNA molecules, synthetically generated DNAs, etc.
The joined DNA molecules may, if desired, be cloned (e.g., inserted) into a vector using a
method of the invention. Assembling two nucleic acids includes any method of joining
strands of two nucleic acids. For example, assembly includes joining digested nucleic acids
such that strands from each c acid anneal to the other and extension, in which each
strand serves as a template for extension of the other.
] In some embodiments, nucleic acids are assembled with a joiner oligo such
that each c acid is assembled to the joiner oligo instead of being assembled directly
together. Assembly with a joiner oligo can position nucleic acid bases between the nucleic
acids that are being assembled that are not part of the nucleic acids to be assembled, but are
part of the joiner oligo. Thus, nucleic acids can be successfully assembled even if extra bases
remain between the nucleic acids. Alternatively, a joiner oligo can be used for seamless
assembly, wherein no extra bases remain between the nucleic acids to be assembled.
In some embodiments, the nucleic acids can be prepared for assembly by
ge with a Cas protein, a restriction enzyme (restriction endonuclease) (e.g., any of the
various restriction endonucleases ed elsewhere herein), a meganuclease (e. g., any of
the various meganucleases provided elsewhere ), or any ation thereof. For
example, one of the nucleic acids to be assembled can be cleaved with a Cas protein and
another nucleic acid to be assembled can be cleaved with a Cas protein, a restriction enzyme,
a meganuclease, or any combination thereof. Following cleavage with a nuclease, the
digested c acid can be assembled directly to another digested nucleic acid having
overlapping end sequences or assembled to a nucleic acid that has not been digested but has
overlapping end sequences. The digested c acid can also be assembled to another
nucleic acid by using a joiner oligo.
In embodiments employing a nuclease agent (e.g., a Cas protein) to produce
overlapping end sequences between two nucleic acid molecules, rapid combinatorial methods
can be used to assemble the ed nucleic acids. For e, a first and a second nucleic
acid having overlapping ends can be combined with a ligase, exonuclease, DNA polymerase,
and tides and incubated at a nt temperature, such as at 50 OC. Specifically, a T5
exonuclease could be used to remove nucleotides from the 5’ ends of dsDNA producing
complementary overhangs. The complementary single-stranded DNA overhangs can then be
annealed, DNA polymerase used for gap filling, and Taq DNA ligase used to seal the
resulting nicks at 50 OC. Thus, two nucleic acids sharing overlapping end sequences can be
joined into a covalently sealed molecule in a one-step isothermal reaction. See, for example,
Gibson, et al. (2009) Nature Methods 6(5): 343-345, herein orated by nce in the
ty. In some embodiments, proteinase K or phenol/chloroform/isoamylalcohol (PCI)
purification is used to remove the nuclease agent (e. g., Cas protein) from the reaction
mixture. In some embodiments, the nuclease agent (e. g., Cas protein) can be removed from
the reaction mixture by silica gel-based column purification.
In certain embodiments the methods disclosed herein assemble a vector with a
linear polynucleotide. In other embodiments, the methods disclosed herein assemble at least
two vectors, such as two BAC vectors. The term “BAC vector” includes any ial
artificial chromosome. In specific embodiments, the BAC is modified to contain a region
with a nucleotide sequence that overlaps with the nucleotide sequence of region of a linear
nucleic acid or another vector, for example, another BAC.
First and second single ed nucleic acids have overlapping ends when the
respective ends are complementary to one another. First and second double ed nucleic
acids have overlapping ends when a 5’ end of a strand of the first nucleic acid is
complementary to the 3’ end of a strand of the second nucleic acid and vice versa. For
example, for double stranded overlapping end sequences, the strands of one nucleic acid can
have at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least
98%, at least 99%, or 100% ty to a corresponding strand of the other nucleic acid. In
methods disclosed herein, the 5' end of a strand of a dsDNA le to be assembled, shares
overlapping end sequences with the 3’ end of a strand of the other dsDNA molecule. The
term “overlapping end sequences” includes both s of a dsDNA molecule. Thus, one
strand from the overlapping region can hybridize specifically to its complementary strand
when the complementary regions of the overlapping sequences are presented in single-
stranded overhangs from the 5’ and 3’ ends of the two polynucleotides to be assembled. In
some embodiments, an exonuclease is used to remove nucleotides from the 5' or 3' end to
create overhanging end sequences. In some embodiments, the overlapping region of the first
and/or second nucleic acid does not exist on 5' or 3' end until after digestion with a Cas
protein. That is, the overlapping region can be an internal region that is subsequently
converted to an overlapping end sequence ing digestion of the nucleic acid(s)
ning the internal overlapping region with a Cas n. The Cas protein can cleave at a
target site (e. g., cleavage site) within the overlapping region or outside of the overlapping
The length of the overlapping region is preferably of sufficient length such
that the region occurs only once within any of the nucleic acids being assembled. In this
manner, other polynucleotides are prevented from annealing with the end sequences and the
assembly can be ic for the target c acids. The length of the overlapping region can
vary from a minimum of about 10 base pairs (bp) to about 300 bp or more. In general, it is
preferable that the length of the p is less than or equal to about the size of the
polynucleotide to be combined, but not less than about 10 bp and not more that about 1000
bp. For the joining of 2 or 3 polynucleotides, about 20-30 bp overlap may be sufficient. For
more than 10 fragments, a preferred overlap is about 80 bp to about 300 bp. In one
embodiment, the overlapping region is of a length that allows it to be generated readily by
synthetic methods, e. g., about 40 bp. In ic embodiments, the length of the pping
region can be about 20-200 bp. The overlaps can be about 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950 or
1,000 bp in length. In some embodiments, the length of the overlapping region is from 20 —
200 bp. In specific ments of the methods disclosed herein at least two polynucleotides
can be assembled wherein an overlapping region on at least one of the polynucleotides is
generated by t with a nuclease agent (e. g., a gRNA-Cas complex). For e,
endonuclease ion of a first polynucleotide can create sequences that overlap with the
end sequences of a second polynucleotide, wherein the overlapping end sequences are then
assembled.
In the methods disclosed herein, the overlapping sequences can be contacted
with an exonuclease to expose complementary sequences (e.g., complementary single strand
sequences) between the overlapping sequences. The exonuclease digestion is carried out
under conditions that are effective to remove (“chew back”) a sufficient number of
tides to allow for specific annealing of the exposed single-stranded regions of
complementarity. In general, a n of the region of overlap or the entire region of overlap
is chewed back, leaving overhangs which comprise a portion of the region of overlap or the
entire region of overlap. In some methods, the exonuclease digestion may be carried out by a
polymerase in the absence of dNTPs (e.g., T5 DNA polymerase) whereas in other methods,
the exonuclease digestion may be carried out by an exonuclease in the presence of dNTPs
that lacks polymerase activity (e. g., exonuclease III).
Any of a variety of 5' to 3', double-strand specific exodeoxyribonucleases may
be used to chew-back the ends of nucleic acids in the s disclosed herein. The term "5'
exonuclease" is sometimes used herein to refer to a 5' to 3' exodeoxyribonuclease. A "non-
processive" exonuclease, as used herein, is an exonuclease that degrades a limited number of
(e. g., only a few) tides during each DNA binding event. Digestion with a 5'
exonuclease produces 3' single-stranded overhangs in the DNA molecules. Among other
properties which are desirable for a 5' exonuclease are that it lacks 3' exonuclease activity, it
generates 5' phosphate ends, and it initiates degradation from both 5'—phosphorylated and
unphosphorylated ends. It also desirable that the enzyme can initiate digestion from the 5' end
of a molecule, whether it is a blunt end, or it has a small 5' or 3' recessed end. Suitable
exonucleases will be evident to the skilled . These include, e. g., phage T5 exonuclease
(phage T5 gene Dl5 t), phage lambda exonuclease, RecE of Rac prophage,
exonuclease VIII from E. coli, phage T7 exonuclease (phage T7 gene 6 product), or any of a
variety of 5' exonuclease that are involved in homologous recombination reactions. In one
embodiment of the invention, the exonuclease is T5 exonuclease or lambda exonuclease. In
another embodiment, the lease is T5 exonuclease. In r embodiment, the
exonuclease is not phage T7 exonuclease. Methods for preparing and using exonucleases and
other enzymes employed in methods of the invention are conventional; and many are
available from commercial sources, such as USB ation, 26111 Miles Road, Cleveland,
Ohio 44128, or New England Biolabs, Inc. (NEB), 240 County Road, Ipswich, Mass. 01938-
2723.
Particularly, in ments where the region of overlap is very long, it may
only be necessary to ack a portion of the region (e.g., more than half of the region of
overlap), provided that the single-stranded overhangs thus generated are of sufficient length
and base content to anneal specifically under the conditions of the reaction. The term
ling specifically" includes situations wherein a ular pair of single-stranded
overhangs will anneal preferentially (or exclusively) to one another, rather than to other
single-stranded overhangs (e. g., non-complementary overhangs) which are present in the
reaction mixture. By "preferentially" is meant that at least about 95% of the overhangs will
anneal to the mentary overhang. A skilled worker can readily determine the l
length for achieving specific annealing of a sequence of interest under a given set of reaction
conditions. lly, the homologous regions of overlap (the single-stranded overhangs or
their complements) contain cal sequences. However, partially identical sequences may
be used, provided that the -stranded ngs can anneal specifically under the
conditions of the reactions.
In certain embodiments, the nuclease agent (e. g., a Cas protein) can create
single strand breaks (i.e., “nicks”) at the target site without cutting both strands of dsDNA. A
“nickase” includes a se agent (e. g., a Cas protein) that create nicks in dsDNA. In this
manner, two separate nuclease agents (e. g., Cas proteins) (e. g., nickases) specific for a target
site on each strand of dsDNA can create overhanging sequences complementary to
nging sequences on another nucleic acid, or a separate region on the same nucleic acid.
The overhanging ends created by contacting a nucleic acid with two nickases specific for
target sites on both strands of dsDNA can be either 5’ or 3’ overhanging ends. For example, a
first e can create a single strand break on the first strand of dsDNA, while a second
nickase can create a single strand break on the second strand of dsDNA such that
overhanging sequences are d. The target sites of each e creating the single strand
break can be selected such that the overhanging end sequences created are complementary to
overhanging end sequences on a second nucleic acid. Accordingly, the complementary
overhanging ends of the first and second nucleic acid can be annealed by the methods
disclosed herein. In some embodiments, the target site of the nickase on the first strand is
different from the target site of the nickase on the second strand. Different target sites on
separate strands of dsDNA result in single strand breaks separated by at least 2, 3, 4, 5, 6, 7,
8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 250, 500, or 1,000 base pairs.
In certain embodiments, the second c acid is also contacted with a first
nickase that creates a nick at a first target site on the second nucleic acid and a nickase that
creates a nick at a second target site on the second nucleic acid molecule. The overhanging
end sequences created by the nicks at two different sites on the second nucleic acid can be
complementary to the overhanging end sequences created by nicks at two different sites on
the first nucleic acid so that the mentary overhanging end sequences anneal.
2015/037199
In some embodiments, the nucleic acid ce of a gene of interest spans
across two or more BACs. In such cases, using the methods provided herein, ically
designed nuclease agents can cut the two or more BACs at the desired locations and the
resulting nucleic acid fragments joined together to form the sequence of the gene of interest.
In some embodiments, the overhanging ends created by nicks at different
target sites on both strands of a first nucleic acid are not complementary to the overhanging
ends created by nicks at different target sites on both strands of a second nucleic acid. In
other embodiments, the nucleic acids to be assembled do not have complementary ends such
that a separate nucleic acid is necessary to assemble the noncomplementary ends. A joiner
oligo can be used to join non-complementary ends of two nucleic acids. A “joiner oligo”
includes complementary arms including a polynucleotide or nucleic acid having a
complementary sequence to the ends of a different cleotide or c acid. In some
embodiments, a joiner oligo has an arm complementary to a first nucleic acid on the 5’ end, a
central portion (spacer), and an arm complementary to a second nucleic acid on the 3’ end.
Thus, nucleic acids having non-complementary end sequences to each other can be
assembled by annealing each nucleic acid to the same joiner oligo following an exonuclease
treatment. In specific ments, the joiner oligo has a first arm complementary to the 5’
or 3' end sequence of a first digested nucleic acid and a second arm complementary to the 5'
or 3’ sequence of a second digested c acid. The joiner oligo can join non-
mentary end sequences that are blunt or have 5’ or 3’ overhanging ce.
The length of the complementary arm sequences of the joiner oligo should be
sufficient to anneal to the nucleic acids to be assembled following exonuclease treatment. For
example, the length of the complementary arm sequences of the joiner oligo can be at least
about 10, 20, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 110, 120, 130, 140, 150 bp or
more. In specific embodiments, the complementary arm is 15-120 bp, 20-100 bp, 30-90 bp,
-60 bp, or 20-80 bp. In one specific embodiment, the length of the complementary arm
ces of the joiner oligo is 40 bp. Each complementary arm of a joiner oligo can be of
different lengths. The spacer of the joiner oligo, between the end sequences complementary
to the c acids to be assembled, can be at least about 20 bp, 30 bp, 35 bp, 40 bp, 45 bp,
50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 90 bp, 100 bp, 250 bp, 500 bp, 750 bp, 1000
bp, 2000 bp, 3000 bp, 4000 bp, 5000 bp, 8000 bp, 10 kb, 15kb, 20 kb, or more. For example,
the spacer of a joiner oligo can include a BAC vector or LTVEC. In some embodiments, the
spacer of the joiner oligo can be designed to have sequences specific for ion or
sequences suitable for PCR in order to confirm successful assembly. In some embodiments,
2015/037199
the spacer of the joiner oligo can be designed to introduce one or more restriction enzyme
sites. In some embodiments, the space of the joiner oligo can be designed to introduce a drug
resistance gene or a reporter gene. In other embodiments, the spacer can contain at least 20 bp
from an end portion of a nucleic acid to be assembled in order to seamlessly le the
nucleic acids. For e, for seamless assembly the spacer can be about 45 bp.
In some embodiments, the molar ratio of the nucleic acid to joiner oligo(s) can
be from about 1:1 to about 1:200. In some ments, the molar ratio of the nucleic acid
tojoiner oligo(s) is about1:1,1:2,1:3,1:4,1:5,1:6,1:7,1:8,1:9,1:10,1:11,1:12,1:13,1:14,
1:15,1:16,1:17,1:18,1:19,1:20,1:30,1:40,1:50,1:60,1:70,1:80,1:90,1:100,1:120,
1:140, 1:160, 1:180, or 1:200. In specific embodiments, the molar ratio of the nucleic acid to
joiner oligo(s) can be from about 1:6 to about 1:20. In one embodiment, the molar ratio is
about 1:6. In another embodiment, the molar ratio is about 1:20.
In specific embodiments, a joiner oligo is used to seamlessly assemble at least
two nucleic acids. ess” assembly refers to assembly of two nucleic acids wherein no
intervening nucleic acid bases are present between the adjacent ends of the c acids to be
assembled. For example, seamlessly assembled nucleic acids have no nucleic acid bases
present that are not a part of the nucleic acids to be assembled. In order to seamlessly
assemble two c acids, the spacer of a joiner oligo should include nucleic acid sequence
identical to an end portion of either the first or second c acid to be assembled. This end
portion should be removed from the nucleic acid prior to assembling with the joiner oligo.
For e, the end portion can be cleaved by a nuclease agent (e. g., a gRNA-Cas complex)
at least 20bp from the end of the c acid, such at least 40bp or at least 45bp from the end
of the nucleic acid. Alternatively, the end portion can be cleaved by a nuclease agent (e. g., a
gRNA-Cas complex) at least 2, at least 4, at least 6, at least 8, at least 10, at least 12, at least
, at least 20, at least 25, at least 30, at least 35, at least 37, at least 40, at least 42, at least
45, at least 48, at least 50, at least 55, at least 60, at least 65, at least 70, at least 80, at least
100, at least 110, at least 120, at least 130, at least 140, at least 150 bp from the end of the
nucleic acid to be assembled.
In one embodiment, the joiner oligo can comprise from the 5’ end to the 3’
end: about a 15-120 bp overlap to the 5’ c acid, about 20-50 bp of a 3’ end region of the
’ nucleic acid, and about a 15-120bp overlap to the 3’ nucleic acid. In one embodiment, the
joiner oligo can comprise from the 5’ end to the 3’ end: about a 15-120 bp overlap to the 5’
nucleic acid, about 20-50 bp of a 5’ end region of the 3’ nucleic acid, and about a 15-120 bp
overlap to the 3’ nucleic acid. Thus, when the joiner oligo is assembled to the first and second
nucleic acid, the spacer from the joiner oligo reconstitutes the section removed from the
nucleic acid prior to assembly. See, and The term “reconstitutes” includes
replacement of the end n of the nucleic acid that was cleaved in order to provide a
complete assembled nucleic acid when assembled to the joiner oligo. For example,
reconstituting the d nucleic acid replaces the cleaved portion of the nucleic acid with a
nucleic acid ed in the spacer of the joiner oligo having the identical sequence to that of
the cleaved portion.
The joiner oligo can be led to a first and second nucleic acid molecule
simultaneously or sequentially. When assembled simultaneously, the joiner oligo can be
ted with a first and second nucleic acid in the same reaction mixture such that the
resulting assembled c acid comprises the first nucleic acid, joiner oligo, and second
nucleic acid. When assembled sequentially, the joiner oligo is contacted with the first nucleic
acid in an assembly reaction that produces an assembled nucleic acid comprising the first
nucleic acid assembled to the joiner oligo, but not the second nucleic acid. Such an led
nucleic acid can then be contacted with the second nucleic acid in a separate assembly
reaction that es an assembled nucleic acid comprising the first nucleic acid, joiner
oligo, and second nucleic acid. In other embodiments, the joiner oligo is contacted with the
second nucleic acid in an assembly reaction that produces an assembled nucleic acid
comprising the second c acid assembled to the joiner oligo, but not the first c
acid. Such an assembled nucleic acid can then be ted with the first nucleic acid in
separate assembly reaction that produces an assembled nucleic acid comprising the first
nucleic acid, joiner oligo, and second nucleic acid.
Any number of joiner oligos can be used in the methods herein to assemble
nucleic acid molecules. For example, 1 joiner oligo can be used to assemble 2 c acid
molecules, 2 joiner oligos can be used to assemble 3 nucleic acid molecules, 3 joiner oligos
can be used to assemble 4 nucleic acid molecules, 4 joiner oligos can be used to assemble 5
nucleic acid molecules, or 5 joiner oligos can be used to le 6 nucleic acid molecules.
The number ofjoiner oligos can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more depending on the
number of nucleic acid molecules to be assembled.
In some embodiments, the joiner oligo comprises a gBlock DNA. A “gBlock”
is a linear double stranded DNA fragment. The gBlock can be from about 50 bp to about
2000 bp. The gBlock can be from about 50 bp to about 100 bp, from about 100 bp to about
200 bp, from about 200 bp to about 300 bp, from about 300 bp to about 400 bp, from about
400 bp to about 500 bp, from about 500 bp to about 600 bp, from about 600 bp to about 800
2015/037199
bp, from about 800 bp to about 1000 bp, from about 1000 bp to about 1250 bp, from about
1250 bp to about 1500 bp, from about 1500 bp to about 1750 bp, or from about 1750 bp to
about 2000 bp.
Assembly of two or more nucleic acids with a gBlock can be screened, for
e, by PCR assays described elsewhere herein (e. g., Example 10). In some cases, the
gBlock does not comprise a selection cassette. Such a method allows for rapid joining of two
or more nucleic acid molecules that can be screened by a simple PCR assay. The gBlock can
se any c acid sequence of interest. In some cases, the gBlock can comprise a
target site for a nuclease agent or a target site for any of the various meganucleases or
ction enzymes provided herein. In other embodiments, a gBlock can comprise a
selection te. In some embodiments, the gBlock ses a DNA sequence of interest.
In one embodiment, the gBlock comprises a human DNA sequence.
The nucleic acids to be assembled or any of the various joiner oligos can also
comprise a selection cassette or a reporter gene. The selection cassette can comprise a
nucleic acid sequence encoding a ion marker, wherein the nucleic acid sequence is
operably linked to a promoter. The promoter can be active in a prokaryotic cell of interest
and/or active in a eukaryotic cell of interest. Such ers can be an inducible er, a
promoter that is endogenous to the reporter gene or the cell, a promoter that is heterologous
to the reporter gene or to the cell, a cell-specific promoter, a tissue-specific promoter or a
developmental stage-specific promoter. In one embodiment, the selection marker is selected
from neomycin phosphotransferase (neor), hygromycin B otransferase (hygr),
puromycin-N-acetyltransferase (puror), cidin S deaminase (bsrr), xanthine/guanine
phosphoribosyl transferase (gpt), and herpes simplex virus thymidine kinase (HSV-k), and a
combination thereof. The selection marker of the targeting vector can be flanked by the
upstream and downstream homology arms or found either 5’ or 3’ to the homology arms.
In one embodiment, the nucleic acids to be assembled or any of the various
joiner oligos comprise a reporter gene operably linked to a promoter, wherein the reporter
gene encodes a reporter protein selected from the group consisting of LacZ, mPlum,
mCherry, thomato, mStrawberry, J-Red, DsRed, mOrange, mKO, mCitrine, Venus, YPet,
enhanced yellow fluorescent protein (EYFP), Emerald, enhanced green fluorescent protein
(EGFP), CyPet, cyan cent protein (CFP), Cerulean, T-Sapphire, luciferase, alkaline
phosphatase, and a combination thereof. Such reporter genes can be operably linked to a
promoter active in the cell. Such promoters can be an inducible promoter, a er that is
endogenous to the report gene or the cell, a promoter that is logous to the reporter gene
or to the cell, a cell-specific promoter, a tissue-specific promoter manner or a developmental
stage-specific er.
Following the annealing of single stranded DNA (e.g., overhangs produced by
the action of exonuclease when the DNA molecules to be joined are dsDNA or overhangs
produced by creating nicks at different target sites on each strand), the single-stranded gaps
left by the lease are filled in with a suitable, non-strand-displacing, DNA rase
and the nicks thus formed a sealed with a ligase. A "non-strand-displacing DNA polymerase,"
as used herein, is a DNA polymerase that terminates synthesis of DNA when it encounters
DNA strands which lie in its path as it proceeds to copy a dsDNA molecule, or that degrades
the encountered DNA strands as it proceeds while concurrently filling in the gap thus created,
y generating a g nick" (nick translation).
] In some embodiments, overlapping end sequences have sufficient
complementarity between the overlapping regions to anneal the single-stranded
complementary ends of each polynucleotide. ing annealing of a single strand of a first
polynucleotide to the complementary strand of a second polynucleotide, the 3’ end of the first
polynucleotide can be extended based on the template of the second polynucleotide strand
and the 3’ end of the second polynucleotide strand can be extended based on the template of
the first polynucleotide strand. By extending the complementary 3’ end of each
polynucleotide, the polynucleotides can be assembled. Following assembly, nicks between
the ed 3’ end of a strand from one fragment and adjacent 5’ end of a strand from the
other fragment can be sealed by ligation. More specifically, the hydroxyl group of the
extended 3’ end of the first polynucleotide to the phosphate group of the 5’ end of the second
polynucleotide and ligating the hydroxyl group of the extended 3’ end of the second
polynucleotide to the phosphate group of the 5’ end of the first polynucleotide.
The ligation reaction can be performed by any of a y of suitable
thermostable DNA ligases. Among the le ligases are, for e, Taq ligase,
Ampligase stable DNA ligase (Epicentre hnologies), the Thermostable ligases
disclosed in U.S. Pat. No. 6,576,453, Thermostable Tfi DNA ligase from Bioneer, Inc.,
A suitable amount of a crowding agent, such as PEG, in the reaction mixture
allows for, enhances, or facilitates molecular crowding. Without wishing to be bound by any
particular mechanism, it is suggested that a crowding agent, which allows for molecular
crowding and binds to and ties up water in a solution, allowing components of the solution to
come into closer contact with one another. For example, DNA molecules to be recombined
can come into closer proximity; which facilitates the annealing of the single-stranded
overhangs. Also, it is suggested that enzymes can come into closer contact with their DNA
substrates and can be stabilized by the removal of water molecules. A variety of suitable
ng agents will be evident to the skilled worker. These include a variety of well-known
macromolecules, such as polymers, e.g., polyethylene glycol (PEG); Ficoll, such as Ficoll 70;
dextran, such as n 70; or the like. Much of the discussion in this application is directed
to PEG. However, the discussion is meant also to apply to other suitable crowding agents. A
skilled worker will recognize how to implement routine changes in the method in order to
odate the use of other crowding agents.
] A suitable amount of a crowding agent, such as PEG, in the reaction mixture
allows for, enhances, or facilitates molecular crowding. For example, crowding agents can
help DNA les to be recombined can come into closer proximity; this thus facilitates
the annealing of the single-stranded overhangs. Also, it is suggested that enzymes can come
into closer contact with their DNA substrates and can be stabilized by the removal of water
molecules. A variety of suitable crowding agents will be evident to the skilled worker. These
e a y of well-known macromolecules, such as polymers, e. g., polyethylene glycol
(PEG); Ficoll, such as Ficoll 70; dextran, such as dextran 70; or the like. In l, when
PEG is used, a concentration of about 5% (weight/volume) is optimal. However, the amount
of PEG can range, e. g., from about 3 to about 7%. Any suitable size of PEG can be used, e. g.,
ranging from about PEG-200 (e.g., PEG-4000, PEG-6000, or PEG-8000) to about PEG-
, or even higher. In the Examples herein, PEG-8000 was used. The crowding agent
can, in addition to enhancing the annealing reaction, enhance ligation.
Reaction components (such as salts, buffers, a le energy source (such as
ATP or NAD), pH of the reaction mixture, etc.) that are present in an assembly reaction
mixture may not be optimal for the individual enzymes (exonuclease, polymerase, and
ligase); , they serve as a compromise that is effective for the entire set of reactions. For
e, one suitable buffer system identified by the inventors, sometimes referred to herein
as ISO (ISOthermal) Buffer lly comprises 0.1 M Tris-Cl pH 7.5; 10 mM MgCl.sub.2,
0.2 mM each of dGTP, dATP, dTTP and dCTP, 10 mM DTT, 5% PEG-8000, and 1 mM
NAD.
In the methods disclosed , at least two nucleic acids are contacted with a
Cas protein and other s under conditions effective to assemble the nucleic acids to
form an assembled double-stranded DNA molecule in which a single copy of the overlapping
region is retained. The described s can be used to join any DNA molecules of interest,
including DNAs which are naturally occurring, cloned DNA molecules, synthetically
WO 00334
generated DNAs, etc. The joined DNA molecules may, if desired, be cloned into a vector
(e. g., using a method of the invention). In some embodiments, the nucleic acids to be
assembled are codon optimized for introduction and expression in a cell of interest (e. g., a
rodent cell, mouse cell, rat cell, human cell, mammalian cell, microbial cell, yeast cell,
etc. . .).
DNA les of any length can be joined by methods disclosed herein. For
example, nucleic acids having about 100 bp to about 750 or 1,000, or more, can be joined.
The number of nucleic acids that may be led, in one or several assembly stages
according to the methods described therein, may be at least about 2, 3, 4, 6, 8, 10, 15, 20, 25,
50, 100, 200, 500, 1,000, 5,000, or 10,000 DNA molecules, for e in the range of about
2 to about 30 nucleic acids. The number of assembly stages may be about 2, 4, 6, 8, 10, or
more. The number of molecules assembled in a single stage may be in the range of about 2 to
about 10 molecules. The methods of the invention may be used to join er DNA
molecules or cassettes each of which has a starting size of at least or no greater than about 40
bp, 60 bp, 80 bp, 100 bp, 500 bp, 1 kb, 3 kb, 5 kb, 6 kb, 10 kb, 18 kb, 20 kb, 25 kb, 32 kb, 50
kb, 65 kb, 75 kb, 150 kb, 300 kb, 500 kb, 600 kb, 1 Mb, or larger. The assembled end
products may be at least about 500 bp, 1 kb, 3 kb, 5 kb, 6 kb, 10 kb, 18 kb, 20 kb, 25 kb, 32
kb, 50 kb, 65 kb, 75 kb, 150 kb, 300 kb, 500 kb, 600 kb, 1Mb, or larger, for example in the
range of 30 kb to 1 Mb.
In some embodiments, the assembled c acids form a circle and/or
become ligated into a vector to form a circle. The lower size limit for a dsDNA to circularize
is about 200 base pairs. ore, the total length of the joined fragments (including, in some
cases, the length of the vector) is at least about 200 bp in length. There is no practical upper
size limit, and joined DNAs of a few hundred kilobase pairs, or larger, can be generated by
the methods disclosed herein. The joined nucleic acids can take the form of either a circle or a
linear molecule.
The methods described herein can be used to assemble a linear fragment with
another linear fragment, a linear nt with a circular nucleic acid molecule, a circular
nucleic acid molecule with another circular nucleic acid molecule, or any combination of
linear and circular nucleic acids. A “vector” includes any circular nucleic acid molecule. In
certain embodiments, the vector led by the methods sed herein is a bacterial
artificial chromosome (BAC). The vector (e. g., the BAC) can include a human DNA, a
rodent DNA, a tic DNA, or any combination thereof. For example, the BAC can
2015/037199
comprise a human cleotide sequence. When joining a mixture of DNA molecules, it is
preferable that the DNAs be present in approximately equimolar s.
The nucleic acid used for assembly by the methods disclosed herein can be a
large targeting vector. The term “large targeting vector” or “LTVEC” includes vectors that
comprise homology arms that correspond to and are derived from nucleic acid sequences
used for homologous targeting in cells and/or comprise insert nucleic acids comprising
c acid sequences intended to m gous recombination targeting in cells. For
example, the LTVEC make possible the modification of large loci that cannot be
accommodated by traditional plasmid-based targeting vectors because of their size
limitations. In specific embodiments, the homology arms and/or the insert nucleic acid of the
LTVEC comprises c sequence of a eukaryotic cell. The size of the LTVEC is too
large to enable screening of targeting events by conventional assays, e. g., southern blotting
and long-range (e.g., b) PCR. Examples of the LTVEC, include, but are not d
to, vectors derived from a bacterial artificial chromosome (BAC), a human artificial
chromosome or a yeast cial chromosome (YAC). Non-limiting examples of LTVECs
and methods for making them are described, e.g., in US Pat. No. 6,586,251, 6,596,541,
7,105,348, and (PCT/US01/45375), and US 2013/0137101, each of which
is herein incorporated by reference.
In some embodiments, tes can be inserted into vectors that can later be
removed. Various forms of cassettes can be constructed to allow for deletion in specific cell
or tissue types, at specific developmental stages, or upon induction. Such cassettes can
employ a recombinase system in which the cassette is flanked on both sides by recombinase
recognition sites and can be removed using a recombinase expressed in the desired cell type,
expressed at the desired developmental stage, or expressed or activated upon induction. Such
tes can further be constructed to include an array of pairs of different recombinase
recognition sites that are placed such that null, conditional, or combination conditional/null
alleles can be generated, as described in US 2011/0104799, which is incorporated by
reference in its entirety. Regulation of inase genes can be controlled in various ways,
such as by ly linking a recombinase gene to a cell-specific, tissue-specific, or
developmentally regulated promoter (or other regulatory t), or by ly linking a
recombinase gene to a 3’-UTR that ses a recognition site for an miRNA that is
transcribed only in particular cell types, tissue types, or developmental stages. A
recombinase can also be regulated, for example, by employing a fusion protein placing the
recombinase under the control of an effector or metabolite (e. g., CreERTZ, whose activity is
positively controlled by tamoxifen), or by placing the recombinase gene under the control of
an inducible promoter (e. g., one whose activity is controlled by doxycycline and TetR or
TetR ts). Examples of various forms of cassettes and means of regulating recombinase
genes are provided, for example, in US 8,518,392; US 8,354,389; and US 8,697,851, each of
which is incorporated by reference in its entirety.
The vectors used for assembling as disclosed herein (e. g., LTVEC) can be of
any length, including, but not limited to, from about 20kb to about 400kb, from about 20kb to
about 30kb, from about 30kb to 40kb, from about 40kb to about 50kb, from about 50kb to
about 75kb, from about 75kb to about 100kb, from about 100kb to 125kb, from about 125kb
to about 150kb, from about 150kb to about 175kb, about 175kb to about 200kb, from about
200kb to about 225kb, from about 225kb to about 250kb, from about 250kb to about 275kb
or from about 275kb to about 300kb, from about 200kb to about 300kb, from about 300kb to
about 350kb, from about 350kb to about 400kb, from about 350kb to about 550kb. In one
embodiment, the LTVEC is about 100kb.
The methods provided herein for assembling nucleic acids can be designed so
as to allow for a deletion from about 5kb to about 10kb, from about 10kb to about 20kb, from
about 20kb to about 40kb, from about 40kb to about 60kb, from about 60kb to about 80kb,
from about 80kb to about 100kb, from about 100kb to about 150kb, or from about 150kb to
about 200kb, from about 200kb to about 300kb, from about 300kb to about 400kb, from
about 400kb to about 500kb, from about 500kb to about 1Mb, from about 1Mb to about
1.5Mb, from about 1.5Mb to about 2Mb, from about 2Mb to about 2.5Mb, or from about
2.5Mb to about 3Mb.
In other instances, the methods provided herein are designed so as to allow for
an ion of an ous nucleic acid sequence ranging from about 5kb to about 10kb,
from about 10kb to about 20kb, from about 20kb to about 40kb, from about 40kb to about
60kb, from about 60kb to about 80kb, from about 80kb to about 100kb, from about 100kb to
about 150kb, from about 150kb to about 200kb, from about 200kb to about 250kb, from
about 250kb to about 300kb, from about 300kb to about 350kb, or from about 350kb to about
400kb. In one embodiment, the insert polynucleotide is about 130 kb or about 155kb.
Linear nucleic acids can be assembled with each other or to vectors by the
methods disclosed herein. The linear molecule can be a vector that has been digested by an
clease (e. g., Cas protein) or any tic, cial, or naturally occurring linear
nucleic acid. In certain embodiments, the linear c acid is d such that the end
sequences overlap with a region of another nucleic acid. The overlapping end sequences of a
linear nucleic acid can be uced by any method known in the art for ting
customized c acid sequences. For example, the end sequences can be a portion of a
synthetically produced molecule, can be introduced by PCR, or can be introduced by
traditional cloning techniques.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in
the art with a complete disclosure and description of how to make and use the t
invention, and are not intended to limit the scope of what the inventors regard as their
invention nor are they intended to represent that the experiments below are all or the only
ments performed. Efforts have been made to ensure accuracy with t to numbers
used (e. g. amounts, temperature, etc.) but some experimental errors and deviations should be
accounted for. Unless indicated otherwise, parts are parts by weight, lar weight is
weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or
near atmospheric.
Example 1: BAC digest with CAS9 followed by assembly with a selection cassette
An artificial chNA and an artificial trachNA were designed to target
specific sequences in the MAID 6177 (116 kb LTVEC) for assembly with a 3 kb PCR
product (UB-HYG). The PCR product contained 50 bp overlaps with the vector. First
dissolve chNAs and trachNA to 100 uM in Duplex Buffer (30 mM HEPES, pH 7.5, 100
mM Potassium Acetate). In order to anneal the RNAs, add 10 ul of 100 uM chNA and 10 ul
of 100 uM trachNA to 80 ul of annealing buffer. Heat RNAs in a 90 OC temp block then
remove block from heater and cool on bench. Final concentration of RNA is about 10 uM.
] In order to digest the BAC, clean maxiprep BAC DNA is used and the BAC
digested according to the following mixture.
BAC DNA (500ng) Xul
BSA (100x) 0.5ul
RNA 2ul (1 ul of each tracr:chNA hybrid)
Cas9 (4.5mg/ml) lul
10x Buffer 1.5ul
H20 to 15ul
Digest for 1 hour at 370 then de-salt for 30 min. The final reaction buffer contains: 20 mM
Tris 7.5; 100-150 mM NaCl; 10 mM MgC12; 1 mM DTT; 0.1 mM EDTA; 100 ug/ml BSA;
for a final volume of 15 ul.
In order to assemble the BAC and , digest a plasmid or perform PCR to
create an insert. For PCR reactions, run a small aliquot on a gel and look for a single product,
if the product has a single band then do PCR cleanup instead of gel extraction. A 1:1-1 :6
molar ratio for the BAC:Insert is desired. Usually, 50 ng of the purified insert will work. The
ing reaction mix can be used:
BAC Digest 4ul
Insert lul
Assembly Mix 15ul
Add the DNA and Mix on ice or directly in a PCR machine at 50 OC. Incubate
at 500C for 1 hour. Add 0.5uL of Proteinase K (20mg/ml) and incubate at 500C for 1 hour.
Desalt for 30 min and electroporate 8 ul of the reaction into DH10B cells. 10 ul of the BAC
Digest can be run on a pulse-field gel to check digestion efficiency. Use RNase-free water
and buffers.
The assembly reaction is carried out as follows: Iso-Thermal Buffer: 3 mL 1M
Tris-HCL (pH 7.5); 150 ul 2M MgClz; 60 ul 100 mM each: dGTP, dATP, dTTP, dCTP; 300
ul 1M DTT; 1.5 g PEG 8000; 300 ul 100 mM NAD. The iso-thermal Buffer is stored in 320
ul aliquots at -20 OC. The Master Mix is prepared as follows: 320 ul ermal Buffer; 0.64
ul T5 lease (stock conc=10 U/ul); 20 ul Phusion DNA polymerase (stock conc=2
U/ul); 160 ul Taq DNA Ligase (stock conc=40 U/ul); 699.36 ul H20; mix er, and
aliquot at 15 ul or 30 ul and store —200C. Use 15 ul master mix (MM) in a total volume of 20
ul reaction.
The tracr RNA sequence used in the example is:
AGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUC (SEQ ID NO: 9).
This CRISPR RNA (chNA) contains: (1) about 20 nucleotides of RNA complementary to a
target sequence and (2) a tail sequence (GUUUUAGAGCUAUGCUGUUUUG (SEQ ID NO:
)) that will anneal to the trachNA.
These steps are outlined in
Example 2: Sewing together two overlapping BACs: Humanized HLA-DQ + Humanized
HLA-DR in mouse MHC II locus (H2-A/H2-E)
] An artificial chNA and an artificial A were designed to target
specific sequences in the humanized HLA-DQ BAC for assembly with a humanized HLA-
DR BAC. The vectors contained ~70bp overlaps with each other created by Cas9 cleavage at
two sites on each vector (See, . Dissolve chNAs and trachNA to 100 uM in Hybe
Buffer. To anneal the RNAs, add 10 ul of 100 uM chNA and 10 ul of 100 uM trachNA to
80 ul of Annealing buffer. Place RNAs in a 90 OC heat block then remove block from heater
and cool on bench. Final concentration of RNA is about 10 uM.
In order to digest the BAC, clean maxiprep BAC DNA can be used. Each
BAC can be digested individually according to the following mixture:
BAC DNA 2.5 ug Xul
BSA (100x) 0.5ul
RNA 4ul (2ul of each tracr:chNA hybrid)
Cas9 (4.5mg/ml) lul
10x Buffer 5ul
H20 to 50ul
The BAC vectors should be digested at 370 C for 1 hour and then heat inactivated for 20 min
at 65 OC. Desalt for 30 min. The digested DNA was purified via
phenol/chloroform/isoamylalcohol (PCI) extraction and then resuspended in 35 ul TE buffer.
In order to assemble the vectors, use 2.5 uL of the BACs for the assembly
on as follows:
Digested BACs 5 ul (total)
ly MIX 15 ul
Add the DNA and Mix on ice or directly in a PCR machine at 50 OC. Incubate
at 50 0C for 1 hour. Desalt for 30 min and electroporate 8 ul of the led DNA into
DHlOB cells. Use RNase-free water and buffers.
] The tracr RNA sequence used in the example is:
CAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUC (SEQ ID NO: 9).
This CRISPR RNA (chNA) contains: (1) about 20 nucleotides of RNA complementary to a
target sequence and (2) a tail sequence (GUUUUAGAGCUAUGCUGUUUUG (SEQ ID NO:
)) that will anneal to the trachNA.
] These steps are outlined in
Example 3: ling of 2 Cas9-cleaved fragments from 2 different plasmids using linkers
In order to construct a targeting vector, pMJ8502x was cleaved with 2
identical chNAs to drop out 400 bp fragment and 2283 bp Amp backbone. (. Qiagen
columns were used to purify the entire reaction. R6KZenUbiNeo was then cleaved with 2
different chNAs to separate into Neo resistance (1086 bp) and ne (5390 bp). Qiagen
columns were used purify the entire reaction. (. Cleavage on: 1170 ng DNA, 30
ul Buffer, 4 ul annealed RNA (@100 uM), 1.7 ul Cas9 (@089 , H20 to 60ul. The
mixture was incubated at 37 0C for 1 hour and purified on a Qiagen column before eluting in
30ul elution buffer.
The cleaved fragments were then led with two linkers to result in a
seamless assembly according to the following reaction mixture: 0.5 ul linkerl (5 ng), 0.5 ul
linker2 (5 ng), 2 ul Neo cleavage (~60ng), 2 ul Amp cleavage (~60ng), 15 ul Assembly
Master Mix. The e was ted at 50 0C for 1 hour, and the reaction was dialyzed
against H20. 10 ul of the reaction was electroporated into electrocompetent Pir cells before
plating on Carb/Kan . PCR across junction showed 6/8 selected colonies were t
and was confirmed by sequencing.
Example 4: Replacement of a portion of a BAC with a cassette using linkers
In order to construct a knock out mouse targeting vector, 40 kb of a BAC
targeting vector was ed with a selection cassette flanked by recombination recognition
sites. ( 2 linkers were designed to delete a region of interest from mBAC and to insert
the selection cassette, one for 5’ and one for 3’. The linkers had 40 bp p to mBAC and
40 bp overlap to a selection cassette. First, 39.5 kb of the 206 kb targeting vector (mBAC)
was cleaved according to the following reaction: 500 ul reaction (bring up with H20): add 1
ul Cas9 (@089 ug/ul), 2 ul each RNA duplex (@ 50 uM), 250 ul buffer, 220 ul (12.5 ng)
BAC maxi prep, and incubated at 37 0C for 1 hour. The digested DNA was purified via
phenol/chloroform/isoamylalcohol (PCI) tion and then resuspended in 55 ul TE buffer.
After PCI cleanup of the mBAC cleavage, assembly was done at 50 0C for 1hr, and 10 ul of
the reaction was electroporated into DH10B cells. (. Sequencing across junctions
confirmed correct assembly. (). Linker 1 (joiner oligo 1) is seamless from mBAC
sequence to Cassette sequence (SEQ ID NO: 12). Linker 2 r oligo 2) is seamless from
Cassette sequence to mBAC sequence (SEQ ID NO: 13).
Example 5: Assembling two BAC vectors using linkers (Joiner Oligos)
Stitching of 2 mBACs by Cas9/isothermal assembly was utilized to make a
targeting vector that contains homology arms to a mouse genomic region and restriction sites
for inserting a human gene by BAC ligation. This targeting vector was used in a BAC
ligation to make a humanized targeting vector. The mBAC was cleaved ing to the
following reaction: 12.5 ug DNA, 2 ul each annealed RNA (@50uM), 10 ul Cas9 (@089
ug/ul), 250 ul buffer, H20 to 500 ul. The mixture was incubated at 37 0C for one hour;
cleaned up by phenol/chloroform/isoamylalcohol (PCI) extraction; and ended in 20 ul
TE. The two mouse BACs were then assembled together with linkers () according to
the following reaction: 6 ul (2 ug) bMQ-208A16 cleavage, 5.6 ul (2 ug) F19
cleavage, 0.25 ul each linker (@50 uM), 4.3 ul (100 ng) selection cassette (Ubi-Hyg) cassette,
12 ul high concentration assembly master mix, 11.35 ul H20. The reaction mixture was
incubated at 50 0C for 1 hour and dialyzed t H20 at 300C. 10 ul or 30 ul of the ed
reaction was used to transform DH10B cells. Sanger sequencing confirmed all junctions.
Illumina cing reconfirmed all junctions ( and SEQ ID NO: 17). Linker 1 is
seamless from mBAC to Cassette (SEQ ID NO: 14). Linker 2 is not seamless from cassette to
mBAC. It incorporates a human spacer sequence as per the project design. Linker 3 is not
seamless from mB2 to mB3. It incorporates a unique sequence that was used for PCR
verification. This area was removed when ized for ES electroporation (SEQ ID NO:
).
illustrates an example of using 4 joiner oligos (linkers) to insert large
human gene fragments onto an mBAC using four linkers and isothermal assembly.
Example 6: Reagents and reactions mixtures for cleavage and assembly
Crispr RNA (chNA) (ordered as ssRNA) contains: (1) 20 nucleotides of
RNA that is complementary to a target area to cleave; (2) and a tail that will anneal to the
tracr RNA: <20nt criserNA>GUUUUAGAGCUAUGCUGUUUUG (SEQ ID NO: 10).
Tracr RNA (ordered as ssRNA):
GUUGGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
GAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).
All RNA is resuspended to 100 uM in H20. 2.5 ul of each chNA and
trachNA is combined with 5 ul of ing buffer (final concentrations: 10 mM Tris pH
7.5-8.0, 50 mM NaCl, 1 mM EDTA). The e is then incubated at 95 0C for 5 minutes
and slowly cooled to room temperature over 1 hour. Cas9 2X ge buffer contains 40
mM HEPES pH7.5 (Final: 20 mM); 300 mM KCl (Final: 150 mM); 1mM DTT (Final: 0.5
mM); 0.2mM EDTA (Final: 0.1 mM); 20 mM MgCl2 (Final: 10 mM).
Large Scale Cas9 Cleavage Reaction: Add in order at room temperature: H20
to 500 ul, 250 ul 2x cleavage buffer, 12.5 ug DNA, 2ul of each RNA (50 uM concentration),
ul Cas9 (0.89 mg/ml concentration), and te at 37 0C for 1 hour.
This reaction can be scaled as needed, for example: H20 to 50 ul, 25 ul Buffer,
125 ng DNA, 2 ul each RNA (5 uM concentration), 1 ul Cas9 (0.89mg/ml concentration), and
incubate at 37 0C for 1 hour.
The assembly reaction is carried out as follows: Iso-Thermal : 3 mL 1M
Tris-HCL (pH 7.5); 150 ul 2M MgClz; 60 ul 100 mM each: dGTP, dATP, dTTP, dCTP; 300
ul 1M DTT; 1.5 g PEG 8000; 300 ul 100 mM NAD. The iso-thermal Buffer is stored in 320
ul aliquots at -20 OC. The Master Mix is prepared as follows: 320 ul iso-thermal Buffer; 0.64
ul T5 exonuclease (stock 0 U/ul); 20 ul n DNA polymerase (stock conc=2
U/ul); 160 ul Taq DNA Ligase (stock conc=40 U/ul); 699.36 ul H20; mix together, and
aliquot at 15 ul or 30 ul and store —200C. Use 15 ul master mix (MM) in a total volume of 20
ul reaction.
] Alternatively, a high concentration master mix (GA MM HC) can be made as
follows: 320 ul iso-thermal buffer; 0.64 ul T5 exonuclease (stock 0 U/ul); 20 ul
Phusion DNA polymerase (stock conc=2 U/ul); 160 ul Taq DNA Ligase (stock conc=40
U/ul); mix together and aliquot at 6 ul or 12 ul and store —20 0C. Use 6 ul of the master mix
in a total volume of 20 ul reaction.
For all ly reactions, the tration of DNA should be determined
(e. g., by Nano Drop) and a 1:6 molar ratio (vector to insert(s)) is used. For standard
concentration, 15 ul of the assembly master mix is used. DNA and water are added to a final
volume of 20 ul in a 200ul PCR tube. Reaction is carried out in a thermocycler at 50 0C for 1
hour. The reaction can then be stored at -20 0C. For high concentration, 6 ul of the high
concentration assembly master mix is used. DNA and water are added to a final volume of 20
ul in a 200 ul PCR tube. The reaction is carried out in a thermocycler at 50 0C for 1 hour.
The reaction can then be stored at -20 OC. Upon completion of the reaction, 10 ul is dialyzed
against water for 30 min and electroporated into riate electro-competent cells (e. g.,
DH10B or Pir+ cells).
Cas9/Isothermal Assembly Reaction: For the Cas9 digest 2.5 ug of each DNA
(e. g., BAC DNA), 4 ul of 10 uM guide/tracr RNAs each, and 5 ul of Cas9 protein (0.89
mg/ml) are digested for 2 hours at 370C. The reaction is heat inactivated at 650C for 20 min,
2015/037199
phenol chloroform extracted (e. g., to remove Cas9 protein), washed once with 70% ethanol,
and DNA resuspended in 35 ul water. The Isotheral Assembly is performed with 5 ul of the
DNA mixed together with 15 ul of the master mix (MM) as described elsewhere herein and
incubated at 50 0C for 1 hour. The on is desalted for 30 min and 8 ul of the reaction can
be electroporated into cells.
Example 7: Cas9/Isothermal Assembly to insert human sequence into a BAC vector
In order to construct a humanized ing vector, MAID 6236 was cleaved
with a gRNA-Cas complex to generate a cleaved fragment with overlapping sequences.
V1568 was also cleaved with a gRNA-Cas complex to generate sequences overlapping with
the fragment of 36. Cas9/ Isothermal assembly was med as described above
resulting in insertion of the humanized locus into the vector (V15 99). This process is outlined
in .
Example 8: Cas9/Isothermal Assembly Using a gBlock Without Selection
Cas9 digest and assembly can also be performed without selection, for example, by
utilizing gBlock DNA fragments. In order to test the possibility of adding double stranded
DNA into a locus without a selection cassette, gBlock DNA fragments were synthesized and
inserted into the construct. As outlined in A and B, a Cas9/gRNA was designed to
target two sites within the TCR beta locus to delete a 4.4 kb nt. A gBlock was
designed to introduce a meganuclease recognition site into the uct. The gBlock was
able to insert into the construct without using a ion marker. A shows the
insertion of a PISceI gBlock and B demonstrates the ion of a MauBI .
The final constructs were confirmed for successful insertion of each of the gBlocks by
PCR junction screens using the s indicated in Table l. The protocol for the junction
screens is as follows: The PCR reaction contained: 1 uL DNA, 0.5uL Primer l, 0.5uL Primer
2, luL DMSO, 4uL dNTPs, 2.5uL 10x buffer, 0.5uL Ex-Taq, and 15uL Water. The
Reaction was carried out in a thermocycler at 95°C for 3 minutes, 95°C for 30 sec, 55°C for
sec for 25 cycles, followed by 72°C for 30 sec, and 72°C 5 min. The junction sequences
were confirmed by sequencing.
Table 1: Primers for junction screening of MAID1715 with either PI-SceI gBlock or MauBI
gBlock
MAID1715+PISceI Gblock
Primer name Sequence Junction size
GGAAAGCCACCCTGTATGCT (SEQ
' 302p18 detect ID NO: 18)
3'down detect CTTGGCCAACAGTGGATGG (SEQ ID
302p18(m41) NO: 19)
Cas9 Primer name Sequence DNA Target sequence
CUAAAAUGAUUCUCAUCUGC
GUUUUAGAGCUAUGCUGUUUUG CTAAAATGATTCTCATCTGCUKKD
1715 target—5' @EQHDN02MD (SEQ ID NO: 22)
GCUCUCAACUUCACCCUUUC
GUUUUAGAGCUAUGCUGUUUUG GCTCTCAACTTCACCCTTHXTGG)
1715 target—3' @EQHDNO:M) 6EQHDNO:ZD
MAID1715+MauBI Gblock
Primer name Sequence Junction size
GGAAAGCCACCCTGTATGCTGEQ
(m380)5‘ 302 18 detect ID NO: 18)
3'down detect CTTGGCCAACAGTGGATGG(SEQID
302 ) NOfl9)
Cas9 Primer name Sequence DNA Target sequence
CUAAAAUGAUUCUCAUCUGC
GAGCUAUGCUGUUUUG TGATTCTCATCTGCUKKD
1715 target—5' @EQHDN02m» (SEQ ID NO: 22)
GCUCUCAACUUCACCCUUUC
GUUUUAGAGCUAUGCUGUUUUG GCTCTCAACTTCACCCTTHXTGG)
1715 target—3' 6EQHDNO:M) @EQHDNO:ZD
Example 9: Cas9/Isothermal Assembly to insert human ce into a BAC vector using
Joiner Oligos
provides an example of direct humanization using Cas9/isothermal
assembly and joiner oliogs. The human fragment and the mouse on are dropped out by
Cas9 (each BAC uses 2 crispr RNAs).The human fragment and mouse backbone are linked
er in a Gibson Assembly reaction with 3 s (joiner oligos) and a selection cassette.
provides an example of indirect humanization using Cas9/isothermal
assembly and joiner oliogs for assembly into a large ing vector (LTVEC). The human
fragment on the hBAC is cleaved out by Cas9 using 2 crispr RNAs. The donor comprises up
and down joiner oligos and a selection cassette. After hBAC cleavage by Cas9, the fragment
is “captured” by Gisbon Assembly using a synthetic donor with incorporated complimentary
overhangs. Targeting vector construction is completed by Gibson Assembly or BHR.
Example 10: Introducing a Point on By Cas9/Isothermal Assembly
es an example of ing Cas9/Isothermal Assembly to
introduce a point on. A donor is made by traditional cloning. A selection cassette is
inserted into a synthetic DNA fragment that contains linker overlaps and the point mutation.
The mBAC is cleaved with Cas9, the sequence is d from the mBAC and the mBAC is
Gibson Assembled to the donor resulting in a construct (LTVEC) comprising the point
mutation and the selection cassette.
Example 11: BAC Trimming by Cas9/Isothermal ly
provides an example of BAC trimming using the sothermal
assembly method. The area needed to be removed from the LTVEC is trimmed using Cas9.
In this example, the BAC trimming removes the Ori sequence. The Ori is replaced in a
Gibson Assembly reaction using 2 linkers (joiner oligos).
Example 12: Other Methods for BAC digest with CAS9 followed by assembly
Other methods can be used in the methods provided herein including the
following: tic or in vitro-transcribed trachNA and chNA were pre-annealed prior to
the reaction by heating to 95 OC and slowly cooling down to room temperature. Native or
linearized plasmid DNA (300 ng (about 8 nM)) was incubated for 60 min at 37 0C with a
purified Cas9 protein (50-500 nM) and a trachNA:chNA duplex (50-500 nM, 1:1) in a
Cas9 plasmid cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 0.5 mM DTT, 0.1 mM
EDTA) with or without 10 mM MgC12. The reactions were stopped with 5X DNA loading
buffer containing 250 mM EDTA, resolved by 0.8 or 1% agarose gel electrophoresis and
visualized by ethidium bromide staining. For the Cas9 mutant cleavage assays, the reactions
were stopped with 5X SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA) prior
to loading on the agarose gel.
An cial chNA and an cial trachNA were ed to target
specific sequences in the MAID 6177 (116 kb LTVEC) for ly with a 3 kb PCR
product G). The PCR product contained 50 bp overlaps with the vector. An
isothermal one-step assembly was used based on the use of an isolated ermostable 5' to
3' exonuclease that lacks 3' exonuclease activity as follows. A reaction was set up containing
the following: 100 fmol each dsDNA substrate, 16 ul 5X ISO buffer, 16 ul T5 exonuclease
(0.2 U/ul, Epicentre), 8.0 ul Taq DNA ligase (40 U/ul, NEB), 1.0 ul PhusionTM DNA
polymerase (2 U/ul, NEB), and water to 80 ul. The 5x ISO (ISOthermal) buffer was 25%
PEG-8000, 500 mM Tris-Cl, 50 mM MgCl2, 50 mM DTT, 5 mM NAD, and 1000 uM each
dNTP (pH 7.5).
] This gave a final concentration of 1.25 fmol/ul each dsDNA (or 45 fmol/ul
each ssDNA) that was to be assembled, 5% PEG-8000, 100 mM Tris-Cl pH 7.5, 10 mM
MgC12, 10 mM DTT, 200 MM each dNTP, 1 mM NAD, 0.02 U/ul T5 exonuclease, 4 U/ul
Taq DNA ligase, and 0.03 U/ul PHUSION DNA polymerase.
s used 1.64 ul (0.2 U/ul) of T5 lease for substrates that overlap
by 20-80 bp, and for substrates that have larger overlaps (e. g., 200 bp), 1.6 ul (1 U/ul) of T5
exonuclease was used. T5 exonuclease was used as a 1:50 dilution (in T5 lease storage
buffer) from the 10 U/ul T5 exonuclease (Epicentre) concentrated enzyme stock. The reaction
was then incubated at 500C for 15 minutes.
Example 13: Other Methods for Sewing together two overlapping BACs
Other methods can be used in the methods provided herein including the
following: Synthetic or in vitro-transcribed trachNA and chNA were pre-annealed prior to
the reaction by heating to 95 OC and slowly cooling down to room temperature. Native or
linearized plasmid DNA (300 ng (about 8 nM)) was incubated for 60 min at 37 0C with a
purified Cas9 protein (50-500 nM) and a trachNA:chNA duplex 0 nM, 1:1) in a
Cas9 plasmid cleavage buffer (20 mM HEPES pH 7.5, 150 mM KCl, 0.5 mM DTT, 0.1 mM
EDTA) with or without 10 mM MgC12. The reactions were stopped with 5X DNA g
buffer containing 250 mM EDTA, resolved by 0.8 or 1% agarose gel electrophoresis and
visualized by ethidium bromide staining. For the Cas9 mutant ge , the reactions
were stopped with 5X SDS loading buffer (30% glycerol, 1.2% SDS, 250 mM EDTA) prior
to loading on the agarose gel.
An artificial chNA and an artificial trachNA were designed to target
specific sequences in the humanized HLA-DQ BAC for ly with a humanized HLA-
DR BAC. The vectors contained ~70bp overlaps with each other created by Cas9 cleavage at
two sites on each vector (See, . An isothermal one-step assembly was used based on
the use of an isolated non-thermostable 5' to 3' exonuclease that lacks 3' exonuclease activity
as follows. A reaction was set up ning approximately the following: 100 fmol each
dsDNA substrate, 16 ul 5X ISO buffer, 16 pt T5 exonuclease (0.2 U/ul, Epicentre), 8.0 ul
Taq DNA ligase (40 U/ul, NEB), 1.0 ul nTM DNA polymerase (2 U/ul, NEB), and
water to 80 ul. The 5><ISO (ISOthermal) buffer was 25% PEG-8000, 500 mM Tris-Cl, 50
mM MgCl2, 50 mM DTT, 5 mM NAD, and 1000 uM each dNTP (pH 7.5).
This gave a final concentration of about 1.25 fmol/ul each dsDNA (or 45
fmol/ul each ssDNA) that was to be assembled, 5% PEG-8000, 100 mM Tris-Cl pH 7.5, 10
mM MgC12, 10 mM DTT, 200 MM each dNTP, 1 mM NAD, 0.02 U/ul T5 exonuclease, 4
U/ul Taq DNA , and 0.03 U/ul PHUSION DNA polymerase.
Methods used 1.64 ul 0.2 U/ul T5 exonuclease for substrates that overlap by
-80 bp, and for substrates that have larger overlaps (e. g., 200 bp), 1.6 ul 1 U/ul T5
exonuclease was used. T5 exonuclease was used as a 1:50 dilution (in T5 exonuclease storage
buffer) from the 10 U/ul T5 exonuclease (Epicentre) concentrated enzyme stock. The reaction
was then incubated at 500 C. for 15 s.
Example 14: Other Methods for assembling an insert with a BAC vector
Other methods can be used in the methods provided herein including the
following: Dissolve chNAs and trachNA to 100 uM in Hybe Buffer (10X buffer: 20 mM
Tris 7.5, 100-150 mM NaCl, 10 mM MgC12, 1 mM DTT, 0.1 mM EDTA, 100 ug/ml BSA).
In order to anneal the RNAs, add 10 ul of 100 uM chNA and 10 ul of 100 uM trachNA to
80 ul of annealing . Heat RNAs in a 90 OC temp block then remove block from heater
and cool on bench. Final concentration of RNA is about 10 uM.
] In order to digest the BAC, clean maxiprep BAC DNA is used and the BAC
digested according to the following mixture.
BAC DNA 500ng Xul
BSA 0.5ul
RNA 2ul (1 ul of each tracr:chNA hybrid)
Cas9 (4.5mg/ml) 1ul
10x Buffer 1.5ul
H20 to 15ul
Digest for 1 hour at 370 then de-salt for 30 min.
In order to assemble the BAC and insert, digest a plasmid or perform PCR to
create an insert. For PCR reactions, run a small aliquot on a gel and look for a clean product,
if the product is not clean then do PCR cleanup instead of gel tion. A 1:1-1 :6 molar
ratio for the BAC:Insert is d. Usually, 50 ng of the purified insert will work. The
ing reaction mix can be used:
BAC Digest 4ul
Insert 1ul
Assembly Mix 15ul
Add the DNA and Mix on ice or directly in a PCR machine at 50 OC. Incubate
at 500C for 1 hour. Add 0.5uL of Proteinase K (20mg/ml) and incubate at 500C for 1 hour.
Desalt for 30 min and electroporate 8 ul of the reaction into DHlOB cells. 10 ul of the BAC
Digest can be run on a field gel to check digestion efficiency. Use RNase-free water
and buffers. The final reaction buffer contains: 20 mM Tris 7.5; 100-150 mM NaCl; 10 mM
MgC12; 1 mM DTT; 0.1 mM EDTA; 100 ug/ml BSA; for a final volume of 15 ul.
The tracr RNA sequence used in the example is:
CAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUC (SEQ ID NO: 9).
This CRISPR RNA (chNA) ns: (1) about 20 nucleotides of RNA complementary to a
target sequence and (2) a tail sequence (GUUUUAGAGCUAUGCUGUUUUG (SEQ ID NO:
)) that will anneal to the trachNA.
Claims (35)
1. An in vitro method for assembling two or more nucleic acids, comprising: (a) contacting a first nucleic acid with a first nuclease agent, wherein the first nuclease agent comprises a Cas protein and a guide RNA (gRNA) (gRNA-Cas complex), a zinc finger nuclease, or a Transcription Activator-Like Effector Nuclease (TALEN), wherein the first nuclease agent cleaves the first nucleic acid at a first target site to produce a first digested nucleic acid with an overlapping end sequence shared by a second nucleic acid; (b) contacting the first digested nucleic acid and the second nucleic acid with an exonuclease to expose complementary ces between the first digested nucleic acid and the second nucleic acid; and (c) assembling the two nucleic acid fragments ted from step (b).
2. The method of claim 1, n step (c) comprises: (i) annealing the exposed complementary sequences; (ii) extending the 3’ ends of the annealed complementary sequences; (iii) ligating the first and the second c acids.
3. The method of claim 1 or 2, wherein step (a) r comprises contacting the second nucleic acid with a second nuclease agent, wherein the second nuclease agent cleaves the second nucleic acid at a second target site to produce a second digested nucleic acid with the overlapping end sequence, and wherein the second c acid of step (b) is the second digested nucleic acid.
4. The method of any one of claims 1-3, n the overlapping end sequence ranges from 20 bp to 200 bp long.
5. An in vitro method for assembling two or more nucleic acids, comprising: (a) contacting a first nucleic acid with at least one nuclease agent, wherein the at least one nuclease agent comprises a Cas protein and a guide RNA (gRNA) (gRNA-Cas complex), a zinc finger nuclease, or a ription Activator-Like or Nuclease (TALEN), wherein the at least one nuclease agent cleaves the first nucleic acid at a first target site to generate a first digested c acid; (b) contacting the first digested nucleic acid with a second nucleic acid, a joiner oligo, and an exonuclease, wherein the joiner oligo comprises: (i) a first complementary sequence that is complementary to the first digested nucleic acid; (ii) a spacer; and (iii) a second complementary sequence that is mentary to the second nucleic acid; wherein the exonuclease exposes the first and second mentary sequences; (c) assembling the joiner oligo with the first digested nucleic acid and the second nucleic acid.
6. The method of claim 5, wherein the assembling in step (c) comprises: (i) annealing the first complementary sequence of the joiner oligo to the first digested nucleic acid and the second complementary sequence of the joiner oligo to the second nucleic acid, optionally r comprising extending the 3’ end of the first digested c acid and/or the second nucleic acid; and (ii) ligating the joiner oligo to the first digested nucleic acid and the second nucleic acid.
7. The method of claim 5 or 6, wherein the first complementary sequence of the joiner oligo is between 15 and 120 complementary bases, and the second complementary sequence of the joiner oligo is between 15 and 120 mentary bases, optionally wherein the first complementary ce of the joiner oligo is between 20 and 80 complementary bases and the second complementary sequence of the joiner oligo is between 20 and 80 complementary bases.
8. The method of any one of claims 5-7, wherein the spacer of the joiner oligo comprises non-complementary nucleic acids.
9. The method of any one of claims 5-8, wherein the first digested nucleic acid is seamlessly assembled to the second nucleic acid.
10. The method of claim 9, wherein the at least one nuclease agent is ed to cleave an at least 20 bp fragment from the end of the first nucleic acid at which the seamless assembly will occur, optionally wherein the fragment is double-stranded, wherein the spacer of the joiner oligo comprises a sequence identical to the at least 20 bp fragment, wherein no nucleic acid bases are present between the first complementary sequence and the at least 20 bp fragment, and no nucleic acid bases are present n the second mentary sequence and the at least 20 bp fragment, such that assembly of the first digested nucleic acid with the joiner oligo and the second nucleic acid reconstitutes the at least 20 bp fragment and seamlessly assembles the first nucleic acid and the second nucleic acid.
11. The method of claim 9 or 10, n the joiner oligo comprises a linear double stranded DNA fragment, optionally wherein the linear double stranded DNA fragment does not comprise a selection cassette.
12. The method of any one of claims 5-11, wherein step (a) further comprises: (i) contacting the second nucleic acid with a second nuclease agent, wherein the second nuclease agent cleaves the second nucleic acid to produce a second digested nucleic acid comprising a nucleotide sequence that is complementary to the second complementary sequence of the joiner oligo, wherein the first digested c acid is led to the second digested nucleic acid; or (ii) contacting the second nucleic acid with a ction enzyme or meganuclease, wherein the restriction enzyme or meganuclease s the second nucleic acid to e a second digested nucleic acid comprising a nucleotide sequence that is complementary to the second complementary sequence in the joiner oligo, wherein the first digested nucleic acid is assembled to the second digested nucleic acid.
13. The method of any one of claims 5-12, wherein the joiner oligo is assembled to the first c acid and the second nucleic acid in the same reaction or sequentially.
14. The method of any one of claims 1-13, wherein the two or more nucleic acids are -stranded.
15. An in vitro method for assembling two or more nucleic acids, comprising: (a) contacting a first nucleic acid with a first nuclease agent and a second nuclease agent, wherein the first nuclease agent comprises a Cas n and a guide RNA (gRNA) (gRNA-Cas complex), a zinc finger nuclease, or a Transcription Activator-Like Effector Nuclease (TALEN), and wherein the first nuclease agent cleaves the first nucleic acid at a first target site and the second se agent cleaves the first nucleic acid at a second target site to generate a first digested nucleic acid; (b) contacting the first digested nucleic acid with a first joiner oligo, a second nucleic acid, a second joiner oligo, and an exonuclease, wherein the first joiner oligo comprises: (i) a first complementary ce that is complementary to the first digested nucleic acid; and (ii) a second complementary sequence that is mentary to the second nucleic acid; and wherein the second joiner oligo comprises: (i) a first complementary sequence that is complementary to the second nucleic acid; and (ii) a second complementary sequence that is complementary to the first digested nucleic acid; and wherein the exonuclease exposes the complementary sequences of the first joiner oligo, the second joiner oligo, the first digested nucleic acid, and the second nucleic acid; and (c) ling the first digested nucleic acid, the first joiner oligo, the second nucleic acid, and the second joiner oligo.
16. The method of claim 15, n the assembling in step (c) comprises: (i) annealing the first complementary sequence of the first joiner oligo to the first digested nucleic acid, annealing the second complementary sequence of the first joiner oligo to the second nucleic acid, annealing the first complementary sequence of the second joiner oligo to the second nucleic acid, and annealing the second mentary sequence of the second joiner oligo to the first ed nucleic acid, optionally further comprising extending the 3’ ends of the annealed complementary sequences; and (ii) ligating the first digested nucleic acid to the first joiner oligo, ligating the first joiner oligo to the second nucleic acid, ligating the second nucleic acid to the second joiner oligo, and ligating the second joiner oligo to the first digested nucleic acid.
17. An in vitro method for assembling three or more nucleic acids, comprising: (a) contacting a first nucleic acid with a first nuclease agent and a second nuclease agent, wherein the first nuclease agent comprises a Cas protein and a guide RNA (gRNA) Cas complex), a zinc finger se, or a Transcription tor-Like Effector Nuclease (TALEN), and wherein the first nuclease agent cleaves the first c acid at a first target site and the second nuclease agent cleaves the first nucleic acid at a second target site to generate a first digested nucleic acid; (b) contacting the first digested nucleic acid with a first joiner oligo, a second nucleic acid, a second joiner oligo, a third nucleic acid, a third joiner oligo, and an exonuclease, wherein the first joiner oligo comprises: (i) a first complementary sequence that is complementary to the first digested nucleic acid; and (ii) a second complementary sequence that is complementary to the second nucleic acid; and wherein the second joiner oligo comprises: (i) a first complementary sequence that is complementary to the second nucleic acid; and (ii) a second complementary sequence that is complementary to the third nucleic acid; and wherein the third joiner oligo comprises: (i) a first complementary sequence that is complementary to the third nucleic acid; and (ii) a second complementary sequence that is complementary to the first digested c acid; and wherein the exonuclease s the complementary sequences of the first joiner oligo, the second joiner oligo, the third joiner oligo, the first digested nucleic acid, the second nucleic acid, and the third nucleic acid; and (c) assembling the first digested nucleic acid, the first joiner oligo, the second nucleic acid, the second joiner oligo, the third nucleic acid, and the third joiner oligo.
18. An in vitro method for assembling four or more c acids, comprising: (a) contacting a first nucleic acid with a first nuclease agent and a second nuclease agent, wherein the first nuclease agent comprises a Cas protein and a guide RNA (gRNA) (gRNA-Cas x), a zinc finger nuclease, or a Transcription Activator-Like Effector Nuclease (TALEN), and wherein the first se agent cleaves the first nucleic acid at a first target site and the second nuclease agent cleaves the first nucleic acid at a second target site to generate a first digested nucleic acid; (b) contacting the first digested nucleic acid with a first joiner oligo, a second c acid, a second joiner oligo, a third nucleic acid, a third joiner oligo, a fourth nucleic acid, a fourth joiner oligo, and an exonuclease, wherein the first joiner oligo comprises: (i) a first complementary sequence that is complementary to the first digested c acid; and (ii) a second complementary sequence that is complementary to the second nucleic acid; and wherein the second joiner oligo ses: (i) a first complementary sequence that is complementary to the second nucleic acid; and (ii) a second complementary sequence that is complementary to the third nucleic acid; and wherein the third joiner oligo comprises: (i) a first complementary sequence that is complementary to the third nucleic acid; and (ii) a second complementary sequence that is mentary to the fourth nucleic acid; and wherein the fourth joiner oligo comprises: (i) a first complementary sequence that is complementary to the fourth nucleic acid; and (ii) a second complementary sequence that is complementary to the first digested nucleic acid; and wherein the exonuclease exposes the complementary sequences of the first joiner oligo, the second joiner oligo, the third joiner oligo, the fourth joiner oligo, the first ed nucleic acid, the second nucleic acid, the third nucleic acid, and the fourth nucleic acid; and (c) assembling the first digested nucleic acid, the first joiner oligo, the second nucleic acid, the second joiner oligo, the third nucleic acid, the third joiner oligo, the fourth nucleic acid, and the fourth joiner oligo.
19. The method of claim 17 or 18, wherein the assembling in step (c) comprises: (i) annealing the first ed c acid, the other nucleic acids, and the joiner oligos, optionally further comprising extending the 3’ ends of the annealed sequences; and (ii) ligating the first digested nucleic acid, the other nucleic acids, and the joiner oligos.
20. The method of any one of claims 15-19, wherein the two or more nucleic acids are -stranded nucleic acids, the first nuclease agent cleaves the first nucleic acid at a first target site to create a first double-strand break, and the second nuclease agent cleaves the first nucleic acid at a second target site create a second double-strand break.
21. The method of any one of claims 15-20, n step (a) further comprises contacting the second nucleic acid with a third nuclease agent, n the third nuclease agent cleaves the second nucleic acid at a third target site to generate a second digested nucleic acid.
22. The method of any one of claims 15-20, wherein step (a) further comprises contacting the second nucleic acid with a third nuclease agent and a fourth nuclease agent, wherein the third nuclease agent cleaves the second nucleic acid at a third target site and the fourth nuclease agent cleaves the second nucleic acid at a fourth target site to generate a second digested nucleic acid.
23. The method of any one of claims 15-22, wherein (I) one or more or all of the joiner oligos are linear, double-stranded DNA; and/or (II) the first complementary sequence and the second complementary sequence of one or more or all of the joiner oligos are each between 15 and 120 complementary bases, optionally n the first complementary sequence and the second complementary sequence of one or more or all of the joiner oligos are each between 20 and 80 complementary bases.
24. The method of any one of claims 15-23, wherein one or more or all of the joiner oligos are from about 50 bp to about 400 bp, ally wherein one or more or all of the joiner oligos are from about 100 bp to about 300 bp.
25. The method of any one of claims 15-24, wherein one or more or all of the joiner oligos further comprise a spacer between the first complementary sequence and the second complementary sequence, ally wherein: (I) the spacer ses a drug resistance gene, a er gene, sequences for detection, sequences suitable for PCR, or one or more restriction enzyme sites to confirm successful assembly; and/or (II) the spacer is from about 20 bp to about 120 bp.
26. The method of any one of claims 15-25, wherein the first digested c acid, the other nucleic acids, and the joiner oligos are assembled in the same reaction or sequentially.
27. The method of any one of claims 15-26, wherein the first digested nucleic acid is seamlessly led to the second nucleic acid.
28. The method of claim 27, wherein the cleaving by the first nuclease agent and/or the second nuclease agent removes a double-stranded fragment from an end of the first nucleic acid at which the seamless ly will occur, and wherein the first joiner oligo further comprises a spacer between the first complementary sequence and the second complementary sequence, and wherein the spacer comprises a sequence identical to the fragment, wherein no nucleic acid bases are present n the first complementary sequence and the sequence identical to the fragment, and no nucleic acid bases are present between the second complementary sequence and the sequence identical to the fragment.
29. The method any one of claims 15-28, wherein one or more or all of the joiner oligos are single-stranded DNA.
30. The method of any one of claims 1-29, wherein: (I) one or more or all of the nucleic acids are vectors from about 20 kb to about 400 kb in length; and/or (II) the assembled nucleic acid is from 30 kb to 1 Mb in length.
31. The method of any one of claims 1-30, n one or more or all of the nucleic acids are at least 10 kb.
32. The method of any one of claims 1-31, wherein: (I) one or more or all of the nucleic acids comprise a bacterial artificial chromosome; and/or (II) one or more or all of the nucleic acids se a human DNA, a rodent DNA, a synthetic DNA, or a combination thereof; and/or (III) each nucleic acid ses a bacterial artificial chromosome, a gene of interest spans the bacterial artificial chromosomes, and the assembly forms the sequence of the gene of interest; and/or (IV) one or more or all of the nucleic acids are circular nucleic acids or linear nucleic acids; and/or (V) the assembled nucleic acid is a circular c acid or a linear nucleic acid.
33. The method of any one of claims 1-32, wherein the method comprises combining the nucleic acids and the joiner oligos with a ligase, the exonuclease, a DNA rase, and nucleotides and incubating at a constant temperature.
34. The method of claim 33, wherein the assembly occurs in a one-step isothermal reaction.
35. The method of any one of claims 1-34, wherein the first nuclease agent comprises the Cas protein and the gRNA, n the Cas protein is a Cas9 protein, wherein the gRNA comprises a nucleic acid ce ng a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA), and wherein the first nuclease agent targets a target site that is immediately flanked by a Protospacer Adjacent Motif (PAM) sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NZ765591A NZ765591A (en) | 2014-06-23 | 2015-06-23 | Nuclease-mediated dna assembly |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462015809P | 2014-06-23 | 2014-06-23 | |
US62/015,809 | 2014-06-23 | ||
US201462016400P | 2014-06-24 | 2014-06-24 | |
US62/016,400 | 2014-06-24 | ||
US201462036983P | 2014-08-13 | 2014-08-13 | |
US62/036,983 | 2014-08-13 | ||
PCT/US2015/037199 WO2015200334A1 (en) | 2014-06-23 | 2015-06-23 | Nuclease-mediated dna assembly |
Publications (2)
Publication Number | Publication Date |
---|---|
NZ727952A NZ727952A (en) | 2021-11-26 |
NZ727952B2 true NZ727952B2 (en) | 2022-03-01 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11932859B2 (en) | Nuclease-mediated DNA assembly | |
US11499164B2 (en) | Methods for scarless introduction of targeted modifications into targeting vectors | |
NZ727952B2 (en) | Nuclease-mediated dna assembly |