CN118234854A - Improved lead editing system efficiency using cis-acting regulatory elements - Google Patents
Improved lead editing system efficiency using cis-acting regulatory elements Download PDFInfo
- Publication number
- CN118234854A CN118234854A CN202280075555.5A CN202280075555A CN118234854A CN 118234854 A CN118234854 A CN 118234854A CN 202280075555 A CN202280075555 A CN 202280075555A CN 118234854 A CN118234854 A CN 118234854A
- Authority
- CN
- China
- Prior art keywords
- sequence
- protein
- nucleic acid
- rna
- pegrna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001105 regulatory effect Effects 0.000 title claims abstract description 43
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 155
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 119
- 238000000034 method Methods 0.000 claims abstract description 102
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 91
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 82
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 82
- 239000000203 mixture Substances 0.000 claims abstract description 45
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims abstract description 23
- 102100034343 Integrase Human genes 0.000 claims abstract 5
- 108020004414 DNA Proteins 0.000 claims description 84
- 108091033409 CRISPR Proteins 0.000 claims description 57
- 230000014509 gene expression Effects 0.000 claims description 52
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 46
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 35
- 230000000295 complement effect Effects 0.000 claims description 20
- 108020004999 messenger RNA Proteins 0.000 claims description 18
- 108020005004 Guide RNA Proteins 0.000 claims description 17
- 230000027455 binding Effects 0.000 claims description 14
- 230000004048 modification Effects 0.000 claims description 12
- 238000012986 modification Methods 0.000 claims description 12
- 108020004705 Codon Proteins 0.000 claims description 11
- 239000013604 expression vector Substances 0.000 claims description 11
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 9
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 claims description 7
- 238000012258 culturing Methods 0.000 claims description 7
- 241000700605 Viruses Species 0.000 claims description 6
- 238000010354 CRISPR gene editing Methods 0.000 claims 1
- 210000004027 cell Anatomy 0.000 description 177
- 230000002759 chromosomal effect Effects 0.000 description 165
- 102000037865 fusion proteins Human genes 0.000 description 117
- 108020001507 fusion proteins Proteins 0.000 description 117
- 108010042407 Endonucleases Proteins 0.000 description 109
- 235000018102 proteins Nutrition 0.000 description 105
- 102100031780 Endonuclease Human genes 0.000 description 94
- 125000003729 nucleotide group Chemical group 0.000 description 86
- 239000002773 nucleotide Substances 0.000 description 82
- 238000003776 cleavage reaction Methods 0.000 description 73
- 230000007017 scission Effects 0.000 description 73
- 210000001161 mammalian embryo Anatomy 0.000 description 70
- 102000040430 polynucleotide Human genes 0.000 description 60
- 108091033319 polynucleotide Proteins 0.000 description 60
- 239000002157 polynucleotide Substances 0.000 description 60
- 241001465754 Metazoa Species 0.000 description 56
- 101710163270 Nuclease Proteins 0.000 description 42
- 239000012636 effector Substances 0.000 description 35
- 102000004533 Endonucleases Human genes 0.000 description 33
- 238000011144 upstream manufacturing Methods 0.000 description 32
- 230000005782 double-strand break Effects 0.000 description 28
- 230000008439 repair process Effects 0.000 description 27
- 102100021579 Enhancer of filamentation 1 Human genes 0.000 description 26
- 101000898310 Homo sapiens Enhancer of filamentation 1 Proteins 0.000 description 26
- 241000699666 Mus <mouse, genus> Species 0.000 description 24
- 238000012217 deletion Methods 0.000 description 23
- 230000037430 deletion Effects 0.000 description 23
- 210000002257 embryonic structure Anatomy 0.000 description 22
- 108010077850 Nuclear Localization Signals Proteins 0.000 description 21
- 238000003780 insertion Methods 0.000 description 19
- 230000037431 insertion Effects 0.000 description 19
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 18
- 230000000694 effects Effects 0.000 description 17
- 239000013598 vector Substances 0.000 description 16
- 238000001890 transfection Methods 0.000 description 15
- 230000010354 integration Effects 0.000 description 14
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 14
- 238000013518 transcription Methods 0.000 description 14
- 230000035897 transcription Effects 0.000 description 14
- 238000010453 CRISPR/Cas method Methods 0.000 description 13
- 241000700159 Rattus Species 0.000 description 13
- 108091026890 Coding region Proteins 0.000 description 12
- 102000053602 DNA Human genes 0.000 description 12
- 239000012634 fragment Substances 0.000 description 12
- 230000035772 mutation Effects 0.000 description 12
- 230000000149 penetrating effect Effects 0.000 description 12
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 11
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 11
- 230000004049 epigenetic modification Effects 0.000 description 11
- 230000006780 non-homologous end joining Effects 0.000 description 11
- 239000013600 plasmid vector Substances 0.000 description 11
- 230000004568 DNA-binding Effects 0.000 description 10
- 235000001014 amino acid Nutrition 0.000 description 10
- 230000012361 double-strand break repair Effects 0.000 description 10
- 210000003527 eukaryotic cell Anatomy 0.000 description 10
- 239000000833 heterodimer Substances 0.000 description 10
- 238000006467 substitution reaction Methods 0.000 description 10
- 102100021601 Ephrin type-A receptor 8 Human genes 0.000 description 9
- 101000898676 Homo sapiens Ephrin type-A receptor 8 Proteins 0.000 description 9
- 239000000539 dimer Substances 0.000 description 9
- 210000004962 mammalian cell Anatomy 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 125000006850 spacer group Chemical group 0.000 description 9
- 108091023040 Transcription factor Proteins 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 8
- 230000002068 genetic effect Effects 0.000 description 8
- 238000010362 genome editing Methods 0.000 description 8
- 238000000338 in vitro Methods 0.000 description 8
- 108020005345 3' Untranslated Regions Proteins 0.000 description 7
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 7
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 7
- 108091036066 Three prime untranslated region Proteins 0.000 description 7
- 230000004913 activation Effects 0.000 description 7
- 150000001413 amino acids Chemical class 0.000 description 7
- 101150038500 cas9 gene Proteins 0.000 description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 7
- 239000003623 enhancer Substances 0.000 description 7
- 230000001404 mediated effect Effects 0.000 description 7
- 239000000178 monomer Substances 0.000 description 7
- 241000699800 Cricetinae Species 0.000 description 6
- 102100025169 Max-binding protein MNT Human genes 0.000 description 6
- 102000040945 Transcription factor Human genes 0.000 description 6
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 6
- 239000003550 marker Substances 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 210000000130 stem cell Anatomy 0.000 description 6
- 108091006107 transcriptional repressors Proteins 0.000 description 6
- 229910052725 zinc Inorganic materials 0.000 description 6
- 239000011701 zinc Substances 0.000 description 6
- 230000033616 DNA repair Effects 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 241000238631 Hexapoda Species 0.000 description 5
- 108090000246 Histone acetyltransferases Proteins 0.000 description 5
- 102000003893 Histone acetyltransferases Human genes 0.000 description 5
- 108091007767 MALAT1 Proteins 0.000 description 5
- -1 amino, carboxyl Chemical group 0.000 description 5
- 230000001580 bacterial effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 108091006047 fluorescent proteins Proteins 0.000 description 5
- 102000034287 fluorescent proteins Human genes 0.000 description 5
- 239000000710 homodimer Substances 0.000 description 5
- 210000003734 kidney Anatomy 0.000 description 5
- 238000002372 labelling Methods 0.000 description 5
- 230000008488 polyadenylation Effects 0.000 description 5
- 241000283690 Bos taurus Species 0.000 description 4
- 241000282465 Canis Species 0.000 description 4
- 108010051219 Cre recombinase Proteins 0.000 description 4
- 241000282326 Felis catus Species 0.000 description 4
- 108010033040 Histones Proteins 0.000 description 4
- 108090000144 Human Proteins Proteins 0.000 description 4
- 102000003839 Human Proteins Human genes 0.000 description 4
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 4
- 241000288906 Primates Species 0.000 description 4
- 241000283984 Rodentia Species 0.000 description 4
- 241000187191 Streptomyces viridochromogenes Species 0.000 description 4
- 238000010459 TALEN Methods 0.000 description 4
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 4
- 230000001464 adherent effect Effects 0.000 description 4
- 235000004279 alanine Nutrition 0.000 description 4
- 125000000539 amino acid group Chemical group 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 208000035475 disorder Diseases 0.000 description 4
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 4
- 230000001965 increasing effect Effects 0.000 description 4
- 238000001638 lipofection Methods 0.000 description 4
- 230000011987 methylation Effects 0.000 description 4
- 238000007069 methylation reaction Methods 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 230000006798 recombination Effects 0.000 description 4
- 238000005215 recombination Methods 0.000 description 4
- 108010054624 red fluorescent protein Proteins 0.000 description 4
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000008685 targeting Effects 0.000 description 4
- 210000001519 tissue Anatomy 0.000 description 4
- 230000005945 translocation Effects 0.000 description 4
- 239000013603 viral vector Substances 0.000 description 4
- 240000002900 Arthrospira platensis Species 0.000 description 3
- 235000016425 Arthrospira platensis Nutrition 0.000 description 3
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 3
- 102000020313 Cell-Penetrating Peptides Human genes 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 3
- 102100022846 Histone acetyltransferase KAT2B Human genes 0.000 description 3
- 102100022893 Histone acetyltransferase KAT5 Human genes 0.000 description 3
- 102100038885 Histone acetyltransferase p300 Human genes 0.000 description 3
- 101000882390 Homo sapiens Histone acetyltransferase p300 Proteins 0.000 description 3
- 241001502974 Human gammaherpesvirus 8 Species 0.000 description 3
- 206010025323 Lymphomas Diseases 0.000 description 3
- 208000009869 Neu-Laxova syndrome Diseases 0.000 description 3
- 108091092724 Noncoding DNA Proteins 0.000 description 3
- 241000283973 Oryctolagus cuniculus Species 0.000 description 3
- 240000007594 Oryza sativa Species 0.000 description 3
- 235000007164 Oryza sativa Nutrition 0.000 description 3
- 206010035226 Plasma cell myeloma Diseases 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 3
- 241000187747 Streptomyces Species 0.000 description 3
- 229940011019 arthrospira platensis Drugs 0.000 description 3
- 108010006025 bovine growth hormone Proteins 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 210000002950 fibroblast Anatomy 0.000 description 3
- 210000005260 human cell Anatomy 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 201000000050 myeloid neoplasm Diseases 0.000 description 3
- 230000009437 off-target effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000035515 penetration Effects 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 235000009566 rice Nutrition 0.000 description 3
- 230000000087 stabilizing effect Effects 0.000 description 3
- 230000005030 transcription termination Effects 0.000 description 3
- 241001464929 Acidithiobacillus caldus Species 0.000 description 2
- 241000605222 Acidithiobacillus ferrooxidans Species 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 2
- 102000007469 Actins Human genes 0.000 description 2
- 108010085238 Actins Proteins 0.000 description 2
- 241000640374 Alicyclobacillus acidocaldarius Species 0.000 description 2
- 241000272517 Anseriformes Species 0.000 description 2
- 241000620196 Arthrospira maxima Species 0.000 description 2
- 241000271566 Aves Species 0.000 description 2
- 241000906059 Bacillus pseudomycoides Species 0.000 description 2
- 101710201279 Biotin carboxyl carrier protein Proteins 0.000 description 2
- 102100021975 CREB-binding protein Human genes 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 241000282693 Cercopithecidae Species 0.000 description 2
- 108091062157 Cis-regulatory element Proteins 0.000 description 2
- 241000193163 Clostridioides difficile Species 0.000 description 2
- 241000193155 Clostridium botulinum Species 0.000 description 2
- 108091035707 Consensus sequence Proteins 0.000 description 2
- 102000005636 Cyclic AMP Response Element-Binding Protein Human genes 0.000 description 2
- 108010045171 Cyclic AMP Response Element-Binding Protein Proteins 0.000 description 2
- 241000255601 Drosophila melanogaster Species 0.000 description 2
- 102220518659 Enhancer of filamentation 1_D10A_mutation Human genes 0.000 description 2
- 241000283073 Equus caballus Species 0.000 description 2
- 101100219622 Escherichia coli (strain K12) casC gene Proteins 0.000 description 2
- 108700024394 Exon Proteins 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 2
- 102000005720 Glutathione transferase Human genes 0.000 description 2
- 108010070675 Glutathione transferase Proteins 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 108010068250 Herpes Simplex Virus Protein Vmw65 Proteins 0.000 description 2
- 102000006947 Histones Human genes 0.000 description 2
- 101001046967 Homo sapiens Histone acetyltransferase KAT2A Proteins 0.000 description 2
- 101001047006 Homo sapiens Histone acetyltransferase KAT2B Proteins 0.000 description 2
- 101001046996 Homo sapiens Histone acetyltransferase KAT5 Proteins 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 2
- 241000186673 Lactobacillus delbrueckii Species 0.000 description 2
- 241000186869 Lactobacillus salivarius Species 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 108060004795 Methyltransferase Proteins 0.000 description 2
- 241000192710 Microcystis aeruginosa Species 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 2
- 108091093037 Peptide nucleic acid Proteins 0.000 description 2
- 102000011755 Phosphoglycerate Kinase Human genes 0.000 description 2
- 241000235648 Pichia Species 0.000 description 2
- 102000014450 RNA Polymerase III Human genes 0.000 description 2
- 108010078067 RNA Polymerase III Proteins 0.000 description 2
- 230000004570 RNA-binding Effects 0.000 description 2
- 108091028664 Ribonucleotide Proteins 0.000 description 2
- 241000714474 Rous sarcoma virus Species 0.000 description 2
- 241000235070 Saccharomyces Species 0.000 description 2
- 241000235346 Schizosaccharomyces Species 0.000 description 2
- 241000700584 Simplexvirus Species 0.000 description 2
- 241000256251 Spodoptera frugiperda Species 0.000 description 2
- 241000193996 Streptococcus pyogenes Species 0.000 description 2
- 241000194020 Streptococcus thermophilus Species 0.000 description 2
- 241000282887 Suidae Species 0.000 description 2
- 210000001744 T-lymphocyte Anatomy 0.000 description 2
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 2
- 101001099217 Thermotoga maritima (strain ATCC 43589 / DSM 3109 / JCM 10099 / NBRC 100826 / MSB8) Triosephosphate isomerase Proteins 0.000 description 2
- 102000002933 Thioredoxin Human genes 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- 241000078013 Trichormus variabilis Species 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 2
- 230000021736 acetylation Effects 0.000 description 2
- 238000006640 acetylation reaction Methods 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 2
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 2
- 235000003704 aspartic acid Nutrition 0.000 description 2
- 230000037429 base substitution Effects 0.000 description 2
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 2
- 230000003115 biocidal effect Effects 0.000 description 2
- 210000004899 c-terminal region Anatomy 0.000 description 2
- 102100029387 cAMP-responsive element modulator Human genes 0.000 description 2
- 101710152311 cAMP-responsive element modulator Proteins 0.000 description 2
- 238000004113 cell culture Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 239000005547 deoxyribonucleotide Substances 0.000 description 2
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 2
- 238000001212 derivatisation Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000006471 dimerization reaction Methods 0.000 description 2
- 210000001671 embryonic stem cell Anatomy 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 108010021843 fluorescent protein 583 Proteins 0.000 description 2
- 108700025906 fos Genes Proteins 0.000 description 2
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 210000003292 kidney cell Anatomy 0.000 description 2
- 239000002502 liposome Substances 0.000 description 2
- 244000144972 livestock Species 0.000 description 2
- 210000002540 macrophage Anatomy 0.000 description 2
- 210000005075 mammary gland Anatomy 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 201000001441 melanoma Diseases 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 2
- 238000002703 mutagenesis Methods 0.000 description 2
- 231100000350 mutagenesis Toxicity 0.000 description 2
- 210000003098 myoblast Anatomy 0.000 description 2
- 201000008968 osteosarcoma Diseases 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 239000002336 ribonucleotide Substances 0.000 description 2
- 125000002652 ribonucleotide group Chemical group 0.000 description 2
- 229910052594 sapphire Inorganic materials 0.000 description 2
- 239000010980 sapphire Substances 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 238000002741 site-directed mutagenesis Methods 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 238000010381 tandem affinity purification Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 108060008226 thioredoxin Proteins 0.000 description 2
- 229940094937 thioredoxin Drugs 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 108700026220 vif Genes Proteins 0.000 description 2
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 2
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 1
- NEWKHUASLBMWRE-UHFFFAOYSA-N 2-methyl-6-(phenylethynyl)pyridine Chemical compound CC1=CC=CC(C#CC=2C=CC=CC=2)=N1 NEWKHUASLBMWRE-UHFFFAOYSA-N 0.000 description 1
- KQPKMEYBZUPZGK-UHFFFAOYSA-N 4-[(4-azido-2-nitroanilino)methyl]-5-(hydroxymethyl)-2-methylpyridin-3-ol Chemical compound CC1=NC=C(CO)C(CNC=2C(=CC(=CC=2)N=[N+]=[N-])[N+]([O-])=O)=C1O KQPKMEYBZUPZGK-UHFFFAOYSA-N 0.000 description 1
- 241000007910 Acaryochloris marina Species 0.000 description 1
- 241000589220 Acetobacter Species 0.000 description 1
- 241001135192 Acetohalobium arabaticum Species 0.000 description 1
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 241000190857 Allochromatium vinosum Species 0.000 description 1
- 241000147155 Ammonifex degensii Species 0.000 description 1
- 241000187643 Amycolatopsis Species 0.000 description 1
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 241001495180 Arthrospira Species 0.000 description 1
- 241001495183 Arthrospira sp. Species 0.000 description 1
- 241000282672 Ateles sp. Species 0.000 description 1
- 108091005950 Azurite Proteins 0.000 description 1
- 108091007065 BIRCs Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 241001453380 Burkholderia Species 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 108010040163 CREB-Binding Protein Proteins 0.000 description 1
- 238000010446 CRISPR interference Methods 0.000 description 1
- 101150018129 CSF2 gene Proteins 0.000 description 1
- 101150069031 CSN2 gene Proteins 0.000 description 1
- 101100381481 Caenorhabditis elegans baz-2 gene Proteins 0.000 description 1
- 102000000584 Calmodulin Human genes 0.000 description 1
- 108010041952 Calmodulin Proteins 0.000 description 1
- 241001496650 Candidatus Desulforudis Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 241000700198 Cavia Species 0.000 description 1
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 1
- 108091005944 Cerulean Proteins 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 229920002101 Chitin Polymers 0.000 description 1
- 101000709520 Chlamydia trachomatis serovar L2 (strain 434/Bu / ATCC VR-902B) Atypical response regulator protein ChxR Proteins 0.000 description 1
- 241000195649 Chlorella <Chlorellales> Species 0.000 description 1
- 241000282552 Chlorocebus aethiops Species 0.000 description 1
- 241000867607 Chlorocebus sabaeus Species 0.000 description 1
- 241000579895 Chlorostilbon Species 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 102100031668 Chromodomain Y-like protein Human genes 0.000 description 1
- 102100035371 Chymotrypsin-like elastase family member 1 Human genes 0.000 description 1
- 101710138848 Chymotrypsin-like elastase family member 1 Proteins 0.000 description 1
- 108091005960 Citrine Proteins 0.000 description 1
- 241000907165 Coleofasciculus chthonoplastes Species 0.000 description 1
- 241000699802 Cricetulus griseus Species 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 241000065716 Crocosphaera watsonii Species 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- 241000238424 Crustacea Species 0.000 description 1
- 101150074775 Csf1 gene Proteins 0.000 description 1
- 241000159506 Cyanothece Species 0.000 description 1
- 108010060385 Cyclin B1 Proteins 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- 241000701022 Cytomegalovirus Species 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 230000007018 DNA scission Effects 0.000 description 1
- 102100036912 Desmin Human genes 0.000 description 1
- 108010044052 Desmin Proteins 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- 241000255925 Diptera Species 0.000 description 1
- 108091005947 EBFP2 Proteins 0.000 description 1
- 108091005942 ECFP Proteins 0.000 description 1
- 101710099240 Elastase-1 Proteins 0.000 description 1
- 102100035074 Elongator complex protein 3 Human genes 0.000 description 1
- 208000001976 Endocrine Gland Neoplasms Diseases 0.000 description 1
- 102100037241 Endoglin Human genes 0.000 description 1
- 108010036395 Endoglin Proteins 0.000 description 1
- 241000588914 Enterobacter Species 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 101100007792 Escherichia coli (strain K12) casB gene Proteins 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 241000326311 Exiguobacterium sibiricum Species 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 102100037362 Fibronectin Human genes 0.000 description 1
- 108010067306 Fibronectins Proteins 0.000 description 1
- 241000192016 Finegoldia magna Species 0.000 description 1
- 102100032340 G2/mitotic-specific cyclin-B1 Human genes 0.000 description 1
- 108010001515 Galectin 4 Proteins 0.000 description 1
- 102100039556 Galectin-4 Human genes 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 241000699694 Gerbillinae Species 0.000 description 1
- 102100039289 Glial fibrillary acidic protein Human genes 0.000 description 1
- 101710193519 Glial fibrillary acidic protein Proteins 0.000 description 1
- KOSRFJWDECSPRO-WDSKDSINSA-N Glu-Glu Chemical compound OC(=O)CC[C@H](N)C(=O)N[C@@H](CCC(O)=O)C(O)=O KOSRFJWDECSPRO-WDSKDSINSA-N 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- 244000060234 Gmelina philippensis Species 0.000 description 1
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 1
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 1
- 102100021519 Hemoglobin subunit beta Human genes 0.000 description 1
- 108091005904 Hemoglobin subunit beta Proteins 0.000 description 1
- 241000700721 Hepatitis B virus Species 0.000 description 1
- 102000008157 Histone Demethylases Human genes 0.000 description 1
- 108010074870 Histone Demethylases Proteins 0.000 description 1
- 102000011787 Histone Methyltransferases Human genes 0.000 description 1
- 108010036115 Histone Methyltransferases Proteins 0.000 description 1
- 102100022901 Histone acetyltransferase KAT2A Human genes 0.000 description 1
- 101710083341 Histone acetyltransferase KAT2B Proteins 0.000 description 1
- 101710116149 Histone acetyltransferase KAT5 Proteins 0.000 description 1
- 102100033071 Histone acetyltransferase KAT6A Human genes 0.000 description 1
- 102100033070 Histone acetyltransferase KAT6B Human genes 0.000 description 1
- 102100033068 Histone acetyltransferase KAT7 Human genes 0.000 description 1
- 102100033069 Histone acetyltransferase KAT8 Human genes 0.000 description 1
- 102100021467 Histone acetyltransferase type B catalytic subunit Human genes 0.000 description 1
- 102000043851 Histone deacetylase domains Human genes 0.000 description 1
- 108700038236 Histone deacetylase domains Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000896987 Homo sapiens CREB-binding protein Proteins 0.000 description 1
- 101000721661 Homo sapiens Cellular tumor antigen p53 Proteins 0.000 description 1
- 101000777795 Homo sapiens Chromodomain Y-like protein Proteins 0.000 description 1
- 101000877382 Homo sapiens Elongator complex protein 3 Proteins 0.000 description 1
- 101000828609 Homo sapiens Flotillin-2 Proteins 0.000 description 1
- 101000944179 Homo sapiens Histone acetyltransferase KAT6A Proteins 0.000 description 1
- 101000944174 Homo sapiens Histone acetyltransferase KAT6B Proteins 0.000 description 1
- 101000944166 Homo sapiens Histone acetyltransferase KAT7 Proteins 0.000 description 1
- 101000944170 Homo sapiens Histone acetyltransferase KAT8 Proteins 0.000 description 1
- 101000898976 Homo sapiens Histone acetyltransferase type B catalytic subunit Proteins 0.000 description 1
- 101000868279 Homo sapiens Leukocyte surface antigen CD47 Proteins 0.000 description 1
- 101000608935 Homo sapiens Leukosialin Proteins 0.000 description 1
- 101000934372 Homo sapiens Macrosialin Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 101000946889 Homo sapiens Monocyte differentiation antigen CD14 Proteins 0.000 description 1
- 101001111984 Homo sapiens N-acylneuraminate-9-phosphatase Proteins 0.000 description 1
- 101000602926 Homo sapiens Nuclear receptor coactivator 1 Proteins 0.000 description 1
- 101000602930 Homo sapiens Nuclear receptor coactivator 2 Proteins 0.000 description 1
- 101000974356 Homo sapiens Nuclear receptor coactivator 3 Proteins 0.000 description 1
- 101000585728 Homo sapiens Protein O-GlcNAcase Proteins 0.000 description 1
- 101000738771 Homo sapiens Receptor-type tyrosine-protein phosphatase C Proteins 0.000 description 1
- 101000821100 Homo sapiens Synapsin-1 Proteins 0.000 description 1
- 101000777789 Homo sapiens Testis-specific chromodomain protein Y 1 Proteins 0.000 description 1
- 101000777786 Homo sapiens Testis-specific chromodomain protein Y 2 Proteins 0.000 description 1
- 101000666382 Homo sapiens Transcription factor E2-alpha Proteins 0.000 description 1
- 101000801209 Homo sapiens Transducin-like enhancer protein 4 Proteins 0.000 description 1
- 108010091358 Hypoxanthine Phosphoribosyltransferase Proteins 0.000 description 1
- 102100029098 Hypoxanthine-guanine phosphoribosyltransferase Human genes 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100025306 Integrin alpha-IIb Human genes 0.000 description 1
- 101710149643 Integrin alpha-IIb Proteins 0.000 description 1
- 102100037872 Intercellular adhesion molecule 2 Human genes 0.000 description 1
- 101710148794 Intercellular adhesion molecule 2 Proteins 0.000 description 1
- 108020004684 Internal Ribosome Entry Sites Proteins 0.000 description 1
- 241001430080 Ktedonobacter racemifer Species 0.000 description 1
- 241000282838 Lama Species 0.000 description 1
- 241000288904 Lemur Species 0.000 description 1
- 102100032913 Leukocyte surface antigen CD47 Human genes 0.000 description 1
- 102100039564 Leukosialin Human genes 0.000 description 1
- 241001134698 Lyngbya Species 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 241000282553 Macaca Species 0.000 description 1
- 102100025136 Macrosialin Human genes 0.000 description 1
- 101710175625 Maltose/maltodextrin-binding periplasmic protein Proteins 0.000 description 1
- 241000501784 Marinobacter sp. Species 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 241000204637 Methanohalobium evestigatum Species 0.000 description 1
- 102000006890 Methyl-CpG-Binding Protein 2 Human genes 0.000 description 1
- 108010072388 Methyl-CpG-Binding Protein 2 Proteins 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 241001467578 Microbacterium Species 0.000 description 1
- 241000192041 Micrococcus Species 0.000 description 1
- 241000192701 Microcystis Species 0.000 description 1
- 241000190928 Microscilla marina Species 0.000 description 1
- 241000713869 Moloney murine leukemia virus Species 0.000 description 1
- 102100035877 Monocyte differentiation antigen CD14 Human genes 0.000 description 1
- 241000713333 Mouse mammary tumor virus Species 0.000 description 1
- 108010086093 Mung Bean Nuclease Proteins 0.000 description 1
- 101000981253 Mus musculus GPI-linked NAD(P)(+)-arginine ADP-ribosyltransferase 1 Proteins 0.000 description 1
- 241000282339 Mustela Species 0.000 description 1
- 102100023906 N-acylneuraminate-9-phosphatase Human genes 0.000 description 1
- 241001602876 Nata Species 0.000 description 1
- 241000167285 Natranaerobius thermophilus Species 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 206010029260 Neuroblastoma Diseases 0.000 description 1
- 101100385413 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) csm-3 gene Proteins 0.000 description 1
- 101100495430 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) hH3v gene Proteins 0.000 description 1
- 101100083259 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) pho-4 gene Proteins 0.000 description 1
- 241001495402 Nitrococcus Species 0.000 description 1
- 241000919925 Nitrosococcus halophilus Species 0.000 description 1
- 241001515112 Nitrosococcus watsonii Species 0.000 description 1
- 241000203619 Nocardiopsis dassonvillei Species 0.000 description 1
- 241001223105 Nodularia spumigena Species 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 241000192656 Nostoc Species 0.000 description 1
- 241000192673 Nostoc sp. Species 0.000 description 1
- 102100037223 Nuclear receptor coactivator 1 Human genes 0.000 description 1
- 102100037226 Nuclear receptor coactivator 2 Human genes 0.000 description 1
- 102100022883 Nuclear receptor coactivator 3 Human genes 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 241000192497 Oscillatoria Species 0.000 description 1
- 241001560086 Pachyrhizus Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 241000142651 Pelotomaculum thermopropionicum Species 0.000 description 1
- 108010088535 Pep-1 peptide Proteins 0.000 description 1
- 102000002508 Peptide Elongation Factors Human genes 0.000 description 1
- 108010068204 Peptide Elongation Factors Proteins 0.000 description 1
- 241000983938 Petrotoga mobilis Species 0.000 description 1
- 241000286209 Phasianidae Species 0.000 description 1
- 241001599925 Polaromonas naphthalenivorans Species 0.000 description 1
- 241001472610 Polaromonas sp. Species 0.000 description 1
- RVGRUAULSDPKGF-UHFFFAOYSA-N Poloxamer Chemical compound C1CO1.CC1CO1 RVGRUAULSDPKGF-UHFFFAOYSA-N 0.000 description 1
- 108091036407 Polyadenylation Proteins 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 102100030122 Protein O-GlcNAcase Human genes 0.000 description 1
- 241000196250 Prototheca Species 0.000 description 1
- 241000519590 Pseudoalteromonas Species 0.000 description 1
- 241000590028 Pseudoalteromonas haloplanktis Species 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 230000021839 RNA stabilization Effects 0.000 description 1
- 241000700157 Rattus norvegicus Species 0.000 description 1
- 101100372762 Rattus norvegicus Flt1 gene Proteins 0.000 description 1
- 101100047461 Rattus norvegicus Trpm8 gene Proteins 0.000 description 1
- 102100037422 Receptor-type tyrosine-protein phosphatase C Human genes 0.000 description 1
- 108010091086 Recombinases Proteins 0.000 description 1
- 102000018120 Recombinases Human genes 0.000 description 1
- 241000701037 Rhadinovirus Species 0.000 description 1
- 101001025539 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) Homothallic switching endonuclease Proteins 0.000 description 1
- 241000282695 Saimiri Species 0.000 description 1
- 108091061750 Signal recognition particle RNA Proteins 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 241001518258 Streptomyces pristinaespiralis Species 0.000 description 1
- 241000187180 Streptomyces sp. Species 0.000 description 1
- 241000203587 Streptosporangium roseum Species 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 102100021905 Synapsin-1 Human genes 0.000 description 1
- 241000192707 Synechococcus Species 0.000 description 1
- 241000192560 Synechococcus sp. Species 0.000 description 1
- 101710137500 T7 RNA polymerase Proteins 0.000 description 1
- 101710192266 Tegument protein VP22 Proteins 0.000 description 1
- 241000255588 Tephritidae Species 0.000 description 1
- 102100031664 Testis-specific chromodomain protein Y 1 Human genes 0.000 description 1
- 102100031666 Testis-specific chromodomain protein Y 2 Human genes 0.000 description 1
- 241000223257 Thermomyces Species 0.000 description 1
- 241000206213 Thermosipho africanus Species 0.000 description 1
- 101100273269 Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) cse3 gene Proteins 0.000 description 1
- 102100038313 Transcription factor E2-alpha Human genes 0.000 description 1
- 102100035100 Transcription factor p65 Human genes 0.000 description 1
- 108050004072 Transcription initiation factor TFIID subunit 1 Proteins 0.000 description 1
- 102100035222 Transcription initiation factor TFIID subunit 1 Human genes 0.000 description 1
- 102100033763 Transducin-like enhancer protein 4 Human genes 0.000 description 1
- 101100400877 Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) MDR1 gene Proteins 0.000 description 1
- 108090000704 Tubulin Proteins 0.000 description 1
- 102000004243 Tubulin Human genes 0.000 description 1
- 108090000848 Ubiquitin Proteins 0.000 description 1
- 102000044159 Ubiquitin Human genes 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 241000545067 Venus Species 0.000 description 1
- 241001416177 Vicugna pacos Species 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 241001673106 [Bacillus] selenitireducens Species 0.000 description 1
- 125000000218 acetic acid group Chemical group C(C)(=O)* 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 150000001298 alcohols Chemical class 0.000 description 1
- KOSRFJWDECSPRO-UHFFFAOYSA-N alpha-L-glutamyl-L-glutamic acid Natural products OC(=O)CCC(N)C(=O)NC(CCC(O)=O)C(O)=O KOSRFJWDECSPRO-UHFFFAOYSA-N 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 125000004429 atom Chemical group 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 108091005948 blue fluorescent proteins Proteins 0.000 description 1
- 229910000389 calcium phosphate Inorganic materials 0.000 description 1
- 239000001506 calcium phosphate Substances 0.000 description 1
- 235000011010 calcium phosphates Nutrition 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 150000001721 carbon Chemical class 0.000 description 1
- 125000002057 carboxymethyl group Chemical group [H]OC(=O)C([H])([H])[*] 0.000 description 1
- 101150111685 cas4 gene Proteins 0.000 description 1
- 125000002091 cationic group Chemical group 0.000 description 1
- 229920006317 cationic polymer Polymers 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 230000004700 cellular uptake Effects 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 235000013330 chicken meat Nutrition 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 239000011035 citrine Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 101150055601 cops2 gene Proteins 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 108010082025 cyan fluorescent protein Proteins 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical group NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 1
- 239000000412 dendrimer Substances 0.000 description 1
- 229920000736 dendritic polymer Polymers 0.000 description 1
- 210000005045 desmin Anatomy 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 230000013020 embryo development Effects 0.000 description 1
- 239000010976 emerald Substances 0.000 description 1
- 229910052876 emerald Inorganic materials 0.000 description 1
- 201000011523 endocrine gland cancer Diseases 0.000 description 1
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 210000000604 fetal stem cell Anatomy 0.000 description 1
- 239000006260 foam Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 102000054766 genetic haplotypes Human genes 0.000 description 1
- 210000005046 glial fibrillary acidic protein Anatomy 0.000 description 1
- 208000005017 glioblastoma Diseases 0.000 description 1
- 108010055341 glutamyl-glutamic acid Proteins 0.000 description 1
- 239000005090 green fluorescent protein Substances 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 1
- 210000003494 hepatocyte Anatomy 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 238000000530 impalefection Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004263 induced pluripotent stem cell Anatomy 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 201000005249 lung adenocarcinoma Diseases 0.000 description 1
- 210000005265 lung cell Anatomy 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 108091005949 mKalama1 Proteins 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 241001515942 marmosets Species 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 238000000520 microinjection Methods 0.000 description 1
- 210000001616 monocyte Anatomy 0.000 description 1
- 210000002894 multi-fate stem cell Anatomy 0.000 description 1
- 230000002107 myocardial effect Effects 0.000 description 1
- 125000004433 nitrogen atom Chemical class N* 0.000 description 1
- 230000037434 nonsense mutation Effects 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 230000009438 off-target cleavage Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- LFGREXWGYUGZLY-UHFFFAOYSA-N phosphoryl Chemical group [P]=O LFGREXWGYUGZLY-UHFFFAOYSA-N 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 210000001778 pluripotent stem cell Anatomy 0.000 description 1
- 229960000502 poloxamer Drugs 0.000 description 1
- 229920001983 poloxamer Polymers 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 108010011110 polyarginine Proteins 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 239000013636 protein dimer Substances 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 230000018883 protein targeting Effects 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012453 sprague-dawley rat model Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 125000003396 thiol group Chemical group [H]S* 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical group [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000003151 transfection method Methods 0.000 description 1
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 1
- GWBUNZLLLLDXMD-UHFFFAOYSA-H tricopper;dicarbonate;dihydroxide Chemical compound [OH-].[OH-].[Cu+2].[Cu+2].[Cu+2].[O-]C([O-])=O.[O-]C([O-])=O GWBUNZLLLLDXMD-UHFFFAOYSA-H 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241001529453 unidentified herpesvirus Species 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 239000003981 vehicle Substances 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 239000000277 virosome Substances 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
Landscapes
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present invention is a synthetic nucleic acid composition and method of use thereof, the synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.
Description
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application Ser. Nos. 63/243,423 and 63/363,247, both filed on 9 and 13, 2022 and 4 and 20, both of which are incorporated herein by reference in their entireties.
Background
Targeted genomic modifications are powerful tools for genetic manipulation of DNA, including manipulation of eukaryotic cells, embryos and animals. For example, the exogenous sequence may be integrated into the targeted genomic location and/or a particular endogenous DNA (e.g., chromosomal) sequence may be deleted, inactivated or modified. Prior to CRISPR/Cas9 method (Perez-Pinera P,Ousterout DG,Gersbach CA,Advances in targeted genome editing,Curr Opin Chem Biol.,2012,16(3-4):268-77;Hsu PD,Lander ES,Zhang F,Development and Applications of CRISPR-Cas9 for Genome Engineering,Cell,2014,157(6):1262-1278), the method relied on the use of engineered nucleases, such as Zinc Finger Nucleases (ZFNs) or transcription activator-like effector nucleases (TALENs). These chimeric nucleases contain a programmable sequence-specific DNA binding module linked to a non-specific DNA cleavage domain. However, each new genomic target requires the design of a new ZFN or TALEN that contains a new sequence-specific DNA binding module. Thus, these custom designed nucleases tend to be expensive and time consuming to prepare. Furthermore, the specificity of ZFNs and TALENs makes them likely to mediate off-target cleavage.
The Crispr/Cas9 technology greatly enhances the ability of workers to target and manipulate DNA sequences, particularly eukaryotic sequences in vivo. However, CRISPR systems are not without their own limitations. For example, CRISPR/Cas9 systems function by creating Double Strand Breaks (DSBs) that allow for insertions, deletions, or base substitutions at the break site. However, DSBs are also associated with undesirable consequences including, for example, translocation. Furthermore, known pathological alleles originate from very precise (albeit inappropriate) insertions, deletions or base substitutions, which require precise gene editing to correct. Current techniques often lack the necessary precision and/or efficiency or lead to unacceptable results.
Recently Anzalone et al introduced a CRISPR-based system called lead editing (Anzalone et al, nature, 12 months 5 of 2019. 576:149-157). The lead editing system allows "search and replace" genome editing without double strand breaks or donor DNA. The authors describe that this system allows "genome editing in human cells that mediates targeted insertions, deletions and conversions between all 12 possible bases and combinations thereof" (supra, page 149). However, as is known in the art, with regard to lead editing, factors affecting efficiency have not been widely studied (Kim et al, nature Biotechnology, month 2021, volume 39, 198-206). There is a need for compositions and methods that improve the efficiency of lead editing systems.
Disclosure of Invention
In various aspects of the invention, there are compositions and methods that substantially increase the efficiency of a lead editing system.
In a non-limiting example, the leader editing system (PES) comprises a Cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion protein and a leader editing guide RNA (pegRNA). The PE pegRNA complex binds to the target DNA and nicks the strand containing PAM (pre-spacer adjacent motif). The resulting 3 'end hybridizes to the primer binding site and then reverse transcription of the new DNA containing the desired edit is initiated (prime) using the pegRNA's transcriptase template; the balance between edited 3' flap and unedited 5' flap, cleavage and ligation of cellular 5' flap, and DNA repair results in stably edited DNA (Anzalone et al, 2019) (see fig. 1; prior art). Current lead editors as exemplified by the system shown in fig. 1 do not exhibit the desired high efficiency, which limits their further use in the research or therapeutic field.
Current leader editing techniques use, for example, cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion proteins that bind to leader editing guide RNAs (pegrnas). The desired editing by the lead editing technique depends on the balance between the edited 3'flap and the unedited 5' flap. Due to the large size of Cas9 (H840A) nickase-RT fusion proteins, stable and efficient expression of the Cas9 (H840A) nickase-RT fusion proteins in target cells is always a challenge to achieve, and this affects the desired editing obtained by using PES.
To address this issue, the inventors incorporate a cis-acting regulatory element (e.g., dENE or sRSM 1) into the Cas9 (H840A) nickase-RT fusion expression cassette (fig. 2) to improve its mRNA stability and protein expression, which is believed to greatly enhance the efficiency of lead editing as well as any other gene editing techniques involving effector protein expression, including CRISPR-Cas9, CRISPR-Cas9 nickase, CRISPRi, CRISPRa, and the like. As shown in the examples section below, the present invention greatly improves the efficiency of current lead editing techniques without changing the characteristics of the desired editing and without adding any additional components to the lead editing complex.
Accordingly, the present invention relates to compositions and methods for substantially improving lead editing efficiency.
In one aspect, the invention contemplates a synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.
The CRISPR-Cas protein encoded by the synthetic nucleic acid composition of the invention may be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, the CRISPR-Cas protein is nCas-H840A.
The reverse transcriptase encoded by the synthetic nucleic acid composition may be any reverse transcriptase known to one of ordinary skill in the art. In one aspect of the invention, the reverse transcriptase is M-MLV-RT.
The cis-acting regulatory element encoded by the synthetic nucleic acid compositions of the present invention may be any cis-acting regulatory element known to those skilled in the art. In one aspect of the invention, the cis-acting regulatory element is dENE, ENE or sRSM1.
In one aspect of the invention, the synthetic nucleic acid composition of the invention is DNA.
In one aspect of the invention, the synthetic nucleic acid composition of the invention is RNA.
The invention contemplates that the synthetic nucleic acid compositions of the invention further comprise an expression promoter.
The invention further contemplates the synthetic nucleic acid compositions of the invention in an expression vector.
The invention further contemplates the incorporation of the synthetic nucleic acid compositions of the invention into transfected viruses.
The invention further contemplates that the cis-acting regulatory elements of the synthetic nucleic acid compositions of the invention are located after the stop codon of the CRISPR-Cas9 sequence and before the mRNA terminator.
The invention further contemplates that the synthetic nucleic acid composition of the invention further comprises a leader editing guide RNA (pegRNA), wherein the pegRNA is derived from one of PE1, PE2, and PE 2.
The invention further contemplates amino acid sequences encoded by the synthetic nucleic acid compositions of the invention.
The invention also relates to a use method. In one aspect, the invention features a method of modifying an endogenous DNA sequence, the method comprising: providing: i) An operable expression vector comprising a synthetic nucleic acid composition comprising: 1) a sequence encoding a CRISPR-Cas type II system protein, 2) a sequence encoding a reverse transcriptase, and 3) a sequence comprising a cis-acting regulatory element; ii) a leader editing guide RNA (pegRNA) comprising a Primer Binding Site (PBS); and iii) a cell comprising a target endogenous DNA sequence that is at least 50% complementary to PBS; transfecting a cell comprising an endogenous DNA sequence of interest with the synthetic nucleic acid composition and pegRNA of the invention; and culturing the transfected cells such that the desired modification is made to the endogenous DNA sequence.
The invention further contemplates that the synthetic nucleic acid composition used in the methods of the invention can be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, the CRISPR-Cas type II system protein is a Cas9 protein.
The methods of the invention further contemplate that the endogenous DNA sequences are at least 75% complementary to PBS.
The methods of the invention further contemplate that the endogenous DNA sequences are at least 90% complementary to PBS.
The methods of the invention further contemplate that the endogenous DNA sequences are at least 95% complementary to PBS.
The methods of the invention further contemplate that the endogenous DNA sequences are at least 98% complementary to PBS.
The methods of the invention further contemplate that the endogenous DNA sequence is 100% complementary to PBS.
The methods of the invention further contemplate that the CRISPR-Cas protein can be any CRISPR-Cas protein known to one of ordinary skill in the art. In one aspect of the invention, it is nCas-H840A.
It is further contemplated that the reverse transcriptase of the methods of the present invention may be any reverse transcriptase known to one of ordinary skill in the art. In one aspect of the invention, it is M-MLV-RT.
It is further contemplated that the cis-acting regulatory element of the methods of the present invention may be any cis-acting regulatory element known to those of skill in the art. In one aspect of the invention, the cis-acting regulatory element is selected from the group consisting of dENE, ENE, and sRSM.
The methods of the invention further contemplate that the operable expression vector encoding a synthetic nucleic acid of the invention is DNA.
The methods of the invention further contemplate that the operable expression vector encoding a synthetic nucleic acid of the invention is RNA.
The methods of the invention further contemplate the incorporation of the synthetic nucleic acid compositions of the invention into transfected viruses.
The methods of the invention further contemplate that the synthetic nucleic acid composition cis-acting regulatory elements of the invention are located after the stop codon of the CRISPR-Cas9 sequence and before the mRNA terminator.
The process of the present invention further contemplates pegRNA being derived from one of PE1, PE2, and PE 3.
The methods of the invention further contemplate introducing a CRISPR/Cas type II system protein encoded in an operable expression vector into a cell.
Drawings
Fig. 1 shows a diagram of a pilot editing technique as shown in the prior art. nCas9 (H840A) =cas 9 (H840A nickase; rt=reverse transcriptase; pbs=primer binding site).
FIG. 2 shows a diagram of the introduction of cis-acting regulatory elements into a lead editing expression cassette.
FIG. 3 shows that PE2-dENE enhances the editing efficiency of PE 2.
FIG. 4 shows that PE3-dENE enhances the editing efficiency of PE 3.
Figure 5 shows that 3' -UTR dENE improves PE editing efficiency for HEK3 targets in K562 cells.
Fig. 6 shows that 3' -UTR dENE does not improve PE editing efficiency for HEK3 targets in HEK293 cells.
Detailed Description
Definition of the definition
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The following references provide the skilled artisan with a general definition of many of the terms used in the present invention: singleton et al ,Dictionary of Microbiology and Molecular Biology(2nd ed.1994);The Cambridge Dictionary of Science and Technology(Walker, edited, 1988); the Glossary of Genetics, fifth edition, R.Rieger et al (eds.), SPRINGER VERLAG (1991); and Hale & Marham, THE HARPER Collins Dictionary of Biology (1991). The following terms as used herein have their assigned meanings unless otherwise indicated.
When introducing elements of the present disclosure or the preferred embodiments thereof, the articles "a," "an," "the," and "said" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.
The transitional phrases "comprising," "consisting essentially of … …," and "consisting of … …" have the meaning as given in MPEP 2111.03 (Manual of Patent Examining Procedure; united STATES PATENT AND TRADEMARK Office). Any claims using the transitional phrase "consisting essentially of … …" will be understood to list only the essential elements of the invention, and any other elements listed in the dependent claims are understood to be unnecessary to the invention listed in the claim to which they depend.
As used herein, the term "endogenous sequence" refers to the original chromosomal sequence of a cell.
As used herein, the term "exogenous" refers to a chromosomal sequence that is not native to the cell, or that is at a different chromosomal location at its natural location in the cell's genome.
"Gene" as used herein refers to the DNA region (including exons and introns) encoding a gene product, as well as all the DNA regions that regulate the production of the gene product, whether or not such regulatory sequences are adjacent to the coding and/or transcribed sequences. Thus, genes include, but are not necessarily limited to, promoter sequences, terminators, translational regulatory sequences, such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, border elements, origins of replication, matrix attachment sites, and locus control regions.
The term "heterologous" refers to an entity that is not endogenous or native to the cell of interest. For example, a heterologous protein refers to a protein that is derived or originally derived from an exogenous source (such as an exogenously introduced nucleic acid sequence). In some cases, the heterologous protein is not normally produced by the cell of interest.
The terms "nucleic acid" and "polynucleotide" refer to polymers of deoxyribonucleotides or ribonucleotides in either a linear or circular conformation, and in either single-or double-stranded form. For the purposes of this disclosure, these terms should not be construed as limiting the length of the polymer. The term may include known analogs of natural nucleotides, as well as nucleotides modified in the base, sugar and/or phosphate moieties (e.g., phosphorothioate backbones). Typically, analogs of a particular nucleotide have the same base pairing specificity; i.e., an analog of a will base pair with T.
The term "synthetic nucleic acid" refers to a nucleotide sequence synthesized in vitro (e.g., in a laboratory, and manually or with a nucleic acid synthesizer), and wherein the sequence is not found in nature. The sequence may be, for example, DNA or RNA or modifications thereof as described below, may be of any length, and may be any nucleotide sequence, provided that the sequence is not naturally occurring.
The term "nucleotide" refers to deoxyribonucleotide or ribonucleotide. The nucleotides may be standard nucleotides (i.e., adenosine, guanosine, cytidine, thymidine, and uridine) or nucleotide analogs. Nucleotide analogs refer to nucleotides having a modified purine or pyrimidine base or modified ribose moiety. Nucleotide analogs can be naturally occurring nucleotides (e.g., inosine) or non-naturally occurring nucleotides. Non-limiting examples of modifications to the sugar or base portion of a nucleotide include the addition (or removal) of acetyl, amino, carboxyl, carboxymethyl, hydroxyl, methyl, phosphoryl, and thiol groups, as well as the substitution of carbon and nitrogen atoms of the base with other atoms (e.g., 7-deazapurine). Nucleotide analogs also include dideoxynucleotides, 2' -O-methyl nucleotides, locked Nucleic Acids (LNA), peptide Nucleic Acids (PNA), and morpholino oligonucleotides (morpholinos).
The terms "polypeptide" and "protein" are used interchangeably and refer to a polymer of amino acid residues.
Techniques for determining nucleic acid and amino acid sequence identity are known in the art. Typically, such techniques involve determining the nucleotide sequence of the mRNA of the gene and/or determining the amino acid sequence encoded thereby, and comparing these sequences to a second nucleotide or amino acid sequence. Genomic sequences may also be determined and compared in this manner. In general, identity refers to the exact correspondence between nucleotides or amino acids of each of two polynucleotide or polypeptide sequences. Two or more sequences (polynucleotides or amino acids) may be compared by determining their percent identity. The percent identity of two sequences (whether nucleic acid sequences or amino acid sequences) is the number of exact matches between two aligned sequences divided by the length of the shorter sequence and multiplied by 100. Approximate alignment of nucleic acid sequences is provided by the local homology algorithm of Smith and Waterman, ADVANCES IN APPLIED MATHEMATICS 2:482-489 (1981). This algorithm can be applied to amino acid sequences by using a scoring matrix developed by Dayhoff, atlas of Protein Sequences and Structure, M.O. Dayhoff, journal 5:353-358,National Biomedical Research Foundation,Washington,D.C, USA, and standardized by Gribskov, nucleic acids Res.14 (6): 6755-6763 (1986). An exemplary implementation of this algorithm to determine percent identity of sequences is provided by Genetics Computer Group (Madison, wis.) in the "BestFit" utility. Other suitable programs for calculating percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST using default parameters. For example, BLASTN and BLASTP using the following default parameters may be used: genetic code = standard; filter = none; chain = both; cut-off = 60; desired = 10; matrix = BLOSUM62; description = 50 sequences; ranking basis = high score; database = non-redundant, genBank + EMBL + DDBJ + PDB + GenBank CDS translation + Swiss protein + Spupdate + PIR. Details of these programs can be found on the GenBank website.
As various changes could be made in the above cells and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and examples set forth below shall be interpreted as illustrative and not in a limiting sense.
Pilot editing system
The leader editing system (PES) is an improvement over CRISPR/Cas9 technology. As first described by Anzalone et al, PES uses a leader editing guide RNA (pegRNA) to guide the CRISPR/Cas9 complex to the desired target site in the genome. PEG is described (Marzec et al, TRENDS IN CELL Biology, month 4 2020, 33:4, 257-259) as containing not only a spacer region complementary to the target DNA strand, but also a Primer Binding Site (PBS) region and sequences to be introduced into the targeted DNA region. PBS is complementary to the second DNA strand and will generate a primer for Reverse Transcriptase (RT) linked to Cas9 nickase. RT is an RNA-dependent polymerase that uses the sequence from pegRNA as a template. The sequence is copied directly from the peg DNA into the target DNA sequence, thereby altering the target sequence in the desired manner.
In a non-limiting example, the leader editing (PE) comprises a Cas9 (H840A) nickase-Reverse Transcriptase (RT) fusion protein and a leader editing guide RNA (pegRNA); the PE pegRNA complex binds to the target DNA and nicks the PAM-containing strand. The resulting 3' end hybridizes to the primer binding site and then reverse transcription of the new DNA containing the desired edits is initiated using the pegRNA transcription template; the balance between edited 3' flap and unedited 5' flap, cleavage and ligation of endogenous cellular 5' flap, and DNA repair results in stably edited DNA (Anzalone et al, 2019; see FIG. 1). To date, several versions of the leader editor (PE) have been developed. PE1[ SEQ ID NO:1] was named by using wild-type Moloney murine leukemia virus reverse transcriptase (M-MLV RT) fused to the C-terminus of a Cas9 (H840A) nickase. PE2[ SEQ ID NO:2] engineered M-MLV RT was used. PE3[ SEQ ID NO:3] is defined by introducing additional guide RNA to nick the unedited strand (which increases editing efficiency, although also increases indel frequency). In PE3b (Anzalone et al), this nicking one-way guide RNA (sgRNA) targets the edited sequence, thereby preventing nicking of the unedited strand before editing occurs, which results in fewer indels in mammalian cells.
Cis-regulatory element
The present invention substantially improves the efficiency of a lead editing system by incorporating one or more cis-regulatory elements (CREs) into the system. (see FIG. 2) those of ordinary skill in the art will appreciate in light of the present description that cis-regulatory elements other than those specifically exemplified herein may also be suitable for use with the present invention, and that screening for such cis-regulatory elements without undue experimentation following the teachings of the present specification is within the skill and knowledge of those of ordinary skill in the art.
As Wittkapp and Kalay teach us (Nature REVIEWS GENETICS, 1, 13, pages 59-69), cis-regulatory elements are terms for a collection of transcription factor binding sites and other non-coding DNA sufficient to activate (or inhibit) transcription in defined spatial and/or temporal expression domains. Cis-regulatory elements are a class of cis-regulatory sequences required for activation and maintenance of transcription. They consist of DNA (usually non-coding DNA) containing binding sites for transcription factors and other regulatory molecules. Promoters, enhancers and silencers are the most commonly recognized CRE types.
However, promoters required for transcription in eukaryotes typically only produce basal levels of mRNA. Enhancers are more variable than promoters and help up-regulate expression and transcription.
Another way to observe the regulation of gene expression with cis-and trans-regulatory elements is that the cis-regulatory element is typically a binding site for one or more trans-acting factors. Cis-regulatory elements are typically present on the same DNA molecule as the genes they regulate, whereas trans-regulatory elements can regulate genes distant from the genes they are transcribed from. Transcription factors are one example of trans-acting factors.
Enhancers are CREs that affect (enhance) the transcription of genes on the same DNA molecule, and can be found upstream, downstream, internal to introns, or even relatively distant from the genes they regulate. Multiple enhancers can act in a synergistic manner to regulate transcription of a gene (supra, wittkapp and Kalay). Many whole genome sequencing projects have revealed that enhancers are often transcribed as long non-coding RNA (lncRNA) or enhancer RNA (eRNA), whose level changes are often correlated with changes in the level of target gene mRNA (Melamed P., yosefzun Y., et al, trans-script, month 3, 2, 7 (1): 26-31.).
While the inventors contemplate that any cis-acting regulatory element would convey benefit to CRISPR-based nucleic acid modification techniques, preferred non-limiting examples of suitable cis-acting regulatory elements are:
Nuclear expression element (Element for Nuclear Expression, ENE): u-rich inner loops (URIL) with short flanking duplex, which confer RNA stability, examples include ENE from Kaposi's sarcoma-associated herpesvirus (KSHV), ENE from human lung adenocarcinoma metastasis-associated transcript 1 (MALAT 1), ENE from multiple endocrine tumor beta (MENbeta), and the like.
Dual nuclear expression element (Double Element for Nuclear Expression, dENE): containing two predicted double helix regions URIL, examples include rice TWIFB1dENE and its 20 mutants (M1 to M20), which are known in the art and can be found depicted in FIG. 5 of Torabi et al ,"RNA stabilization by a poly(A)tail 3'-end binding pocket and other modes of poly(A)-RNA interaction",Science,2021,371(6529).
KSHV ENE sequence:
UGUUUUGGCUGGGUUUUUCCUUGUUCGCACCGGACACCU CCAGUGACCAGACGGCAAGGUUUUUAUCCCAGUGUAUAUU[SEQ ID NO:4]
rhesus herpesvirus (Rrhesus rhadinovirus) (PRV) ENE sequence:
CGUUUGUGUUGGUUUUUAUGACCAGCUUGGUACAAAACC UGCUGGUGAUUUUUUACCCAACAAAUAAUAAAUAAAA[SEQ ID NO:5]
MALAT1 ENE sequence:
UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACA CGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGC UU[SEQ ID NO:6]
MALAT1 ENE+A-rich beam sequence:
UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACA CGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGC UUAAAAAAAAAAAAAGCAAAA[SEQ ID NO:7]
MALAT1 ENE+A-rich bundle+ mascRNA sequence:
UAGGGUCAUGAAGGUUUUUCUUUUCCUGAGAAAACAACACGUAUUGUUUUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGCUUAAAAAAAAAAAAAGCAAAAGAUGCUGGUGGUUGGCACUCCUGGUUUCCAGGACGGGGUUCAAAUCCCUGCGGCGUCUUUGCUUUGACU[SEQ ID NO:8]
MALAT1 ene+a-rich beam variant sequence:
GAAGGUUUUUCUUUUCCUGAGAAAACAACACGUAUUGUU UUCUCAGGUUUUGCUUUUUGGCCUUUUUCUAGCUUAAAAAAA AAAAAAGCAAAA[SEQ ID NO:9]
The MENβENE sequence:
GCCGCCGCAGGUGUUUCUUUUACUGAGUGCAGCCCAUGG CCGCACUCAGGUUUUGCUUUUCACCUUCCCAUCUG[SEQ ID NO:10]
menβene+ a-rich beam sequence:
GCCGCCGCAGGUGUUUCUUUUACUGAGUGCAGCCCAUGG CCGCACUCAGGUUUUGCUUUUCACCUUCCCAUCUGUGAAAGA GUGAGCAGGAAAAAGCAAAA[SEQ ID NO:11]
mer beta ENE + a-rich bundle variant sequence:
AGGUGUUUCUUUUACUGAGUGCAGCCCAUGGCCGCACUC AGGUUUUGCUUUUCACCUUCCCAUCUGUGAAAGAGUGAGCAG GAAAAAGCAAAA[SEQ ID NO:12]
Rice TWIFB1 dENE sequence: UGUUGGCUGUACUCUUUUCUUUGUCAUGGUUUUCUCAAAUAU GAGUUUUUACAUGACAAAGUUUUUAACGAGGCAGCAUGUA [ SEQ ID NO:13].
MCDiV ENE sequence:
GAGUGUAACUCAACAGUUUUUCCUAACCACGCGUCGCGU GGCAGGUUUUUUAAUCUGAGAGUUACAUUC[SEQ ID NO:14]
ATCOPIA27_ ATh-I ENE sequence:
GUGCUGUACUCUUUUUCCUCACUAUGGUUUUGUCCCGAA AGGGUUUUCCUAGUAAGGUUUUAAUGAGGCAGCAU[SEQ ID NO:15]
TUCP _ ZMa ENE sequence:
GGCUGUACUCUUUUUUCCUGUCUAGGGUUUCUCACAAGG GUGAGUUUUACCUAGACAGGUUUUUAACGAGGCAACC[SEQ ID NO:16]
Other ens or dENE and variants or mutants thereof are known in the art and may be found described in Tycowski et al ,"Conservation of a Triple-Helix-Forming RNA Stability Element in Noncoding and Genomic RNAs of Diverse Viruses",Cell Rep.,2012,2:26-32, and Tycowski et al ,"Myriad Triple-Helix-Forming Structures in the Transposable Element RNAs of Plants and Fungi",Cell Rep.,2016,15:1266–1276.
Some computational frameworks (e.g., TEISER, a tool for obtaining information structural elements in RNA (Tool for Eliciting Informative Structural ELEMENTS IN RNA)) were used to identify structural RNA stabilizing motif 1 (sRSM 1) (the statistically most significant 3' utr element that stabilizes RNA), which are known in the art and can be found described in Goodarzi et al ,"Systematic discovery of structural elements governing stability of mammalian messenger RNAs",Nature,2012,485(264).
Structural RNA stabilizing motif 1 (sRSM 1) sequence set 1:
AAAACUAUUUUGAAGAUGGUGGUGAGCUGCAAAAUAGCUGGAUGGAUUUGAAUGAUUGGGAUGAUACAUCAUUGAACUGCACUUUAUAUAACCAAAGCUUAGCAGUUUGUUAGAUAAGAGUCUAUGUAUGUCUCUGGUUAGGAUGAAGUUAAUUUUAUGUUUUUAACAUGGUAUUUUUGAAGGAGCUAAUGAAACACUGG[SEQ ID NO:17]
structural RNA stabilizing motif 1 (sRSM 1) sequence set 2:
AUUGUUUCUGGAAACUGCUUGCCAAGACAACAUUUAUUAACUGUUAGAACACUUGCUUUAUGUUUGUGUGUACAUAUUUUCCACAAAUGUUAUAAUUUAUAUAGUGUGGUUGAACAGGAUGCAAUCUUUUGUUGUCUAAAGGUGCUGCAGUUAAAAAAAAAACAACCUUUUCUUUCAAUAUGGCAUGUAGUGGAGUUUUU[SEQ ID NO:18]
Other sRSM sequences are known to those skilled in the art, examples of which can be found in Goodarzi et al ,"Systematic discovery of structural elements governing stability of mammalian messenger RNAs",Nature,2012,485:264.
Other suitable 3' UTR sequences are known to those of ordinary skill in the art and include, but are not limited to, the c-fos gene and v-fos gene 3' UTR, CD47 3' UTR, BIRC 3' UTR, beta-actin 3' UTR, beta-globin 3' UTR, hmga 23 ' UTR, cam 2a3' UTR, cyclin B1 3' UTR, and U-rich motifs associated with increased mRNA stability.
Other cis-acting regulatory elements are known to those of ordinary skill in the art and are incorporated herein. Those skilled in the art will be able to identify and optimize suitable cis-acting regulatory elements without undue experimentation in light of the teachings of the present specification.
CRISPR/Cas proteins and systems
The invention will be helpful in understanding the CRISPR/Cas protein system in general and in the context of the invention.
Lead editing guide RNA
As described above, several variations of the Pilot Editor (PE) were developed. The PE contains reverse transcriptase fused to RNA-programmable nicking enzyme and a leader editing guide RNA to copy genetic information directly from the extension on pegRNA into the target genomic locus. Therefore pegRNA "directs" the PE editing device to a specific site (target DNA), where the single strand of double stranded DNA is cleaved by Cas9 enzyme. pegRNA also contains sequences encoding the desired edits to the target DNA. According to pegRNA's design, PE can precisely and efficiently exchange any single-letter DNA for any other, and can make deletions and insertions. One of ordinary skill in the art will understand how to construct the appropriate pegRNA for a particular target site.
RNA-guided endonucleases
An RNA-guided endonuclease, such as Cas9, may comprise at least one nuclear localization signal, at least one nuclease domain, and at least one domain that interacts with pegRNA to target the endonuclease to a particular nucleotide sequence for cleavage. Nucleic acids encoding RNA-guided endonucleases are also known, as well as methods of modifying chromosomal sequences of eukaryotic cells or embryos using RNA-guided endonucleases. The RNA guided endonucleases interact with specific pegRNA, each of said pegRNA directs the endonuclease to a specific targeting site where the RNA guided endonuclease introduces a strand break that can be repaired by the DNA repair process, such that the chromosomal sequence is modified. Since pegRNA provides specificity, RNA-based endonucleases are versatile and can be used with different pegRNA to target different genomic sequences. The methods disclosed herein can be used to target and modify specific chromosomal sequences and/or introduce exogenous sequences (or lack endogenous sequences) at targeted locations in the genome of a cell or embryo. Furthermore, targeting is specific, with limited off-target effects.
The present disclosure provides fusion proteins, wherein the fusion proteins comprise a CRISPR/Cas-like protein or a fragment thereof and an effector domain. Suitable effector domains include, but are not limited to, cleavage domains, epigenetic modification domains, transcriptional activation domains, and transcriptional repressor domains. Each fusion protein is directed by a specific pegRNA to a specific chromosomal sequence, where the effector domain mediates targeted genomic modifications or gene regulation. In one aspect, the fusion protein may act as a dimer, increasing the length of the target site and increasing its likelihood of uniqueness in the genome (thus reducing off-target effects). For example, endogenous CRISPR systems modify genomic positions based on DNA binding word lengths of about 13-20bp (Cong et al, science, 339:819-823). At this word length, only 5-7% of the target sites within the genome are unique (Iseli et al, PLos One (6): e 579). In contrast, zinc finger nucleases typically have a DNA binding word length in the range of 30-36bp, resulting in about 85-87% unique target sites within the human genome. The smaller size of the DNA binding sites utilized by CRISPR-based systems limits and complicates the design of targeted CRISPs-based nucleases near desired locations (such as disease SNPs, small exons, start and stop codons, and other locations within complex genomes). The present disclosure provides not only means for extending CRISPR DNA binding word length (i.e., to limit off-target activity), but also CRISPR fusion proteins with modified functionality. Thus, the disclosed CRISPR fusion proteins have increased target specificity and unique functionality. Also provided herein are methods of using the fusion proteins to modify or regulate expression of targeted chromosomal sequences.
The RNA-guided endonuclease may comprise at least one nuclear localization signal that allows the endonuclease to enter the nucleus of eukaryotic cells and embryos (such as non-human single cell embryos). The RNA guided endonuclease further comprises at least one nuclease domain and at least one domain that interacts with pegRNA. The RNA-guided endonuclease is directed to a specific nucleic acid sequence (or target site) by pegRNA. pegRNA interact with the RNA-guided endonuclease and the target site such that once guided to the target site, the RNA-guided endonuclease is able to introduce strand breaks into the target site nucleic acid sequence. Since pegRNA provides specificity for targeted cleavage, the endonucleases of the RNA guided endonucleases are universal and can be used with different pegRNA to cleave different target nucleic acid sequences. The RNA-guided endonuclease may be a protein, may be encoded by an isolated nucleic acid (i.e., RNA or DNA), may be encoded by a vector comprising a nucleic acid encoding the RNA-guided endonuclease, and may be a protein-RNA complex comprising the RNA-guided endonuclease plus pegRNA.
RNA-guided endonucleases can be derived from Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas) systems. The CRISPR/Cas system may be a type I, type II or type III system. Non-limiting examples of suitable CRISPR/Cas proteins include Cas3, cas4, cas5e (or CasD)、Cas6、Cas6e、Cas6f、Cas7、Cas8a1、Cas8a2、Cas8b、Cas8c、Cas9、Cas10、Cas10d、CasF、CasG、CasH、Csy1、Csy2、Csy3、Cse1( or CasA), cse2 (or CasB), cse3 (or CasE), cse4 (or CasC)、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3,Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csz1、Csx15、Csf1、Csf2、Csf3、Csf4 and Cu1966.
In one embodiment, the RNA guided endonuclease is derived from a type II CRISPR/Cas system. In particular embodiments, the RNA-guided endonuclease is derived from a Cas9 protein. Cas9 proteins may be from Streptococcus pyogenes (Streptococcus pyogenes), streptococcus thermophilus (Streptococcus thermophilus), streptococcus (Streptomyces sp.), north Amycolatopsis Dactylicapni (Nocardiopsis dassonvillei), streptomyces roseoflorius (Streptomyces pristinaespiralis), streptomyces viridochromogenes (Streptomyces viridochromogenes), streptomyces viridochromogenes, Streptomyces viridochromogenes, streptomyces roseoflorius (Streptosporangium roseum), streptomyces roseoflorius, alicyclobacillus acidocaldarius (Alicyclobacillus acidocaldarius), bacillus pseudomycoides (Bacillus pseudomycoides), bacillus selenitireducens, microbacterium sibiricum (Exiguobacterium sibiricum), lactobacillus delbrueckii (Lactobacillus delbrueckii), Lactobacillus salivarius (Lactobacillus salivarius), microscilla marina, bacteria of the order burkholderia, polaromonas naphthalenivorans, genus polar monad (Polaromonas sp.), crocosphaera watsonii, genus blue (Cyanothece sp.), microcystis aeruginosa (Microcystis aeruginosa), genus Synechococcus (Synechococcus sp.), genus, Acetobacter araffinus (Acetohalobium arabaticum), ammonifex degensii, caldicelulosiruptor becscii, candidatus Desulforudis, clostridium botulinum (Clostridium botulinum), clostridium difficile (Clostridium difficile), georgi apparatus (Finegoldia magna), thermophilic anaerobe (Natranaerobius thermophilus), Anaerobic enterobacter thermophilus (Pelotomaculum thermopropionicum), acidithiobacillus caldus (Acidithiobacillus caldus), acidithiobacillus ferrooxidans (Acidithiobacillus ferrooxidans), allochromatium vinosum, haibacterium (Marinobacter sp.), nitrococcus halophilus (Nitrosococcus halophilus), nitrosococcus watsoni, Pseudoalteromonas nata (Pseudoalteromonas haloplanktis), ktedonobacter racemifer, methanohalobium evestigatum, anabaena variabilis (Anabaena variabilis), chlorella foam (Nodularia spumigena), nostoc (Nostoc sp.), arthrospira maxima (Arthrospira maxima), arthrospira platensis (Arthrospira platensis), and Arthrospira platensis, Arthrospira (Arthrospira sp.), sphingeum (Lyngbya sp.), microcystis prototheca (Microcoleus chthonoplastes), oscillatoria (Osciliatria sp.), pachyrhizus mobilis (Petrotoga mobilis), thermomyces africanus (Thermosipho africanus) or Acaryochloris marina.
Typically, the CRISPR/Cas protein comprises at least one RNA recognition and/or RNA binding domain. The RNA recognition and/or RNA binding domain interacts with the guide RNA. The CRISPR/Cas protein may also comprise nuclease domains (i.e., dnase or rnase domains), DNA binding domains, helicase domains, rnase domains, protein-protein interaction domains, dimerization domains, and other domains.
The CRISPR/Cas-like protein may be a wild-type CRISPR/Cas protein, a modified CRISPR/Cas protein, or a fragment of a wild-type or modified CRISPR/Cas protein. The CRISPR/Cas-like protein may be modified to increase nucleic acid binding affinity and/or specificity, alter enzyme activity, and/or alter another property of the protein. For example, the nuclease (i.e., dnase, rnase) domain of the CRISPR/Cas-like protein can be modified, deleted, or inactivated. Alternatively, the CRISPR/Cas-like protein may be truncated to remove domains not essential for the function of the fusion protein. CRISPR/Cas-like proteins may also be truncated or modified to optimize the activity of the effector domain of the fusion protein.
In some embodiments, the CRISPR/Cas-like protein may be derived from a wild-type Cas9 protein or a fragment thereof. In other embodiments, the CRISPR/Cas-like protein may be derived from a modified Cas9 protein. For example, the amino acid sequence of the Cas9 protein may be modified to alter one or more properties of the protein (e.g., nuclease activity, affinity, stability, etc.). Alternatively, the domain of the Cas9 protein that is not involved in RNA-guided cleavage may be removed from the protein such that the modified Cas9 protein is smaller than the wild-type Cas9 protein.
Typically, the Cas9 protein comprises at least two nuclease (i.e., dnase) domains. For example, the Cas9 protein may comprise a RuvC-like nuclease domain and an HNH-like nuclease domain. RuvC and HNH domains work together to cleave single strands to create double strand breaks in DNA. (Jinek et al, science, 337:816-821). In some embodiments, cas 9-derived proteins may be modified to contain only one functional nuclease domain (RuvC-like or HNH-like nuclease domain). For example, the Cas 9-derived protein may be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., there is no nuclease activity). In some embodiments where one of the nuclease domains is inactive, the Cas 9-derived protein is capable of introducing a nick into double-stranded nucleic acid (such proteins are referred to as "nickases"), but does not cleave double-stranded DNA. For example, conversion of aspartic acid to alanine (D10A) in the RuvC-like domain converts Cas 9-derived protein to a nickase. Likewise, the conversion of histidine to alanine (H840A or H839A) in the HNH domain converts Cas 9-derived proteins to nickases. Each nuclease domain can be modified using well-known methods such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis, among other methods known in the art.
The RNA-guided endonuclease may comprise at least one nuclear localization signal. Typically, the NLS comprises a stretch of basic amino acids. Nuclear localization signals are known in the art (see, e.g., lange et al, j. Biol. Chem.,2007, 282:5101-5105). For example, in one embodiment, the NLS may be a haplotype sequence, such as PKKKRKV (SEQ ID NO: 19) or PKKKRRV (SEQ ID NO: 8). In another embodiment, the NLS may be a double-typing sequence. In yet another embodiment, the NLS may be KRPAATKKAGQAKKKK (SEQ ID NO: 20). NLS can be located at the N-terminus, C-terminus or internal position of RNA-guided endonucleases.
In some embodiments, the RNA-guided endonuclease may further comprise at least one cell penetrating domain. In one embodiment, the cell penetrating domain may be a cell penetrating peptide sequence derived from an HIV-1TAT protein. For example, the TAT cell penetrating sequence may be GRKKRRQRRRPPQPKKKRKV (SEQ ID NO: 21). In another embodiment, the cell penetrating domain may be a TLM (PLSSIFSRIGDPPKKKRKV; SEQ ID NO: 22), which is a cell penetrating peptide sequence derived from human hepatitis B virus. In yet another embodiment, the cell penetrating domain may be MPG (GALFLGWLGAAGSTMGAPKKKRKV; SEQ ID NO:23 or GALFLGFLGAAGSTMGAWSQPKKKRKV; SEQ ID NO: 24). In another embodiment, the cell penetrating domain may be Pep-1 (KETWWETWWWWQPKKKKKKKV; SEQ ID NO: 25), VP22 (which is a cell penetrating peptide from a herpes simplex virus), or a polyarginine peptide sequence. The cell penetrating domain may be located at the N-terminal, C-terminal or internal positions of the protein.
In other embodiments, the RNA guided endonuclease may further comprise at least one marker domain. Non-limiting examples of the labeling domain include fluorescent proteins, purification tags, and epitope tags. In some embodiments, the marker domain may be a fluorescent protein. Non-limiting examples of suitable fluorescent proteins include green fluorescent protein (e.g., GFP-2, tagGFP, turboGFP, EGFP, emerald, azami Green, monomeric Azami Green, copGFP, aceGFP, zsGreen), yellow fluorescent protein (e.g., YFP, EYFP, citrine, venus, YPet, phiYFP, zsYellow 1), blue fluorescent protein (e.g., EBFP2, azurite, mKalama1, GFPuv, sapphire, T-sapphire), cyan fluorescent protein (e.g., ECFP, cerulean, cyPet, amCyan1, midoriishi-Cyan), red fluorescent protein (mKate、mKate2、mPlum、DsRed monomer、mCherry、mRFP1、DsRed-Express、DsRed2、DsRed-Monomer、HcRed-Tandem、HcRed1、AsRed2、eqFP611、mRasberry、mStrawberry、Jred), and Orange fluorescent protein (mOrange, mKO, kusabira-Orange, monomeric Kusabira-Orange, mTangerine, tdTomato), or any other suitable fluorescent protein. In other embodiments, the marker domain may be a purification tag and/or an epitope tag. Exemplary tags include, but are not limited to, glutathione-S-transferase (GST), chitin Binding Protein (CBP), maltose binding protein, thioredoxin (TRX), poly (NANP), tandem Affinity Purification (TAP) tag 、myc、AcV5、AU1、AU5、E、ECS、E2、FLAG、HA、nus、Softag 1、Softag 3、Strep、SBP、Glu-Glu、HSV、KT3、S、S1、T7、V5、VSV-G、6×His、, biotin Carboxyl Carrier Protein (BCCP), and calmodulin.
In certain embodiments, the RNA-guided endonuclease may be part of a protein-RNA complex comprising pegRNA. pegRNA interact with RNA-guided endonucleases to direct the endonuclease to a specific target site, wherein the 5' end of the guide RNA base pairs with a specific pre-spacer.
(II) fusion proteins
Another aspect of the present disclosure provides a fusion protein comprising a CRISPR/Cas-like protein or a fragment thereof and an effector domain in combination with pegRNA and a cis-acting regulatory element. CRISPR/Cas-like proteins are directed to a target site through pegRNA where the effector domain can modify or affect a target nucleic acid sequence. The effector domain may be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. The fusion protein may further comprise at least one additional domain selected from a nuclear localization signal, a cell penetrating domain or a labeling domain.
(A) CRISPR/Cas-like proteins
The fusion protein comprises a CRISPR/Cas-like protein or a fragment thereof. CRISPR/Cas-like proteins are described in detail in section (I) above. The CRISPR/Cas-like protein may be located at the N-terminus, C-terminus, or an internal position of the fusion protein.
In some embodiments, the CRISPR/Cas-like protein of the fusion protein may be derived from a Cas9 protein. The Cas 9-derived protein may be wild-type, modified, or a fragment thereof. In some embodiments, cas 9-derived proteins may be modified to contain only one functional nuclease domain (RuvC-like or HNH-like nuclease domain). For example, the Cas 9-derived protein may be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., there is no nuclease activity). In some embodiments where one of the nuclease domains is inactive, the Cas 9-derived protein is capable of introducing a nick into double-stranded nucleic acid (such proteins are referred to as "nickases"), but does not cleave double-stranded DNA. For example, conversion of aspartic acid to alanine (D10A) in the RuvC-like domain converts Cas 9-derived protein to a nickase. Likewise, the conversion of histidine to alanine (H840A or H839A) in the HNH domain converts Cas 9-derived proteins to nickases. In other embodiments, both the RuvC-like nuclease domain and the HNH-like nuclease domain can be modified or removed such that the Cas 9-derived protein is incapable of nicking or cleaving double-stranded nucleic acids. In other embodiments, all nuclease domains of the Cas 9-derived protein may be modified or removed such that the Cas 9-derived protein lacks all nuclease activity.
In any of the above embodiments, any or all of the nuclease domains can be inactivated by one or more of deletion, insertion, and/or substitution mutations using well-known methods, such as site-directed mutagenesis, PCR-mediated mutagenesis, and total-gene synthesis, among other methods known in the art. In one exemplary embodiment, the CRISPR/Cas-like protein of the fusion protein is derived from a Cas9 protein, wherein all nuclease domains have been inactivated or deleted.
(B) Effector domains
The fusion protein further comprises an effector domain. The effector domain may be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. The effector domain may be located at the N-terminal, C-terminal or internal position of the fusion protein.
(I) Cleavage domain
In some embodiments, the effector domain is a cleavage domain. As used herein, "cleavage domain" refers to a domain that cleaves DNA. The cleavage domain may be obtained from any endonuclease or exonuclease. Non-limiting examples of endonucleases from which the cleavage domain may be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, e.g., NEW ENGLAND Biolabs Catalog or Belfort et al, (1997) Nucleic Acids Res.25:3379-3388. Additional enzymes that cleave DNA are known (e.g., S1 nuclease, mung bean nuclease, pancreatic DNase I, micrococcus nuclease, yeast HO endonuclease). See also Linn et al, (eds.) Nucleases, cold Spring Harbor Laboratory Press,1993. One or more of these enzymes (or functional fragments thereof) may be used as a source of cleavage domains.
In some embodiments, the cleavage domain may be derived from a type II-S endonuclease. Type II-S endonucleases cleave DNA at a site that is typically several base pairs from the recognition site and thus have separable recognition and cleavage domains. These enzymes are typically monomers that associate transiently to form dimers to cleave each strand of DNA at staggered positions (STAGGERED LOCATION). Non-limiting examples of suitable type II-S endonucleases include BfiI, bpmI, bsaI, bsgI, bsmBI, bsmI, bspMI, fokI, mbolI and SapI. In an exemplary embodiment, the cleavage domain of the fusion protein is a fokl cleavage domain or derivative thereof.
In certain embodiments, the type II-S cleavage can be modified to promote dimerization of two different cleavage domains, each of which is attached to a CRISPR/Cas-like protein or fragment thereof. For example, the cleavage domain of fokl can be modified by mutating certain amino acid residues. As non-limiting examples, amino acid residues at positions 446, 447, 479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 of the fokl cleavage domain are modified targets. For example, the modified cleavage domains of FokI that form the obligatory heterodimer include those in which the first modified cleavage domain includes mutations at amino acid positions 490 and 538, and the second modified cleavage domain includes a pair of mutations at amino acid positions 486 and 499 (Miller et al, 2007, nat. Biotechnol,25:778-785; szczpek et al, 2007, nat. Biotechnol, 25:786-793). For example, in one domain (E490K, I538K), glu (E) at position 490 may be changed to Lys (K), and Ile (I) at position 538 may be changed to K, and in another cleavage domain (Q486E, I499L), gin (Q) at position 486 may be changed to E, and I at position 499 may be changed to Leu (L). In other embodiments, the modified FokI cleavage domain may include three amino acid changes (Doyon et al, 2011, nat. Methods, 8:74-81). For example, one modified fokl domain (termed ELD) may comprise the Q486E, I499L, N496D mutation, while another modified fokl domain (termed KKR) may comprise the E490K, I538K, H537R mutation.
In exemplary embodiments, the effector domain of the fusion protein is a fokl cleavage domain or a modified fokl cleavage domain.
In embodiments where the effector domain is a cleavage domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, cas9 derivatization may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.
(Ii) Epigenetic modification of the domains
In other embodiments, the effector domain of the fusion protein may be an epigenetic modification domain. Typically, the epigenetic modification domain alters the histone structure and/or chromosomal structure without altering the DNA sequence. Altering histone and/or chromatin structure can result in altered gene expression. Examples of epigenetic modifications include, but are not limited to, acetylation or methylation of lysine residues in histones, and methylation of cytosine residues in DNA. Non-limiting examples of suitable epigenetic modification domains include histone acetyl transferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
In embodiments where the effector domain is a Histone Acetyl Transferase (HAT) domain, the HAT domain may be derived from EP300 (i.e., E1A binding protein p 300), CREBBP (i.e., CREB binding protein )、CDY1、CDY2、CDYL1、CLOCK、ELP3、ESA1、GCN5(KAT2A)、HAT1,KAT2B、KAT5、MYST1、MYST2、MYST3、MYST4、NCOA1、NCOA2、NCOA3、NCOAT、P/CAF、Tip60、TAFII250 or tft 3c4. In one such embodiment, the HAT domain is p300.
In embodiments where the effector domain is an epigenetic modification domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, cas9 derivatization may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.
(Iii) Transcriptional activation domains
In other embodiments, the effector domain of the fusion protein may be a transcriptional activation domain. Typically, the transcriptional activation domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc.) to increase and/or activate transcription of the gene. In some embodiments, the transcriptional activation domain may be, but is not limited to, a herpes simplex virus VP16 activation domain, VP64 (which is a tetrameric derivative of VP 16), NF-. Kappa. B p65 activation domain, p53 activation domains 1 and 2, CREB (cAMP response element binding protein) activation domain, E2A activation domain, and NFAT (nuclear factor of activated T cells) activation domain. In other embodiments, the transcriptional activation domains may be Gal4, gcn4, MLL, rtg3, GIn3, oaf1, pip2, pdr1, pdr3, pho4, and Leu3. The transcriptional activation domain may be wild-type or it may be a modified form of the original transcriptional activation domain. In some embodiments, the effector domain of the fusion protein is a VP16 or VP64 transcriptional activation domain.
In embodiments where the effector domain is a transcriptional activation domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, the Cas 9-derived protein may be modified such that its endonuclease activity is removed as discussed herein. For example, cas9 derivatives may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.
(Iv) Transcription repressor domain
In other embodiments, the effector domain of the fusion protein may be a transcriptional repressor domain. Typically, a transcription repressor domain interacts with transcription control elements and/or transcription regulatory proteins (i.e., transcription factors, RNA polymerase, etc.) to reduce and/or terminate transcription of a gene. Non-limiting examples of suitable transcription repressor domains include the Inducible CAMP Early Repressor (ICER) domain, the Kruppel related cassette A (KRAB-A) repressor domain, the YY1 glycine-rich repressor domain, the Sp 1-like repressor, the E (spl) repressor, the IκB repressor, and MeCP2.
In embodiments where the effector domain is a transcription repressor domain and the CRISPR/Cas-like protein is derived from a Cas9 protein, the Cas 9-derived protein may be modified as discussed herein such that its endonuclease activity is removed. For example, cas9 may be modified by mutating RuvC and HNH domains such that they are no longer nuclease active.
(C) Additional domains
In some embodiments, the fusion protein further comprises at least one additional domain. Non-limiting examples of suitable additional domains include nuclear localization signals, cell penetration or translocation domains, and labeling domains. Non-limiting examples of suitable nuclear localization signals, cell penetrating domains and labeling domains are presented in section (I) above.
(D) Fusion protein dimers
In embodiments where the effector domain of the fusion protein is a cleavage domain, a dimer comprising at least one fusion protein may be formed. The dimer may be a homodimer or a heterodimer. In some embodiments, the heterodimer comprises two different fusion proteins. In other embodiments, the heterodimer comprises one fusion protein and an additional protein.
In some embodiments, the dimer is a homodimer, wherein the two fusion protein monomers are identical in terms of primary amino acid sequence. In one embodiment where the dimer is a homodimer, the Cas 9-derived proteins are modified such that their endonuclease activity is removed, i.e., such that they do not have a functional nuclease domain. In certain embodiments in which Cas 9-derived proteins are modified such that their endonuclease activity is removed, each fusion protein monomer comprises the same Cas 9-like protein and the same cleavage domain. The cleavage domain can be any cleavage domain, such as any of the exemplary cleavage domains provided herein. In a particular embodiment, the cleavage domain is a fokl cleavage domain or a modified fokl cleavage domain. In such embodiments, the specificity pegRNA directs the fusion protein monomers to different but closely adjacent sites such that upon dimer formation, the nuclease domains of both monomers will create a double-strand break in the target DNA.
In other embodiments, the dimer is a heterodimer of two different fusion proteins. For example, the CRISPR/Cas-like proteins of each fusion protein can be derived from different CRISPR/Cas proteins or from orthologous CRISPR/Cas proteins from different bacterial species. For example, each fusion protein can comprise a Cas 9-like protein, which Cas 9-like protein is derived from a different bacterial species. In these embodiments, each fusion protein will recognize a different target site (i.e., the target site specified by the pre-spacer sequence and/or PAM sequence). For example, pegRNA can localize heterodimers to different but closely adjacent sites such that their nuclease domains produce efficient double-strand breaks in the target DNA. Heterodimers can also have Cas9 proteins with nicking activity modified such that the nicking positions are different.
Alternatively, the two fusion proteins of the heterodimer may have different effector domains. In embodiments where the effector domain is a cleavage domain, each fusion protein may contain a different modified cleavage domain. For example, each fusion protein may contain a different modified fokl cleavage domain, as detailed in section (II) (b) (i) above. In these embodiments, the Cas-9 protein may be modified such that its endonuclease activity is removed.
As will be appreciated by those of skill in the art, the two fusion proteins forming the heterodimer may differ in both CRISPR/Cas-like protein domains and effector domains.
In any of the above embodiments, the homodimer or heterodimer may comprise at least one additional domain selected from the group consisting of a Nuclear Localization Signal (NLS), a cell penetration, a translocation domain, and a labeling domain, as detailed above.
In any of the above embodiments, one or both of the Cas 9-derived proteins may be modified such that its endonuclease activity is removed or modified.
In alternative embodiments, the heterodimer comprises one fusion protein and an additional protein. For example, the additional protein may be a nuclease. In one embodiment, the nuclease is a zinc finger nuclease. The zinc finger nuclease comprises a zinc finger DNA binding domain and a cleavage domain. Zinc fingers recognize and bind three (3) nucleotides. The zinc finger DNA binding domain can comprise from about three zinc fingers to about seven zinc fingers. The zinc finger DNA binding domain may be derived from a naturally occurring protein, or it may be engineered. See, for example, beerli et al, (2002) Nat. Biotechnol.20:135-141; pabo et al, (2001) Ann.Rev.biochem.70:313-340; isalan et al, (2001) Nat. Biotechnol.19:656-660; segal et al (2001) curr.Opin.Biotechnol.12:632-637; choo et al, (2000) Curr.Opin. Structure. Biol.10:411-416; zhang et al, (2000) J.biol. Chem.275 (43): 33850-33860; doyon et al, (2008) Nat. Biotechnol.26:702-708; and Santiago et al, (2008) Proc. Natl. Acad. Sci. USA 105:5809-5814. The cleavage domain of the zinc finger nuclease may be any of the cleavage domains detailed in section (II) (b) (i) above. In exemplary embodiments, the cleavage domain of the zinc finger nuclease is a fokl cleavage domain or a modified fokl cleavage domain. Such zinc finger nucleases would dimerize with fusion proteins comprising a fokl cleavage domain or a modified fokl cleavage domain.
In some embodiments, the zinc finger nuclease may comprise at least one additional domain selected from a nuclear localization signal, a cell penetration or translocation domain, which is detailed above.
In certain embodiments, any of the fusion proteins detailed above or dimers comprising at least one fusion protein may be part of a protein-RNA complex comprising at least one pegRNA. pegRNA interact with a CRISPR-Cas 0-like protein of a fusion protein to direct the fusion protein to a specific target site, wherein the 5' end of pegRNA base pairs with a specific pre-spacer sequence.
(III) nucleic acids encoding RNA-guided endonucleases or fusion proteins
Another aspect of the invention provides a nucleic acid encoding any of the RNA guided endonucleases or fusion proteins described in sections (I) and (II), respectively, above. The nucleic acid may be RNA or DNA. In one embodiment, the nucleic acid encoding an RNA-guided endonuclease or fusion protein is mRNA. The mRNA can be 5 'capped and/or 3' polyadenylation. In another embodiment, the nucleic acid encoding an RNA-guided endonuclease or fusion protein is DNA. The DNA may be present in a vector (see below).
Nucleic acids encoding RNA-guided endonucleases or fusion proteins can be codon optimized for efficient translation into proteins in eukaryotic cells or animals of interest. For example, codons may be optimized for expression in humans, mice, rats, hamsters, cows, pigs, cats, dogs, fish, amphibians, plants, yeast, insects, and the like. The codon optimization program is available as free software. Commercial codon optimization procedures are also available.
In some embodiments, DNA encoding an RNA-guided endonuclease or fusion protein may be operably linked to at least one promoter control sequence. In some iterations, the DNA coding sequence may be operably linked to a promoter control sequence for expression in a eukaryotic cell or animal of interest. Promoter control sequences may be constitutive, regulated, or tissue specific. Suitable constitutive promoter control sequences include, but are not limited to, a cytomegalovirus immediate early promoter (CMV), a simian virus (SV 40) promoter, an adenovirus major late promoter, a Rous Sarcoma Virus (RSV) promoter, a Mouse Mammary Tumor Virus (MMTV) promoter, a phosphoglycerate kinase (PGK) promoter, an elongation factor (ED 1) - α promoter, a ubiquitin promoter, an actin promoter, a tubulin promoter, an immunoglobulin promoter, a fragment thereof, or a combination of any of the foregoing promoters. Examples of suitable regulated promoter control sequences include, but are not limited to, those regulated by heat shock, metals, steroids, antibiotics, or alcohols. Non-limiting examples of tissue specific promoters include the B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-beta promoter, mb promoter, nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter. The promoter sequence may be wild-type or it may be modified for more efficient or effective expression. In one exemplary embodiment, the coding DNA can be operably linked to a CMV promoter for constitutive expression in mammalian cells.
In certain embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein may be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for in vitro mRNA synthesis. In such embodiments, the in vitro transcribed RNA may be purified for use in the methods detailed in sections (IV) and (V) below. For example, the promoter sequence may be a T7, T3 or SP6 promoter sequence or a variant of a T7, T3 or SP6 promoter sequence. In one exemplary embodiment, the DNA encoding the fusion protein is operably linked to a T7 promoter, which T7 promoter is used for in vitro mRNA synthesis using a T7 RNA polymerase.
In alternative embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein may be operably linked to a promoter sequence for in vitro expression of the RNA-guided endonuclease or fusion protein in a bacterial or eukaryotic cell. In such embodiments, the expressed protein may be purified for use in the methods detailed in sections (IV) and (V) below. Suitable bacterial promoters include, but are not limited to, T7 promoters, lac operator promoters, trp promoters, variants thereof, and combinations thereof. An exemplary bacterial promoter is tac, which is a hybrid of trp and lac promoters. Non-limiting examples of suitable eukaryotic promoters are listed above.
In further aspects, DNA encoding an RNA-guided endonuclease or fusion protein may also be linked to a polyadenylation signal (e.g., SV40polyA signal, bovine Growth Hormone (BGH) polyA signal, etc.) and/or at least one transcription termination sequence. Furthermore, the sequence encoding the RNA-guided endonuclease or fusion protein may also be linked to a sequence encoding at least one nuclear localization signal, at least one cell penetrating domain and/or at least one marker domain, which is detailed in part (I) above.
In various embodiments, DNA encoding an RNA-guided endonuclease or fusion protein may be present in the vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/minichromosomes, transposons and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, etc.). In one embodiment, the DNA encoding the RNA-guided endonuclease or fusion protein is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript and variants thereof. The vector may comprise additional expression control sequences (e.g., enhancer sequences, kozak sequences, polyadenylation sequences, transcription termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like. Additional information can be found in "Current Protocols in Molecular Biology" Ausubel et al, john Wiley & Sons, new York,2003 or "Molecular Cloning:A Laboratory Manual"Sambrook&Russell,Cold Spring Harbor Press,Cold Spring Harbor,N.Y.,, 3 rd edition, 2001.
In some embodiments, an expression vector comprising a sequence encoding an RNA-guided endonuclease or fusion protein may further comprise a sequence encoding pegRNA. The sequence encoding pegRNA is typically operably linked to at least one transcriptional control sequence for expression pegRNA in the cell or embryo of interest. For example, the DNA encoding pegRNA may be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6, U3, H1, and 7SL RNA promoters.
(IV) methods for modifying chromosomal sequences using RNA-guided endonucleases
Another aspect of the disclosure includes a method of modifying a chromosomal sequence in a eukaryotic cell or embryo. The method comprises introducing into a eukaryotic cell or embryo (i) at least one RNA-guided endonuclease comprising at least one nuclear localization signal or a nucleic acid encoding at least one RNA-guided endonuclease comprising at least one nuclear localization signal, (ii) at least one pegRNA or DNA encoding at least one pegRNA, and optionally, (iii) at least one donor polynucleotide comprising a donor sequence. The method further comprises culturing the cell or embryo such that each pegRNA directs the RNA-guided endonuclease to a target site in the chromosomal sequence, wherein the RNA-guided endonuclease introduces a double-strand break at the target site, and the double-strand break is repaired by a DNA repair process such that the chromosomal sequence is modified.
In some embodiments, the method may comprise introducing an RNA-guided endonuclease (or encoding nucleic acid) and a pegRNA (or encoding DNA) into the cell or embryo, wherein the RNA-guided endonuclease introduces a double strand break in the targeted chromosomal sequence. In embodiments where the optional donor polynucleotide is not present, double strand breaks in the chromosomal sequence may be repaired by a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, the targeted chromosomal sequence may be modified or inactivated. For example, a single nucleotide change (SNP) may produce an altered protein product, or a shift in the reading frame of the coding sequence may inactivate or "knock out" the sequence such that no protein product is produced. In embodiments where an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence at the target site during double strand break repair. For example, in embodiments in which the donor sequence is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to the upstream and downstream sequences, respectively, of the target site in the chromosomal sequence, the donor sequence may be exchanged with or integrated into the chromosomal sequence at the targeted site during repair conducted by homology-directed repair Cheng Jie. Alternatively, in embodiments where the donor sequence is flanked by compatible overhangs (or the compatible overhangs are created by RNA-guided endonuclease sites), the donor sequence may be directly linked to the cleaved chromosomal sequence by a non-homologous repair process during double-strand break repair. The exchange or integration of the donor sequence into the chromosomal sequence modifies the targeted chromosomal sequence or introduces an exogenous sequence into the chromosomal sequence of the cell or embryo.
In other embodiments, the method may comprise introducing two RNA-guided endonucleases (or encoding nucleic acids) and two pegRNA (or encoding DNA) into the cell or embryo, wherein the RNA-guided endonucleases introduce two double strand breaks in the chromosomal sequence. See fig. 3B. The two breaks may be within a few base pairs, within tens of base pairs, or may be separated by thousands of base pairs. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a sequence loss between the two cleavage sites and/or a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
In other embodiments, the method may comprise introducing into the cell or embryo one RNA-guided endonuclease modified to cleave one strand of a double-stranded sequence (or encoding nucleic acid) and two pegRNA (or encoding DNA), wherein each pegRNA directs the RNA-guided endonuclease to a specific target site at which the modified endonuclease cleaves one strand of the double-stranded chromosomal sequence (i.e., the nick), and wherein the two nicks are in opposite strands and sufficiently close to constitute a double-stranded break. See fig. 3A. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
(A) RNA-guided endonucleases
The method comprises introducing into a cell or embryo at least one RNA-guided endonuclease comprising at least one nuclear localization signal or a nucleic acid encoding at least one RNA-guided endonuclease comprising at least one nuclear localization signal. Such RNA-guided endonucleases and nucleic acids encoding RNA-guided endonucleases are described in sections (I) and (III), respectively, above. Such guided RNAs may be pegRNA.
In some embodiments, the RNA-guided endonuclease can be introduced into the cell or embryo as an isolated protein. In such embodiments, the RNA-guided endonuclease may further comprise at least one cell penetrating domain that facilitates cellular uptake of the protein. In other embodiments, the RNA-guided endonuclease may be introduced into the cell or embryo as an mRNA molecule. In other embodiments, the RNA-guided endonuclease may be introduced into the cell or embryo as a DNA molecule. Typically, the DNA sequence encoding the fusion protein is operably linked to a promoter sequence that is functional in the cell or embryo of interest. The DNA sequence may be linear or the DNA sequence may be part of a vector. In further embodiments, the fusion protein may be introduced into the cell or embryo as an RNA-protein complex comprising the fusion protein and pegRNA.
In alternative embodiments, the DNA encoding the RNA guided endonuclease may further comprise a sequence encoding pegRNA. Typically, each of the sequences encoding the RNA-guided endonucleases and pegRNA is operably linked to appropriate promoter control sequences that allow expression of the RNA-guided endonucleases and pegRNA, respectively, in a cell or embryo. The DNA sequences encoding RNA-guided endonucleases and pegRNA may further comprise additional expression control, regulatory and/or processing sequences. The DNA sequences encoding the RNA guided endonucleases and pegRNA may be linear or may be part of a vector.
(B) Pilot editing guide RNA (PegRNA)
The method further comprises introducing at least one pegRNA or DNA encoding at least one pegRNA into the cell or embryo. pegRNA interact with RNA-guided endonucleases to direct the endonuclease to a specific target site at which the 5' end of pegRNA base pairs with a specific pre-spacer in the chromosomal sequence.
Each pegRNA contains three regions: a first region at the 5 'end complementary to the target site in the chromosomal sequence, a second internal region forming a stem-loop structure, and a third 3' region that remains substantially single-stranded. The first region of each pegRNA is different such that each pegRNA directs the fusion protein to a particular target site. The second and third regions of each pegRNA may be the same in all pegRNA.
The first region of pegRNA is complementary to a sequence at a target site in the chromosomal sequence (i.e., a pre-spacer sequence) such that the first region of pegRNA can base pair with the target site. In various embodiments, the first region of pegRNA may comprise from about 10 nucleotides to more than about 25 nucleotides. For example, the length of the base pairing region between the first region of pegRNA and the target site in the chromosomal sequence can be about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, or more than 25 nucleotides. In an exemplary embodiment, the first region of pegRNA is about 19, 20, or 21 nucleotides in length.
PegRNA also include a second region that forms a secondary structure. In some embodiments, the secondary structure comprises a stem (or hairpin) and a loop. The length of the loops and stems may vary. For example, the loop may range from about 3 to about 10 nucleotides in length, and the stem may range from about 6 to about 20 base pairs in length. The stem may comprise one or more projections of 1 to about 10 nucleotides. Thus, the total length of the second region may range from about 16 to about 60 nucleotides in length. In one exemplary embodiment, the loop is about 4 nucleotides in length and the stem comprises about 12 base pairs.
PegRNA also includes a third region at the 3' end that remains substantially single-stranded. Thus, the third region is not complementary to any chromosomal sequence in the cell of interest and is not complementary to the remainder of pegRNA. The length of the third region may vary. Typically, the third region is more than about 4 nucleotides in length. For example, the length of the third region may range from about 5 to about 60 nucleotides in length.
The sum length of the second and third regions of pegRNA (also referred to as the universal region or the scaffold region) can range from about 30 to about 120 nucleotides in length. In one aspect, the sum length of the second and third regions of pegRNA ranges from about 70 to about 100 nucleotides in length.
In some embodiments pegRNA comprises a single molecule containing all three regions. In other embodiments pegRNA may comprise two separate molecules. The first RNA molecule may comprise one half of the "stem" of the first region of pegRNA and the second region of pegRNA. The second RNA molecule may comprise the other half of the "stem" of the second region of pegRNA and the third region of pegRNA. Thus, in this embodiment, the first and second RNA molecules each contain nucleotide sequences that are complementary to each other. For example, in one embodiment, the first and second RNA molecules each comprise a sequence (of about 6 to about 20 nucleotides) that base pairs with another sequence to form a functional pegRNA.
In some embodiments pegRNA may be introduced into the cell or embryo as an RNA molecule. The RNA molecule can be transcribed in vitro. Alternatively, the RNA molecule may be chemically synthesized.
In other embodiments pegRNA may be introduced into the cell or embryo as a DNA molecule. In such cases, the DNA encoding pegRNA may be operably linked to a promoter control sequence to express pegRNA in the cell or embryo of interest. For example, the RNA coding sequence may be operably linked to a promoter sequence recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6 or H1 promoters. In exemplary embodiments, the RNA coding sequence is linked to a mouse or human U6 promoter. In other exemplary embodiments, the RNA coding sequence is linked to a mouse or human H1 promoter.
The DNA molecule encoding pegRNA may be linear or circular. In some embodiments, the DNA sequence encoding pegRNA may be part of a vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/minichromosomes, transposons and viral vectors. In an exemplary embodiment, the DNA encoding the RNA guided endonuclease is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript and variants thereof. The vector may comprise additional expression control sequences (e.g., enhancer sequences, kozak sequences, polyadenylation sequences, transcription termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like.
In embodiments where both RNA-guided endonucleases and pegRNA are introduced into a cell as DNA molecules, each may be part of a separate molecule (e.g., one vector containing the fusion protein coding sequence and a second vector containing the pegRNA coding sequence) or both may be part of the same molecule (e.g., one vector containing the coding (and regulatory) sequences of both fusion proteins and pegRNA).
(C) Target site
The RNA-guided endonuclease along with pegRNA is directed to a target site in the chromosomal sequence, wherein the RNA-guided endonuclease introduces a break in the chromosomal sequence. The target site is not limited in sequence except that the sequence is immediately followed (downstream) by the consensus sequence. This consensus sequence is also known as the pre-spacer adjacent motif (PAM). Examples of PAMs include, but are not limited to, NGG, NGGNG, and NNAGAAW (where N is defined as any nucleotide and W is defined as a or T). As detailed in section (IV) (b) above, the first region of pegRNA (at the 5' end) is complementary to the pre-spacer of the target sequence. Typically, the first region of pegRNA is about 19 to 21 nucleotides in length. Thus, in certain aspects, the sequence of the target site in the chromosomal sequence is 5'-N 19-21 -NGG-3'. PAM is shown in italics.
The target site may be in a coding region of a gene, an intron of a gene, a control region of a gene, a non-coding region between genes, or the like. The gene may be a protein-encoding gene or an RNA-encoding gene. The gene may be any gene of interest.
(D) Optional donor polynucleotide
In some embodiments, the method further comprises introducing at least one donor polynucleotide to the target site. The donor polynucleotide comprises at least one donor sequence. In some aspects, the donor sequence of the donor polynucleotide corresponds to an endogenous or native chromosomal sequence. For example, the donor sequence may be substantially identical to a portion of the chromosomal sequence at or near the target site, but it comprises at least one nucleotide change. Thus, the donor sequence may comprise a modified form of the wild-type sequence at the target site such that, upon integration or exchange with the native sequence, the sequence at the targeted chromosomal location comprises at least one nucleotide change. For example, the change may be an insertion of one or more nucleotides, a deletion of one or more nucleotides, a substitution of one or more nucleotides, or a combination thereof. As a result of integrating the modified sequence, the cell or embryo/animal can produce a modified gene product from the targeted chromosomal sequence.
In other aspects, the donor sequence of the donor polynucleotide corresponds to the exogenous sequence. As used herein, an "exogenous" sequence refers to a sequence that is not native to a cell or embryo, or that is at a different location in its natural location in the genome of the cell or embryo. For example, the exogenous sequence may comprise a protein coding sequence that may be operably linked to an exogenous promoter control sequence such that, upon integration into the genome, the cell or embryo/animal is capable of expressing the protein encoded by the integrated sequence. Alternatively, the exogenous sequence may be integrated into the chromosomal sequence such that its expression is under the control of the endogenous promoter sequence. In other iterations, the exogenous sequence may be a transcription control sequence, another expression control sequence, an RNA coding sequence, or the like. Integration of exogenous sequences into chromosomal sequences is known as "knock-in".
The length of the donor sequence may and will vary, as will be appreciated by those skilled in the art. For example, the length of the donor sequence may vary from a few nucleotides to hundreds of thousands of nucleotides.
A donor polynucleotide comprising an upstream sequence and a downstream sequence. In some embodiments, the donor sequence in the donor polynucleotide is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to sequences located upstream and downstream, respectively, of the target site in the chromosomal sequence. Because of these sequence similarities, the upstream and downstream sequences of the donor polynucleotide allow for homologous recombination between the donor polynucleotide and the targeted chromosomal sequence, such that the donor sequence can be integrated into (or exchanged with) the chromosomal sequence.
As used herein, an upstream sequence refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence upstream of the target site. Similarly, a downstream sequence refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence downstream of the target site. As used herein, the phrase "substantial sequence identity" refers to sequences having at least about 75% sequence identity. Thus, the upstream and downstream sequences in the donor polynucleotide may have about 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to the sequence upstream or downstream of the targeted site. In exemplary embodiments, the upstream and downstream sequences in the donor polynucleotide may have about 95% to 100% sequence identity to the chromosomal sequence upstream or downstream of the targeted site. In one embodiment, the upstream sequence shares substantial sequence identity with a chromosomal sequence located immediately upstream of (i.e., adjacent to) the target site. In other embodiments, the upstream sequence shares substantial sequence identity with a chromosomal sequence located within about one hundred (100) nucleotides upstream of the target site. Thus, for example, the upstream sequence may share substantial sequence identity with chromosomal sequences located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides upstream of the target site. In one embodiment, the downstream sequence shares substantial sequence identity with a chromosomal sequence located immediately downstream of (i.e., adjacent to) the target site. In other embodiments, the downstream sequence shares substantial sequence identity with a chromosomal sequence located within about one hundred (100) nucleotides downstream of the targeted site. Thus, for example, the downstream sequence may share substantial sequence identity with chromosomal sequences located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides downstream of the targeted site.
Each upstream or downstream sequence may range in length from about 20 nucleotides to about 5000 nucleotides. In some embodiments, the upstream and downstream sequences may comprise about 50、100、200、300、400、500、600、700、800、900、1000、1100、1200、1300、1400、1500、1600、1700、1800、1900、2000、2100、2200、2300、2400、2500、2600、2800、3000、3200、3400、3600、3800、4000、4200、4400、4600、4800 or 5000 nucleotides. In exemplary embodiments, the upstream and downstream sequences may range in length from about 50 to about 1500 nucleotides.
The donor polynucleotide comprising an upstream sequence and a downstream sequence having sequence similarity to the targeted chromosomal sequence may be linear or circular. In embodiments where the donor polynucleotide is circular, it may be part of a vector. For example, the vector may be a plasmid vector.
A donor polynucleotide comprising a targeted cleavage site. In other embodiments, the donor polynucleotide may additionally comprise at least one targeted cleavage site recognized by an RNA-guided endonuclease. The targeted cleavage site added to the donor polynucleotide may be placed upstream or downstream or both upstream and downstream of the donor sequence. For example, the donor sequence may be flanked by targeted cleavage sites such that, upon cleavage by an RNA-guided endonuclease, the donor sequence is flanked by overhangs that are compatible with overhangs in the chromosomal sequence that result after cleavage by the RNA-guided endonuclease. Thus, during double strand break repair, the donor sequence may be linked to the cleaved chromosomal sequence by a non-homologous repair process. Typically, the donor polynucleotide comprising the targeted cleavage site will be circular (e.g., may be part of a plasmid vector).
A donor polynucleotide comprising a short donor sequence with optional overhangs. In alternative embodiments, the donor polynucleotide may be a linear molecule comprising a short donor sequence with an optional short overhang that is compatible with the overhang produced by the RNA-guided endonuclease. In such embodiments, the donor sequence may be directly linked to the cleaved chromosomal sequence during double strand break repair. In some cases, the donor sequence may be less than about 1,000, less than about 500, less than about 250, or less than about 100 nucleotides. In some cases, the donor polynucleotide may be a linear molecule comprising a short donor sequence with blunt ends. In other iterations, the donor polynucleotide may be a linear molecule comprising a short donor sequence with 5 'and/or 3' overhangs. The overhang may comprise 1,2, 3,4 or 5 nucleotides.
Typically, the donor polynucleotide is DNA. The DNA may be single-stranded or double-stranded and/or linear or circular. The donor polynucleotide may be a DNA plasmid, bacterial Artificial Chromosome (BAC), yeast Artificial Chromosome (YAC), viral vector, linear DNA segment, PCR fragment, naked nucleic acid, or nucleic acid complexed with a delivery vehicle such as a liposome or poloxamer. In certain embodiments, the donor polynucleotide comprising the donor sequence may be part of a plasmid vector. In any of these cases, the donor polynucleotide comprising the donor sequence may further comprise at least one additional sequence.
(E) Introduction into cells or embryos
RNA-targeted endonucleases (or encoding nucleic acids), pegRNA (or encoding DNA), and optionally donor polynucleotides can be introduced into cells or embryos by a variety of means. In some embodiments, the cell or embryo is transfected. Suitable transfection methods include calcium phosphate mediated transfection, nuclear transfection (or electroporation), cationic polymer transfection (such as DEAE-dextran or polyethylenimine), viral transduction, virosome transfection, virion transfection, liposome transfection, cationic liposome transfection, immunoliposome transfection, non-liposome lipofection, dendrimer transfection, heat shock transfection, magnetic transfection, lipofection, gene gun delivery, puncture infection (impalefection), acoustic perforation, optical transfection, and proprietary reagent enhanced nucleic acid uptake. Methods of transfection are well known in the art (see, e.g., ausubel et al, "Current Protocols in Molecular Biology", john Wiley & Sons, new York,2003 or "Molecular Cloning:A Laboratory Manual"Sambrook&Russell,Cold Spring Harbor Press,Cold Spring Harbor,N.Y.,, 3rd edition, 2001). In other embodiments, the molecule is introduced into the cell or embryo by microinjection. Typically, an embryo is a fertilized single cell stage embryo of the species of interest. For example, the molecule may be injected into a prokaryotic cell of a single-cell embryo.
The RNA-targeted endonuclease (or encoding nucleic acid), pegRNA (or DNA encoding pegRNA), and optionally the donor polynucleotide may be introduced into the cell or embryo simultaneously or sequentially. The ratio of RNA-targeted endonucleases (or encoding nucleic acids) to pegRNA (or encoding DNA) will typically be about stoichiometric such that they can form an RNA-protein complex. In one embodiment, the DNA encoding the RNA-targeted endonuclease and the DNA encoding pegRNA are delivered together in a plasmid vector.
(F) Culturing cells or embryos
The method further comprises maintaining the cell or embryo under conditions such that pegRNA directs the RNA-guided endonuclease to a target site in the chromosomal sequence and the RNA-guided endonuclease introduces at least one double-strand break in the chromosomal sequence. Double strand breaks can be repaired by a DNA repair process such that the chromosomal sequence is modified by deleting at least one nucleotide, inserting at least one nucleotide, replacing at least one nucleotide, or a combination thereof.
In embodiments where no donor polynucleotide is introduced into the cell or embryo, the double strand break may be repaired via a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, sequences on the chromosomal sequence may be modified such that the reading frame of the coding region may be shifted and such that the chromosomal sequence is inactivated or "knocked out. The inactivated chromosomal sequence encoding the protein does not produce the protein encoded by the wild-type chromosomal sequence.
In embodiments in which a donor polynucleotide comprising an upstream sequence and a downstream sequence is introduced into a cell or embryo, the double strand break may be repaired by a Homology Directed Repair (HDR) process such that the donor sequence is integrated into the chromosomal sequence. Thus, the exogenous sequence may be integrated into the genome of the cell or embryo, or the targeted chromosomal sequence may be modified by exchanging the wild-type chromosomal sequence for the modified sequence.
In embodiments in which a donor polynucleotide comprising a targeted cleavage site is introduced into a cell or embryo, the RNA-guided endonuclease can cleave both the targeted chromosomal sequence and the donor polynucleotide. The linearized donor polynucleotide may be integrated into the chromosomal sequence at the double strand break site by ligation between the donor polynucleotide and the cleaved chromosomal sequence via the NHEJ process.
In embodiments where a linear donor polynucleotide comprising a short donor sequence is introduced into a cell or embryo, the short donor sequence may be integrated into the chromosomal sequence at the double strand break site via the NHEJ process. Integration can be via blunt-ended ligation between the short donor sequence and the chromosomal sequence at the double-strand break site. Alternatively, integration may be via cohesive end (i.e., with 5 'or 3' overhangs) linkages between short donor sequences flanked by overhangs compatible with those generated by RNA-targeted endonucleases in the excised chromosomal sequence.
Typically, cells are maintained under conditions suitable for cell growth and/or maintenance. Suitable cell culture conditions are well known in the art and are described, for example, in Santiago et al, (2008) PNAS105:5809-5814; moehle et al, (2007) PNAS104:3055-3060; urnov et al, (2005) Nature 435:646-651; and Lombardo et al (2007) Nat.Biotechnology 25:1298-1306. Those skilled in the art understand that methods of culturing cells are known in the art and may and will vary depending on the cell type. In all cases, routine optimization can be used to determine the best technique for a particular cell type.
Embryos can be cultured in vitro (e.g., in cell culture). Typically, if desired, the embryos are cultured at the necessary O 2/CO2 ratio at the appropriate temperature and in the appropriate medium to allow expression of the RNA endonucleases and pegRNA. Suitable non-limiting examples of media include M2, M16, KSOM, BMOC, and HTF media. Those skilled in the art will appreciate that the culture conditions may and will vary depending on the species of embryo. In all cases, routine optimization can be used to determine optimal culture conditions for a particular embryo species. In some cases, the cell line may be derived from an embryo (e.g., an embryonic stem cell line) cultured in vitro.
Alternatively, the embryo may be cultured in vivo by transferring the embryo into the uterus of a female host. Generally, the female host is from the same or similar species as the embryo. Preferably, the female host is pseudopregnant. Methods for preparing pseudopregnant female hosts are known in the art. In addition, methods of transferring embryos into female hosts are known. Culturing embryos in vivo allows embryo development and can result in live production of animals derived from the embryo. Such animals will contain a modified chromosomal sequence in each cell of the body.
(G) Cell and embryo types
Various eukaryotic cells and embryos are suitable for use in this method. For example, the cell may be a human cell, a non-human mammalian cell, a non-mammalian vertebrate cell, an invertebrate cell, an insect cell, a plant cell, a yeast cell, or a single cell eukaryotic organism. Typically, the embryo is a non-human mammalian embryo. In particular embodiments, the embryo may be a single cell non-human mammalian embryo. Exemplary mammalian embryos (including single cell embryos) include, but are not limited to, mouse, rat, hamster, rodent, rabbit, cat, canine, ovine, porcine, bovine, equine, and primate embryos. In other embodiments, the cells may be stem cells. Suitable stem cells include, but are not limited to, embryonic stem cells, ES-like stem cells, fetal stem cells, adult stem cells, pluripotent stem cells, induced pluripotent stem cells, multipotent stem cells, oligopotent stem cells, monopotent stem cells, and the like. In an exemplary embodiment, the cell is a mammalian cell.
Non-limiting examples of suitable mammalian cells include Chinese Hamster Ovary (CHO) cells, baby Hamster Kidney (BHK) cells; mouse myeloma NSO cells, mouse embryonic fibroblasts 3T3 cells (NIH 3T 3), mouse B lymphoma a20 cells; mouse melanoma B16 cells; mouse myoblasts C2C12 cells; mouse myeloma SP2/0 cells; mouse embryo mesenchymal C3H-10T1/2 cells; mouse cancer CT26 cells, mouse prostate DuCuP cells; mouse mammary gland EMT6 cells; mouse liver cancer Hepa1c1c7 cells; mouse myeloma J5582 cells; mouse epithelial MTD-1A cells; mouse myocardial MyEnd cells; mouse kidney RenCa cells; mouse pancreatic RIN-5F cells; mouse melanoma X64 cells; mouse lymphoma YAC-1 cells; rat glioblastoma 9L cells; rat B lymphoma RBL cells; rat neuroblastoma B35 cells; rat hepatoma cells (HTCs); buffalo rat liver BRL 3A cells; canine kidney cells (MDCK); canine mammary gland (CMT) cells; rat osteosarcoma D17 cells; rat monocyte/macrophage DH82 cells; monkey kidney SV-40 transformed fibroblasts (COS 7); monkey kidney CVI-76 cells; african green monkey kidney (VERO-76) cells; human embryonic kidney cells (HEK 293, HEK 293T); human cervical cancer cells (HELA); human lung cells (W138); human hepatocytes (Hep G2); human U2-OS osteosarcoma cells, human A549 cells, human A431 cells, and human K562 cells. A broad list of mammalian cell lines can be found in the american type culture collection catalog (ATCC, mamassas, va.).
(V) methods of modifying chromosomal sequences or regulating expression of chromosomal sequences using fusion proteins
Another aspect of the disclosure includes methods for modifying a chromosomal sequence or regulating expression of a chromosomal sequence in a cell or embryo. The method comprises introducing into a cell or embryo (a) at least one fusion protein or nucleic acid encoding at least one fusion protein, wherein the fusion protein comprises a CRISPR/Cas-like protein or fragment thereof and an effector domain, and (b) at least one pegRNA or DNA encoding pegRNA, wherein pegRNA directs the CRISPR/Cas-like protein of the fusion protein to a targeted site in a chromosomal sequence, and the effector domain of the fusion protein modifies the chromosomal sequence or modulates expression of the chromosomal sequence.
Fusion proteins comprising a CRISPR/Cas-like protein or fragment thereof and an effector domain are described in detail in section (II) above. Typically, the fusion proteins disclosed herein further comprise at least one nuclear localization signal. Nucleic acids encoding the fusion proteins are described in section (III) above. In some embodiments, the fusion protein may be introduced into the cell or embryo as an isolated protein (which may further comprise a cell penetrating domain). Furthermore, the isolated fusion protein may be part of a protein-RNA complex comprising pegRNA. In other embodiments, the fusion protein may be introduced into the cell or embryo as an RNA molecule (which may be capped and/or polyadenylation). In other embodiments, the fusion protein may be introduced into a cell or embryo as a DNA molecule. For example, the fusion protein and pegRNA may be introduced into a cell or embryo as discrete DNA molecules or as part of the same DNA molecule. Such DNA molecules may be plasmid vectors.
In some embodiments, the method further comprises introducing at least one zinc finger nuclease into the cell or embryo. Zinc finger nucleases are described in section (II) (d) above. In other embodiments, the method further comprises introducing at least one donor polynucleotide into the cell or embryo. The donor polynucleotide is described in detail in section (IV) (d) above. Means for introducing molecules into cells or embryos and means for culturing cells or embryos are described in sections (IV) (e) and (IV) (f), respectively, above. Suitable cells and embryos are described in section (IV) (g) above.
In certain embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or a modified fokl cleavage domain), the method may comprise introducing one fusion protein (or nucleic acid encoding one fusion protein) and two pegRNA (or DNA encoding two pegRNA) into the cell or embryo. Two pegRNA direct the fusion protein to two different target sites in the chromosomal sequence, wherein the fusion protein dimerizes (e.g., forms a homodimer) such that two cleavage domains can introduce a double-strand break into the chromosomal sequence. In embodiments where the optional donor polynucleotide is not present, double strand breaks in the chromosomal sequence may be repaired by a non-homologous end joining (NHEJ) repair process. Because NHEJ is error-prone, a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during repair of a break. Thus, the targeted chromosomal sequence may be modified or inactivated. For example, a single nucleotide change (SNP) may produce an altered protein product, or a shift in the reading frame of the coding sequence may inactivate or "knock out" the sequence such that no protein product is produced. In embodiments where an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence at the targeted site during double strand break repair. For example, in embodiments in which the donor sequence is flanked by an upstream sequence and a downstream sequence that have substantial sequence identity to the upstream and downstream sequences, respectively, of the target site in the chromosomal sequence, the donor sequence may be exchanged with or integrated into the chromosomal sequence at the targeted site during repair conducted by homology-directed repair Cheng Jie. Alternatively, in embodiments where the donor sequence is flanked by compatible overhangs (or the compatible overhangs are created by RNA-guided endonuclease sites), the donor sequence may be directly linked to the cleaved chromosomal sequence by a non-homologous repair process during double-strand break repair. The exchange or integration of the donor sequence into the chromosomal sequence modifies the targeted chromosomal sequence or introduces an exogenous sequence into the chromosomal sequence of the cell or embryo.
In other embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or a modified fokl cleavage domain), the method may comprise introducing two different fusion proteins (or nucleic acids encoding two different fusion proteins) and two pegRNA (or DNA encoding two pegRNA) into the cell or embryo. The fusion proteins may be different, as detailed in section (II) above. Each pegRNA directs the fusion protein to a specific target site in the chromosomal sequence, where the fusion protein dimerizes (e.g., forms a heterodimer) such that two cleavage domains can introduce a double-strand break into the chromosomal sequence. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
In other embodiments where the effector domain of the fusion protein is a cleavage domain (e.g., a fokl cleavage domain or modified fokl cleavage domain), the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein), a pegRNA (or DNA encoding a pegRNA), and a zinc finger nuclease (or nucleic acid encoding a zinc finger nuclease), wherein the zinc finger nuclease comprises a fokl cleavage domain or modified fokl cleavage domain. pegRNA directs the fusion protein to a specific chromosomal sequence and the zinc finger nuclease is directed to another chromosomal sequence, wherein the fusion protein and the zinc finger nuclease dimerize such that the cleavage domain of the fusion protein and the cleavage domain of the zinc finger nuclease can introduce a double-strand break into the chromosomal sequence. See fig. 1B. In embodiments where an optional donor polynucleotide is not present, the resulting double-strand break may be repaired by a non-homologous repair process such that a deletion of at least one nucleotide, an insertion of at least one nucleotide, a substitution of at least one nucleotide, or a combination thereof may occur during the repair of the break. In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide may be exchanged with or integrated into the chromosomal sequence during double-strand break repair by homology-based repair processes (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity to the targeted site in the chromosomal sequence, respectively), or by non-homologous repair processes (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
In other embodiments where the effector domain of the fusion protein is a transcriptional activation domain or transcriptional repressor domain, the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein) and a pegRNA (or DNA encoding a pegRNA). pegRNA direct the fusion protein to a specific chromosomal sequence, wherein the transcriptional activation domain or transcriptional repressor domain activates or represses, respectively, expression of the targeted chromosomal sequence. See fig. 2A.
In an alternative embodiment where the effector domain of the fusion protein is an epigenetic modified domain, the method may comprise introducing into the cell or embryo a fusion protein (or nucleic acid encoding a fusion protein) and a pegRNA (or DNA encoding a pegRNA). pegRNA direct the fusion protein to a specific chromosomal sequence, wherein the epigenetic modification domain modifies the structure of the targeted chromosomal sequence. See fig. 2B. Epigenetic modifications include acetylation, methylation, and/or nucleotide methylation of histones. In some cases, structural modifications of the chromosomal sequence result in changes in expression of the chromosomal sequence.
(VI) genetically modified cells and animals
The present disclosure includes genetically modified cells, non-human embryos, and non-human animals comprising at least one chromosomal sequence that has been modified using RNA-guided endonuclease-mediated or fusion protein-mediated processes, e.g., using the methods described herein. The present disclosure provides cells comprising at least one RNA-guided endonuclease or fusion protein encoding a targeted chromosomal sequence of interest or a DNA or RNA molecule encoding a fusion protein, at least one pegRNA, and optionally one or more donor polynucleotides. The present disclosure also provides a non-human embryo comprising at least one DNA or RNA molecule encoding an RNA-guided endonuclease or fusion protein targeting a chromosomal sequence of interest, at least one pegRNA, and optionally one or more donor polynucleotides.
The present disclosure provides genetically modified non-human animals, non-human embryos, or animal cells comprising at least one modified chromosomal sequence. The modified chromosomal sequence may be modified such that it (1) is inactivated, (2) has altered expression or produces an altered protein product, or (3) comprises an integrated sequence. Using the methods described herein, the chromosomal sequence is modified with RNA-guided endonuclease-mediated or fusion protein-mediated processes.
As discussed, one aspect of the present disclosure provides genetically modified animals in which at least one chromosomal sequence has been modified. In one embodiment, the genetically modified animal comprises at least one inactivated chromosomal sequence. The modified chromosomal sequence may be inactivated such that the sequence is not transcribed and/or the functional protein product is not produced. Thus, a genetically modified animal comprising an inactivated chromosomal sequence may be referred to as a "knockout" or a "conditional knockout". The inactivated chromosomal sequence may include a deletion mutation (i.e., deleting one or more nucleotides), an insertion mutation (i.e., inserting one or more nucleotides), or a nonsense mutation (i.e., a single nucleotide is substituted with another nucleotide such that a stop codon is introduced). As a result of the mutation, the targeted chromosomal sequence is inactivated and the functional protein is not produced. The inactivated chromosomal sequence does not contain exogenously introduced sequences. Also included herein are genetically modified animals in which two, three, four, five, six, seven, eight, nine, or ten or more chromosomal sequences are inactivated.
In another embodiment, the modified chromosomal sequence may be altered such that it encodes a variant protein product. For example, a genetically modified animal comprising a modified chromosomal sequence may comprise targeted point mutations or other modifications such that an altered protein product is produced. In one embodiment, the chromosomal sequence may be modified such that at least one nucleotide is altered and the expressed protein comprises an altered amino acid residue (missense mutation). In another embodiment, the chromosomal sequence may be modified to include more than one missense mutation such that more than one amino acid is altered. Furthermore, the chromosomal sequence may be modified to have three nucleotide deletions or insertions such that the expressed protein comprises a single amino acid deletion or insertion. The altered protein or variant protein may have altered properties or activity, such as altered substrate specificity, altered enzymatic activity, altered kinetic rate, etc., as compared to the wild-type protein.
In another embodiment, the genetically modified animal may comprise at least one chromosomally integrated sequence. Genetically modified animals comprising an integrated sequence may be referred to as "knockins" or "conditional knockins". The chromosomally integrated sequence may encode, for example, an ortholog protein, an endogenous protein, or a combination of both. In one embodiment, sequences encoding orthologous or endogenous proteins may be integrated into the chromosomal sequence encoding the protein such that the chromosomal sequence is inactivated, but the exogenous sequence is expressed. In this case, the sequence encoding the ortholog protein or endogenous protein may be operably linked to a promoter control sequence. Alternatively, sequences encoding ortholog or endogenous proteins may be integrated into the chromosomal sequence without affecting expression of the chromosomal sequence. For example, the sequence encoding the protein may be integrated into a "safe harbor (safe harbor)" locus, such as the Rosa26 locus, the HPRT locus, or the AAV locus. The disclosure also includes genetically modified animals in which two, three, four, five, six, seven, eight, nine, or ten or more sequences (including protein-encoding sequences) are integrated into the genome.
The sequence encoding chromosomal integration of the protein may encode a wild-type form of the protein of interest or may encode a protein comprising at least one modification such that an altered form of the protein is produced. For example, a sequence encoding chromosomal integration of a protein associated with a disease or disorder may comprise at least one modification such that an altered form of the produced protein causes or enhances the associated disorder. Alternatively, the sequence encoding chromosomal integration of a protein associated with a disease or disorder may comprise at least one modification such that altered forms of the protein prevent the development of the associated disorder.
In further embodiments, the genetically modified animal may be a "humanized" animal comprising at least one sequence encoding chromosomal integration of a functional human protein. The functional human protein may be free of corresponding orthologs in the genetically modified animal. Alternatively, the wild-type animal from which the genetically modified animal is derived may comprise an ortholog corresponding to a functional human protein. In this case, the orthologous sequences in the "humanized" animal are inactivated such that no functional protein is produced, and the "humanized" animal comprises at least one sequence encoding chromosomal integration of a human protein.
In another embodiment, the genetically modified animal may comprise at least one modified chromosomal sequence encoding a protein such that the expression pattern of the protein is altered. For example, regulatory regions controlling protein expression, such as promoters or transcription factor binding sites, may be altered such that the protein is overproduced, or tissue-specific or temporal expression of the protein is altered, or a combination thereof. Alternatively, conditional knockout systems can be used to alter the expression pattern of a protein. Non-limiting examples of conditional knockout systems include the Cre-lox recombination system. The Cre-lox recombination system comprises a Cre recombinase (a site-specific DNA recombinase) that catalyzes the recombination of nucleic acid sequences between specific sites (lox sites) in a nucleic acid molecule. Methods for generating time and tissue specific expression using this system are known in the art. Typically, genetically modified animals are produced that have chromosomal sequences flanked by lox sites. The genetically modified animal comprising chromosomal sequences flanking lox can then be hybridized to another genetically modified animal expressing Cre recombinase. Offspring animals are then produced that contain the chromosomal sequences flanking lox and Cre recombinase, and recombine the chromosomal sequences flanking lox, resulting in a deletion or inversion of the chromosomal sequence encoding the protein. The expression of Cre recombinase can be temporally and conditionally regulated to effect temporally and conditionally regulated recombination of chromosomal sequences.
In any of these embodiments, the modified chromosomal sequences of the genetically modified animals disclosed herein can be heterozygous. Alternatively, the modified chromosomal sequence of the genetically modified animal may be homozygous.
The genetically modified animals disclosed herein can be crossed to produce animals comprising more than one modified chromosomal sequence or to produce animals in which one or more modified chromosomal sequences are homozygous. For example, two animals comprising the same modified chromosomal sequence may be crossed to produce an animal in which the modified chromosomal sequence is homozygous. Alternatively, animals with different modified chromosomal sequences may be crossed to produce animals comprising two modified chromosomal sequences.
For example, a first animal comprising an inactivated chromosomal sequence gene "X" may be crossed with a second animal comprising a chromosomal integrated sequence encoding a human gene "X" protein to produce a "humanized" gene "X" offspring comprising both the inactivated gene "X" chromosomal sequence and the chromosomal integrated human gene "X" sequence. In addition, the humanized gene "X" animal can be crossed with a humanized gene "Y" animal to produce a humanized gene X/gene Y offspring. Those skilled in the art will appreciate that many combinations are possible.
In other embodiments, animals comprising the modified chromosomal sequence may be crossed to combine the modified chromosomal sequence with other genetic backgrounds. As non-limiting examples, other genetic backgrounds may include wild-type genetic backgrounds, genetic backgrounds with deletion mutations, genetic backgrounds with another targeted integration, genetic backgrounds with non-targeted integration.
As used herein, the term "animal" refers to a non-human animal. The animal may be an embryo, a larva or an adult. Suitable animals include vertebrates such as mammals, birds, reptiles, amphibians, crustaceans and fish. Examples of suitable mammals include, but are not limited to, rodents, companion animals, livestock and primates. Non-limiting examples of rodents include mice, rats, hamsters, gerbils and guinea pigs. Suitable companion animals include, but are not limited to, cats, dogs, rabbits, hedgehog and ferrets. Non-limiting examples of livestock include horses, goats, sheep, pigs, cattle, llamas, and alpacas. Suitable primates include, but are not limited to, pigtail, chimpanzee, lemur, macaque, marmoset, spider monkey, squirrel monkey, and green monkey. Non-limiting examples of birds include chickens, turkeys, ducks, and geese. Alternatively, the animal may be an invertebrate such as an insect, nematode or the like. Non-limiting examples of insects include fruit flies and mosquitoes. An exemplary animal is a rat. Non-limiting examples of suitable rat strains include Dahl salt sensitivity, fischer344, lewis, long Evans Hooded, sprague-Dawley, and Wistar. In one embodiment, the animal is not a genetically modified mouse. In each of the above iterations of the suitable animals of the invention, the animals do not include exogenously introduced randomly integrated transposon sequences.
Another aspect of the present disclosure provides a genetically modified cell or cell line comprising at least one modified chromosomal sequence. The genetically modified cell or cell line can be derived from any of the genetically modified animals disclosed herein. Alternatively, the chromosomal sequence in the cell may be modified using the methods described herein, as described above (in the paragraph describing chromosomal sequence modification in animals). The disclosure also includes lysates of the cells or cell lines.
Typically, the cell is a eukaryotic cell. Suitable host cells include fungi or yeasts such as Pichia (Pichia), saccharomyces (Saccharomyces) or Schizosaccharomyces (Schizosaccharomyces); insect cells such as SF9 cells from spodoptera frugiperda (Spodoptera frugiperda) or S2 cells from drosophila melanogaster (Drosophila melanogaster); and animal cells, such as mouse, rat, hamster, non-human primate, or human cells. Exemplary cells are mammalian. The mammalian cells may be primary cells. In general, any primary cell that is susceptible to double strand breaks can be used. The cells may be of various cell types, such as fibroblasts, myoblasts, T or B cells, macrophages, epithelial cells, and the like.
When a mammalian cell line is used, the cell line may be any established cell line or a primary cell line that has not been described. The cell line may be adherent or non-adherent, or the cell line may be grown under conditions that promote adherent, non-adherent, or organotypic growth using standard techniques known to those skilled in the art. Non-limiting examples of suitable mammalian cells and cell lines are provided in section (IV) (g) herein. In other embodiments, the cells may be stem cells. Non-limiting examples of suitable stem cells are provided in section (IV) (g).
The present disclosure also provides genetically modified non-human embryos comprising at least one modified chromosomal sequence. The chromosomal sequence in the embryo may be modified using the methods described herein, as described above (in the paragraph describing chromosomal sequence modification in animals). In one embodiment, the embryo is a non-human fertilized single cell stage embryo of an animal species of interest. Exemplary mammalian embryos (including single cell embryos) include, but are not limited to, mouse, rat, hamster, rodent, rabbit, cat, canine, ovine, porcine, bovine, equine, and primate embryos.
Examples
The cis-acting regulatory element greatly enhances the efficiency of PE2 and PE3
In a proof of concept experiment, a wild-type double element (dENE) for nuclease expression from rice TWIFB1 was added to the 3' end of the leader editor expression cassette immediately after the stop codon and before the mRNA terminator (e.g., BGH), as shown in fig. 2, where dENE was reverse transcribed into the RNA sequence contained in the leader editor mRNA. As shown in fig. 3 and 4, K562 cells were nuclear transfected with a leader editor (PE 2) expression construct containing dENE sequences in the construct or no dENE sequences in the construct, and also a pegRNA expression construct targeting HEK3 sites for different types of editing. In the case of PE3, an additional nicking guide RNA expression construct is added to the nuclear transfection mixture. Cells were harvested three days after nuclear transfection for Next Generation Sequencing (NGS) analysis of lead editing efficiency. As shown, the lead editor (PE 2) expression cassette contained 3' -UTR dENE greatly enhanced the editing efficiency of +1ctt insertion and +5g deletion editing for HEK3 targets to 8-fold (fig. 3), and in the case of PE3 enhanced the editing efficiency of +1t to a conversion editing and +1t deletion and +5g to C conversion editing for HEK3 targets to 2-fold (fig. 4).
The cis-acting regulatory elements enhance lead editing efficiency in a cell type dependent manner
In two independent experiments, K562 or HEK293 cells were nuclear transfected with a leader editor (PE 2) expression construct containing dENE sequences in the construct or NO dENE sequences in the construct and also a pegRNA expression construct targeting the HEK3 site (GGCCCAGACTGAGCACGTGATGG [ SEQ ID NO:26], underlined bases representing PAM sequences) for different types of editing, as shown in FIGS. 5 and 6. In the case of PE3, an additional nicking guide RNA expression construct is added to the nuclear transfection mixture. Cells were harvested three days after nuclear transfection for Next Generation Sequencing (NGS) analysis of lead editing efficiency. As shown in fig. 5, in K562 cells, the lead editor (PE 2) expression cassette contained 3' -UTR dENE enhanced the editing efficiency of +1CTT insertion and +5G deletion editing for HEK3 targets by about 50%, and similarly enhanced the editing efficiency of +1T-to-a conversion, +1CTT insertion and +5G deletion, and +1T deletion and +5G-to-C editing for HEK3 targets by about 50% in the case of PE 3. In contrast, in HEK293 cells, it was not demonstrated that the lead editor (PE 2) expression cassette contained 3' -UTR dENE enhanced the editing efficiency of the same type of editing for the same HEK3 target, and also in the case of PE3, as shown in fig. 6. This means that the 3' -UTR element dENE acts in a cell type dependent manner to enhance the efficiency of pilot editing of test targets in certain cell types.
Claims (30)
1. A synthetic nucleic acid composition comprising: i) A sequence encoding a CRISPR-Cas protein, ii) a sequence encoding a reverse transcriptase, and iii) a sequence encoding a cis-acting regulatory element.
2. The synthetic nucleic acid composition of claim 1, wherein the CRISPR-Cas protein is nCas-H840A.
3. The synthetic nucleic acid composition of any one of claims 1 or 2, wherein the reverse transcriptase is M-MLV-RT.
4. The synthetic nucleic acid composition of any one of claims 1-3, wherein the cis-acting regulatory element is dENE or ENE.
5. The synthetic nucleic acid composition of any one of claims 1-3, wherein the cis-acting regulatory element is sRSM1.
6. The synthetic nucleic acid composition of any one of claims 1-5, wherein the nucleic acid is DNA.
7. The synthetic nucleic acid composition of any one of claims 1-5, wherein the nucleic acid is RNA.
8. The synthetic nucleic acid composition of any one of claims 1-7, wherein the composition further comprises an expression promoter.
9. The synthetic nucleic acid composition of claim 8, wherein the composition is in an expression vector.
10. The synthetic nucleic acid composition of claim 8, wherein the composition is incorporated into a transfected virus.
11. The synthetic nucleic acid composition of any one of claims 1-10, wherein the cis-acting regulatory element is located after the stop codon of the CRISPR-Cas9 sequence and before an mRNA terminator.
12. The synthetic nucleic acid composition of any one of claims 1-11, further comprising a leader editing guide RNA (pegRNA), wherein the pegRNA is derived from one of PE1, PE2, and PE 2.
13. An amino acid sequence encoded by the synthetic nucleic acid composition of any one of claims 1-11.
14. A method of modifying an endogenous DNA sequence, the method comprising:
a) Providing: i) An operable expression vector comprising a synthetic nucleic acid composition comprising: 1) a sequence encoding a CRISPR-CasII type system protein, 2) a sequence encoding a reverse transcriptase, and 3) a sequence comprising a cis-acting regulatory element; ii) a leader editing guide RNA (pegRNA) comprising a Primer Binding Site (PBS); and iii) a cell comprising a target endogenous DNA sequence that is at least 50% complementary to the PBS;
b) Transfecting said cell comprising the endogenous DNA sequence of interest with the synthetic nucleic acid composition of the invention and pegRNA; and
C) Culturing the transfected cells such that the endogenous DNA sequence is subjected to the desired modification.
15. The method of claim 14, wherein the CRISPR-Cas type II system protein is a Cas9 protein.
16. The method according to claim 14, wherein the endogenous DNA sequence is at least 75% complementary to the PBS.
17. The method according to claim 14, wherein the endogenous DNA sequence is at least 90% complementary to the PBS.
18. The method according to claim 14, wherein the endogenous DNA sequence is at least 95% complementary to the PBS.
19. The method according to claim 14, wherein the endogenous DNA sequence is at least 98% complementary to the PBS.
20. The method according to claim 14, wherein the endogenous DNA sequence is 100% complementary to the PBS.
21. The method of any one of claims 14-20, wherein the CRISPR-Cas protein is nCas-H840A.
22. The method of any one of claims 14-21, wherein the reverse transcriptase is M-MLV-RT.
23. The method of any one of claims 14-22, wherein the cis-acting regulatory element is Dene or ENE.
24. The method of any one of claims 14-22, wherein the cis-acting regulatory element is sRSM1.
25. The method of any one of claims 14-24, wherein the operable expression vector is DNA.
26. The method of any one of claims 14-24, wherein the operable expression vector is RNA.
27. The synthetic nucleic acid composition of any one of claims 14-26, wherein the composition is incorporated into a transfected virus.
28. The synthetic nucleic acid composition of any one of claims 14-27, wherein the cis-acting regulatory element is located after the stop codon of the CRISPR-Cas9 sequence and before an mRNA terminator.
29. The method of any one of claims 14-28, wherein the pegRNA is derived from one of PE1, PE2, and PE 3.
30. The method of any one of claims 14-29, wherein the CRISPR/CasII-type system protein encoded in an operable expression vector is introduced into the cell.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63/243423 | 2021-09-13 | ||
US202263363247P | 2022-04-20 | 2022-04-20 | |
US63/363247 | 2022-04-20 | ||
PCT/US2022/076175 WO2023039508A1 (en) | 2021-09-13 | 2022-09-09 | Improved prime editing system efficiency with cis-acting regulatory elements |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118234854A true CN118234854A (en) | 2024-06-21 |
Family
ID=91498269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280075555.5A Pending CN118234854A (en) | 2021-09-13 | 2022-09-09 | Improved lead editing system efficiency using cis-acting regulatory elements |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118234854A (en) |
-
2022
- 2022-09-09 CN CN202280075555.5A patent/CN118234854A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019201344B2 (en) | Crispr-based genome modification and regulation | |
AU2021200636B2 (en) | Using programmable dna binding proteins to enhance targeted genome modification | |
US20210207165A1 (en) | Crispr-based genome modification and regulation | |
CN118234854A (en) | Improved lead editing system efficiency using cis-acting regulatory elements | |
WO2023039508A1 (en) | Improved prime editing system efficiency with cis-acting regulatory elements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |