WO2024044689A2 - Small molecule-inducible gene expression switches - Google Patents

Small molecule-inducible gene expression switches Download PDF

Info

Publication number
WO2024044689A2
WO2024044689A2 PCT/US2023/072823 US2023072823W WO2024044689A2 WO 2024044689 A2 WO2024044689 A2 WO 2024044689A2 US 2023072823 W US2023072823 W US 2023072823W WO 2024044689 A2 WO2024044689 A2 WO 2024044689A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
polynucleotide
seq
transgene
exon
Prior art date
Application number
PCT/US2023/072823
Other languages
French (fr)
Other versions
WO2024044689A3 (en
Inventor
Eric Tzy-Shi WANG
Yu Zhou
Original Assignee
University Of Florida Research Foundation, Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Florida Research Foundation, Incorporated filed Critical University Of Florida Research Foundation, Incorporated
Publication of WO2024044689A2 publication Critical patent/WO2024044689A2/en
Publication of WO2024044689A3 publication Critical patent/WO2024044689A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2740/00Reverse transcribing RNA viruses
    • C12N2740/00011Details
    • C12N2740/10011Retroviridae
    • C12N2740/16011Human Immunodeficiency Virus, HIV
    • C12N2740/16041Use of virus, viral particle or viral elements as a vector
    • C12N2740/16043Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2840/00Vectors comprising a special translation-regulating system
    • C12N2840/44Vectors comprising a special translation-regulating system being a specific part of the splice mechanism, e.g. donor, acceptor

Definitions

  • Recombinant viruses e.g., recombinant adeno-associated viruses (AAV) and recombinant lentiviruses, etc.
  • AAV adeno-associated viruses
  • lentiviruses lentiviruses
  • aspects of the application relate to recombinant nucleic acids containing a transgene comprising a ligand-responsive alternatively spliced exon that controls expression of an mRNA (e.g., encoding a protein of interest) or a functional RNA (e.g., a regulatory' RNA) encoded by the transgene.
  • the recombinant nucleic acids are delivered to a host cell (e.g., ex vivo or in vivo).
  • a host cell nucleic acid e.g., one or more genomic alleles
  • ligand-responsive alternative splicing is used to regulate AAV-delivered gene expression.
  • ligand-responsive alternative splicing can confer greater control of therapeutic cargoes and also potentially avoid potential toxicides from constitutive over-expression of therapeutic cargoes.
  • Previous aptazyme-based approaches lack modularity and has leaky, non-zero basal expression.
  • Other efforts using drug-responsive alternative splicing patterns to control AAV-mediated gene expression potentially affect many other cryptic splice sites and are restricted to a single specific molecule.
  • aspects of the present invention relate to the use of alternative splicing switches in mammalian cells and sequence designs that allow for ligand-inducible regulation of gene expression or knockdown.
  • the approach uses rational design, coupled to deep sequencing, to characterize behavior of hundreds to thousands of synthetic intron/ exon cassettes.
  • riboswitch designs that facilitate small molecule-mediated regulation of alternative splicing and multiple sequence variants are described. Unlike switches that promote exon inclusion this design promotes exon skipping upon drug induction.
  • These designed switches can dynamically regulate protein isoforms, protein expression levels, and production of RNA interference triggers. This approach is termed SPlicing by Ligand Induction for Controllable Expression based on Riboswitch (SPLICER).
  • the designs are compact in size and promoter-independent., making them useful regulatory' tools that can be incorporated into gene expression cassettes for basic and translational applications.
  • the designs can be useful for controlling the expression patterns (e.g., timing of expression by addition of a ligand) of therapeutically useful genes.
  • polynucleotides of the present disclosure comprise a ligand- responsive sequence.
  • the polynucleotide is a transgene, such as one comprising a cassette which is responsive to certain ligands.
  • the cassettes comprise ligand-responsive sequences which regulate alternative splicing.
  • cassettes may comprise ligand-responsive aptamers that can bind to exogenous or endogenous ligands which results in conformational changes in the transcript of the transgene that effects splicing patterns.
  • transgenes of the present disclosures are provided in vectors.
  • the transgenes are provided in recombinant viral genomes that can be provided in AAV particles.
  • the present disclosure relates to a polynucleotide comprising a transgene, wherein the transgene comprises at least one alternatively spliced exon, at least two introns flanking the alternatively spliced exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the at least one alternative exon and the ligand- responsive aptamer along with the introns.
  • aspects of the present disclosure relate to the observation that alternatively-spliced exons may be used in the context of viral vectors (e.g., AA V viral vectors or lentivirus viral vectors) to effectively regulate the expression of a coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein).
  • a coding region of interest e.g., a coding region of a transgene that encodes a therapeutic protein.
  • the alternatively-spliced exons regulate a coding region of interest in a condition-responsive manner.
  • condition-responsive manner means that the alternatively-spliced exon regulates the expression of a coding region of interest in a manner that is controlled or influenced by one or more conditions, including, but not limited to, environmental conditions, intracellular conditions, extracellular conditions, type of cell (e.g, liver versus kidney cell), gene expression pattern, or disease state. Accordingly, the present disclosure relates to a new approach for regulating expressi on of a coding region of interest (e.g, a coding region of a transgene that encodes a therapeutic protein) from recombinant viral vectors, optionally in a condition-responsive manner, by coupling the expression of a coding region of interest with an alternatively-spliced exon.
  • a coding region of interest e.g, a coding region of a transgene that encodes a therapeutic protein
  • the present disclosure describes a variety of exemplary configurations and methods of coupling the expression of a coding region of interest (or multiple portions of coding regions) with an alternatively-spliced exon, but any suitable arrangement or configuration is contemplated so long as the expression of the coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein) is configured to come under regulatory control of an alternatively- spliced exon.
  • aspects of the present disclosure relate a polynucleotide comprising a sequence encoding a ligand-responsive sequence, wherein the polynucleotide is capable of being alternatively spliced in the presence of a ligand to produce a first RNA or a second RNA.
  • the polynucleotide comprises an alternative exon operably linked to the ligand-responsive sequence.
  • the first RNA comprises the alternative exon, wherein the second RNA does not comprise the alternative exon.
  • the first RNA encodes a long isoform of an RNA of interest and/or the second RNA encodes a short isoform of the RNA of interest.
  • the first RNA encodes an RNA of interest.
  • the first RNA is not operably linked to a pre-mature stop codon (e.g., does contain a pre-mature stop codon).
  • the first RNA is operably linked to a start codon (e.g., contains a start codon).
  • the second RNA encodes an RNA of interest.
  • the second RNA is not operably linked to a pre-mature stop codon (e.g., does not contain a pre-mature stop codon).
  • the second RNA is operably linked to a start codon (e.g., contains a start codon).
  • the RNA of interest is an interfering RNA.
  • RNA of interest is a microRNA.
  • second RNA encodes the microRNA.
  • the RNA of interest encodes a protein.
  • the RNA of interest encodes a CRISPR/Cas nuclease or a guide RNA (gRNA).
  • the RNA of interest encodes a therapeutic RNA and/or a therapeutic protein.
  • the ligand-responsive sequence is a risdiplam-responsive sequence or a branaplam-responsive sequence.
  • the alternative exon comprises a first portion of the risdiplam-responsive sequence and an intron downstream of the alternative exon comprises a second portion of the risdiplam-responsive sequence.
  • the first portion of the risdiplam-responsive sequence comprises a WGA sequence and the second portion of the risdiplam-responsive sequence comprises a GTAAGW sequence.
  • the alternative exon further comprises a AGGAAG sequence which is 5’ to the WGA sequence.
  • the alternative exon further comprises an upstream sequence which is 5’ to the AGGAAG sequence. In some embodiments, the upstream sequence comprises at least 10 nucleotides. In some embodiments, the alternative exon further comprises a downstream sequence which is 3’ to the AGGAAG sequence and 5’ to the WGA sequence. In some embodiments, the downstream sequence comprises at least 6 nucleotides.
  • the risdiplam-responsive sequence comprises NNNNNNNNAGGAAGNNNNNNNNNNNNNNNNAWGAGTAAGW (SIR.) ID NO: 2183), wherein N is any nucleotide and W is A or T. In some embodiments, the risdiplam-responsive sequence comprises YWWKWWWMKYAGGAAGYTAKTWGTTAWGAGTAAGW (SEQ ID NO:
  • the risdiplam-responsive sequence comprises
  • R is A or G
  • Y is C or T
  • W is A or T.
  • the branaplam-responsive sequence comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
  • the ligand-responsive sequence is a tetracycline-responsive sequence.
  • the tetracycline-responsive sequence is located in a tetracycline-responsive aptamer comprising the sequence TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T.
  • the polynucleotide comprises, from 5’ to 3’, an upstream 3' splice site, a first stem region, a 5' splice site reverse complementary sequence, the tetracyclineresponsive sequence, a 5' splice site, a sequence comprising GT, the second stem region, and a downstream 3’ splice site.
  • the upstream 3’ splice site is at least 20 nucleotides long and the two nucleotides at the 3’ end are AG.
  • the downstream 3’ splice site is at least 20 nucleotides long.
  • the first stem region and the second stem region are at least 2 nucleotides long.
  • the 5’ reverse complementary sequence and the 5’ splice site are at least 7 nucleotides long.
  • polynucleotides of the present disclosure are transgenes.
  • the present disclosure relates to a polynucleotide comprising a transgene, wherein the transgene comprises: at least one alternative exon, at least two introns flanking the alternative exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the alternative exon, the at least two introns, and the ligand- responsive aptamer from the transgene.
  • the at least one alternative exon and the at least two introns are from the same gene. In some embodiments, wherein the alternative exon and the at ieast two introns are from different genes.
  • the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity 7 , relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide have a nucleic acid sequence set forth as in SEQ ID NO: 2081 , 2089, 2092, 2097, 2135, 2142, or 2143.
  • the alternative exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
  • the alternative exon comprises a polynucleotide have a nucleic acid sequence set forth as in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 21 14, 2137, 2236, or 2247-2256.
  • At least one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the introns comprise a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 21 18, 2121 , 2127, 2129, 2130, or 2141.
  • the exons comprise a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA- 16 2 gene.
  • the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2281 .
  • the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2281.
  • the ligand-response aptamer comprises a polynucleotide comprising a nucleic acid sequence that is 20-60 nucleotides in length.
  • the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
  • the ligand-responsive aptamer comprises a polynucleotide having at nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 21 12, or 2187-2189.
  • the ligand-responsive aptamer binds to tetracycline.
  • the ligand-responsive aptamer is located in the intron downstream of the alternative exon.
  • the ligand-responsive aptamer is located in the intron upstream of the alternative exon.
  • the ligand-responsive aptamer is located in the alternative exon .
  • the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239 and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
  • the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260.
  • the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • a vector comprises the transgene.
  • the vector is a plasmid.
  • a cell comprises the vector.
  • the cell is a mammalian cell.
  • the cell is a human cell or cell from a human subject.
  • a recombinant viral genome comprises the transgene.
  • the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
  • rAAV recombinant adeno-associated virus
  • transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
  • ITR inverted terminal repeat
  • AAV ITR sequences are AAV2 ITR sequences.
  • the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, or 2138.
  • the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • an rAAV particle comprises the recombinant viral genome.
  • the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHlO, AAV2 (Y ⁇ >F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
  • further comprising at least one helper plasmid further comprising at least one helper plasmid.
  • helper plasmid comprises a rep gene and a cap gene.
  • the rep gene encodes Rep78, Rep68, Rep52, or Rep40
  • the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
  • the rAAV particle comprises two helper plasmids.
  • the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
  • the present disclosure relates to a method of treating a disease or condition in a subject comprising administering the recombinant viral genome or the rAAV particle. In some embodiments, wherein the subject is a mammal.
  • the mammal is a human.
  • the recombinant viral genome or rAAV particle is administered to the subject at least one time.
  • the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
  • the viral genome or rAAV particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intraci st ernally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
  • the present disclosure relates to a method of regulating the expression of a transgene in a subject comprising administering to a subject a polynucleotide comprising the transgene comprising at least one alternative exon, at least two introns flanking the alternative exon, and a ligand-responsive aptamer, and a ligand, wherein the presence of the ligand results in splicing out the alternative exon, the at least two introns, and the ligand- responsive aptamer from the transgene.
  • the transgene further comprises two exons flanking the alternative exon, the at least two introns, the ligand-responsive aptamer comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide having the nucleic acid sequence set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene comprises a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA-16 2 gene.
  • miRNA microRNA
  • the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2281 .
  • the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2281.
  • the at least one alternative exon comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
  • the at least one alternative exon comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
  • At least one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the introns comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
  • the ligand-responsive aptamer comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
  • the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239 and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
  • the ligand is tetracycline.
  • the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 21 11, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the transgene comprises a polynucleotide having a nucleic acid sequence set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • transgene is provided in a recombinant viral genome.
  • the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
  • rAAV recombinant adeno-associated virus
  • transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
  • ITR inverted terminal repeat
  • AAV ITR sequences are AAV2 ITR sequences.
  • the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260.
  • the recombinant viral genome is provided in a an rAAV particle.
  • the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y ⁇ »F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
  • the rAAV particle further comprises at least one helper plasmid.
  • helper plasmid comprises a rep gene and a cap gene.
  • the rep gene encodes Rep78, Rep68, Rep52, or Rep40
  • the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
  • rAAV'' particle comprises two helper plasmids.
  • the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a Ela gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
  • RNA level of the exclusion isoform of about 300-400-fold.
  • administering results in a fold increase in the protein level of the exclusion isoform of about 5-25-fold.
  • splicing out the alternative exon, the at least two introns, and the aptamer from the transgene results in the production of a functional start codon in the transgene.
  • splicing out the alternative exon, the at least two introns, and the aptamer results in the removal of a pre-mature stop codon from the transgene.
  • the present disclosure further relates to the following embodiments.
  • aspects relate to a recombinant viral genome capable of delivering expressing) a transgene or coding region thereof in a subject, wherein said recombinant viral genome comprises at least one alternatively-spliced exon and a coding region of the transgene.
  • the alternatively-spliced exon undergoes differential splicing in a condition-responsive manner to result in different spliced transcripts (e.g., mRNA isoforms), whereby the alternatively-spliced exon has been either retained (“spliced in”) or not retained (“spliced-out”) in the resulting spliced transcripts.
  • the alternatively-spliced exon may be spliced-out of the resulting transcript; however, in a cancer cell, the alternatively-spliced exon may be spliced-in the resulting transcript.
  • the alternatively-spliced exon regulates the expression of the coding region of interest by virtue of being either present (spliced-in) or not present (spliced-out) in the resulting mRNA transcript isoform.
  • the alternatively-spliced exon may be provided in the form of a transgene comprising the alternatively-spliced exon, one or more introns (or portion(s) thereof), and one or more additional exons (e.g., constitutive exons).
  • transgenes comprising an alternatively-spliced exon may be referred to herein as comprising an “alternatively-spliced exon cassettes.”
  • the configuration of the alternatively-spliced exon cassettes and transgenes is not limited in any way, and examples of such configurations are provided in the Figures.
  • the transgene comprises an alternatively-spliced exon, one or more introns (or portion(s) thereof) and one or more exons.
  • the one or more exons can be constitutive exons (i.e., those that are retained in all mRNA isoforms resulting from splicing).
  • the transgene or the alternatively-spliced exon cassette comprises one intron (or portion thereof).
  • the intron (or portion thereof) is located 3’ or 5’ to an alternatively-spliced exon.
  • the transgene or the alternatively-spliced exon cassette comprises two introns (or portion(s) thereof) (e.g., whereby the one or more introns are flanking introns, i.e., introns that are immediately upstream or downstream of the alternatively-spliced exon).
  • an alternative exon cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778. In some embodiments, an alternative exon cassette comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
  • the alternatively-spliced exon comprises at least one modification, relative to a naturally occurring alternatively-spliced exon. In some embodiments, the alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon. In some embodiments, all native start codons located 5’ to the heterologous start codon are disrupted or deleted.
  • the alternatively-spliced exon is located 5’ to the coding region of the transgene.
  • the alternatively-spliced exon cassette comprises two alternatively-spliced exons, each with flanking introns.
  • the two alternatively-spliced exons are adjacent.
  • the constitutive exon is located 5’ to the two alternatively-spliced exons.
  • each alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon. In some embodiments, all native start codons located 5’ to the heterologous start codon of the 5 ’-most alternatively-spliced exon are disrupted or deleted.
  • only one of the two alternatively-spliced exons is retained in the spliced transcript.
  • the 5 ’-most alternatively-spliced exon is retained in the spliced transcript.
  • the 3 ’-most alternatively-spliced exon is retained in the spliced transcript.
  • the alternatively-spliced exon(s) and flanking intron(s) are located within the coding region of the transgene.
  • the alternatively-spliced exon comprises a heterologous, in-frame stop codon.
  • the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
  • the heterologous stop codon elicits nonsense-mediated decay.
  • the alternatively-spliced exon is spliced-in or retained in the presence of one or more conditions (z.e., in a condition-responsive manner) to result in an mRNA isoform comprising the alternatively-spliced exon and a coding region of interest.
  • the one or more conditions comprise the conditions that define one cell type from another.
  • the one or more conditions comprise the intracellular conditions that define a healthy cell state from a diseased cell state.
  • the one or more conditions comprise the presence or absence of activated T cells and/or the presence or absence of a state of inflammation.
  • the one or more conditions comprise one or more signs or symptoms of a disease state, and/or the presence or absence of one or more disease markers. In still other embodiments, the one or more conditions comprise the expression level and/or activity of the endogenous protein that corresponds to the protein encoded by the coding region of interest in the alternatively-spliced exon cassette of the recombinant virus genome.
  • the alternatively-spliced exon may be spliced-in, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory' sequence).
  • the alternatively-spliced exon may be spliced-in, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory' sequence).
  • the alternatively-spliced exon may be spliced-out, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory' sequence that is removed by the splicing-out of the exon).
  • the alternatively-spliced exon may be spliced-out, and the coding region of interest may be downregulated (e.g., if the alternatively- spliced exon comprises a positive regulatory sequence that is removed by the splicing-out of the exon).
  • the one or more conditions may result in the splicing-in or splicing-out of the alternatively-spliced exon.
  • the one or more conditions may cause the alternatively-spliced exon to be spliced-in, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory sequence).
  • the one or more conditions may cause the alternatively-spliced exon to be spliced- in, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory/ sequence).
  • the one or more conditions may cause the alternatively-spliced exon to be spliced-out, and the coding region of interest may be upregulated (e.g, if the alternatively-spliced exon comprises a negative regulatory sequence that is removed by the splicing-out. of the exon).
  • the one or more conditions may cause the alternatively-spliced exon to be spliced-out, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory sequence that is removed by the splicing-out of the exon).
  • the alternatively-spliced exon comprises an alternatively-spliced exon from a gene selected from the group consisting of: ABCC1, AK 125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EX0C7, EZH2, FAM120A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIPIL1, F0XRED1, FUBP3, GALT, GATA3, G0LGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, 1RAK4, IRF1, KCTD13
  • the alternatively-spliced exon comprises an alternatively-spliced exon from or derived from an alternatively-spliced exon of a gene selected from the group consisting of CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIFI3A, and/or PICALM.
  • the alternatively-spliced exon is or is derived from an alternatively-spliced exon of CAMK2B.
  • the alternatively-spliced exon is or is derived from an alternatively-spliced exon of PKP2.
  • the alternatively-spliced exon is or is derived from an alternatively-spliced exon of LGMN. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of NRAP. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of VPS39. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of KSR1. In some embodiments, the alternatively- spliced exon is or is derived from an alternatively-spliced exon of PDLIM3.
  • the alternatively-spliced exon is or is derived from an alternatively-spliced exon of BINI. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of ARFGAP2. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of KIF13A. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of PICALM.
  • the alternatively-spliced exon is or is derived from exon 11 of BINI.
  • the alternatively-spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37.
  • the alternatively- spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having a. nucleic acid sequence as set forth in SEQ ID NO: 37.
  • the alternatively- spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38.
  • the alternatively-spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
  • a component e.g., an alternative exon; an intronic sequence
  • a gene e.g., BINI, SMN1
  • a non-natural context e.g., inserted into the nucleic acid sequence of a transgene
  • a component e.g., an alternative exon; an intronic sequence which is “derived from” a gene (e.g., BINI, SMNf) may be derived from the gene in that the component is taken from its wild-type or natural context and put into a non-natural context (e.g., inserted into the nucleic acid sequence of a. transgene), and may also be derived from the gene in that the nucleic acid sequence of the component is modified, relative to the wild-type or natural nucleic acid sequence of said component. Modifications to the various components (e.g., introns, exons, etc.) are described elsewhere herein.
  • the alternatively-spliced exon comprises an alternatively-spliced exon comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 23-44.
  • flanking intron(s) is a native flanking intron(s) (or portion(s) thereof) of the alternatively-spliced exon(s).
  • the flanking intron(s) (or portion(s) thereof) comprises at its 5’ end a 5’ splice donor site.
  • the flanking intron(s) (or portion(s) thereof) comprises at its 3’ end a 3’ splice donor site.
  • the flanking intron(s) (or portion(s) thereof) comprises no modifications, relative to a naturally occurring intron (or portion thereof).
  • flanking intron(s) (or portion(s) thereof) comprises at least one modification, relative to a naturally occurring intron (or portion thereof).
  • the modification is a substitution or deletion of one or more nucleotides.
  • the flanking intron(s) (or portion(s) thereof) is a regulated intron (or portion thereof).
  • flanking intron(s) is or is derived from an intron of a gene selected from the group consisting of ABCC1, AK 125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7
  • MAP MAP, SMNI, SNRNP70, STAT6, TBC1D1, T1MM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK.2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM.
  • flanking intron(s) is or is derived from an intron of SMNI. In some embodiments, the flanking intron(s) which is or is derived from an intron of SMNI flanks a constitutive exon. In some embodiments, the flanking intron(s) is or is derived from intron 6 and/or intron 7 of SMNI .
  • flanking intron which is derived from SMNI intron 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMNI intron 6,
  • the flanking intron which is derived from SMNI intron 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103.
  • the flanking intron which is derived from SMN1 intron 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103.
  • the flanking intron which is derived from SMN 1 intron 7 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 7.
  • the flanking intron which is derived from SMN1 intron 7 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 104. In some embodiments, the flanking intron which is derived from SMN1 intron 7 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 104.
  • flanking intron(s) is or is derived from an intron of BINI . In some embodiments, the flanking intron(s) which is or is derived from an intron of BINI flanks an alternative exon. In some embodiments, the flanking intron(s) is or is derived from intron 10 and/or intron 11 of BINI.
  • flanking intron(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • the flanking intron(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • flanking intron(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • the flanking intron(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • flanking intron(s) comprises an intron comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 1-22, 103, and 104.
  • the constitutive exon is an exon which is natively associated with the coding region of the transgene. In some embodiments, the constitutive exon is not a exon which is natively associated with the coding region of the transgene. In some embodiments, the constitutive exon is or is derived from the same gene as the alternatively-spliced exon(s). In some embodiments, the gene is the gene from which the coding region of the transgene is also derived. In some embodiments, the constitutive exon is not from or derived from the same gene as the alternatively-spliced exon(s).
  • the coding region of the transgene is or is derived from a coding region of a gene selected from the group consisting of MBNLl, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C, hnRNP D, hnRNP DL, hnRNP F, hnRNP H, hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FMRP, MTM1, MTMR2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1, NEB, ACTA, TPMS, TPM2, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, L.
  • POLGI GAA, AGL, PYGM:, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcoglycan-encoding gene, a delta-sarcoglycan- encoding gene, TCAP, TRIM32, FKRP, FXN, POMT1, FKTN, POMT2, POMGnTl, DAG1, ANO5, PLECl, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TOR1AIP1, POGLUT
  • the coding region of the transgene is or is derived from MTM1, CAPN3, or FXN. In some embodiments, the coding region of the transgene is or is derived from FXN. In some embodiments, the coding region of the transgene is or is derived from MTM1 . In some embodiments, the coding region of the transgene which is or is derived from MTM1 comprises a polynucleotide having at least 70%, at least 75%, at ieast 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1881 . In some embodiments, the coding region of the transgene which is or is derived from MTM1 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1881.
  • the coding region of the transgene is or is derived from CAPN3.
  • the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1882.
  • the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1882.
  • a recombinant viral genome of the present disclosure further comprises a promoter.
  • the promoter is a native promoter of the coding region of the transgene. In some embodiments, the promoter is not a native promoter of the coding region of the transgene. In some embodiments, the promoter is constitutive. In some embodiments, the promoter is inducible. In some embodiments, the promoter is a cell-specific promoter. In some embodiments, the promoter is a tissue-specific promoter.
  • the promoter is selected from the group consisting of an EFI alpha promoter, beta actin promoter, CMV, muscle creatine kinase promoter, C5-12 muscle promoter, MHCK7, CBh, synapsin, MECP2, enolase, GFAP, Desmin, and CAG promoter.
  • the promoter is an MHCK7 promoter.
  • an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880.
  • an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
  • the promoter drives expression of the transgene (e.g., expression of the product encoded by the coding region of interest).
  • the promoter is a ubiquitous promoter.
  • a ubiquitous promoter is a promoter selected from the group consisting of: an EFl alpha promoter, a beta actin promoter, CMV, CBh, and CAG promoter.
  • the promoter is a tissue-specific promoter, such as a muscle- or heart-biased promoter.
  • a tissue-specific promoter such as a muscle- or heart-biased promoter, is a promoter selected from the group consisting of: a muscle creatine kinase promoter, a C5-12 muscle promoter, MHCK7, and Desmin.
  • the promoter is a neuronal -biased promoter.
  • a neuronal -biased promoter is a promoter selected from the group consisting of: synapsin and VIECP2.
  • the promoter is an astrocyte-biased promoter.
  • an astrocyte-biased promoter is a GFAP promoter.
  • the coding region of the transgene comprises at least one modification, relative to a coding region of a naturally occurring gene.
  • the modification is an addition, substitution or deletion of at least one nucleotide.
  • the coding region of the transgene comprises a deletion of a native start codon, or a portion thereof.
  • the coding region of the transgene comprises an addition of a non-native stop codon, or a portion thereof.
  • the transgene comprises one or more recombinant introns (e.g., a 3’ UTR intron).
  • the one or more recombinant introns e.g., a 3' UTR intron
  • NMD nonsense mediated decay
  • the naturally occurring gene is a gene selected from the group consisting of MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C, hnRNP I), hnRNP DL, hnRNP F, hnRNP I L hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAFI 5, EWSR1, M ATR3, TIA1, FMRP, MTM1, MTMR.2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, K M 11.40..
  • the naturally occurring gene is MTM1, CAPN3, or FXN. In some embodiments, the naturally occurring gene is MTM1. In some embodiments, the naturally occurring gene is CAPN3. In some embodiments, the naturally occurring gene is FXN.
  • the coding region of the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
  • the coding region of the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
  • the recombinant viral genome is a recombinant genome from an adeno-associated vims (rAAV), lentivirus, retrovirus, or foamyvirus.
  • the recombinant viral genome is from an AAV
  • the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
  • the ITR sequences comprise AAV1, AAV2, AAV5, AAV7, AAV8, or AAV9 ITR sequences.
  • the recombinant viral genome is from a lentivirus.
  • the alternatively-spliced exon cassette is located on the minus strand of the lentivirus genome.
  • a recombinant viral genome of the present disclosure further comprises a 3’ untranslated region (UTR) that is endogenous or exogenous to the transgene.
  • the exogenous 3’ UTR is the 3’ UTR from bovine growth hormone, SV40, EBV, or Myc.
  • the exogenous 3’ UTR is SV40.
  • the SV40 3’ UTR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set. forth in SEQ ID NO: 1883.
  • the SV40 3’ UTR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1883.
  • the exogenous 3’ UTR comprises a polyadenylation (pA) signal.
  • the pA signal is an SV40 pA signal.
  • the viral particle comprising a viral genome according to any embodiment of the present disclosure.
  • the viral particle is an rAAV particle.
  • the rAAV particle comprises an AAV serotype selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
  • the rAAV particle comprises AAV serotype 9.
  • the rAAV particle comprises an AAV derivative or pseudotype selected from the group consisting of an AAV2-AAV3 hybrid, AAVrh.10, AA.Vhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, A AV-H AE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y">F), AAV8 (Y733F), AAV2. I 5, AAV2.4, AAVM4I, and AAVr3.45.
  • AAV2-AAV3 hybrid AAVrh.10, AA.Vhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15
  • the viral particle further comprises at least one helper plasmid.
  • the helper plasmid comprises a rep gene and a cap gene.
  • the rep gene encodes Rep78, Rep68, Rep52, or Rep40.
  • the cap gene encodes a VP1 , VP2, and/or VP3 region of the viral capsid protein.
  • the viral particle comprises two helper plasmids.
  • the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a Ela gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
  • the viral particle is a recombinant lentivirus particle.
  • the lentivirus is a human immunodeficiency virus (HIV1 or HIV2), a feline immunodeficiency virus (FIV), a bovine immunodeficiency virus (BIV), a caprine arthritis encephalitis virus, an equine infectious anemia virus, a jembrana disease virus, a puma lentivirus, aimian immunodeficiency virus, or a visna-maedi vims.
  • the viral particle further comprises a viral envelope.
  • aspects of the invention relate to a method of treating a disease or condition in a subject comprising administering a recombinant viral genome or a viral particle according to any embodiment of the present disclosure to the subject.
  • the subject is a mammal.
  • the mammal is a human.
  • the recombinant viral genome or viral particle is administered to the subject at least one time.
  • the recombinant viral genome or viral particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
  • the recombinant viral genome or viral particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intra ci sternally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more ceils, tissues, or organs.
  • the recombinant viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravitreai injection.
  • the disease or condition is a disease or condition selected from the group consisting of Dentatorubrai-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI ), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCAl), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA
  • aspects of the invention relate to a method of regulating transgene expression (e.g, comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) using a viral vector comprising a recombinant viral genome as described herein, wherein the transgene, or coding region of the transgene, are under the regulatory' control of an alternatively-spliced exon.
  • the method comprises inserting into the recombinant viral genome at least one alternatively-spliced exon and at least one coding region of interest (e.g., which encodes a therapeutic protein), wherein the expression of the at least one coding region of interest is regulated by the alternative-spliced exon.
  • the regulation of the coding region of interest depends on (a) the presence or absence of positive or negative regulatory control sequences in the alternatively-spliced exon, and (b) whether the alternatively-splice exon is spliced-in (i.e., retained) or spliced-out (i.e., removed) from the final mRNA transcript isoform.
  • the recombinant viral genome may be configured with one or more additional introns, exons, and/or regulatory sequences (e.g., promoters, enhancers, and the like that control transcription from the recombinant viral genome).
  • the alternatively- splice exon may be comprised on a cassette (which may be referred to as an alternatively-spliced exon cassette), comprising the alternatively-spliced exon(s) and one or more introns, which may be inserted into the recombinant viral genome in a manner that couples it to the coding region of interest, such that the expression of the coding region of interest comes under regulatory control of the alternatively-spliced exon of the cassette.
  • a cassette which may be referred to as an alternatively-spliced exon cassette
  • introns which may be inserted into the recombinant viral genome in a manner that couples it to the coding region of interest, such that the expression of the coding region of interest comes under regulatory control of the alternatively-spliced exon of the cassette.
  • the transgene comprises an alternatively-spliced exon, optionally one or more introns (or portion(s) thereof), optionally one or more constitutive exons, and a coding region of interest.
  • aspects of the invention relate to a method of regulating transgene (e.g., comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) expression using a viral vector comprising a recombinant viral genome as described herein.
  • transgene e.g., comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein
  • the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron (or portion thereof), and a coding region of a transgene; (b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively-spliced exon; (c) disrupting or deleting all native start codons located 5’ to the heterologous start codon; and (d) deleting or disrupting one or more native start codons, or a portion(s) thereof, from the coding region of the transgene.
  • the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron (or portion thereof), and a coding region of a transgene; (b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively- spliced exon; (c) disrupting or deleting all native start codons located 5’ to the heterologous start codon; and (d) adding a heterologous 3’ UTR, or a portion thereof, to the coding region of the transgene.
  • translation of the heterologous 3’ UTR elicits nonsense mediated decay.
  • translation of the heterologous 3’ UTR elicits nonsense mediated decay.
  • the constitutive exon, alternatively-spliced exon, and flanking intron (or portion thereof) are each located 5’ to the coding region of the transgene.
  • aspects of the invention relate to a method of regulating transgene (e.g, comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) expression using a viral vector comprising a recombinant viral genome as described herein.
  • the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises an alternatively-spliced exon and at least one flanking intron (or portion thereof) within the coding region of the transgene; and (b) introducing into the alternatively -spliced exon a heterologous, in-frame stop codon upstream of the next 5' splice junction.
  • the heterologous, in-frame stop codon elicits nonsense-mediated decay.
  • the in-frame stop codon is inserted at least 100 nucleotides, at least 95 nucleotides, at least 90 nucleotides, at least 85 nucleotides, at least 80 nucleotides, at least 75 nucleotides, at least 70 nucleotides, at least 65 nucleotides, at least 60 nucleotides, at least 55 nucleotides, at least 50 nucleotides, at least 45 nucleotides, at least 40 nucleotides, at least 35 nucleotides, at least 30 nucleotides, at least 25 nucleotides, at least 20 nucleotides, at least 15 nucleotides, at least 10 nucleotides, or at least 5 nucleotides, or between 1 to 5 nucleotides upstream of the next 5’ splice junction.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ direction
  • transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively- spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative Gs-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively -spliced exon comprising at its 3’ end a heterologous ATG start codon, and (iii) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon.
  • all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation, (ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iii) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation; (iv) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, and (v) a nucleotide sequence comprising a second exonic sequence
  • nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative c/s-acting element; and (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon;
  • nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon.
  • all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative as-acting element; (iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; and (iv) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation; (v) a nucleotide sequence comprising a second intro
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element, and (iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3 : splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start, codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first, intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises a first alternatively-spliced exon comprising a positive or negative cA-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ spiice donor site and at its 3’ end a 3’ splice acceptor site; and (v) a nucleotide
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first, portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at
  • aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first, intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cis-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a
  • aspects of the disclosure relate to a transgene comprising: (i) a constitutive exon and one or more intronic sequences, each from a first gene; (ii) an alternatively-spliced exon cassette, and (iii) a coding region of interest from a third gene.
  • the alternatively- spliced exon cassette comprises: (a) an alternatively-spliced exon, and (b) flanking intronic sequences.
  • each of (a) and (b) are from a second gene.
  • the alternatively-spliced exon comprises an ATG start codon at its 3’ end.
  • the first and second gene are the same gene, the first and third gene are the same gene; or all of the first, second, and third genes are the same gene.
  • the first gene is survival motor neuron 1 (SMN1).
  • the constitutive exon comprises exon 6 of SMN1, or a portion thereof. In some embodiments, the constitutive exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102. In some embodiments, the constitutive exon comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102.
  • the one or more intronic sequences of (i) are or are derived from intron 6 and/or intron 7 of SMN1.
  • the one or more intronic sequences of (i) comprise(s) a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104.
  • the one or more intronic sequences of (i) comprise(s) a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104.
  • the second gene is a gene selected from the group consisting of: CAMK2B, PKP2, LGMN, ⁇ RAP. VPS39, KSR 1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
  • the second gene is bridging integrator 1 (BINI)
  • the alternatively-spliced exon comprises exon 11 of BINI .
  • the alternatively-spliced exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38.
  • the alternatively-spliced exon comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38.
  • flanking intronic sequences of (ii) are or are derived from intron 10 and/or intron 11 of BINI.
  • the flanking intronic sequences of (ii) each comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16,
  • the flanking intronic sequences of (ii) each comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16.
  • the alternatively-spliced exon cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
  • the alternatively-spliced exon cassette comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
  • the third gene is myotubularin 1 (MTM1) or calpain 3 (CA.PN3).
  • the coding region of interest comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
  • the coding region of interest comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
  • the alternatively-spliced exon comprises 1-3 nucleic acid substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon within the alternatively- spliced exon.
  • the ATG start codon is formed in the alternatively-spliced exon by 1 nucleic acid substitution.
  • the ATG start codon is formed in the alternatively-spliced exon by 2 nucleic acid substitutions.
  • the ATG start codon is formed in the alternatively-spliced exon by 3 nucleic acid substitutions.
  • the alternatively-spliced exon is retained in the spliced transcript. In some embodiments, all native start codons located 5’ to the ATG start codon located within the alternatively-spliced exon are disrupted or deleted.
  • the alternatively-spliced exon cassette is located 5’, relative to the coding region of interest. In some embodiments, the constitutive exon is located 5’, relative to the alternatively-spliced exon cassette. In some embodiments, the one or more intronic sequences of (i) flank the alternatively-spliced exon cassette.
  • the alternatively-spliced exon comprises a heterologous, in-frame stop codon.
  • the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
  • the heterologous, inframe stop codon elicits nonsense-mediated decay.
  • the alternatively-spliced exon is retained in the spliced transcript in distinct tissues. In some embodiments, the alternatively-spliced exon is retained in the spliced transcript in skeletal muscle. In some embodiments, the alternatively-spliced exon is not retained in the spliced transcript in heart and/or liver tissue.
  • flanking intronic sequences of (ii)(b) are or are derived from native flanking introns of the alternatively-spliced exon. In some embodiments, the flanking intronic sequences of (ii)(b) each comprise at least one modification, relative to a naturally occurring intronic sequence. In some embodiments, the modification is a substitution or deletion of one or more nucleic acids.
  • the ATG start codon is located at the 3’ end of the alternatively- spliced exon. In some embodiments, the ATG start codon is in the same reading frame as the coding region of interest. In some embodiments, the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon. In some embodiments, the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
  • the first 10 nucleotides of the flanking intronic sequence which is immediately 3’ to the alternatively-spliced exon comprise 1 -5 nucleotide substitutions, relative to the wild-type flanking intronic sequence which is immediately 3’ to the wild-type alternatively- spliced exon.
  • the one or more intronic sequences of (i) each comprise at least one modification, relative to a naturally occurring intronic sequence.
  • the modification is a substitution or deletion of one or more nucleic acids.
  • the coding region of interest comprises at least one modification, relative to a naturally occurring coding region of the third gene.
  • the modification is a substitution or deletion of one or more nucleic acids.
  • the coding region of interest comprises a deletion or disruption of a native start codon.
  • the coding region of interest comprises at least one heterologous stop codon.
  • the at least one heterologous stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
  • the at least one heterologous stop codon elicits nonsense-mediated decay.
  • a transgene as described in any embodiment of the disclosure further comprises a 3’ untranslated region (UTR).
  • the 3’ UTR is SV40.
  • the SV40 3 : UTR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1883.
  • the SV40 3’ UTR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1883.
  • the 3’ UTR comprises a polyadenylation (pA) site and a cleavage site.
  • the polyadenylation site is an SV40 pA site.
  • a transgene as described in any embodiment of the disclosure further comprises a promoter, wherein the promoter is located 5’, relative to all of (i), (ii), and (iii).
  • the promoter is a tissue-specific promoter.
  • the tissue-specific promoter is an MHCK7 promoter.
  • an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880.
  • an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
  • the alternatively-spliced exon cassette comprises a nucleic acid sequence which is 450 to 650 nucleotides in length.
  • aspects of the disclosure relate to a recombinant viral genome comprising a transgene as described in any embodiment of the disclosure.
  • the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
  • the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
  • the AAV ITR sequences are AAV2 ITR sequences.
  • an AAV2 ITR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879.
  • the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ II) NO: 105 or SEQ ID NO: 106.
  • the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
  • the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV- HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAA Shi 110, AAV2 (Y-»F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
  • the rAAV particle further comprises at least one helper plasmid.
  • the helper plasmid comprises a rep gene and a cap gene.
  • the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VPS region of the viral capsid protein.
  • the r,AAV particle comprises two helper plasmids.
  • the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
  • the transgene comprises: (i) a constitutive exon and one or more intronic sequences; (ii) an alternative exon cassette; and (iii) a coding region of interest.
  • the alternative exon cassette comprises: (a) an alternatively-spliced exon; (b) at least a portion of the intron immediately upstream of the alternatively-spliced exon, and (c) at least a portion of the intron immediately downstream of the alternatively-spliced exon.
  • the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end: (1) the 3’ end of the alternatively-spliced exon comprises 1-3 nucleic acid substitutions relative to the wild-type alternatively-spliced exon to form an ATG start codon, and (2) the first 10 nucleotides of the intron immediately downstream of the alternatively-spliced exon comprise 1-5 nucleic acid substitutions relative to the wild-type intron immediately downstream of the wild-type alternatively-spliced exon.
  • the 1-5 nucleic acid substitutions of (2) increase splice site strength.
  • any wild-type start codons within the alternatively-spliced exon located upstream of the ATG start codon at the 3’ end of the alternatively-spliced exon are disrupted or deleted.
  • the recombinant viral genome further comprises a tissue-specific promoter upstream of the alternative exon cassette.
  • the coding region of interest is or is derived from a naturally occurring coding region of MTM1 or CAPN3.
  • the tissue-specific promoter is an MHCK7 promoter.
  • the alternative exon is exon 11 of the BIN I gene.
  • the constitutive exon is exon 6 of the SMN1 gene.
  • the alternative exon cassette promotes skeletal muscle expression of the coding region of interest and reduces cardiac muscle expression of the coding region of interest.
  • the alternative exon cassette is approximately 600 nucleotides in length.
  • aspects of the disclosure relate to a method of treating a disease or condition in a subject comprising administering a recombinant viral genome or an rAAV particle according to any embodiment, of the present disclosure to the subject.
  • the subject is a mammal.
  • the mammal is a human.
  • the recombinant viral genome or rAAV particle is administered to the subject at least one time.
  • the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
  • the viral genome or rAAV particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intraci sternal ly, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
  • the viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravitreal injection.
  • the disease or condition is a disease or condition selected from the group consisting of Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7)
  • Facioscapulohumeral muscular dystrophy Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy, Distal muscular dystrophy, Emery-Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (VXD), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PM A).
  • PD Parkinson's disease
  • VXD motor neuron disease
  • PBP Progressive bulbar palsy
  • PM A Progressive muscular atrophy
  • PLS Primary lateral sclerosis
  • SMA Spinal muscular atrophy
  • a bladder cancer a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader- Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronucl ear myopathy .
  • PWS Primary lateral
  • FIG. 1 is a schematic illustrating the concept of a recombinant viral genome (e.g., rAAV or lentivirus) modified to include a transgene comprising a coding region of interest (e.g., encoding a therapeutic protein) under regulatory’ control by an alternatively-spliced exon (or an alternatively-spliced exon cassette).
  • Step (b) shows the formation of a pre-mRNA which includes the coding region of interest and the alternatively-spliced exon.
  • Step (c) shows the splicing-out or splicing-in of the alternatively -spliced exon based on one or more conditions (e.g, cell type, disease state, or other intracellular environmental signal).
  • a negative regulatory c/x-element such as mRNA degradation element
  • the removal of a negative regulatory c/x-element may lead to the upregulation or increased expression of the transgene, i.e., the increased expression of the product encoded by the coding region of interest.
  • a positive regulatory c/s-element such as a translation start signal
  • the maintenance of a positive regulatory c/s-element will result in the upregulation or increased expression of the transgene, i.e., the increased expression of the product encoded by the coding region of the transgene.
  • a negative regulatory czs-element such as mRNA degradation element
  • the maintenance of a negative regulatory czs-element may lead to the downregulation or decreased expression of the transgene, i.e., the decreased expression of the product encoded by the coding region of the transgene.
  • FIG. 2 shows different models of alternative splicing which could be utilized in the nucleic acid vectors of the present disclosure. From top to bottom: a skipped exon model of alternative splicing, a retained intron model of alternative splicing, an alternative 5’ splice site model of alternative splicing, an alternative 3’ splice site model of alternative splicing, a mutually exclusive exon model of alternative splicing, and an alternative last exon model of alternative splicing.
  • White regions represent constitutive exons throughout.
  • Gray regions represent alternatively-spliced exons.
  • One or more of the constitutive exons may be modified to contain a coding region of interest, e.g., a coding region of a transgene that encodes a therapeutic protein.
  • FIGs. 3A-3B show two schematics representing exemplary recombinant viral genomes.
  • FIG. 3A shows a typical recombinant adeno-associated virus (rAAV) genome design.
  • Two AAV inverted terminal repeats (ITRs) flank the transgene.
  • the transgene may comprise a coding region of interest (e.g., encoding a therapeutic protein) under regulator ⁇ ' control of an alternatively-spliced exon (or cassette comprising an alternatively -spliced exon).
  • the cassettes e.g., in the context of a transgene
  • FIG. 3B shows a typical recombinant lentivirus genome design.
  • the 5’ and 3’ sequences of the lentivirus genome flank the packaging signal (PSI), rev response elements (RRE), and transgene.
  • the transgene may comprise a coding region of interest (e.g, encoding a therapeutic protein) under regulator ⁇ / control of an alternatively-spliced exon (or cassette comprising an alternatively-spliced exon).
  • the promoter and nucleotide sequence comprising the transgene sequence must be encoded on the minus strand of the lentivirus genome to prevent splicing during virus production and packaging.
  • the cassettes e.g., in the context of a transgene
  • cassettes e.g., in the context of a transgene
  • the cassettes may inserted into a recombinant viral vector genome and which comprise an alternatively-spliced exon and comprising, in some embodiments, at least one positive or negative regulatory cA-element.
  • Non-limiting examples of positive or negative regulatory cA-elements located within the alternatively-spliced exons can include, without limitation, a translation start codon, a translation stop codon, a binding site for an RNA binding protein that serves to positively regulate mRNA translation, a binding site for an RNA binding protein that serves to negatively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g., an siRNA) that serves to negatively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to positively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to negatively regulate mRNA stability or degradation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA stability or degradation, a ligand-responsive sequence, or a binding site for a nucle
  • the disclosure embraces any genetic element or region positioned within, or at least associated with, an alternatively-spliced exon which exerts a positive or negative control on the overall expression of a transgene (e.g, encoding a therapeutic protein or a miRNA).
  • a transgene e.g, encoding a therapeutic protein or a miRNA.
  • the cis-element is within the alternatively -spliced exon, but in other cases, the cis-element is separate from, but at least associated with, the alternatively-spliced exon, such that it becomes spliced-in or spliced-out at the same time as the alternatively-spliced exon.
  • the cassettes may include one or more additional components, including one or more introns.
  • the constitutive exons not comprising the coding region of interest are represented by narrow rectangles
  • introns are represented as dashed lines
  • the alternatively-spliced exons are represented as shaded narrow rectangles.
  • the exon or exons comprising the coding region are indicated as solid thick white rectangles.
  • FIG. 4A is a schematic of a cassette (e.g, in the context of a transgene) embodiment whereby the alternatively-spliced exon is upstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is to the 5’ of the exon encoding the coding region of interest.
  • FIG. 4B is a schematic of a cassette (e.g., in the context of a transgene) embodiment whereby the alternatively-spliced exon is downstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively- spliced exon is to the 3 : of the exon encoding the coding region of interest.
  • FIG. 4A is a schematic of a cassette (e.g, in the context of a transgene) embodiment whereby the alternatively-spliced exon is upstream of the exon encoding the coding region of interest. S
  • FIG. 4C is a schematic of a cassette (e.g, in the context of a transgene) embodiment whereby the alternatively-spliced exon is positioned between two separate exons encoding portions of the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is between the exons encoding the portions of the coding region of interest.
  • FIG. 4D shows a nonlimiting embodiment of an approach that puts a gene sequence under control of a ligand- responsive sequence.
  • a naturally occurring gene can be engineered to become under the control of a ligand by inserting the cassette into the gene. The portions upstream and downstream of the site at which the cassette is inserted then become separate exons, FI €».
  • FIG. 4E shows a non -limiting embodiment of a transgene comprising an alternatively- spliced cassette.
  • the expression cassette comprises a general structure comprising at least, one alternative exon, at least two introns flanking the alternative exon, a ligand-response sequence, and a plurality of splice sites.
  • FIG. 4F shows a non-limiting embodiment of a transgene comprising a non-continuous start codon split by the alternatively spliced cassette.
  • the exons comprise a non-continuous start, codon such that the 3’ most nucleotides of the upstream exon comprise an A or AT and the 5’ most nucleotides of the downstream exon comprise a TG or G, respectively.
  • FIG. 4G shows a non-limiting embodiment of an alternatively spliced exon cassette comprising a stop codon that is inserted between two consecutive coding sequences of a gene (e.g., two exons of a gene).
  • the exons flanking the cassette are not translated in the absence of ligand and the presence of a pre-mature stop codon in the alternative exon.
  • FIG. 4H shows a non-limiting embodiment of an alternatively spliced exon cassette that is inserted in a coding sequence for a regulatory RNA molecule.
  • the two exons encode an interfering RNA, such as a miRNA, such that removal of the alternative exon produces a functional miRNA molecule that is capable of regulating gene expression.
  • FIG. 41 shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence.
  • an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are brought together to form an RNA that encodes the protein of interest.
  • FIG. 41 shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence.
  • an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exon
  • FIG. 4J shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence.
  • an intron splits two exons.
  • Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are disrupted and the RNA cannot encode the protein of interest.
  • FIG. 4K shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms.
  • the alternative exon is flanked by introns.
  • Ligand binding results in exclusion of the alternative exon in the spliced RNA thereby encoding the shorter isoform of the protein.
  • FIG. 4L shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms. The alternative is flanked by introns. Ligand binding results in inclusion of the alternative exon in the spliced RNA thereby encoding the longer isoform of the protein. The absence of the ligand results in exclusion of the alternative exon from the spliced RNA which encodes the shorter isoform of the protein.
  • FIG. 4M shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA.
  • the alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to production of the protein corresponding to the RNA.
  • FIG. 4N shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA.
  • the alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the absence of the ligand leads to production of the protein corresponding to the RNA.
  • FIG. 40 shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA.
  • FIG. 4P shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Exclusion of the alternative exon in the presence of the ligand leads to an RNA which can be translated into a protein.
  • FIG. 4Q shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA.
  • FIG. 4R shows a nonlimiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript.
  • FIG. 4S shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript.
  • 4T shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript.
  • FIGs. 5A-5G depict various embodiments of the general model of the cassettes (e.g, in the context of a transgene) of FIG, 4A.
  • FIG. 5A depicts an embodiment of the ‘‘skipped exon model.’'’
  • FIG. 5B depicts an embodiment of the “retained intron model.”
  • FIG. 5C depicts an embodiment of the “alternative 5’ splice site model.”
  • FIG. 5D depicts an embodiment of the “alternative 3’ splice site model.”
  • FIG. 5E depicts an embodiment of the “mutually exclusive exon model.”
  • FIG. 5F depicts an exemplary alternatively spliced transcript.
  • FIG. 5G depicts an exemplary constitutively spliced transcript.
  • FIGs. 6A-6G depict various embodiments of the general model of the cassettes (e.g., in the context of a transgene) of FIG. 4B.
  • FIG. 6A depicts an embodiment of the “alternative last exon model.”
  • FIG. 6B depicts an embodiment of the “skipped exon model.”
  • FIG. 6C depicts an embodiment of the “retained intron model.”
  • FIG. 6D depicts an embodiment of the “alternative 5’ splice site model.”
  • FIG. 6E depicts an embodiment of the “alternative 3’ splice site model.”
  • FIG. 6F depicts an embodiment of the “mutually exclusive exon model.”
  • FIG. 6G depicts an embodiment of the “alternative last exon model.”
  • FIGs. 7A-7F depict various embodiments of the general model of the cassettes (e.g, in the context of a transgene) of FIG. 4C.
  • FIG. 7A depicts the “skipped exon model.”
  • FIG. 7B depicts the “retained intron model.”
  • FIG. 7C depicts “alternative 5’ splice site model.”
  • FIG. 7D depicts the “alternative 3’ splice site model.”
  • FIG. 7E depicts the “mutually exclusive exon model.”
  • FIG. 7F depicts the “alternative last exon model.”
  • FIGs. 8A-8B show embodiments of the general model of the cassettes (e.g., in the context of a transgene).
  • FIG. 8A-8B show embodiments of the general model of the cassettes (e.g., in the context of a transgene).
  • FIG. 8A-8B show embodiments of the general model of the cassettes (e.g., in the context of a
  • the cassette (e.g:, in the context of a transgene) comprises a constitutive exon at the left, an alternatively-spliced exon comprising an ATG (an example of a positive regulatory cA-element) in the middle, and a constitutive exon comprising a coding region of interest (shown with the natural ATG start codon removed to eliminate translation of that exon without further positive control by the alternatively-spliced exon).
  • Black lines indicate intronic sequences (e.g., the flanking introns of the alternatively-spliced exon).
  • Alternative reading frames within the exon comprising the coding sequence may in some embodiments be removed, as appropriate.
  • alternative splicing conditions which are specific to the nature of the chosen alternatively-spliced exon, the alternatively-spliced exon will be included, and productive translation of the coding sequence will result.
  • homeostatic conditions normal splicing conditions
  • only the consti tutive exon will be i ncluded, the presence of the ATG start codon in the alternatively- spliced exon will be eliminated, and the coding sequence will not be translated.
  • the upper dotted lines show the splicing pattern leading to a splicing-in of the alternatively-spliced exon (expression of the coding region).
  • FIG. SB shows an embodiment of the general model of the cassettes (e.g, in the context of a transgene) of FIG. 4C.
  • the cassette e.g., in the context of a transgene
  • the cassette comprises an alternatively -spliced exon (shown in gray) positioned between two separate constitutive exons each comprising a portion of the desired coding region.
  • the exon to the left comprises the 5’ end of the coding sequence and the exon to the right comprises the 3’ end of the coding region.
  • An in-frame stop codon is inserted into the alternatively-spliced exon at a location which is >50 nucleotides upstream of the next downstream splice site.
  • alternative splicing conditions which are specific to the nature of the chosen alternatively-spliced exon, the alternatively-spliced exon will be included, and NMD (nonsense-mediated mRNA decay) will result.
  • homeostatic conditions normal splicing conditions
  • the upper dotted lines show the splicing pattern leading to a splicing-in of the alternatively-spliced exon (no or reduced expression of the coding region due to active NMD).
  • the tower doted tines show the splicing pattern leading to a splicing-out of the alternative-spliced exon (expression of the coding region).
  • FIG. 9 shows a configuration of a gene therapy cargo whose translation can be regulated by alternative splicing. Inclusion of an alternative exon that ends in “ATG” can lead to translation of the downstream coding sequence. Exclusion will prevent appropriate protein translation of the downstream coding sequence.
  • FIG. 10 shows a construct design for the screening of alternative exon cassettes with regulatory activity.
  • the construct used the SMN1 exon 6 and intron 6/7 context.
  • Test alternative exon cassettes were inserted between portions of SAINI intron 6 and 7.
  • An MHCK7 was used.
  • the coding sequence was derived from the human MTM1 gene.
  • the 3’ UTR contained an SV40 polyadenylation and cleavage site.
  • AAV2 ITRs flanked the construct. Splice site scores of the flanking constitutive exons are listed.
  • FIG. 11 show's a strategy to prevent undesired translation of peptides from alternative reading frames of MTM1.
  • Amino acids generated in the MTM 1 reading frame are listed (e.g., GCT encodes Alanine); only the 5’ end of MTM1 sequence is shown. Substitutions that preserve MTM1 reading frame but terminate alternative reading frames are shown. Arrows denote point mutations made to generate stop codons that would terminate open reading frames in the +1 and +2 reading frames. Nucleic acid substitutions are denoted by lower-case letters.
  • FIG. 12 shows a strategy to preserve splice site strength following mutation of bases to introduce ATG to the ends of alternative exons by altering 5' splice site sequences. Because the addition of ATG to the end of each alternative exon may change the splice site strength, intronic bases to were altered to maintain splice site strength and preserve splicing activity. All upstream ATGs were also removed from alternative exons. Splice site strengths were scored by MaxEntScan and are shown. Splice sites are listed for the endogenous sequence (top), the endogenous sequence altered such that ATG is introduced (middle), and a “compensated” splice site sequence (botom). Nucleic acid substitutions are denoted by lower-case letters.
  • FIG. 13 show's a construct barcoding strategy.
  • a barcode strategy was used in which synonymous mutations were made and used to identify each candidate alternative exon uniquely.
  • FIGs. 14A-14C show percent spliced in (psi) values for each tested cassette exon in various tissues. Psi values were plotted in heart (H), tibialis anterior (TA), and liver (L). Data for tibialis anterior was obtained from animals injected intramuscularly, and data from the other tissues was obtained from animals injected intravenously.
  • FIG. 14A shows data obtained from the following tested cassette exons (from left to right): ARFGAP2, BINI, CAMK2B, and KIF13A.
  • FIG. 14B show's data obtained from the following tested cassette exons (from left to right): KSR1, LGMN, NRAP, and PDLIM3.
  • FIG. 14C shows data obtained from the following tested cassette exons (from left to right): PICALM, PKP2, and VPS39.
  • FIGs. 15A-15B show percent spliced in (psi) values for each tested exon in tibialis anterior at various times following injection. Psi values were plotted for each sample versus every' other sample. The number following the dash indicates the replicate number for that particular week.
  • FIG. ISA show's a first comparison of psi values obtained at different time points following injection.
  • FIG. 15B shows a second comparison of psi values obtained at different time points following injection.
  • FIGs. 16A-16B show the ratios of RNA binding protein (RBP) RNA expression in heart vs. skeletal muscle, or vice-versa.
  • RNA expression values for RNA binding proteins were obtained from publicly available databases. The ratio of expression in heart versus skeletal muscle was computed; the RBPs showing the strongest bias in either direction were plotted.
  • FIG. 16A show's the RBPs which were found to be enriched in muscle tissue, relative to heart tissue.
  • FIG. 16B shows the RBPs winch were found to be depleted in muscle tissue, relative to heart tissue.
  • FIG. 17 shows that the intronic sequence upstream of BINI exon 11 is enriched for CAC motifs.
  • FIG. IS shows percent spliced in (psi) values for BINI exon 11 in human, rhesus macaque, and dog.
  • Psi values for BINI exon 11 for these species were obtained from publicly available datasets and plotted.
  • the dog data includes data from animals modeling XLMTM1, including those also being treated with AAV-MTM1.
  • AAV low, mid, and high denotes AAV- MTM1 treatment in XLMTM1 dogs from Dupont el al. (2020).
  • FIG. 19 shows splice site variants which were considered in the high throughput screen to optimize the BINI exon 11 cassette.
  • the endogenous BINI 3’ splice site is listed (top), along with the endogenous BINI 5’ splice site (second row from top), the endogenous BINI 5’ splice site sequence altered such that ATG is introduced (third row from top), and the “compensated” version characterized in the first screen (bottom). Additional splice sites tested are listed below. Nucleic acid substitutions are denoted by lower-case letters.
  • FIG. 20 shows intronic variants which were considered in the high throughput screen to optimize the BINI exon 11 cassette. Sequence from the downstream intron of BINI exon 1 1 is shown (top). Putative MBNL binding sites (YGCY motifs) are bolded. Putative RBFOX binding sites (TGCATG) are underlined. Sequence that includes 4 possible alterations is shown (bottom). The alterations, denoted with lower-case letters, either generate additional MBNL binding sites (the first, second, and third alterations, from 5’ to 3’) or an additional RBFOX site (the fourth alteration). Consideration of 0, 1, 2, 3, or 4 alterations in all combinations yields 16 possible sequences to test.
  • FIG. 21 show's a strategy to use PCR amplicons to read the association between barcodes and variants (the codebook). Given short read Illumina sequencing (-75 nucleotides), a PCR strategy was used to associate the downstream barcode with upstream sequence variants.
  • FIG. 22 shows the number of barcodes encoding each variant. A histogram of the number of barcodes encoding each variant is shown for the plasmid library. On average, -8 barcodes encode each variant.
  • FIGs. 23A-23C show scatters of percent spliced in (psi) values for each variant in different tissues. Each point represents the mean psi for each variant across all barcodes representing that variant. Data from selected tissues is shown.
  • FIG. 23A shows scatter between 2 heart samples, which lies along the diagonal (indicating reproducibility).
  • FIG. 23B shows scatter between 2 gastrocnemius samples, which also lies along the diagonal (indicating reproducibility).
  • FIG. 23C shows scatter between heart and skeletal muscle samples, which lies above the diagonal. This is because psi for most variants is higher in skeletal muscle than in heart.
  • FIGs. 24A-24B show scatters of mean percent spliced in (psi) as computed across multiple animals.
  • FIG. 24A shows data obtained from tibialis anterior (y-axis) versus heart (x-axis) tissue.
  • FIG 24B shows data obtained from gastrocnemius (y-axis) versus heart (x-axis) tissue.
  • FIGs. 25A-25D show percent spliced in (psi) values as a function of splice site strength for selected samples. Psi values for each variant were grouped by 3 : or 5’ splice site strength; data is shown only for heart sample I and gastrocnemius sample 1. There is a trend such that strong splice sites tend to yield higher inclusion levels.
  • FIG. 25. A shows the 3' splice site strength relative to the psi in heart tissue for heart sample 1 .
  • FIG. 25B shows the 5’ splice site strength relative to the psi in heart tissue for heart sample 1.
  • FIG. 25C shows the 3’ splice site strength relative to the psi in gastrocnemius tissue for gastrocnemius sample 1.
  • FIG. 251) shows the 5’ splice site strength relative to the psi in gastrocnemius tissue for gastrocnemius sample I .
  • FIGs. 26A-26B show scatters of mean percent spliced in (psi) for each variant as computed across multiple animals when linked to a CAPN3 cargo. Each point represents the mean psi for each variant across multiple animals (u -4 for all tissues).
  • FIG. 26A shows data obtained from tibialis anterior (y-axis) versus heart (x-axis) tissue.
  • FIG. 26B shows data obtained from gastrocnemius (y-axis) versus heart (x-axis) tissue.
  • FIG. 27A shows data for heart tissue.
  • FIG. 27B shows data for gastrocnemius tissue.
  • FIG. 28 shows an exemplary' riboswitch-regulated alternative exon library design.
  • MBNL1 exon 5 is flanked by 39 different 3’ splice sites and 20 different 5’ splice sites in different construct variants.
  • the 5 ’splice site is incorporated into the communication stem of the downstream riboswitch.
  • the 5’ splice site is recognized by U1 snRNP and the exon is included to yield full length MBN.
  • the 5’ splice site is occluded and causes exon 5 skipping.
  • FIG. 29 shows an exemplary workflow for the massively parallel barcoded splicing assay.
  • the barcoded synthetic plasmid library/ was sequenced to obtain the codebook that links barcode sequences to specific splice site variants.
  • the plasmid library was transfected to analyze splicing patterns for each barcode in the presence and absence of drug.
  • the codebook was then used to decode barcodes, to characterize splicing patterns for individual variants.
  • FIG. 30 shows Psi data for barcodes and variants.
  • psi for uniquely identifiable barcodes in the presence and absence of drug is shown. Barcodes that appear in all six samples (3x drug-, 3x drug+) and in the codebook were plotted. Error bars are shown for three biological replicates.
  • psi for 780 variants with/without drug is shown. Psi for barcodes linked to the same variants were averaged, and error bars are shown for three biological replicates. The triangle highlights variants with Apsi >0.3, representing promising candidates with large dynamic splicing changes in response to drug treatment.
  • FIGs. 31A-31C show ? analyses of psi and delta psi for various 3’ and 5’ splice site variants.
  • FIG. 31A shows variants that were grouped according to 3’ splice site identity and sorted by mean psi in the absence of tetracycline.
  • FIG. 31B shows variants that were grouped according to 5’ splice site identity and sorted by mean psi in the absence of tetracycline.
  • FIG. 31C shows delta psi plotted in a heatmap format, in which row/columns denote specific 3’ and 5’ splice site combinations. Splice sites were sorted by mean psi in the absence of tetracycline.
  • FIG. 32 shows protein isoform regulation from a single variant.
  • the left-side panel shows gel electrophoresis analysis of RT-PCR products analyzed by fragment analyzer.
  • the right-side pane shows western blot analysis of MBNL protein using an anti-HA tag antibody.
  • FIG. 33 show's an exemplary/ cassette configuration for alternative splicing-regulated protein expression.
  • the alternative splicing cassette was placed between an ATG and downstream coding sequence for the protein of interest.
  • An HA tag was placed before the ATG for protein immunoblotting.
  • FIG. 34 shows exonic splicing switch variants. Nucleotides that base-pair within the communication stem of the riboswitch are underlined.
  • FIG. 35 shows skipping percentage of exonic splicing switch variants with/without drug.
  • RNA splicing assays were performed by RT-PCR and fragment analyzer.
  • FIG. 36 shows exclusion percentages of AltEx9 following different tetracycline concentrations. Variant AltEx9 was tested against different concentrations of tetracycline, and exon-skipping RNA isoform percentages were calculated.
  • FIGs. 37A-37B show RNA splicing and protein expression regulation of three variant constructs.
  • FIG. 37 A shows RT-PCR analysis of RNA splicing patters of three constructs that ⁇ vere fused to a nano-luciferase reporter in response to drug treatment.
  • FIG. 37B show's nanoluciferase enzymatic activity for three variants, along with exclusion and inclusion isoform controls. Nano-luciferase signal was normalized by co-transfected firefly luciferase (fLuc).
  • FIG. 38 show's alternative splicing regulated protein expression by reconstructing translation initiation. Exon inclusion disrupts translation initiation sites and exon skipping reconstructs strong Kozak sequences for translation of a downstream protein of interest.
  • FIG. 39 show's exemplary designs for alternative splicing-regulated RNA interference.
  • An exemplary' pri-miR 16_2 scaffold bearing the miRNA-targeting luciferase reporter is shown.
  • the dashed box denotes the sequence location in which the alternative splicing cassette should be placed.
  • FIG. 40 shows riboswitch-regulated RNAi.
  • Firefly Luciferase reporter signal was normalized by co-transfected renilla luciferase.
  • RNAi (+) was from the pri-miR 16_2 scaffold bearing fLuc miRNA;RNAi while (-) is from a non-functional control RNA.
  • the RNAi AltEx9 has alternative splicing cassette AltEx9 inserted in the pri-miR scaffold.
  • FIG. 41 shows a non-limiting example of a nucleic acid design to regulate 5. aureus Cas9 by tetracycline.
  • FIGs. 42A-42B show representative cellular screening results for the nucleic acid shown in FIG. 41.
  • FIG. 42A shows a scatter plot of PSI for each of 2760 variants analyzed in a high throughput screen in HEK293T cells. Each point represents the behavior of an individual variant at a particular dose of tetracycline (y-axis) relative to no tetracycline (x ⁇ axis). Circles, squares and triangles denote treatment with 25 uM, 50 pM and 100 pM tetracycline, respectively.
  • FIG. 42B show's a heat map of delta PSI (no tetracycline minus 100 pM tetracycline) as a function of aptamer stem length and splice site strength.
  • FIG. 43 shows a non-limiting example of a nucleic acid design to regulate erythropoietin (EPO) expression by a risdiplam-responsive sequence.
  • FIG. 44 shows representative cellular screening results for the nucleic acid design shown in FIG. 43.
  • the scatter plot shows percent intron removal for 30,455 variants analyzed in a high throughput screen in HEK293T cells. Each point represents the behavior of an individual variant at a particular dose of risdiplam (y-axis) relative to no risdiplam (x-axis). Circles, squares and triangles denote treatment with 250 nM, 500 nM, and 1000 nM risdiplam, respectively.
  • FIGs. 45A-45B show representative data from real-time PCR (RT-PCR) analyses of individual variants shown in FIG. 44.
  • FIG. 45A shows products made from cloning seven distinct variants and testing the expression of said said sequences with RT-PCR. Fragment analysis shows the abundance of intron retained product (top band) or intron spliced product (bottom band) in the presence (1 pM) or absence of risdiplam.
  • FIG, 45B shows quantitation of the data shown in FIG. 45A.
  • FIGs. 46A-46C show a non-limiting example of a strategy for using risdiplam- responsive motifs to regulate GABRG2 isoforms.
  • FIG. 46A shows an overview of the mechanism through which risdiplam-responsive sequences identified from the screen performed in FIGs. 44 and 45A-45B (variants 3 and 7) were incorporated into an alternatively spliced gene that allows for production of either the exon 9-containing (long) i soform of GABRG2 or the exon 9-skipped (short) isoform.
  • the gray box indicates GABRG2 exons 1 through 8.
  • the white box indicates exons 9 and 10, and the dotted box indicates the risdiplam-responsive sequence.
  • the black box indicates exon 10 alone.
  • FIG. 46B shows representative data of tw'O different risdiplam-responsive motifs that were tested in Neuro2A cells using RT- PCR. Primers that target the gray and black boxes was performed to evaluate splicing behavior in the presence (1 pM) or absence of risdiplam.
  • FIG. 46C shows quantitation of the data shown in FIG. 45B.
  • FIGs. 47A-47D show a non-limiting example of a strategy for using a risdiplam- responsive motif from POMT2 exon l ib to regulate CSNK1D isoforms.
  • FIG. 47A shows an overview of the mechanism through which risdiplam-responsive sequences were incorporated into an alternatively spliced gene that allows for production of either the exon 9-containing (long) isoform of CSNK1D or the exon 9-skipped (short) isoform.
  • the gray box indicates CSNK1D exons 1 through 8.
  • the white box indicates exons 9 and 10
  • the dotted box indicates the risdiplam-responsive sequence derived from POMT2.
  • the black box indicates exon 10 alone.
  • FIG. 47B shows representative data from testing a nucleic acid in HEK293T and Neuro2A cells using RT-PCR. Primers that target the gray and black boxes were evaluated for splicing behavior in the presence (1 uM) or absence of risdiplam.
  • FIG. 47C shows quantitation of the RT-PCR data in FIG. 47B.
  • FIG. 47D shows a non-limiting example of strategy wherein the construct tested in FIGs.
  • Isoform A indicates the exon 9-skipped isoform.
  • Isoform B indicates the exon 9- included isoform.
  • FIGs. 48A-48C show a non-limiting example of a strategy for repurposing exon 1 lb in POMT2 to regulate CasMini.
  • FIG. 48A shows a non-limiting example of a nucleic acid design for a risdiplam-responsive splicing cassette that regulates translation of the N-terminal portion of CasMini.
  • Exon 1 lb and flanking introns from POMT2 were modified to contain a start codon in frame with downstream CasMini. Inclusion of this exon leads to production of an N-terminal portion of CasMini fused to nanoluciferase.
  • FIG, 48B shows representative data from testing the nucleic acid shown in FIG.
  • FIG. 48A shows additional nonliming examples of variants that were cloned and assayed for nanoluciferase signal in the presence (l uM) and absence of risdiplam in Neuro2A cells.
  • CTRL denotes a control plasmid which encodes firefly luciferase but not nanoluciferase, to serve as nanoluciferase substrate control .
  • FIGs. 49A-49C show a non-limiting example of a strategy which leverages tetracycline aptamer-regulated splicing to control microRNA biogenesis via exon skipping.
  • FIG. 49 A shows a non-limiting example of a tetracycline-responsive exon cassette placed between two halves of a primary' microRNA sequence. Exon inclusion leads to suboptimal recognition of the microRNA precursor by Dicer and thus lower production of a mature microRNA. Exon skipping leads to proper recognition of the microRNA precursor by Dicer and thus higher production of the mature microRNA.
  • FIG. 49B shows representative data from assaying the nucleic acid shown in FIG. 49A in a Drosha knockout HEK293 cell line.
  • FIG. 49C show's representative Northern Blot data from assaying HEK293T transfected with the nucleic acid shown in FIG. 49A and testing in FIG.
  • FIGs. 50A-50C show a non-limiting example of a strategy which leverages branaplam- regulated splicing to control microRNA biogenesis via exon inclusion.
  • FIG. 50A shows a nonlimiting example of a cassette such that branaplam is capable of enhancing exon inclusion via recognition of certain sequences near the 5’ splice site of the alternatively spliced exon.
  • the primary microRNA sequence was split across the 2nd intron of a cassette exon event derived from SF3B3 such that inclusion of the cassette exon facilitates formation of the full microRNA base stem, which can enhance Drosha recognition and processing.
  • FIG. 50B shows representative Northern blot data from testing several branaplam-responsive cassettes.
  • YZ230 is a control that encodes the sequence expected with exon inclusion.
  • YZ231 is a control that encodes the sequence expected with exon skipping.
  • YZ232 is a variant in which the exon cassette is present and can respond to branaplani.
  • FIG. 50C shows representative data from luciferase assay analysis of knockdown by microRNAs encoded by branaplam-responsive cassettes. In each case, luciferase transcript is targeted by the microRNA.
  • 95 is a construct that constitutively generates a microRNA active against luciferase.
  • 259 is a construct that does not generate a microRNA active against luciferase.
  • 231 and 232 are the constructs as shown in (b), both with and without branaplam.
  • FIGs, 51A-51B show 7 a non-limiting example of a strategy for controlling leaky microRNA production due to basal recognition of an incomplete microRNA stem.
  • FIG. 51A shows non-limiting examples of microRNA scaffolds.
  • YZ95 is a potent primary microRNA scaffold that is effectively recognized by Drosha and can downregulate a GFP reporter transcript comprising a target site.
  • YZ293 was produced by mutating bases in the stem of YZ95 which are recognized by Drosha.
  • YZ301 was produced by re-constituting the complete microRNA stem.
  • FIG. 51B shows analyses of GFP silencing in HEK293 cells using YZ95, YZ293, and YZ301.
  • alternatively-spliced exons to control the expression of one or more genes of interest (e.g., genes that are useful therapeutically and/or diagnostically).
  • alternative splicing of an exon can be placed under the control of a ligand by introducing a ligand-binding sequence (e.g., a sequence encoding a ligand- binding aptamer) into an alternatively spliced exon and/or into at least one of the introns flanking the alternatively spliced exon.
  • a ligand-binding sequence e.g., a sequence encoding a ligand- binding aptamer
  • a ligand-responsive alternatively spliced exon is introduced into a naturally occurring gene (e.g., at one or both alleles of the gene in the genome of a host cell).
  • a synthetic gene construct is provided that includes a ligand-responsive alternatively spliced exon.
  • alternatively spliced exons can be used to regulate one or more aspects of gene expression (e.g., of mRNA translation and/or RNA function) by including one or more translation stop codons, interrupting a start codon, and/or interrupting a functional RNA sequence (e.g., a mRNA, a regulatory RNA, such as an interfering RNA, and/or a ribozyme).
  • a functional RNA sequence e.g., a mRNA, a regulatory RNA, such as an interfering RNA, and/or a ribozyme.
  • one or more aspects of the application can be used in the context of viral vectors (e.g., AAV viral vectors or lentivirus viral vectors) to effectively regulate the expression of a coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein).
  • viral vectors e.g., AAV viral vectors or lentivirus viral vectors
  • the alternatively-spliced exons regulate the expression of a coding region of interest in a condition-sensitive manner (e.g., expression in one type of cell but not another, expression in a diseased condition, or expression in the presence of certain intracellular conditions, such as the presence of a ligand).
  • the present disclosure relates to a new approach for regulating expression of a transgene (or a coding region thereof) from a recombinant viral vector that couples alternatively-spliced exons with the expression of a coding region of interest (e.g., a coding region of a transgene encoding a therapeutic protein).
  • a coding region of interest e.g., a coding region of a transgene encoding a therapeutic protein.
  • the present disclosure describes a variety of exemplary configurations as to how to combine or otherwise pair the expression of a coding region of interest (or multiple portions of coding regions) with an alternatively-spliced exon, but any suitable arrangement or configuration is contemplated so long as the expression of the coding region of interest (or portions thereof) is configured to come under regulatory control of the alternatively-spliced exon.
  • the present disclosure relates to that the use of inducibly-spliced exon cassettes in the context of viral vectors (e.g., AAV viral vectors or lentivirus viral vectors) to effectively regulate the expression of a transgene encoding a therapeutic cargo such as microRNAs (miRNA) and proteins.
  • viral vectors e.g., AAV viral vectors or lentivirus viral vectors
  • the transgene regulates the expression of an inducibly-spliced exon cassette in a condition-sensitive manner (e.g., the presence of a drug or ligand).
  • the inducibly-spliced cassette encodes an RNA comprising a ligand-responsive sequence which is alternatively spliced (e.g., to exclude or include an alternative exon) in response to ligand binding.
  • the inducibly-spliced cassette comprises a tnicroRNA (miRNA) sequence and a ligand-responsive aptamer controlling the splicing of the said cassette.
  • miRNA tnicroRNA
  • the inducibly-spliced exon cassette will be either spliced out or not spliced in a manner that can be dependent on one or more environmental conditions, e.g., the presence of an external factor (such as, for example, an administered agent such as a drug or ligand).
  • an external factor such as, for example, an administered agent such as a drug or ligand.
  • a recombinant nucleic acid e.g., recombinant viral genome
  • a recombinant nucleic acid comprises a transgene comprising at least two exons and the alternatively -spliced cassette comprising at least two introns flanking an alternative exon, and a ligand-responsive aptamer.
  • a recombinant nucleic acid e.g., a recombinant viral genome
  • the transgene comprising the inducibly-spliced cassette comprises other regulatory' sequences including, but not limited to, 3’ UTRs, 5’ UTRs, poly A sequences, promoters, enhances, etc.
  • the inducibly-spliced cassette comprises a sequence that is capable of regulating the expression of another gene such as a miRNA.
  • compositions and methods described herein can be useful to regulate expression of therapeutic transcripts (e.g., in the context of viral vector-based treatments for diseases or disorders).
  • the transgene can be spliced in an inducible manner to form a functional miRNA that modulates the expression of a mutated or variant protein or a misexpressed protein that is implicated in a disease or disorder.
  • the present application provides compositions and methods that are useful for delivering genes and gene products (such as RNAs and proteins) that retain or restore therapeutically effective levels of regulation of a protein or variant thereof implicated in a disease or disorder.
  • FIG. 1 A schematic representing the disclosed new approach for regulating expressi on of a transgene (or a coding region of a transgene, e.g., a transgene encoding a therapeutic protein) in a recombinant viral genome using alternatively-spliced exons is provided in FIG. 1 .
  • a viral genome may be configured to include a transgene that comprises a coding region of interest (e.g., encoding a therapeutic protein) and an alternatively-spliced exon (or a cassette comprising an alternatively-spliced exon) which regulates the expression of the coding region of the transgene.
  • FIG. 2 a number of exemplary embodiments of recombinant nucleic acid molecule constructs that comprise an alternatively-spliced exon and a coding region of interest (e.g., encoding a therapeutic protein) are shown in FIG. 2.
  • FIG. 2 a number of exemplary embodiments of recombinant nucleic acid molecule constructs that comprise an alternatively-spliced exon and a coding region of interest (e.g., encoding a therapeutic protein) are shown in FIG. 2.
  • FIG. 3 depicts, in general, typical AAV and lentivirus vector constructs comprising a coding region of interest whose expression is driven by a promoter, and which further include the insertion (at any suitable location) of a nucleotide sequence comprising an alternatively-spliced exon (or a cassette comprising an alternatively-spliced exon) to further regulate the expression of the coding region (e.g., by controlling translation or mRNA homeostasis, e.g., mRNA levels).
  • the nucleotide sequence comprising an alternatively-spliced exon may be in the form of a “cassette.” Examples of this are provided in FIGs. 2 and 4-7.
  • Such constructs represent embodiments that enable the disclosed new approach for regulating transgene expression (e.g., the expression of a therapeutic protein) from recombinant viral vectors in a condition-responsive manner, whereby the condition-responsive expression is controlled by alternatively-spliced exons which are included in the recombinant genome of the expression vector in such a manner that imparts a level of control on the expression of a coding region of interest (e.g., encoding a therapeutic protein).
  • a coding region of interest e.g., encoding a therapeutic protein
  • alternatively- spliced exons are spliced-in or spliced-out in a manner that can be dependent on one or more environmental conditions, e.g., intracellular conditions, such as a disease state (e.g., cancer) or even a type of cell (e.g., a liver cell versus a neuron, each of which have different intracellular conditions), or the presence of an external factor (such as, for example, an administered agent).
  • a disease state e.g., cancer
  • a type of cell e.g., a liver cell versus a neuron, each of which have different intracellular conditions
  • an external factor such as, for example, an administered agent
  • FIG. 1 a generalized schematic of a recombinant AAV is provided in (a) which comprises a transgene located between the left and right ITRs.
  • the transgene is indicated as comprising a coding region of interest (e.g., which encodes a therapeutic protein) and an alternatively-spliced exon that regulates the expression of the transgene (or the product encoded by the coding region of interest). While the drawing depicts a recombinant AAV genome, other recombinant viral vector genomes may be used, such as recombinant lentivirus genomes.
  • the recombinant viral genomes may be delivered or administered to subjects packaged in a viral vector, which refers to an infectious viral particle comprising a recombinant viral genome within a viral capsid, and in addition which may further include a lipid/protein envelope layer for enveloped viruses.
  • a viral vector refers to an infectious viral particle comprising a recombinant viral genome within a viral capsid, and in addition which may further include a lipid/protein envelope layer for enveloped viruses.
  • the coding region (or exon comprising the coding region) may be combined or arranged with the alternatively-spliced exon in the form of a transgene comprising any suitable arrangement of additional components, including one or more constitutive exons (i.e., those exons present in all spliced mRNA isoforms that result from the initial pre-mRNA transcript) and one or more introns.
  • an alternative exon cassette (comprising the alternatively-spliced exon) may be linked with or coupled to any coding region of interest to impart regulator ⁇ - control on that coding region of interest.
  • the alternatively-spliced exon may be any naturally-occurring alternatively-spliced exon or any recombinant alternatively-spliced exon.
  • a variety of configurations are contemplated, and no limitation is implied by FIG. 1 as to the possible configurations that may be employed.
  • the alternatively-spliced exon may be located between two exons that each separately comprise a portion of the coding region of interest.
  • the alternatively-spliced exon is located outside of the exon comprising the coding region of interest.
  • the alternatively-spliced exon may be located downstream of the exon encoding the coding region of interest.
  • the alternatively-spliced exon may be located upstream of the exon encoding the coding region of interest.
  • the general descriptions of the configuration of the cassettes comprising the alternatively-spliced exon and the coding region of interest (or the exon comprising the coding region of interest) embrace any suitable configuration, including those embodiments described in FIGs. 2 and 4-8.
  • step (b) show's the formation of a pre-mRNA (i.e., a primary transcription product which has not yet been processed by splicing) which includes the coding region of interest and the alternatively-spliced exon.
  • step (c) shows the splicing-out or splicing-in of the alternatively-spliced exon based on one or more conditions (e.g., cell type, disease state, or other intracellular environmental signal).
  • the splicing-out of the alternatively-spliced exon results in mRNA isoform 1 in (d)
  • the splicing-in of the alternatively-spliced exon results in mRNA isoform 2 in (e).
  • the absence of the alternatively-spliced exon removes a positive or negative regulatory civ-element.
  • the removal of a positive regulatory civ-element such as a translation start signal, will result in the downregulation or down-expression of the transgene, i.e., the reduced expression of the product encoded by the coding region of interest.
  • a negative regulatory civ-element such as mRNA degradation element, may lead to the upregulation or up-expression of the transgene, i.e., the increased expression of the product encoded by the coding region of interest.
  • the disclosure provides methods and compositions for regulating gene expression using viral vectors comprising a recombinant viral genome described herein.
  • Viral vectors can be used to deliver one or more transgenes (comprising a coding region of interest w'hich encodes a protein of interest, such as a therapeutic protein) for therapeutic, diagnostic, or other purposes.
  • expression of a transgene in a recombinant viral genome can be regulated using alternative splicing of an RNA expressed from the viral genome.
  • aspects of the disclosure relate to methods and compositions for regulating expression of a transgene (comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) using viral vectors comprising a recombinant viral genome described herein
  • a recombinant viral genome can be engineered to include one or more exons (e.g., one or more of a constitutive exon, an alternatively-spliced exon, and/or engineered versions thereof) that (a) can be either spliced-in or spliced-out of a pre-mRNA encoded by the genome, and (b) include one or more positive or negative regulatory cA-elements that affect protein expression (e.g., mRNA stability and/or translation of the coding region of interest).
  • Different intron and exon configurations can be used to provide for alternatively-spliced exon splicing, as discussed in greater detail herein, and shown in FIG. 2 and FIGs. 4-8 as examples.
  • Non-limiting examples include the following models of alternative splicing: skipped exons, retained introns, alternative 5’ splice sites, alternative 3’ splice sites, mutually exclusive exons, and alterative last exons as illustrated in FIGs. 2 and 4-8.
  • Each of these different intron/ exon configurations can be used to leverage alternatively-spliced exons which may, in some embodiments, include one or more positive or negative regulatory civ-elements that promote or limit expression of the coding region of interest.
  • Such sequences may promote translation and/or stability, or inhibit or terminate RNA translation and/or promote RNA degradation.
  • Such c/x-acting elements may in some embodiments be sequences that form secondary' structures (e.g., that slow translation), bind to one or more regulatory' RNAs (e.g., siRNAs), and/or be targeted by one or more intracellular enzymes (e.g, nucleases).
  • splice sites which may result in splicing under specific conditions. Such splice sites can be chosen for their ability to regulate splicing under conditions of interest. Alternatively or additionally, splice sites may be chosen based upon their relative strength, as calculated using a variety of published methods (see, e.g., Yeo & Burge (2004), Maximum entropy' modeling of short sequence motifs with applications to RNA splicing signals, J. Compul. Biol, 11(2-3):377-94). Such relative strength may in some embodiments reflect the efficiency of recognition by the core spliceosomal machinery (e.g., U1 and U2 snRNPs).
  • the core spliceosomal machinery e.g., U1 and U2 snRNPs
  • splice sites may be altered to enhance or diminish recognition by the core spliceosomal machinery. Such alterations may be performed, in some embodiments, to achieve the desired regulatory' behavior in conditions of interest.
  • splice sites may be used to make splicing responsive to certain endogenous or exogenous factors such that the alternative splicing of the DNA is specific to, such as, for example, certain tissues, certain diseases, certain intracellular conditions, etc.
  • splicing may be additionally or alternatively responsive to an exogenous agent (e.g., a small molecule, antibody, or other compound) which regulates splicing of the pre-rnRNA.
  • Alternatively-spliced exons as described herein may in some embodiments be contained within an alternatively-spliced exon cassette, as shown in the various embodiments of FIGs. 2 and 4-8.
  • a recombinant viral genome of the present disclosure comprises a transgene comprising at least one alternatively-spliced exon (or “regulatory'”) cassette.
  • a transgene comprising an alternatively-spliced exon cassette comprises at least one alternatively-spliced exon, intronic sequences flanking the alternatively- spliced exon, and an exon comprising a coding region of interest.
  • a transgene comprising a regulatory' cassette may in some embodiments also contain additional components, such as a constitutive exon, additional intronic sequences, or both.
  • a transgene comprising an alternatively-spliced exon cassette comprises any one or more of the following components: an alternatively-spliced exon, a flanking intron, an exon comprising a coding region of interest, and/or a constitutive exon.
  • alternative splicing regulation can be used to help control the expression of a coding region of interest encoded by a recombinant viral genome (e.g., an rAAV recombinant genome, a lentivirus recombinant genome).
  • a recombinant viral genome e.g., an rAAV recombinant genome, a lentivirus recombinant genome.
  • aspects of the invention relate to a method of regulating expression of a coding region of interest using a viral vector comprising a recombinant viral genome described herein.
  • the method comprises: (i) inserting into the recombinant viral genome at least one transgene comprising an alternatively- spliced exon cassette (e.g., such as any of those shown in FIGs.
  • the constitutive exon, alternatively-spliced exon, and flanking intron are each located 5' to the coding region of interest.
  • the method comprises: (i) inserting into the recombinant viral genome at least one transgene comprising an alternatively- spliced exon cassette; and (ii) introducing into the alternatively-spliced exon a heterologous, inframe stop codon at least 50 nucleotides upstream of the next 5' splice junction.
  • a transgene comprising an alternatively-spliced exon cassette comprises any one or more of the following components: an alternatively-spliced exon, a flanking intron, a coding region of interest, and/or a constitutive exon.
  • compositions and methods described herein can be useful to regulate expression of therapeutic transcripts in the context of viral vector-based treatments for diseases or disorders.
  • Abnormal cellular regulation e.g., abnormal regulation of intron splicing of one or more genes
  • Some aspects of the invention therefore concern a method of treating a disease or condition in a subject comprising administering a viral vector of the disclosure to a subject, wherein the viral vector comprises a recombinant viral genome described herein.
  • the present application provides compositions and methods that are useful for delivering genes that retain or restore therapeutically effective levels of regulation (e.g., therapeutically effective regulation of intron splicing).
  • a viral vector (e.g., an r.AAV vector; a lentivirus vector, etc.) comprises a recombinant viral genome that includes a nucleic acid that encodes an RNA (e.g., an mRNA) comprising one or more introns.
  • RNA e.g., an mRNA
  • splicing of at least one intron is regulated by one or more intracellular factor(s). Regulation of intron splicing can control the expression level of the RNA and/or of the type of RNA (e.g., of an RNA splice alternative) inside a cell.
  • polynucleotide refers to any nucleic acid comprising naturally- occurring sequences, engineered sequences, or a combination thereof.
  • the term “polynucleotide” may be used interchangeably with the term “nucleic acid”.
  • a polynucleotide may be DNA, In some embodiments, a polynucleotide may be RNA. Accordingly, in some embodiments, the term “polynucleotide” may be used to refer to both DNA and an RNA encoded by or corresponding to said DNA (e.g., an RNA that is alternatively spliced in the presence of a ligand).
  • a polynucleotide e.g., a guide RNA is a chemically modified nucleic acid.
  • polynucleotides of the present disclosure comprise a sequence encoding ligand-responsive sequence.
  • the polynucleotide is capable of being expressed in a cell and alternatively spliced in the presence of the ligand.
  • a ligand induces alternative splicing to produce a first RNA.
  • a ligand induces splicing to produce a second RNA.
  • a polynucleotide comprises all of the sequence information to encode the first and the second RNA, such that one of the RNAs will be more highly expressed in the presence of the ligand and the other RNA will more highly expressed in the absence of the ligand.
  • the presence of the ligand results in increased expression of the first RNA.
  • the increase in expression of the first RNA in the presence of the ligand is on the order of 2- to 500-fold relative to the expression of the first RNA and/or second RNA in the absence of the ligand.
  • the increase is approximately 2- f old, 3-fold, 4-fold, 5-fold, 6-fold, 7-fbld, 8-fold, 9-fold, 10-fold, 1-fold to 3-fold, 1-fold to 4- fold, 1-fold to 5-fold, 1-fold to 6-fold, 1-fold to 7-fold, 1-fold to 8-fold, 1-fold to 9-fold, 1-fold to 10-fold, 10-fold to 20-fold, 20-fold to 30-fold, 30-fold to 40-fold, 40-fold to 50-fold, 50-fold to 60-fold, 60-fold to 70-fold, 70-fold to 80-fold, 80-fold to 90-fold, 90-fold to 100-fold, 100-fold to 200-fold, 200-fold to 300-fold, 300-fold to 400-fold, 400-fold to 500-fold, 500-fold to 600- fold, 600-fold to 700-fold, 700-fold to 800-fold, 800-fold to 900-fold, or 900-fold to 1000-fold.
  • the presence of the ligand results in increased expression of the second RNA.
  • the increase in expression of the second RNA in the presence of the ligand is on the order of 2-fold to 500-fold relative to the expression of the first RNA and/or the second RNA in the absence of the ligand.
  • the increase in the second RNA is approximately 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10- fold, 1-fold to 3-fold, 1-fold to 4-fold, 1-fold to 5-fold, 1-fold to 6-fold, 1 -fold to 7-fold, 1-fold to 8-fold, 1-fold to 9-fold, 1-fold to 10-fold, 10-fold to 20-fold, 20-fold to 30-fold, 30-fold to 40- fold, 40-fold to 50-fold, 50-fold to 60-fold, 60-fold to 70-fold, 70-fold to 80-fold, 80-fold to 90- fold, 90-fold to 100-fold, 100-fold to 200-fold, 200-fold to 300-fold, 300-fold to 400-fold, 400- fold to 500-fold, 500-fold to 600-fold, 600-fold to 700-fold, 700-fold to 800-fold, 800-fold to 900-fold, or 900-fold to 1000-fold.
  • polynucleotides nucleotides may comprise one or more exons.
  • the polynucleotide may comprise one or more introns.
  • the polynucleotide may comprise the full sequence of a gene, such as one comprising a plurality of exons.
  • the first RNA and the second RNA differ by at least one exon.
  • the first RNA comprises an exon that is not found in the second RNA.
  • binding of a ligand to the ligand- responsive sequence may promote inclusion of one or more alternative exons in the first. RNA.
  • binding of a ligand to the ligand-responsive sequence may promote exclusion of one or more alternative exons in the second RNA.
  • each of the one or more alternative exons is flanked by an intron.
  • exons found polynucleotides correspond to an RNA of interest.
  • the first RNA encodes an RNA of interest (e.g., one that can lead to synthesis of a corresponding protein) and the second RNA does not.
  • the second RNA encodes an RNA of interest (e.g., a microRNA that can bind a target transcript of interest) and the first RNA does not.
  • polynucleotides comprise one or more splice sites.
  • a 3’ splice site is at least 2 nucleotides long.
  • a 3’ splice site is 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotides in long.
  • a 5’ splice site is 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20- 30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotides in long.
  • a 5’ splice site is at least 7 nucleotides long. In some embodiments, a 5’ splice site is at least 9 nucleotides long.
  • the polynucleotide may comprise one or more 5’ splice sites and 3’ splice sites which are used differentially used in the splicing of the RNA encoded by the polynucleotide depending on the presence or absence of the ligand.
  • a polynucleotide comprises at least one alternative exon, at least two introns flanking an alternative exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the at least one alternative exon, the at least two introns flanking the at least one alternative exon, and the ligand-responsive aptamer.
  • Non-limiting examples of such polynucleotides are disclosed in FIGs. 4E-4H. However, such disclosures should not be considered limiting as, in other embodiments, it may be desirable to use a ligand to retain an alternatively spliced exon in the spliced RNA. Non-limiting examples of such polynucleotides are disclosed in FIGs. 4L, 4N, 4P, 46A, 47 A, and 48A.
  • polynucleotides of the present disclosure are transgenes.
  • polynucleotides e.g., transgenes
  • polynucleotides of the present disclosure are provided in a vector (e.g., a plasmid, phage, transposon, cosmid, chromosome, or artificial chromosome).
  • vectors are single-stranded or double-stranded.
  • vectors are circular (e.g., circular plasmids, nanoplasmids, and minicircle plasmids) or linear.
  • vectors are self-complementary.
  • polynucleotides of the present disclosure are provided in recombinant viral genome.
  • the polynucleotide comprises a sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2183-2255, or 2259-2260.
  • the polynucleotide comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091 , 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2183-2255, or 2259- 2260.
  • the polynucleotide comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to an exon set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the polynucleotide comprises an exon comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the polynucleotide comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
  • the polynucleotide comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
  • the polynucleotide comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the polynucleotide comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the polynucleotide comprises a 3' splice site comprising having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the polynucleotide comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the polynucleotide comprises a 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34.
  • the polynucleotide comprises a 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in any one of Tables 7, 25, 26, or 34.
  • the polynucleotide comprises a ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 21 12, 2138, 2183, 2186, 2206-2211, 2213- 2220, or 2236-2260.
  • the polynucleotide comprises at least one ligand- responsive sequence comprising a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-2220, or 2236- 2260.
  • the polynucleotide comprises a ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
  • the polynucleotide comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187- 2189.
  • a polynucleotide comprises an intron, exon (e.g., alternative exon), and/or a splice site corresponding to a gene selected from the group consisting of: MBNL1; MBNL2; MBNL3; hnRNP Al; hnRNP A2B1; hnRNP ( : hnRNP D; hnRNP DL; hnRNP F; hnRNP H; hnRNP K, hnRNP L; hnRNP M; hnRNP R; hnRNP U; FUS; TDP43;
  • TRIM32 FKRP; FXN; PO.MT1; FKTN, P0MT2; POMGnTl ; DAG1 ; AN05; PLEC1;
  • transgene refers to any recombinant gene or a segment thereof that includes a non-naturally occurring sequence.
  • the non-naturally occurring sequence may in some embodiments be from a different organism, but it need not be.
  • a transgene is a recombinant gene, or segment thereof, from one organism or infectious agent (e.g., a virus) that is introduced into the genome of another organism or infectious agent.
  • infectious agent e.g., a virus
  • the transgene may contain segments of DNA taken from the same organism, but the segments are arranged in a non-natural configuration.
  • the non-naturally occurring sequence is an engineered nonnatural ly occurring sequence.
  • a transgene may comprise any combination of naturally-occurring and engineered DNA sequences.
  • a transgene may be introduced into the genome of another organism or infectious agent using recombinant DNA techniques.
  • a transgene may include or may be modified to include one or any combinati on of regulatory' sequences, including, but not limited to, transcription regulatory' sequences (e.g., promoter, enhancer, silencer, transcription factor binding sequence, 5’ UTR, or 3’ UTR), post-transcriptional regulatory sequences (e.g., acceptor/ donor splicing sites and splicing regulatory sequences), ligand-responsive sequences (e.g., aptamers), and/or translation regulatory' sequences (e.g., translation initiation signals, translation termination signals, mRNA degradation or decay signals, polyadenylation signals).
  • transcription regulatory' sequences e.g., promoter, enhancer, silencer, transcription factor binding sequence, 5’ UTR, or 3’ UTR
  • post-transcriptional regulatory sequences e.g., acceptor/ donor splicing sites and splicing regulatory sequences
  • a regulatory' sequence such as a ligand-responsive aptamer or a ligand- responsive exon, is located in an alternatively-spliced expression cassette between two exon regions of the transgene thereby separating a single exon into two non-continuous stretches of nucleotides.
  • the transgene encodes an RNA product that plays a regulatory' role effecting gene expression in the cell such as a miRNA.
  • the transgene comprises all components (e.g., exons, introns, regulatory' sequences, alternative exons, ligand-responsive aptamers, etc.) which are located between the .AAV inverted terminal repeat sequences (see, e.g., FIG. 3 A).
  • AAV recombinant adeno associated virus
  • the transgene encoded a sequence encoding a ligand-responsive sequence (e.g., a ligand-responsive aptamer).
  • the transgene comprises a sequence encoding an RN A of interest.
  • a transgene comprises two or more discontinuous sequences encoding distinct portions of an RNA of interest.
  • a transgene may be modified to comprise an alternatively-spliced exon, defined below, such that the regulation of the expression of the transgene, the product encoded by the transgene, or the target of a miRN A encoded by the transgene comes under control of the alternatively-spliced exon.
  • the alternative splicing of an exon of the transgene is dependent upon the presence of a ligand to which a ligand-responsive aptamer sequence within the transgene binds to.
  • the alternative splicing of an exon of the transgene is dependent upon the presence of a iigand to which a ligand-responsive exon within the transgene binds to.
  • the alternatively-spliced exon may be configured in a “cassette,” defined below.
  • the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the transgene comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260
  • the transgene comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to a nucleic acid sequence of an exon as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene comprises an exon comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene comprises at least two exons having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to the nucleic acid sequences of two exons as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene comprises at least two exons comprising a nucleic acid sequence of two exons as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the transgene comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
  • the transgene comprises at least two exons comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
  • the transgene comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141 .
  • the transgene comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121 , 2127, 2129, 2130, or 2141.
  • the transgene comprises at least two introns having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to the nucleic acid sequences of trvo introns as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the transgene comprises at least two introns comprising the nucleic acid sequences of two introns as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101 , 2104, 2107, 2113, 2115, 21 17, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the transgene comprises at least one 3' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the transgene comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the transgene comprises at least one 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34.
  • the transgene comprises at least one 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34.
  • the transgene comprises at least one ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-
  • the transgene comprises at least one ligand- responsive sequence a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211 , 2213-2220, or 2236-2260.
  • the transgene comprises at least one ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
  • the transgene comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
  • a “regulatory sequence” or, equivalently, a “regulatory element,” may refer to a nucleotide sequence that regulates, directly or indirectly, any aspect of the expression of a gene or transgene, including regulatory sequences that effect transcription of a gene or transgene into one or more mRNAs, the processing of mRNA (e.g., the splicing of a pre-mRNA comprising exons and introns to produce one or more mRNA isoforms), and/or the translation of a coding region in a mRNA to form a polypeptide product.
  • Non-limiting examples of positive or negative regulatory sequences can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise controls the amount, stability, and/or degradation of an mRNA encoding a coding region of interest (or portions thereof); and/or (2) a nucleotide sequence element that regulates, modulates, or otherwise controls the translation of a coding region of interest (or portions thereof) encoded by an mRNA.
  • a ligand-responsive sequence may function as a cA-element which is capable of binding to an exogenously administered ligand.
  • a ligand- responsive sequence functions as a cA-element by regulating alternative splicing of the nucleic acid it is provided in (such as an inducibly-spliced cassette of a transgene).
  • a ligand-responsive sequence functions as a positive regulator (e.g., increasing expression or the function transgene).
  • a ligand-responsive sequence functions as a negative regulator (e.g., reducing expression or the function transgene).
  • a ligand-responsive aptamer may function as a cA-element which is capable of binding to an exogenously administered ligand.
  • the ligand- responsive aptamer functions as a m-element by regulating alternative splicing of the nucleic acid it is provided in (such as an inducibly-spliced cassette of a transgene).
  • a ligand-responsive aptamer functions as a positive regulator (e.g, increasing expression or the function transgene).
  • a ligand-responsive aptamer functions as a negative regulator (e.g, reducing expression or the function transgene).
  • polynucleotides of the present disclosure are operably linked to at least one other regulator ⁇ ' sequence in addition to at least one operably linked ligand-responsive sequence described herein.
  • a polynucleotide and regulatory sequences are said to be “operably linked” (which may be used interchangeably with “operatively linked”) when they are covalently linked in such a way as to place the expression (e.g., transcription and/or translation) of the nucleic acid sequence under the influence or control of the regulator ⁇ / sequences.
  • a promoter region would be operably linked to a nucleic acid sequence if the promoter region w'ere capable of effecting transcription of that DNA sequence such that the corresponding RNA (e.g., a pre-mRNA, a mRNA, a miRNA, etc.) might be present at increased levels in a cell and/or translated into the desired protein or polypeptide.
  • RNA e.g., a pre-mRNA, a mRNA, a miRNA, etc.
  • two or more coding regions are operably linked when they are linked in such a way that their transcription from a common promoter result in the expression of two or more proteins having been translated in frame.
  • Non-limiting examples of other regulatory sequences which may be located in polynucleotides comprising ligand-responsive sequences (e.g., a transgene comprising a cassette wherein alternative splicing of RNA encoded by the cassette is regulated by an operably linked ligand-responsive sequence) include transcriptional regulatory sequences (e.g., promoters, enhancers, silencers, transcription factor binding sequences, 5’ UTRs, or 3’ UTRs), post-transcriptional regulatory sequences (e.g., accept or/donor splicing sites and splicing regulatory sequences), and/or translation regulatory sequences (e.g., translation initiation signals, translation termination signals, mRNA degradation or decay signals, polyadenylation signals).
  • transcriptional regulatory sequences e.g., promoters, enhancers, silencers, transcription factor binding sequences, 5’ UTRs, or 3’ UTRs
  • post-transcriptional regulatory sequences e.g., accept or/donor
  • regulatory' sequences include, without limitation, promoter sequences, ribosome binding sites, ribozymes, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5’ and 3’ untranslated regions (UTRs), transcriptional start sites, transcription terminator sequences, polyadenylation sequences, introns, and premature stop codons.
  • a pre-mature stop codon may be found in an RNA such that it is in-frame with a sequence encoding an RNA of interest (e.g., located within an exon) which results in production of a truncated protein corresponding to the RNA.
  • the pre-mature stop codon can be UAA, UAG, or UGA.
  • the promoter driving expression of polynucleotides of the present disclosure can be, but is not limited to, a constitutive promoter, an inducible promoter, a tissue-specific promoter, or a synthetic promoter.
  • a constitutive promoter maintains constant expression of RNAs regardless of the conditions or physiological state of a host cell.
  • a constitutive promoter can be, but is not limited to, a Herpes Simplex virus (HS V) promoter, a thymidine kinase (TK) promoter, a Rous Sarcoma Virus (RSV) promoter, a Simian Virus 40 (SV40) promoter, a Mouse Mammary Tumor Virus (MMTV) promoter, an Adenovirus El A promoter, a cytomegalovirus (CMV) promoter (see, e.g,, Boshart et al..
  • PGK phosphoglycerol kinase
  • CAG CAG promoter
  • EFl a human elongation factor-1 alpha
  • inducible promoters allow regulation of gene expression and can be regulated by exogenously supplied compounds, environmental factors such as temperature, or the presence of a specific physiological state.
  • an inducible promoter can be, but is not limited to, an IPTG-inducible promoter, a cytochrome P450 gene promoter, a heat shock protein gene promoter, a metallothionein gene promoter, a hormone-inducible gene promoter, an estrogen gene promoter, or a tetVP16 promoter, the zinc-inducible sheep metallothionine (MT) promoter, the dexamethasone (Dex)-inducible mouse mammary tumor virus (MMTV) promoter, the T7 polymerase promoter system (WO 98/10088), the ecdysone insect promoter (No et al., Proc.
  • inducible promoters which may be useful in this context are those which are regulated by a specific physiological state.
  • tissue-specific regulatory sequences bind tissue-specific transcription factors that induce transcription in a tissue specific manner.
  • tissue-specific promoters include, but are not limited to, retinoschisin proximal promoter, interphotoreceptor retinoid-binding protein enhancer (RS/IRBPa), rhodopsin kinase (RK), liverspecific thyroxin binding globulin (TBG) promoter, an trypsin promoter, a glucagon promoter, a somatostatin promoter, a pancreatic polypeptide (PPY) promoter, a synapsin-1 (Syn) promoter, a creatine kinase (MCK) promoter, a mammalian desmin (DES) promoter, a a-myosin heavy chain (a-MHC) promoter, or a cardiac Troponin T (cTnT) promoter.
  • RS/IRBPa interphotoreceptor retinoid-binding
  • Beta-actin promoter hepatitis B virus core promoter. Sandig et al., Gene Ther., 3: 1002-9 (1996); alpha-fetoprotein (AFP) promoter, Arbuthnot et al.. Hum. Gene Ther., 7: 1503-14 (1996)), bone osteocalcin promoter (Stein et al., Mol. Biol. Rep., 24 : 185-96 (1997)); bone sialoprotein promoter (Chen et al., J. Bone Miner. Res., 11 :654-64 (1996)), CD2 promoter (Hansal et ah, J.
  • Immunol., 161 : 1063-8 (1998); immunoglobulin heavy chain promoter ; T cell receptor a-chain promoter, neuronal such as neuron-specific enolase (NSE) promoter (Andersen et al.. Cell. Mol. Neurobiol., 13:503-15 (1993)), neurofilament light-chain gene promoter (Piccioli et al., Proc. Natl. Acad. Sci. USA, 88:5611-5 (1991)), and the neuron-specific vgf gene promoter (Piccioli et al., Neuron, 15:373-84 ( 1995)), among others which will be apparent to the skilled artisan.
  • NSE neuron-specific enolase
  • polynucleotides of the present disclosure are operably linked to a native promoter of a gene which endogenous to a cell (e.g., a cell comprising a polynucleotide described herein).
  • the native promoter may be preferred when it is desired that expression of the polynucleotide should mimic the native expression of a gene of interest.
  • the native promoter may be used when expression of the polynucleotide must be regulated temporally, developmentally, in a tissue-specific manner, or in response to specific transcriptional stimuli.
  • other native regulatory/ sequences such as enhancer elements, poly adeny lation sites, and/or Kozak consensus sequences may also be used to mimic the native expression.
  • the regulatory sequence driving expression of a polynucleotide is an RNA pol II promoter.
  • the regulatory sequence is an RNA pol III promoter, such as U6 or Hl.
  • the regulatory sequence is an RNA pol II promoter.
  • the regulatory sequence is a CMV enhancer (CMVe).
  • the regulatory' sequence is a chicken P-actin (CBA) promoter.
  • the regulatory sequence is a CMVe and a CBA promoter.
  • the regulatory sequence is a CAG promoter.
  • regulatory' sequence which maybe operably linked to a sequence encoding an polynucleotide described herein include a BDNF promoter, an NGF promoter, an EGF promoter, a growth factor promoter, an axon-specific promoter, a dendrite-specific promoter, a brain-specific promoter, a hippocampal-specific promoter, a kidney-specific promoter, an elafin promoter, a cytokine promoter, an interferon promoter, an al antitrypsin promoter, a brain cell-specific promoter, a neural cell-specific promoter, a central nervous system cell-specific promoter, a peripheral nervous system cellspecific promoter, an interleukin promoter, a serpin promoter, a hybrid CMV promoter, a hybrid P-actin promoter, an EFl promoter, a Ula promoter, a Ulb promoter, a Tet-inducible promoter, a VP 16 Lex A promoter, or
  • a polynucleotide comprises a polyadenylation sequence following the sequence encoding the polynucleotide and before any other 3’ regulatory- sequence (e.g., a 3’ AAV ITR).
  • a poly(A) signal sequence is inserted following the sequence encoding the polynucleotide and before any other 3’ sequence (e.g., a 3’ AAV ITR), which signals for the polyadenylation of transcribed mRNA molecules.
  • poly(A) signal sequences include, but are not limited to, bovine growth hormone (bGH) poly(A) signal sequence, SV-40 poly(A) signal sequence, and synthetic poly(A) signal sequences, which are known to cause polyadenylation of eukaryotic transgenes and efficient termination of translation ( Azzoni A R et al., J Gene Med. 2007; 9(5):392-402).
  • bGH bovine growth hormone
  • SV-40 poly(A) signal sequence SV-40 poly(A) signal sequence
  • synthetic poly(A) signal sequences which are known to cause polyadenylation of eukaryotic transgenes and efficient termination of translation ( Azzoni A R et al., J Gene Med. 2007; 9(5):392-402).
  • a regulatory sequence that enhances expression of the polynucleotide may further be inserted following the sequence encoding the RNA of interest and before the 3’ AAV ITR and poly(A) signal sequences.
  • exemplary regulatory sequence includes, but is not limited to, a woodchuck hepatitis virus (WHV) post-transcriptional regulatory' element (WPRE) (Higashimoto T et al., Gene Ther. 2007, 14(17): 1298-304), (iv) Alternatively-spliced exon
  • exon refers to certain nucleotide sequences comprising exon sequences in addition to exon regions which are either retained (e.g., spliced-in), excluded (e.g., spliced-out), or spliced together (such as forming one continuous exon from an exon that was previously split into two non-continuous regions) during post-transcriptional splicing of a pre-mRNA or pri-miRNA.
  • exon is spliced-in or spliced-out may depend on a number of different factors, including, but not limited to one or more cellular conditions, such as the presence or absence of a disease state (e.g., cancer), type of cell (e.g., liver cell versus skeletal cell), other intracellular conditions, or an external engineered factor (e.g., the administration of an agent such as a ligand).
  • a disease state e.g., cancer
  • type of cell e.g., liver cell versus skeletal cell
  • an external engineered factor e.g., the administration of an agent such as a ligand.
  • exon may be used interchangeably with the term “alternatively-spliced exon” or “alternative exon.”
  • Differential splicing events can result in different spliced transcripts (e.g, mRNA isoforms) that either retain or exclude the alternative exon.
  • exons may comprise one or more positive or negative regulatory c/.s-elements that exert a positive or negative regulatory control on the expression of a coding region of interest (or portions thereof).
  • exons may comprise one or more positive or negative regulatory cxs’-elements that exert positive or negative regulatory control on the expression of an RNA, such as one encoding a protein (e.g., an mRNA encoding a therapeutic protein) or a miRNA.
  • Exons may be found in nature in a naturally-occurring gene, or may be modified by changing or altering the sequence thereof, including adding or changing the splice site, and/or adding or changing a positive or negative regulatory' cis-element (e.g., a ligand-responsive sequence). Such altered exons may be referred to as “recombinant” or “synthetic” exons. “Recombinant” or “synthetic” may in some embodiments include naturally occurring exons that have been placed into a heterologous gene (e.g, an unmodified exon placed into a non-natural context).
  • the c/.v-elements mediate localization to a specific cellular compartment, such as, for example, an organelle, the cytoskeleton, plasma membrane, the endoplasmic reticulum, the mitochondria, the nucleus, etc.
  • the polynucleotide (e.g., a transgene or a cassette) comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
  • the polynucleotide (e.g., a transgene or a cassette) comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 21 14, 2137, 2236, or 2247- 2256.
  • cassette' refers to any set of introns and/or exons (including an alternatively-spliced exon) capable of exhibiting a splicing pattern to produce different spliced transcript (e.g., mRNA isoforms).
  • the cassette comprises an alternatively-spliced exon.
  • sequence comprising the intronic sequences (or portions thereof) flanking the alternatively-spliced exon may be referred to as an “alternative splicing cassette” or equivalently, “alternatively-spliced exon cassette” or “alternative exon cassette.”
  • an alternative-spliced exon When situated in an alternatively-spliced exon cassette, an alternative-spliced exon may be alternatively referred to as a “cassette exon.”
  • a “cassette,” and in particular, an “alternatively- spliced exon cassette,” may exclude a coding region of interest, but also may be configured to be operatively linked to any coding region of interest such that the alternatively-spliced exon cassette regulates the expression of the coding region of interest.
  • the term “cassette” refers to a set of introns, alternative exon(s) and ligand-responsive aptamer capable of exhibiting a splicing pattern to produce differentially spliced transcript (e.g., miRNA or mRNA isoforms).
  • the term “cassette” refers to a set of introns, alternative exon(s) and ligand-responsive sequence that is not an aptamer (e.g., a ligand-responsive exon) capable of exhibiting a splicing pattern to produce differentially spliced transcript (e.g., miRNA or mRNA isoforms).
  • the terms “cassette,” “expression cassette,” “inducibly-spliced cassete,” “inducibly-spliced exon cassette” or “alternatively-spliced cassette” may be used equivalently or interchangeably.
  • the inducibly-spliced cassettes of the present disclosure can be considered to be “ligand-responsive” as the presence of the ligand-responsive sequence, such as a ligand-responsive exon or a ligand-responsive aptamer, in the cassette induces the splicing of the transgene comprising the cassette.
  • an alternative exon When situated in an inducibly- spliced cassette, an alternative exon may be alternatively referred to as a “cassette exon.”
  • a “cassette,” and in particular, an “inducibly-spliced cassette,” may exclude a coding region of interest, but also may be configured to be operatively linked to any coding region of interest such that the inducibly-spliced exon cassette regulates the expression of the coding region of interest.
  • Such an example would be an indicubly-spliced exon cassette comprising in its non-spliced form a crt-regulatory element that either negatively or positively regulates the expression of a coding sequence to which it is operatively linked.
  • the presence of a ligand which binds to the ligand-responsive sequence (e.g., an aptamer) of the cassette would result in a splicing reaction which would alter the functionality of the cassette acting as a czk-regulatory element thereby inducibly changing the expression patterns of the coding region to which the cassette is operatively linked.
  • the cassette comprising the introns, alternative exon, and the ligand-responsive sequence may act as a riboswitch by regulating the splicing patterns of the transgene based on the presence or absence of the ligand.
  • a non-functional start codon e.g., a start, codon provided in the two non-continuous regions of an exon or provided in two separate exons
  • a functional start codon is produced which promotes protein translation of the downstream sequence.
  • the intronic sequences that split the exon are positioned near an alternative exon and a ligand-responsive aptamer which regulates splicing of the inducibly-spliced cassette.
  • the cassette comprises a premature stop codon which regulates the translation of the transgene and is spliced out only in the presence of a ligand.
  • the inducibly-spliced cassette comprises a miRNA gene which is n on-functional in the absence of the ligand and functional only upon splicing of the cassette in the presence of the ligand.
  • the cassette is inserted without making any changes to the sequence flanking the insertion site (e.g., at a genomic site in a host cell).
  • one or more nucleotide sequence changes are made in one or both flanking regions (e.g., at the positions immediately flanking the site of insertion).
  • the one or more nucleotide changes render either or both flanking sequences more compatible with splicing.
  • the one or more nucleotide changes result in either or both flanking sequences becoming effective 3’ and/or 5’ splice sites.
  • the one or more nucleotide changes include introducing one or more sequences that support an effective dynamic range between alternative splicing events of a ligand-induced alternatively spliced exon described in this application. In some embodiments, the one or more nucleotide changes include introducing one or more flanking sequence described in this application.
  • the cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the cassette comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 21 16, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the cassette comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an exon as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the cassette comprises at least two exons comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the cassette comprises at least two exons having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to the nucleic acid sequences of two exons as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the cassette comprises at least two exons comprising the nucleic acid sequences of two exons as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
  • the cassette comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137, 2236, or 2247-2256.
  • the cassette comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 21 14, or 2137, 2236, or 2247-2256.
  • the cassette comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the cassette comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 21 15, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the cassette comprises at least two introns having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to the nucleic acid sequences of tw ? o introns as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
  • the cassette comprises at least two introns comprising a nucleic acid sequence of two introns as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 21 13, 2115, 2117, 21 18, 2121, 2127, 2129, 2130, or 2141.
  • the cassette comprises at least one 3' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the cassette comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the cassette comprises at least one 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34.
  • the cassette comprises at least one 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in any one of Tables 7, 25, 26, or 34.
  • the cassette comprises at least one ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NO: 2086, 2095, or 2112.
  • the cassette comprises at least one ligand-responsive sequence comprising a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NO: 2086, 2095, or 2112.
  • the cassette comprises at least one ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213- 2220, or 2236-2260.
  • the cassette comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211 , 2213-2220, or 2236-2260.
  • polynucleotides of the present disclosure comprise a ligand- responsive sequence.
  • a “ligand-responsive sequence” refers to a polynucleotide (e.g., an RNA sequence found in a pre-mRNA) having a sequence capable of binding a ligand.
  • a ligand-responsive sequence binds a ligand to regulate alternative splicing of an RNA comprising said ligand-responsive sequence.
  • binding of a ligand to a ligand-responsive sequence induces a specific combination of 5’ and 3’ splice sites to be used during splicing.
  • polynucleotides that comprise ligand- responsive sequences have a balance of splice strengths, such that addition of a ligand sufficiently changes the way that the spliceosome recognizes the sequences involved in regulating splicing.
  • a ligand-responsive sequence comprises approximately 2-200 nucleotides in length. In some embodiments, a ligand-responsive sequence comprises approximately 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20-30, 30-40, 40-50, 50-60, 70-80, 80-90, 90-100, 100-125, 125-150, 150-175, 175-200, 200-250, or 250-300 nucleotides in length. However, in other embodiments, a ligand-responsive sequence may be greater than 300 nucleotides.
  • a ligand-responsive sequence is generated by modifying a natural exon, natural intron, and/or natural splice site. In some embodiments, such modifications comprise substituting, deleting, and/or inserting one or more nucleotides (e.g., nucleotides in a sequence known to bind to ligands) to enhance ligand binding. In some embodiments, a ligand- responsive sequence is generated by completely replacing an exon, intron, and/or splice site with a sequence not naturally found in the gene. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in an exon.
  • a ligand-responsive sequence, or a portion thereof is found in an intron. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in a 5’ splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in a 3 : splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is placed in an intron downstream of a 5’ splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is placed in an intron upstream of a 3’ splice site.
  • a ligand-responsive sequence spans an exon-intron boundary (e.g., a first portion of the ligand-responsive sequence is found in an exon and a separate portion thereof is found in an adjacent intron).
  • a first portion of a ligand-responsive sequence comprises approximately 1-10, 10-20, 20-30, 30-40, 40-50, or more nucleotides and may be located in an exon which is located immediately 5’ of an intron comprising a second portion of the ligand-responsive sequence comprising approximately 1-10, 10-20, 20-30, 30-40, 40-50 or more nucleotides.
  • a ligand-responsive sequence, or a portion thereof is found 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, or more nucleotides upstream or downstream from a 5’ splice. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, or more nucleotides upstream or downstream from a 3’ splice site. In some embodiments, the ligand-responsive sequence, or a portion thereof, is found in an intron, an exon (e.g., an alternative exon), and/or a splice site.
  • an exon e.g., an alternative exon
  • a polynucleotide may comprise at least one ligand-responsive sequence. In some embodiments, a polynucleotide may comprise a plurality (e.g., 2, 3, 4, or more) of ligand-responsive sequences. In some embodiments, a polynucleotide comprising a plurality of ligand-responsive sequences may be responsive to more than one ligand.
  • Non-limiting examples of ligand responsive sequences are found in Examples 7 and 10 (e.g., SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-2220, or 2236-2260, and those in Table 34).
  • a ligand-responsive sequence encodes an RNA sequence that is capable of binding RNA.
  • such a ligand may be capable of binding an RNA (e.g., a sequence in a splice site, exon, intron, and/or aptamer, or a sequence wherein distinct portions thereof are found in a splice site, exon, and/or intron, such as sequences found in pre-mRNA) and/or one or more components of the spliceosome.
  • the binding affinity' may be characterized by a dissociation constant in the micromolar, nanomolar, or femtomolar range.
  • ligand-responsive sequences bind biomolecules such as, but not limited to, proteins, peptides, carbohydrates, lipids, nucleic acids, and combinations thereof such as glycoproteins or lipidated proteins.
  • the ligand is a small molecule (e.g., a drug molecule).
  • the ligand is a nucleic acid (e.g., an antisense oligonucleotide, such as an exon-skipper).
  • a ligand-responsive sequence comprises affinity to a non-toxic ligand.
  • a ligand will be tolerable (e.g., does not result in cytotoxicity) across a broad range of concentrations that are sufficient to regulate alternative splicing.
  • a ligand-responsive sequence comprises binding affinity to a ligand that is cell permeable.
  • a ligand-responsive sequence comprises binding affinity to a ligand that is expressed in a cell (e.g., an ASO encoded by a nucleic acid in the cell). In some embodiments, for example, a.
  • ligand-responsive sequence may comprise an exon from a gene that exhibits alternative splicing in the presence of an exon-skipping ASO (e.g., an exon 7 from a SMN2 gene further comprising flaking introns which may be derived from the SMN2 gene or an alternatively spliced exon from a dystrophin gene).
  • an exon-skipping ASO e.g., an exon 7 from a SMN2 gene further comprising flaking introns which may be derived from the SMN2 gene or an alternatively spliced exon from a dystrophin gene.
  • a ligand-responsive sequence comprises a sequence with binding affinity to a specific ligand when it is expressed as an RNA and adopts a three-dimensional conformation that specifically binds the ligand. In some embodiments, such sequences facilitate alternative splicing of polynucleotides as described herein.
  • a ligand-responsive sequence e.g., one found in an RNA capable of binding a ligand
  • the ligand-responsive sequence binds risdiplam.
  • a sequence capable of binding risdiplam comprises WGAGTAAGW, wherein W is A or T.
  • the ligand-responsive sequence binds branaplam.
  • a sequence capable of binding branaplam comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187)
  • the ligand-responsive sequence binds tetracycline.
  • a sequence capable of binding tetracycline comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T.
  • a sequence capable of binding tetracycline comprises TAAAACATACCAYMCGKAAMCGKMTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T.
  • ligand-responsive sequences are aptamers.
  • aptamer or “ligand-responsive aptamer” as used herein, refers to an oligonucleotide (e.g., single-stranded DNA (ssDNA) or single-stranded RNA (ssRNA) that can specifically bind to a ligand.
  • ligand-responsive aptamer may be used to describe an aptamer which changes its structural confirmation as a result of binding to a ligand.
  • whether the aptamer is spliced out or retained in the transgene is not a direct result of binding to the ligand. In some embodiments, it is dependent on the splice site strength which is regulated by the aptamer (and the location of the aptamer relative to the intron and/or exon sequences that are either spliced out or retained).
  • An aptamer binds to its target with high affinity, selectivity, and specificity (see., e.g., Keefe et al., Aptamers as therapeutics. Nat. Rev. Drag Discov. 2010;9:537-550; Jayasena S.D. Aptamers: .An emerging class of molecules that rival antibodies in diagnostics. Clin. Chem. 1999:45: 1628- 1650).
  • Aptamer binding is determined by its tertiary structure. Target recognition and binding of an aptamer involves three-dimensional, shape-dependent interactions as well as hydrophobic interactions, base-stacking, and intercalation.
  • a ligand refers to a target, molecule to which a separate molecule (e.g., an aptamer) binds with specific chemical affinity 7 .
  • ligands of aptamers include biomolecules such as, but not limited to, proteins, peptides, carbohydrates, lipids, nucleic acids, and combinations thereof such as glycoproteins or lipidated proteins.
  • a target molecule of an aptamer is a small molecule or a toxin.
  • aptamers bind cells (e.g., live cells).
  • an aptamer binds to drug molecules, such as tetracycline, branaplam, or risdiplam.
  • aptamers are screened for their ability to bind to ligands through various methods known in the art such as SELEX (see, e.g., Ruscito & DeRosa. Small-Molecule Binding Aptamers: Selection Strategies, Characterization, and Applications. Front. Chem. 2016:4; 1.).
  • SELEX see, e.g., Ruscito & DeRosa. Small-Molecule Binding Aptamers: Selection Strategies, Characterization, and Applications. Front. Chem. 2016:4; 1.
  • an aptamer is said to be “ligand-responsive” if it binds to a ligand target molecule.
  • the terms “aptamer” and “ligand-responsive aptamer” can be used interchangeably.
  • Responding to a ligand may entail a confirmational change in the ligand- responsive aptamer thereby altering the 3-D shape of the aptamer upon assuming its
  • an aptamer comprises between 20 and 60 nucleotides, between 25 and 55 nucleotides, between 30 and 50 nucleotides, between 35 and 45 nucleotides, between 20 and 50 nucleotides, between 20 and 40 nucleotides, between 25 and 40 nucleotides, between 20 and 30 nucleotides, between 30 and 40 nucleotides, between 30 and 60 nucleotides, between 40 and 60 nucleotides, or between 50 and 60 nucleotides.
  • aptamers may comprise more than 60 nucleotides (e.g., approximately 80, 100, 120, 140, etc).
  • an aptamer comprises a first stem region and a second stem region.
  • ligand-responsive aptamer stem length influences the sensitivity of an RNA to ligand binding to effect splicing.
  • a stem region comprises at least two nucleotides.
  • a stem region comprises 1-5 nucleotides.
  • a stem region may comprise approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15.
  • a stem region comprises more than 15 nucleotides.
  • the first stem region and the second stem region are the same length.
  • the first stem region and the second stem region are different lengths.
  • the first stem region and the second stem region differ in length by approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides.
  • an aptamer comprises a loop region.
  • a loop region may comprise 1-10, 10-20, 20-30, 30-40, 40-50, or more nucleotides (e.g., 50-75, 75-100, etc,).
  • an aptamer comprises a plurality of stems and loops.
  • an aptamer may comprise 2, 3, 4, 5, or more loops each associated with their own respective first stem region and second stem region.
  • an aptamer regulates the activity of 3’ and 5' splice sites.
  • splice sites may be part of the aptamer structure (e.g., to influence its 3D conformation to effect splicing).
  • splices sites are not found in an aptamer sequence but are located within 1-5, 5-10, 10-15, 15-20, or 20-30 nucleotides of a sequence capable of binding a ligand.
  • a first stem region is located downstream of a 3’ splice site. In some embodiments, a first stem region is located upstream of a 3’ splice site. In some embodiments, a sequence that is not a stem region comprising approximately 1-10, 10-20, 20-30, or more nucleotides is found downstream of the 3’ splice site and first stem region and upstream of the remaining aptamer sequence. In some embodiments, a stem region is located upstream of a 5’ splice site. In some embodiments, a stem region is located downstream of a 5’ splice site. In some embodiments, a sequence that is not a stem region approximately, I, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides long is found between the 5’ splice site and the second stem region.
  • a polynucleotide comprising a ligand-responsive aptamer is comprises a general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Ligand-Binding Sequence]-[5’ splice site]-[second stem region].
  • a polynucleotide comprising a ligand-responsive aptamer comprises a general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Ligand-Binding Sequence]-[5’ splice site] -[sequence comprising at least 2 nucleotides]-[second stem region].
  • said sequences may be flanked by one or more introns and/or exons.
  • a polynucleotide comprising a ligand-responsive aptamer comprises a general structure of: [EXON]-[INTRON]-[upstream 3' splice site]-[first stem region]-[5’ splice site reverse complementary' sequence]-[Ligand-Binding Sequence]-[5’ splice site]-[sequence comprising at least 2 nucleotides]-[second stem region] -[INTRON] - [downstream 3' splice site] ⁇ [EXON].
  • an aptamer binds risdiplam.
  • the aptamer comprises WGAGTAAGW (SEQ ID NO: 2261), wherein W is A or T.
  • an aptamer binds branaplam.
  • the aptamer comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
  • an aptamer binds tetracycline.
  • the aptamer comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T.
  • the aptamer comprises TAAAACATACCAYMCGKAAMCGKMTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T.
  • a transgene comprises at least one ligand-responsive aptamer that comprises a sequence that is part of a 5' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
  • the transgene comprises at least one ligand-responsive aptamer comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
  • a ligand-responsive aptamer is provided in a polynucleotide wherein the aptamer sequence is flanked by non-aptamer nucleic acid sequences (e.g., exons and/or introns).
  • the aptamer is provided in an intron of the transgene.
  • the aptamer is provided in an alternative exon of the transgene.
  • the aptamer spans an intron-exon boundary of the transgene.
  • the aptamer upon binding to a ligand, alters its 3D conformation thereby conveying ligand-dependent regulator ⁇ - effects on the polynucleotide within which it is provided.
  • binding to a ligand enables the aptamer to regulate the alternative splicing of a polynucleotide within which it is provided.
  • the presence of a ligand increases the translation of an mRNA comprising the ligand-responsive aptamer.
  • the presence of a ligand decreases the translation of an mRNA comprising the ligand-responsive aptamer.
  • the presence of a ligand may enhance the expression of a particular isoform of an mRNA sequence or protein upon binding its cognate ligand-responsive aptamer.
  • the presence of a ligand forces the aptamer to be spliced out of the transgene thereby forming a functional RNA product such as a miRNA.
  • the ligand-responsive aptamer is present in the intron and, therefore, is spliced out of the transgene regardless of the presence or absence of ligand. In such an embodiment, ligand addition also results in splicing out the alternative exon of the transgene.
  • splicing out an aptamer from the transgene as a result of ligand addition causes two regions of a non-continuous exon to be spliced together forming a continuous exon sequence.
  • the ligand-responsive aptamer is alternatively-spliced out of the transgene wherein the presence of the ligand results in the formation of a transgene which does not comprise the ligand-responsive aptamer.
  • the transgene comprises at least one ligand-responsive aptamers comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
  • the transgene comprises at least one ligand-responsive aptamer comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
  • a ligand-responsive sequence is a risdiplam-responsive sequence.
  • risdiplam enhances recognition of 5’ splice sites (e.g., suboptimal or weak 5’ splice sites) by a component of the spliceosome (e.g., the U1 snRNP). In some embodiments, risdiplam enhances pre-mRNA interactions with the U1 snRNP at a 5’ splice site. In some embodiments, risdiplam interacts with exon sequence upstream of a 5’ splice site to either preclude interaction with splicing silencers or recruit splicing enhancers. Accordingly, in some embodiments, a risdiplam-responsive sequence occurs in an alternative exon at.
  • binding of risdiplam to a risdiplam-responsive sequence will lead to intron exclusion.
  • the RNA comprises one intron flanked by exons
  • the presence of risdiplam results in intron removal.
  • FIG. 43 A non-limiting example of such embodiments.
  • binding of risdiplam to a risdiplam-responsive sequence will lead to alternative exon inclusion.
  • the RNA comprises two introns flanking one or more alternative exons
  • the presence of risdiplam results in inclusion of the one or more alternative exons.
  • FIG. 47 A A non-limiting example of such embodiments
  • a risdiplam-responsive sequence comprises a sequence of WGA wherein W corresponds to T or A. In some embodiments, a risdiplam-responsive sequence comprises a sequence of GTAAGW wherein W corresponds to T or A. In some embodiments, a risdiplam-responsive sequence is in an exon-intron boundary with a sequence comprising WGA
  • a risdiplam-responsive sequence comprises AGGAAG which is 5’ of the sequence AWGAgtaagw (SEQ ID NO: 2190), wherein W is A or T.
  • the AGGAAG is preceded by any 5’ sequence and proceeded by any 3’ sequence.
  • the sequence 5’ sequence preceding the AGGAAG can be 1-5, 5-10, 10- 15, 15-20, or more nucleotides in length.
  • the sequence 3’ sequence proceeding the AGGAAG can be 1-5, 5-10, 10-15, 15-20, or more nucleotides in length.
  • the 5’ sequence comprises ATAATTTTTT (SEQ ID NO: 2191), CACTTTTATT (SEQ ID NO: 2192), CATTATAATC (SEQ ID NO: 2193), CCATAAGTTT (SEQ ID NO: 2194), TACTATTTAT (SEQ ID NO: 2195), TCATATCT AT (SEQ ID NO: 2196), or TTAGTATCGT (SEQ ID NO: 2197).
  • the 3’ sequence comprises GTTACGCTTT (SEQ ID NO: 2198), TTGTGTTGTT (SEQ ID NO: 2199), TTAGTGTGTT (SEQ ID NO: 2200), TGATGTATAT (SEQ ID NO: 2201), TTTATCTATC (SEQ ID NO: 2202), TTTTTTACAG (SEQ ID NO: 2203), or CTATTAGTTA (SEQ ID NO: 2204).
  • a risdiplam-responsive sequence comprises the general structure: NNNNNNNNAGGAAGNNNNNNNNNNAWGAgtaagw (SEQ ID NO: 2183), wherein N is any nucleotide and W is A or T.
  • a risdiplam-responsive sequence comprises the general structure: NNNNNNNNAGGAAGNNNNNNh ⁇ T ⁇ n ⁇ LAWGAgtaagw ? (SEQ ID NO: 2205), wherein N is any nucleotide and W is A or T.
  • a risdiplam-responsive sequence comprises the general structure YWWKWWWMKYAGGAAGYTAKT(R)WGTTAWGAgtaagw (SEQ ID NO: 2206), wherein Y is C or T, K is G or T, VV is A or T, M is A or C, R is A or G, and (R) is optionally present.
  • a risdiplam-responsive sequence comprises CATTATAATCAGGAAGTTAGTGTGTTAAGAgtaagt (SEQ ID NO: 2207). In some embodiments, a risdiplam-responsive sequence comprises TTAGTATCGTAGGAAGCTATTAGTTAATGgtaagt (SEQ ID NO: 2208). In some embodiments, a risdiplam-responsive sequence comprises ATRTCCACTYAAAAAAATCTGGCGATGGGAGCAGAAWGAgtaagw (SEQ ID NO: 2186), wherein R is A or G, Y is C or T, and W is A or T.
  • a risdiplam-responsive sequence comprises, ATGTCCACTTAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2209), ATGTCCACTCAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2210), or ATATCCACTTAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2211).
  • a risdiplam-responsive sequence comprises a sequence in Variant 3 or Variant 7 (see, e.g., Example 10). In some embodiments, a risdiplam-responsive sequence comprises a sequence in a variant of exon 1 lb (El IB ) of a POMT2 gene (see, e.g., Example 10). In some embodiments, a risdiplam-responsive sequence comprises an A:C mutation at the +10 position in the intron downstream of POMT2 El IB. In some embodiments, a risdiplam- responsive sequence comprises a sequence in YZ312, YZ316, YZ317, or a variant thereof (see, e.g., Example 10).
  • a ligand-responsive sequence is a branapl am -responsive sequence.
  • a branaplam-responsive sequence binds to a ligand to promote alternative exon inclusion.
  • a branaplam-responsive sequence binds to a ligand to promote alternative exon exclusion.
  • branaplam enhances exon inclusion via recognition of sequences near the 5’ splice site of an alternative exon.
  • branaplam regulates interaction (e.g., directly or indirectly) between a 5’ splice site and a splicesome component (e.g., the U1 snRNP).
  • a branaplam-responsive sequence comprises a sequence in YZ231 or YZ232 (see, e.g., Example 10). In some embodiments, a branaplam-responsive sequence comprises a sequence in YZ301 (see, e.g., Example 10). In some embodiments, for example, when the RNA comprises one intron flanked by at least two exons, the presence of branaplam results in intron removal. In other embodiments, for example, when the RNA comprises two introns flanking one or more alternative exons, the presence of branaplam results in inclusion of the one or more alternative exons. In some embodiments, a branaplam-responsive sequence comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
  • a ligand-responsive sequence is a tetracycline-responsive sequence.
  • a tetracycline-responsive sequence comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or I'.
  • a tetracycline-responsive sequence comprises TAAAACATACCAYMCGKAAMCGK.MTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T.
  • a tetracycline-responsive sequence comprises TAAAACATACCTACCGTAACCGGTAGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2212), TAAAACATACCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2095), or TAAAACATACCAGACGGAAACGTCTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2086).
  • a tetracycline-responsive sequence is an aptamer.
  • a tetracycline-responsive aptamer is found in a sequence comprising the general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Tetracycline-Binding Sequence]-[5’ splice site]gt[second stem region].
  • the upstream 3’ splice site and downstream 3’ splice site are at least 20 nucleotides long, wherein the last two nucleotides are AG.
  • the upstream 3‘ splice site comprises nnnnnnnnnnnnnnnag wherein n is any nucleotide.
  • the first stem region and the second stem region are at least two nucleotides long.
  • the first stem region comprises the sequence NN and the second stem region comprises the sequence nn, wherein N/'n is any nucleotide.
  • the 5' splice site reverse complementary sequence and the 5' splice site are at least 7 nucleotides long.
  • the 5' splice site reverse complementary sequence comprises NNNNNNN and the 5' splice site comprises NNNnnn, wherein N/n is any nucleotide.
  • a tetracycline-responsive aptamer is found in a sequence comprising the general structure of: [EXON] -[INTRON] -[up stream 3' splice site]-[first stem region]-[5' splice site reverse complementary sequence]-[Tetracycline-Binding Sequence]-[5' splice site]gt[second stem region]-[INTRON]-[downstream 3' splice site]-[EXON],
  • the upstream 3' splice site may comprise the sequence TCCTCATIGCCTCTCCTT (SEQ ID XO 2213), TTTCCAACTTATTTCCCT (SEQ ID NO: 2214), CTTACTTTGTATTCCCAT (SEQ ID NO: 2215), AATCTTTATCTCTATTTC (SEQ ID NO: 2216), TGCXICTATCTTACCTTAT (SEQ ID NO: 2217), TGCACTTTCATTCATTTT (SEQ ID NO: 2218), CCACCTTTTTTTATTTTC (SEQ ID NO: 2219), or CCCCCATTTGTCT TCCC X (SEQ ID NO: 2220).
  • the upstream 5' splice site reverse complementary sequence may comprise the reverse complement of CAGGTAA, AACGTAA, CAGGTAC, CCGGTAC, ATCGTAA, GCGGTAC, GAGGTAC, ACGGTAG, CAAGTAA, GAGGTGA, CGCGI AA, GTCGTAA, GAGGTAT, AAGGTAT, TTCGTAA, CCGGTGC, GAGGTAG, CTCGTAA, CTGGTAC, AACGTGA, GCGGTAT, CCGGTAG, or C ACC d TGA.
  • variable sequence for stem region NN and the variable sequence for stem region nn may comprise CA and ac, CC and ac, AC and ac, AC and cc, or AC and ct.
  • the downstream variable region for 3' splice site nnnnnnnnnnnnnnnn may comprise the sequence tttcttttttttcag (SEQ ID NO: 2237), ttcttattctccctttcag (SEQ ID NO: 2238), or ttcttcttctctacctttcag (SEQ ID NO: 2239).
  • a tetracycline-responsive aptamer comprises a sequence in YZ150 or a variant thereof (see, e.g., Example 10).
  • a tetracycline-responsive aptamer comprises the sequence of SEQ ID NOs: 2086, 2095, 2112 or 2188.
  • polynucleotides of the present disclosure comprise a sequence encoding an RNA (e.g., an RNA comprising the sequence of an RNA of interest).
  • RNA of interest refers to a functional RNA (e.g., an mRNA that can encode a full- length protein, such as a therapeutic protein, or an interfering RNA that can bind to a target transcript).
  • the RNA of interest is functional when present in what is referred to herein as a “first RNA”. In other embodiments, the RNA of interest is functional when present in what is referred to herein as a “second RNA”. In some embodiments, the RNA of interest is functional in either form corresponding to the “first RNA” and the “second RNA”, wherein the first RNA and second RNA encode different isoforms of the RNA of interest. In some embodiments, an RNA of interest corresponds to any gene or protein sequence described herein (see, e.g., Examples 1-10).
  • a sequence encoding an RNA of interest comprises at least 1-5000 nucleotides in length. In some embodiments, a sequence encoding an RNA of interest, is approximately 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1-3, 1-4, 1-5, 1 -6, 1-7, 1 -8, 1-9, 1 10, 10-20, 20-30, 30- 40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1,000, 1,000-1,100, 1,100-1 ,200, 1,200-1,300, 1,300-1,400, 1,400-1,500, 1,500-1,600, 1,600-1,700, 1,700-1,800, 1,800-1,900, 1,900-2,000, 2,000-2,100, 2,100-2,200, 2,200-2,300, 2,300-2,400, 2,400-2,500, 2,500-2,600, 2,600-2,700, 2,700-2, 2,700-2
  • a sequence encoding an RNA of interest is approximately 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 100-200, 200-300, 300-400, 400-500, 500- 600, 600-700, 700-800, 800-900, 900-1000, 1000-1 100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, or 2900- 3000 nucleotides long.
  • a polynucleotide comprises two or more sequences encoding distinct portions of an RNA of interest. In some instances, said sequences may be referred to as a “first sequence”, a “second sequence”, or a “third sequence”. In some embodiments, a portion of an RNA of interest comprises at least 1-5000 nucleotides in length.
  • a portion of an RINA of interest comprises approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9, 110, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1,000, 1,000-1,100, 1,100-1,200, 1,200-1,300, 1 ,300-1,400, 1,400-1,500, 1,500-1,600, 1,600-1,700, 1,700-1,800, 1,800-1,900, 1,900-2,000, 2,000-2,100, 2,100-2,200, 2,200-2,300, 2,300-2,400, 2,400-2,500,
  • a portion of an RNA of interest comprises approximately 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, 1000-1100, 1100- 1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900- 2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 2400-2500, 2500-2600, 2600-2700, 2700- 2800, 2800-2900, or 2900-3000 nucleotides long.
  • the RNA of interest in the polynucleotide e.g., in an RNA that has not undergone splicing, such as a pre-mRNA
  • the RNA of interest may be split into two or more portions each comprising 1-10, hundreds, or thousands of nucleotides in length.
  • the polynucleotide may encode an RNA of interest that is 4000 nucleotides in length, wherein the first sequence comprises 1000 nucleotides of the RNA of interest (e.g., the 5’-most 1000 nucleotides) and the third sequence may comprise 3000 nucleotides of the RNA of interest (e.g., the 3 ’-most 3000 nucleotides).
  • the first sequence, the second sequence, and/or the third sequence comprises at least one exon. In some embodiments, the second sequence comprises at least one alternative exon. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises at least one intron. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises at least one splice site. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises a ligand-responsive sequence. In some embodiments, the second sequence, and/or the third sequence comprise distinct portions of a ligand-responsive sequence.
  • the first RNA comprises the first sequence, the second sequence, and the third sequence.
  • the second RNA comprises the first sequence and the third sequence.
  • the second RNA lacks the second sequence (e.g., as a result of alternative splicing).
  • an RNA of interest encodes a marker.
  • markers include cell surface proteins (e.g., an antibody or antigen-binding fragment thereof, receptors, membrane proteins which become glycosylated upon expression in a cell, etc.), luciferase or variants thereof, alkaline phosphatase or variants thereof, beta-galactosidase or variants thereof, and fluorescent markers (e.g., mNeonGreen, GFP (e.g., SEQ ID NO: 28), EGFP, Superfold GFP, Azami Green, m Wasabi, TagGFP, TurboGFP, acGFP, zsGreen, T- sapphire, EBFP, EBFP2, Azurite, TagBFP, ECFP, mECFP, Cerulean, mTurquoise, CyPet, AmCyanl , TagCFP, mTFPl, EYFP, mCitrine, TagYFP, phiYFP
  • cell surface proteins e
  • an RNA of interest comprises corresponds to a gene selected from the group consisting of: MBNL1; MBNL2; MBNL3; hnRNP Al; hnRNP A2B1; hnRNP C; hnRNP D, hnRNP DL; hnRNP F; hnRNP H; hnRNP K , hnRNP L, hnRNP M; hnRNP R; hnRNP U; FUS; TDP43; PABPN1; ATXN2; TAF15; EWSR1; MATR3; TIA1; FMRP; MTM1; MTMR2; LAMP2; KIF5A; a microdystrophin-encoding gene; C9ORF72; HTT; DNM2; BINI , RYR1; NEB; ACTA; TPM3; TPM2; TNNT2; CFL2; KBTBD13; KLHL40;
  • TNPO3 TNPO3; HNRPDL; CAPN3; DYSF; art alpha-sarcoglycan-encoding gene; a beta-sarcoglycan- encoding gene; a gamma-sarcoglycan-encoding gene; a delta-sarcoglycan-encoding gene; TCAP; TRIM32; FKRP; FXN; POMT1; FK TN; POMT2; POMGnTl; DAG1; ANO5; PLEC1;
  • MAP MAP3K7; MAP4K2; MBNL2; MFF; NAEI , NCSTN; NR4.A3; NRFI , NIJP98; PARP6; PCM1; PLAUR; PLSCR3; PPIL5; PPP5C; PTPRC-E4; PTPRC-E6; PTS; RABL5; RAPH1; SEC16A; SFRS3; SFRS7; SI.
  • MAP MAP3K7; MAP4K2; MBNL
  • SNRNP70 STAT6; TBC1D1; TIMM8B; TIR8; TRA2.A; TROVE2; UGCGL1; VAP-B; VAV1; ZNF384; ZNF496; CAMK2B; PKP2; LGMN; NRAP; VPS39;
  • an RNA of interest corresponds to a gene encoding a component of a “CRISPR/Cas system” which may be alternatively referred to as a “CRISPR/Cas” molecule.
  • a CRISPR/Cas molecule comprises a Cas nuclease (e.g., Cas9 or a variant thereof, Cas 12a or a variant thereof, Cas fusion protein comprising CasPhi, CasMini, etc.).
  • the CRISPR'Cas molecule binds to a guide RNA (gRNA) described herein.
  • the CRISPR/Cas molecule binds to a gRNA encoded by a polynucleotide regulated by an alternatively spliced sequence described herein. In some embodiments, the CRISPR/Cas molecule binds to a gRNA encoded by a separate polynucleotide that does not comprise an alternatively spliced sequence described herein. In some embodiments, the RNA of interest corresponds to a gRNA. In some embodiments, the gRNA binds to a CRISPR/Cas molecule described herein.
  • the gRNA binds to a CRISPR/Cas molecule encoded by a polynucleotide regulated by an alternatively spliced sequence described herein. In some embodiments, the gRNA binds to a CRISPR/Cas molecule encoded by a separate polynucleotide that does not comprise an alternatively spliced sequence described herein.
  • a CRISPR/Cas molecule is of, or derived from, Streptococcus Staphylococcus aureus (e.g., A aureus Cas9).
  • a CRISPR/Cas molecule comprises a Cas nuclease variant that encoded by a shorted variant sequence (e.g., CasMini).
  • CasMini a shorted variant sequence
  • such a Cas nuclease may be selected in order to fit within the packaging capacity of an rAAV genome.
  • a CRISPR/Cas molecule may be selected to promote genomic editing with a suitable gRNA.
  • the gRNA may bind to a target domain in the genome of a host cell (e.g., when present in a ribonucleoprotein complex with a CRISPR/Cas nuclease).
  • the gRN A may comprise a targeting domain that may be partially or completely complementary to the target domain.
  • the gRNA comprise a targeting domain that may be partially or completely complementary to the target domain located in a genomic sequence (e.g., a gene) implicated in a disease or disorder (e.g., a mutated gene).
  • a gRNA can be unimolecular (having a single RNA molecule), sometimes referred to herein as sgRNAs (comprising more than one, and typically two, separate RNA molecules, such a single RNA molecule including both crRNA and tracrRNA sequences covalently bound to each other).
  • the targeting domain is 15 to 25 nucleotides in length.
  • the gRNA is chemically modified.
  • the RNA of interest corresponds to an erythropoietin (EPO) gene.
  • EPO erythropoietin
  • the RNA of interest corresponds to a GARBRG2 gene. In some embodiments, the RNA of interest corresponds to a long protein isoform of GARBRG2. In some embodiments, the RNA of interest corresponds to a short protein isoform of a GARBRG2. In some embodiments, the long protein isoform of GABRG2 comprises a sequence corresponding to exon 9 of the GABRG2 gene. In some embodiments, the short protein isoform of GABRG2 does not comprise a sequence corresponding to exon 9 of the GABRG2 gene. In some embodiments, the RNA of interest corresponds to the CSNK1 D gene.
  • the RNA of interest corresponds to a long protein isoform of CSNK1D. In some embodiments, the RNA of interest corresponds to a short protein isoform of CSNK1D. In some embodiments, the long protein isofomi of CSNK1D comprises a sequence corresponding to exon 9 of the CSNK1D gene. In some embodiments, the short protein isofonn does not comprise a sequence corresponding to exon 9 of the CSNK1D gene.
  • an RNA of interest is a therapeutic RNA and/or encodes a therapeutic protein.
  • a “therapeutic RNA” or “therapeutic protein” leads to a physiological change that is associated with or expected to at least partially, if not fully, remedy at least one symptom associated with a disease, disorder, or condition.
  • a therapeutic RNA may refer to an RNA expressed from a transgene that is therapeutic as an RNA upon expression in a target cell and without being translated into a protein.
  • a therapeutic protein may refer to any proteinaceous molecule that is translated from an RNA expressed from a transgene which is therapeutic upon translation in a target cell.
  • a therapeutic RNA or protein may be therapeutic for any disease, disease, or condition described herein upon administration to a subject in need thereof.
  • therapeutic RNAs can be, but are not limited to, interfering RNAs (e.g., shRNAs, siRNAs, miRNAs, ncRNAs, piRNAs, pro-siRNAs, etc.), exon-skipping RNAs, enzymatic RNAs, guide RNAs or gRNAs (e.g., sgRNAs) of a CRISPR/Cas editing system (e.g., Cas9-based genome editing and derivatives thereof, such as base editing and prime editing), small nuclear RNAs (snRNAs), ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), and niRNAs.
  • interfering RNAs e.g., shRNAs, siRNAs, miRNAs, ncRNAs, piRNAs, pro-siRNAs, etc.
  • exon-skipping RNAs e.g., sgRNAs
  • enzymatic RNAs e.g.,
  • miRNA refers to a nucleic acid which comprises several structural and functional characteristics. miRNAs are single-stranded RNAs of about 19-25 nucleotides that regulate the expression, stability, and/or translation of other mRNAs comprising complementary sequences. miRNAs are cleaved from a longer endogenous double-stranded hairpin precursors by the enzymes Drosha and dicer. miRNAs match genomic regions that can potentially encode precursor RNAs in the form of double-stranded hairpins. miRNAs and their predicted precursor secondary structures are phylogenetically conserved. Drosha, dicer, and Argonaute are crucial regulators of miRNA biosynthesis, maturation, and function.
  • pre-miRNA biogenesis involves Drosha cleavage on hairpin shaped primary miRN A to generate hairpin precursor with 2 or 3 nucleotide overhangs in the 3' end, and then Dicer cleavage on precursor miRNA to generate miRNA duplex. Additionally, the stem-loop structure of pre-miRNA is crucial for miRNA processing wherein disruption of such structures inhibits the Drosha cleavage reaction and, thus, the production of functional miRNAs. Cofactors bind to the pre-miRNA to form a pre-micro ribonucleoprotein (pre-miRNP) and unwind the pre- miRNAs into single-stranded miRNAs. The pre-miRNP is then transformed to miRNP. miRNAs play crucial roles in eukaryotic gene regulation. For instance, miRNAs are thought to interact with target mRNAs through complementary base-pairing which leads to suppressed translation. Separately, miRNAs promote RNA degradation.
  • pre-miRNP pre-micro ribon
  • miRNA expression may be assessed by measuring the levels of the target mRNA and/or its protein product.
  • miRNAs include, but are not limited to miRNA-16 2 gene.
  • the transgene comprises the scaffold of primary miRNA 16-2 and an miRNA seed sequence of HSUR4 miRNA.
  • the transgene comprising the pri-miRNA 16-2 scaffold can further comprise a miRNA seed sequence of any miRNA of interest.
  • the miRNA comprises a sequence of YZ150, YZ232, or YZ301.
  • the polynucleotide (e.g., a transgene) comprises at least one miRNAs comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64.
  • 64 may comprise targeting ability (e.g., reverse complementarity) to a different RNA target but is regulated by a ligand-responsive sequence described herein.
  • the polynucleotide (e.g., a transgene) comprises at least one miRNA comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64.
  • the polynucleotide (e.g., a transgene) comprises at least one exon comprising a niiRNA sequence comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64.
  • the polynucleotide (e.g., a transgene) comprises at least one exon comprising a miRNA sequence comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64.
  • therapeutic proteins can be, but are not limited to, enzymes (such as proteases, signaling proteins, transcriptional regulators (e.g., MECP2), Cas9, base editors, prime editors, etc.), enzymatic domains, enzyme substrates, secreted proteins (e.g., progranulin), hormones (e.g., erythropoietin, insulin or a variant thereof, such as a furin-cleavale pro-insulin, etc.), receptors (e.g., chimeric antigen receptors), components of gene editing ribonucleoprotein complexes (e.g., CRISPR/Cas molecules, such as Cas9, base editors, such as adenine base editors and cytidine base editors, prime editors, etc.), a zinc finger nuclease, a TALEN, peptibodies, growth factors, RNA-binding proteins, clotting factors, cytokines, chemokines, activating or inhibitor ⁇
  • enzymes
  • an “intron” or “intronic sequence” or “intronic regions” can refer to a nucleotide sequence that does not code for a therapeutic protein or therapeutic RNA and is spliced out of the transgene transcript.
  • an “intron” or “intronic sequence” or “intronic regions” can refer to alternatively spliced sequence (e.g., an intron found in a polynucleotide comprising a risdiplam-responsive sequence). In some embodiments, such splicing may be regulated by the presence or absence of a ligand.
  • the terms “intron” and “intronic sequence” may be used interchangeably.
  • the transgene comprises at least two introns or intronic sequences.
  • An intron alternatively referred to as a flanking component, may in some embodiments be immediately adjacent to the central component.
  • a central ligand-responsive aptamer may, in some embodiments, be flanked by two introns, wherein such introns are positioned immediately adjacent to the central ligand-responsive aptamer.
  • a central ligand-responsive sequence e.g., one comprising an alternative exon
  • the transgene comprises a polynucleotide comprising an exon or exon region at the 5' and 3' ends with a central region comprising at least two introns, an alternative exon and a ligand-responsive aptamer.
  • introns of the transgene are spliced out of the transgene along with the ligand-responsive aptamer in the presence of the ligand thereby forming a transgene lacking both the at least one intron and the aptamer.
  • in the absence of ligand only the introns are spliced out.
  • the polynucleotide (e.g., a transgene) comprises at least two introns comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 21 13, 2115, 2117, 2118, 2121, 2130, 2141 , or 2232-2233.
  • the polynucleotide (e.g., a transgene) comprises at least two introns comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2130, 2141, or 2232-2233.
  • the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 3' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
  • the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 5' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in Tables 7, 25, 26, or 34.
  • the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
  • an “engineered intron” is an intron which comprises at least one modification, relative to a native intron.
  • an engineered intron may comprise one or more nucleotide deletions, and thus be truncated, relative to a native intron.
  • an “engineered exon” is an exon which comprises at least one modification, relative to a native exon.
  • an engineered exon may comprise one or more nucleotide deletions, and thus be truncated, relative to a native exon.
  • flanking component refers to a component which is located upstream (e.g., 5’) or downstream (e.g., 3’) of a central component (e.g., an exon).
  • a flanking component may in some embodiments be immediately adjacent to the central component, but that is not required by the methods and compositions of the present disclosure.
  • a central alternatively-spliced exon may, in some embodiments, be flanked by two introns, wherein such introns are immediately adjacent to the central alternatively-spliced exon.
  • the same central alternatively-spliced exon may also be flanked by two additional exons, which are located upstream and downstream of the central alternatively-spliced exon, respectively, but which are not immediately adjacent to the central alternatively-spliced exon.
  • a “constitutive exon” is an exon that is present in all spliced transcripts (e.g., mRNA isoforms) formed as a result of splicing a pre-mRNA or miRNA transcripts that are transcribed from a gene.
  • a constitutive exon is therefore common to different mRNA isoforms of a gene.
  • mRNA isoforms mRNA isoforms
  • resultant protein isoforms may have related, distinct or even opposing functions.
  • the mRNA and protein isoforms produced by alternative splicing (or equivalently, alternative processing) of primary RNA transcripts may differ in structure, function, localization or other properties.
  • Alternative splicing in particular is known to affect more than half of all human genes, and has been proposed as a primary driver of the evolution of phenotypic complexity in mammals.
  • the number of variants of a gene ranges from two to potentially thousands.
  • the resulting proteins may exhibit different and sometimes antagonistic functional and structural properties, and may inhabit the same cell with the resulting phenotype representing a balance between their expression levels. Defects in splicing have been implicated in human diseases, including cancer.
  • aspects of the invention utilize alternative splicing mechanisms as a method of regulating the expression of a transgene (e.g., encoding a therapeutic protein or miRNA).
  • a transgene e.g., encoding a therapeutic protein or miRNA.
  • a recombinant viral genome of the present disclosure comprising the inducibly-spliced exon cassette may behave in a predictable manner, and the transgene and/or coding region of interest may be expressed in specific conditions which are therapeutically beneficial (e.g., in a specific cell type, a specific tissue, a disease state, and/or upon an inflammatory response).
  • aspects of the invention contemplate inducibly-spliced exon cassettes for regulating the expression of coding regions of interest (e.g., encoding therapeutic nucleic acids such as miRNAs and/or therapeutic proteins).
  • aspects of the invention utilize alternative splicing mechanisms as a method of regulating the expression of a transgene (e.g.., encoding a therapeutic protein).
  • a transgene e.g.., encoding a therapeutic protein
  • the alternatively-spliced exons of the application do not necessarily result in alternative sequence isoforms of the encoded protein.
  • an alternatively-spliced exon impacts the level of protein expression without impacting the sequence of the protein that is expressed. That is, the alternatively-spliced exon is utilized as a means of regul ation of the expression of the protein of interest.
  • retention of the alternatively-spliced exon in the spliced transcript results in the productive translation of a coding region of interest.
  • exclusion of the alternatively-spliced exon from the spliced transcript results in the coding region of interest not being translated (e.g., the alternatively-spliced exon is spliced out).
  • retention of the alternatively-spliced exon in the spliced transcript results in nonsense mediated decay.
  • exclusion of the alternatively-spliced exon from the spliced transcript results in the productive translation of the coding region of interest.
  • a recombinant viral genome of the present disclosure comprising the alternatively- spliced exon cassette may behave in a predictable manner, and the transgene and/or coding region of interest may be expressed in specific conditions which are therapeutically beneficial (e.g., in a specific cell type, a specific tissue, a disease state, and/or upon an inflammatoiy response).
  • Transgenes comprising alternatively-spliced exon cassettes may be designed according to any one of several non-limiting models of alternative splicing (shown in FIGs. 2 or 4-8), each of which is specifically contemplated herein, in addition to other models of alternative splicing.
  • aspects of the invention contemplate alternatively-spliced exon cassettes for regulating the expression of coding regions of interest (e.g., encoding therapeutic proteins).
  • the alternatively-spliced exons are spliced-in or spliced-out in a manner that, is dependent upon one or more environmental cues, e.g., cell or tissue type, disease state, or intracellular conditions such as the presence of a ligand.
  • the alternatively-spliced exons can be sourced from a naturally occurring gene or may be recombinant, for example, in order to add one or more genetic regulatory/ elements for influencing expression levels of the transgene and/or coding region of the transgene. Examples of alternatively-spliced exons are disclosed herein.
  • the alternatively-spliced exons may comprise one or more regulatory' sequences that modulate the expression of a coding sequence of interest.
  • regulatory sequences may be referred to a cis-elements.
  • m-elements that impart a positive regulatory control on a coding sequence of interest may be referred to as a positive regulatory czs-element.
  • czs-elements that impart a negative regulatory control on a coding sequence of interest may be referred to as a negative regulatory cis-element.
  • Alternatively-spliced exons may be found in nature in a naturally-occurring genes, or may be modified by changing or altering the sequence thereof (e.g., derived from a naturally- occurring gene), including adding or changing the splice site, and/or adding or changing a positive or negative regulatory' cis-element.
  • the one or more positive or negative regulatory cis- elements may be located within an alternatively-spliced exon, and may influence the level of expression of a coding region of interest through positive and/or negative controls, and may include any regulatory' sequence which exerts as a consequence being spliced-in or spliced-out of the final niRNA — either a positive or negative regulation on the expression of the coding region.
  • FIG. 4 shows seven non-limiting embodiments contemplated for the structural configuration of a cassette (e.g., comprised within a transgene) for use with a recombinant virus genome, wherein the cassette (e.g, comprised within a transgene) comprises an alternatively- spliced exon and a coding region, wherein the alternatively-spliced exon further comprises at least one positive or negative regulatory czs-element.
  • the cassette e.g., comprised within a transgene
  • the alternatively-spliced exon further comprises at least one positive or negative regulatory czs-element.
  • Non-limiting examples of positive or negative regulatory’ czx-elements can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise affects the stability and/or degradation of a mRNA, and (2) a nucleotide sequence element that regulates, modulates, or otherwise affects the translation of a mRNA into one or more encoded polypeptide products (e.g., a therapeutic product).
  • positive or negative regulatory' czs-elements may include, but are not limited to, a translation start, codon, a translation stop codon, a ligand-responsive aptamer, a binding site for an RNA binding protein that serves to positively regulate transgene expression, a binding site for an RNA binding protein that serves to negatively regulate transgene expression, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate transgene expression, a binding site for an RNA binding protein that serves to negatively regulate transgene expression, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate transgene expression, or a binding site for a nucleic acid molecule (e.g., an siRNA or miRNA).
  • a nucleic acid molecule e.g., an miRNA
  • the one or more czx-elements can include, but are not limited to, a translation start codon, a translation stop codon, a ligand-responsive aptamer, an siRNA binding site, a miRNA binding site, a sequence forming a stem-loop structure, a sequence forming an RNA dimerization motif, a sequence forming a hairpin structure, a sequence forming an RNA quadruplex, polypurine tract, a sequence forming a pair of kissing loops, and a sequence forming a tetral oop/tetraloop receptor pair.
  • cA-elements include binding sites recognized by regulatory elements, such as, for example, RNA binding proteins.
  • an RNA binding protein capable of exerting regulatory' control once bound is an RNA binding protein described in Van Nostrand, et al. (2020), A large-scale binding and functional map of human RNA-binding proteins, Nature, 583: 711-719, which is herein incorporated by reference with respect to its description of RNA binding proteins.
  • a transgene comprising an inducibly-spliced exon cassette comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • a transgene comprising an inducibly-spliced cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
  • the cassettes may include one or more additional components, including one or more other constitutive exons, and one or more introns.
  • the constitutive exons not comprising the coding region of interest are represented by narrow rectangles
  • introns are represented as dashed lines
  • the alternatively-spliced exons are represented as shaded narrow rectangles.
  • the exon or exons comprising the coding region are indicated as solid thick white rectangles.
  • the alternatively-spliced exon may contain portions of a coding region of interest.
  • FIG. 4A is a schematic of an embodiment wherein the alternatively-spliced exon is upstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is to the 5’ of the exon encoding the coding region of interest.
  • FIG. 4B is a schematic of an embodiment wherein the alternatively-spliced exon is downstream of the exon encoding the coding region of interest. Said another way , in this embodiment, the alternatively-spliced exon is to the 3’ of the exon encoding the coding region of interest.
  • FIG. 4C is a schematic of an embodiment wherein the alternatively-spliced exon is positioned between two separate exons encoding portions of the coding region of interest. Said Ill another way, in this embodiment, the alternatively-spliced exon is between the exons encoding the portions of the coding region of interest.
  • polynucleotides encode an RNA of interest corresponding to any gene described herein (see, e.g., Examples 1-10).
  • the RNA of interest becomes functional as a result of alternative splicing.
  • alternative splicing is induced by binding of a ligand (e.g., a small molecule).
  • the presence of a ligand results in exclusion of an alternatively spliced sequence (e.g., an intron, exon, aptamer, etc.) from the RNA encoded by the polynucleotide which enables the RNA of interest encoded therein to comprise a continuous, n on-interrupted sequence, adopt a functional three-dimensional structure (e.g., such as that of a microRNA), and/or be translated into a protein (e.g., a therapeutic protein).
  • an alternatively spliced sequence e.g., an intron, exon, aptamer, etc.
  • the presence of ligand results in inclusion of an alternatively spliced sequence (e.g., one or more exons) in the RNA encoded by the polynucleotide which enables the RNA of interest encoded therein to comprise a sequence encoding an RNA of interest, adopt a functional three dimensional structure (e.g., such as that of a microRNA), and/or be translated into a protein (e.g., a therapeutic protein).
  • an alternatively spliced sequence e.g., one or more exons
  • ligand-depending inclusion or exclusion of an alternatively spliced sequence e.g., one or more exons
  • an alternatively spliced sequence e.g., one or more exons
  • the RNA of interest encoded therein to be differentially expressed (e.g., to be translated into a long or short protein isoform, respectively).
  • FIG. 4D shows a non-limiting embodiment, of an approach that puts a gene sequence under control of a ligand-responsive aptamer.
  • a naturally occurring gene can be engineered to become under the control of a ligand by inserting the cassette into the gene. The portions upstream and downstream of the site at which the cassette is inserted then become separate exons.
  • the cassette is inserted without making any changes to the sequence flanking the insertion site.
  • one or more nucleotide sequence changes are made in one or both flanking regions (e.g., at the positions immediately flanking the site of insertion).
  • the one or more nucleotide changes render either or both flanking sequences more compatible with splicing.
  • the one or more nucleotide changes result in either or both flanking sequences becoming effective 3’ and/or 5' splice sites.
  • the one or more nucleotide changes include introducing one or more sequences that support an effective dynamic range between alternative splicing events of a ligand-induced alternatively spliced exon described in this application.
  • the one or more nucleotide changes include introducing one or more flanking sequence described in this application.
  • FIG. 4E shows a non-limiting embodiment of a transgene comprising an alternatively-spliced cassette.
  • the expression cassette comprises a general structure comprising at least one alternative exon, at least two introns flanking the alternative exon, a ligand-response aptamer, and a plurality of splice sites.
  • one exon is positioned 5’ to the cassette sequence and one exon is positioned 3’ to the cassette sequence thereby flanking the intervening at least two introns, alternative exon, ligand-responsive aptamer, and plurality of splice sites.
  • at least two exons flanking the cassette are always present in the RNA molecule transcribed from the transgene regardless of the presence of the ligand or splicing reaction outcomes.
  • the alternative exon comprises the ligand-responsive aptamer wherein the ligand-responsive aptamer regulates the splicing (i.e., removal) of the alternative exon.
  • the alternative exon when the ligand which binds to the aptamer is absent, the alternative exon is present in the spliced RNA molecule transcribed from the transgene.
  • the presence of a ligand which binds to the aptamer results in removal of the alternative exon such that the spliced RNA molecule comprises only the at least two exons and lacks the alternative exon, the two introns, and the ligand- responsive aptamer.
  • the most 5’ intron is downstream (3’) of the most upstream exon and the 3 ’ most intron is upstream (5’) of the most downstream exon such that the exons exist at the 5’ and 3’ termini of the cassete sequences which include the introns, alternative exon, and the ligand-responsive aptamer.
  • the boundaries of an exon-intron sequence comprise splice sites that regulate the splicing of the cassette.
  • the splicing of the introns occurs regardless of the presence of ligand such that the spliced RNA molecule comprising the cassette sequence lacks the at least tw'O introns.
  • the ligand-responsive aptamer may be located in either one of the introns, in the alternative exon, or may span an intron-exon boundary occurring between the alternative exon and one of the introns.
  • a ligand-responsive aptamer may be included in one of the flanking exon sequences provided that it is configured such that binding of the ligand affects the splicing of the alternatively spliced exon and the ligand-responsive aptamer. In the embodiment illustrated in FIG.
  • the splice sites are provided in multiples of two such that two splice sites (a 5’ site and a 3 : site) are always required to regulate the splicing of a sequence.
  • the 3’ splice site that is 5’ of the alternative exon comprises intronic sequences.
  • the 5’ splice site that is 3’ of the alternative exon comprises both intronic and exonic sequences such that when the alternative exon is included in the RNA molecule it will comprise a partial sequence that is part of the original 5’ splice site.
  • Non-limiting examples of embodiments illustrating this configuration can be found in Example 7 and SEQ ID NO: 2081 .
  • FIG. 4F shows a non-limiting embodiment of a transgene comprising a non-continuous start codon split by the alternatively spliced cassette.
  • the exons comprise a non-continuous start codon such that the 3’ most nucleotides of the upstream exon comprise an A or AT and the 5’ most nucleotides of the downstream exon comprise a TG or G, respectively.
  • the absence of a ligand results in splicing reactions that includes the alternative exon and thereby produces an RNA molecule that contains a non-continuous start codon that is disrupted by the alternative exon and is not translated into the full-length protein product.
  • the presence of a ligand results in splicing reactions that removes the alternative exon and thereby produces an RNA molecule that comprises a continuous start codon provided by the nucleotides of the first and last exon resulting in translation of the full- length protein product of the transgene.
  • SEQ ID NO: 2131 represents a non-limiting example of a control construct that can be used to assess the inducibility of alternative splicing of a transgene comprising a non-continuous start codon.
  • the transgene lacks the aptamer and alternative exon.
  • SEQ ID NO: 2132 represents another non-limiting example of a control construct that can be used to assess the inducibility of alternative splicing of a transgene comprising a non-continuous start codon.
  • the alternative exon comprising an aptamer disrupts the start codon thereby preventing translation of the transgene.
  • FIG. 4G shows a non-limiting embodiment of an alternatively spliced exon cassette comprising a pre-mature stop codon that is inserted between two consecutive coding sequences of a gene (e.g., two exons of a gene).
  • the exons flanking the cassette are not translated in the absence of ligand due to the presence of a pre-mature stop codon in the alternative exon (e.g., in frame with the reading frame of the upstream exon).
  • the presence of the stop codon in the alternative exon results in pre-mature termination of translation of the transgene when the alternative exon is not spliced out of the RNA molecule.
  • the presence of a ligand induces splicing upon binding to the aptamer such that the alternative exon comprising the pre-mature stop codon is removed thereby allowing translation to produce the full-length protein product encoded by the transgene.
  • Non limiting examples of embodiments illustrating this configuration can be found in Example 7 and SEQ ID NOs: 2091, 2099, 2102, 2105, 2108, 2109-2112, 2116, 2118, 2120, 2123, and 2128.
  • the pre-mature stop codon can be UAA, UAG, or UGA provided that it is in frame with the reading frame of the first exon.
  • the stop codon may be provided within the aptamer sequence if the aptamer is provided in the alternative exon.
  • the stop codon may be upstream or downstream of the aptamer and provided in the alternative exon.
  • FIG. 4H shows a non-limiting embodiment of an alternatively spliced exon cassette that is inserted in a coding sequence for a regulatory RNA molecule.
  • the at least two exons encode an interfering RNA, such as a miRNA, such that removal of the alternative exon produces a functional miRNA molecule that is capable of regulating gene expression.
  • an interfering RNA such as a miRNA
  • the aptamer may be provided in an intron sequence, the alternative exon sequence, or may span the alternative exon and a flanking intron.
  • the sequences encoding the regulatory RNA may comprise a pri-miRNA scaffold and/or miRNA seed sequence.
  • FIG. 41 shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence.
  • an intron splits tw'O exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are brought together to form an RNA that encodes the protein of interest.
  • FIG. 4J shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence.
  • an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are disrupted and the RNA cannot encode the protein of interest.
  • FIG. 4K shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isofomis.
  • the alternative exon is flanked by introns.
  • Ligand binding results in exclusion of the alternative exon in the spliced RNA thereby encoding the shorter isoform of the protein.
  • the absence of the ligand results in inclusion of the alternative exon from the spliced RNA which encodes the longer isoform of the protein.
  • FIG. 4L shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms.
  • the alternative is flanked by introns.
  • Ligand binding results in inclusion of the alternative exon in the spliced RNA thereby encoding the longer i soform of the protein.
  • the absence of the li gand results in exclusion of the alternative exon from the spliced RNA which encodes the shorter isoform of the protein.
  • FIG. 4M shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA.
  • the alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to production of the protein corresponding to the RNA.
  • said nucleic acid is useful in providing an inducible ON switch for regulating synthesis of a protein of interest.
  • FIG. 4N shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA.
  • the alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the absence of the ligand leads to production of the protein corresponding to the RNA.
  • said nucleic acid is useful in providing an inducible OFF switch for regulating synthesis of a protein of interest.
  • FIG. 40 show's a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to an RNA which cannot be translated into a protein.
  • said nucleic acid is useful in providing an inducible ON switch for regulating synthesis of a protein of interest.
  • FIG. 4P shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Exclusion of the alternative exon in the presence of the ligand leads to an RNA which can be translated into a protein.
  • said nucleic acid is useful in providing an inducible OFF switch for regulating synthesis of a protein of interest.
  • the alternatively spliced exon and the introns flanking the alternatively spliced exon may include all of the sequences that are useful for splicing.
  • one or more nucleotide changes are also made in one or both flanking exon (e.g., upstream and/or downstream exon) sequences to further support splicing.
  • one or more nucleic acids described herein provide a high level of differential splicing between the presence of ligand and the absence of ligand.
  • the dynamic range e.g., the level of expression of a gene or protein of interest under the control of an alternatively-spliced exon of the present, disclosure in the presence of ligand relative to the absence of ligand
  • the dynamic range can be greater than 5 fold, greater than 10 fold, greater than 25 fold, greater than 50 fold, greater than 100 fold, 100-250 fold, 250-500 fold, 500-1,000 fold, or more.
  • FIGs. 4E-4P illustrate non-limiting embodiments that refer to Exon 1 and Exon 2 or Exon 1, Exon 2, and Exon 3 as examples. However, the same configuration can be used for other exons of a gene, and in some embodiments Exon 1 and Exon 2 and/or Exon 3 in FIGs. 4E-4P could represent other upstream and downstream exons that are not necessarily the first and second exons of a gene.
  • FIG. 4Q show's a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript.
  • said nucleic acid is useful in providing an inducible OFF switch for regulating a target transcript.
  • FIG. 4R shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript.
  • said nucleic acid is useful in providing an inducible ON switch for regulating a target transcript.
  • FIG. 4S shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript.
  • said nucleic acid is useful in providing an inducible ON switch for regulating a target transcript.
  • FIG. 4T shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand disrupts microRN A structure thereby inhibiting its ability to reduce expression of a target transcript.
  • said nucleic acid is useful in providing an inducible OFF switch for regulating a target transcript.
  • a ligand may include a variety of molecules including both synthetic and naturally- occurring chemical species.
  • Ligands may include, but are not necessarily limited to, small molecule drugs (e.g., risdiplam, branaplam, etc.), peptides, nucleic acids or modified nucleic acids (e.g., ASOs, such as exon-skipping ASOs), lipids, carbohydrates, and metabolites present in the cell.
  • ligands that bind to aptamers of the present disclosure include tetracycline, theophylline, glycine, adenine, guanine and cyclic GMP (cGMP).
  • Aptamers are single-stranded nucleic acids that bind to ligands based on their specific three dimensional shape and chemical affinity. Aptamers may comprise non-modified or modified nucleotides or combinations thereof.
  • aptamers Upon binding to a ligand, aptamers undergo conformational changes that may change their functional properties and, by extension, the functional properties of the molecules they are provided in.
  • Non-limiting examples of aptamers include theophylline-binding aptamer and natural aptamers (riboswitches) that bind to adenine, glycine, and guanine.
  • a ligand is tetracycline.
  • the RNA capable of binding a ligand comprises a tetracycline-responsive sequence.
  • a tetracycline-responsive sequence comprises an aptamer.
  • a tetracyclineresponsive sequence comprises a sequence in YZ150 or a variant thereof (see, e.g., Example 10).
  • a tetracycline-responsive sequence comprises a sequence described herein (e.g., an aptamer comprising the sequence of SEQ ID NOs: 2086, 2095, 2112 or 2188; see. Example 7 and Example 10 for further details).
  • a ligand is risdiplam.
  • risdiplam promotes interaction between a pre-mRNA corresponding to the polynucleotide and U1 spliceosome at 5’ splice site.
  • risdiplam interacts with risdiplam -responsive sequences in exons to preclude interaction with splicing silencers.
  • risdiplam interacts with risdiplam-responsive sequences in exons to recruit splicing enhancers.
  • a risdiplam-responsive sequence, or a portion thereof, is present in a 5’ splice site.
  • a risdiplam-responsive sequence is present in a exon-intron boundary.
  • the RNA capable of binding a ligand comprises a risdiplam-responsive sequence.
  • binding of risdiplam to a risdiplam- responsive sequence will lead to intron exclusion.
  • the presence of risdiplam results in intron removal. A non-limiting example of such embodiments is diagrammed in FIG. 43.
  • binding of risdiplam to a risdiplam-responsive sequence will lead to alternative exon inclusion.
  • the RNA capable of binding risdiplam comprises two introns flanking one or more alternative exons
  • the presence of risdiplam results in inclusion of the one or more alternative exons.
  • FIG. 47A A non-limiting example of such embodiments is diagrammed in FIG. 47A.
  • a ligand is branaplam.
  • the RNA capable of binding a ligand comprises a branaplam-responsive sequence.
  • a branaplam-responsive sequence comprises a sequence in YZ231 or YZ232 (see, e.g., Example 10).
  • a branaplam-responsive sequence comprises a sequence in YZ301 (see, e.g., Example 10).
  • the RNA capable of binding a ligand comprises one intron flanked by exons, the presence of branaplam results in intron removal.
  • RNA capable of binding branaplam comprises two introns flanking one or more alternative exons
  • the presence of branaplam results in inclusion of the one or more alternative exons.
  • a polynucleotide (e.g., a transgene) comprises at least 70%, sequence identity relative to at least one of the nucleic acid sequences as set forth in SEQ ID NOs: 2080-2082, 2084, 2086, 2088-2089, 2091-2097, 2099-2121 , 2123, 2127-2132, 2135, 2137- 2138, 2141-2143, or 2183-2260.
  • a transgene comprising an alternatively- spliced exon cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 45-55, 2236, or 2247-2256.
  • a transgene comprising an alternatively- spliced exon cassette comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 45-55, 2236, or 2247-2256.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a skipped exon model of alternative splicing (see, e.g., FIGs. 5 A, 6B, and 7A).
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3 ’ end a heterologous ATG start codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’
  • the coding region of interest comprises at its 5’ end a modification comprising the removal of a nati ve ATG start codon (k), and wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (f) are mutated or deleted.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3' end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3 : orientation (j), wherein the exonic sequence comprises a constitutive exon.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (e), wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ to 3’ direction:
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative m-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative as-acting element.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a retained intron model of alternative splicing (see, e.g., FIGs. 5B, 6C, and 7B).
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a second exonic sequence having a 5’ to 3 : orientation (b), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (c); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (d), wherein the coding region of interest comprises at its 5' end a modification comprising the removal of a nati ve ATG start codon (e), and wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element (c); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (d), wherein the second exonic sequence comprises a constitutive exon.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (b), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (c); a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (d); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (e), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (f) and at its 3’ end a 3’ splice acceptor site (g); and a nucle
  • retention of the alternative exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively - spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative c/x-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c/x-acting element.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative 5’ donor site model of alternative splicing (see, e.g., FIGs. 5C, 6D, and 7C).
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon, a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (b), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3 ’ splice acceptor site (f); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (b), wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative czs-acting element (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3’ splice acceptor site (f); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3 : orientation (g), wherein the exonic sequence comprises a constitutive exon.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a transgene having a 5’ to 3’ orientation (a); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (b), wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3 : orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3' splice acceptor site (f); and a nucleotide sequence comprising a second portion of a transgene having a 5’ to 3’ orientation (g).
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative m-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative Gx-acting element.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative 3’ donor site model of alternative splicing (see, e.g., FIGs. 5D, 6E, and 7D).
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (b), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3 : orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (f); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (b), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (e), wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative czs-acting element (f); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ spiice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (g); a nucleotide sequence comprising a nucleotide sequence comprising
  • the second intronic sequence comprises at its 5’ end a 5’ splice donor site (i) and at its 3’ end a 3’ splice acceptor site (j); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (k), wherein the second exonic sequence comprises a constitutive exon.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expressi on of the coding regi on of interest, wherein expression of the coding region of interest is regulated by a positive or negative cG-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c-A-acting element.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a mutually exclusive exon model of alternative splicing (see, e.g., FIGs. 5E, 6F, and 7E).
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3 : orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’
  • the third exonic sequence comprises an alternatively-spliced exon; a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
  • the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (n), wherein the coding region of interest comprises at its 5' end a modification comprising the removal of a nati ve ATG start, codon (o). wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (f) are mutated or deleted.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3' end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises a first alternatively-spliced exon comprising a positive or negative cA-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3 : orientation
  • the second exonic sequence comprises a second alternatively-spliced exon, a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
  • the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation (n), wherein the third exonic sequence comprises a constitutive exon.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first, portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence
  • the second exonic sequence comprises an alternatively-spliced exon; a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
  • the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (n).
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative czs-acting element. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c/.v-acting element.
  • the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative last exon model of alternative splicing (see, e.g, FIGs. 6A, 6G, and 7F). Referencing the components as labeled in FIG.
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3 : orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the second intronic sequence comprises at its 5’ end a 5’ splice donor site (g) and at its 3’ end a 3’ splice acceptor site (h); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
  • the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative chs-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation
  • the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
  • the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a transgene having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d), a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its
  • 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (n).
  • retention of the alternatively-spliced exon in the spiiced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay.
  • retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative cvs-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative czs-acting element.
  • a nucleic acid vector (e.g, a viral vector) of the present invention comprises a transgene comprising at least one alternatively-spliced exon cassette as described herein. Nucleic acid vectors or transgenes may have one alternatively-spliced exon cassette, or multiple such cassettes. In some embodiments, a nucleic acid vector or transgene comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15, or more alternatively-spliced exon cassettes.
  • transgene comprising an alternatively-spliced exon cassette may, in some embodiments, comprise any one or more of the following components: an alternatively-spliced exon, an intron (e.g., a flanking intron), an exon comprising a coding region of interest, and/or a constitutive exon.
  • transgene comprising an alternatively-spliced exon cassette comprises an alternatively-spliced exon, a flanking intron, and an exon comprising a coding region of interest (wherein, in some embodiments, the coding region of interest may be split into portions across two or more exons).
  • a nucleic acid vector or transgene comprises an alternatively- spliced exon cassette, wherein the alternatively-spliced exon cassette comprises among other components at least one alternatively-spliced exon.
  • the alternatively- spliced exon cassette comprises 1, 2, 3, or 4 alternatively-spliced exons.
  • the alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, I I , 12,
  • the alternatively-spliced exon is synthetic or recombinant. In some embodiments, the alternatively-spliced exon is considered to be synthetic or recombinant because it undergoes one or more nucleic acid modifications, relative to the wild-type alternatively-spliced exon.
  • a nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the alternatively-spliced exon.
  • an alternative exon comprises an ATG start codon at its 3’ end.
  • the “3’ end” comprises the 1, 2, or 3 nucleic acids lying at the 3’ end of the alternative exon.
  • a wild-type or naturally occurring alternative exon may comprise an ATG start codon at its 3’ end.
  • the alternative exon may comprise nucleic acid modifications unrelated to the insertion of a heterologous start codon at the 3’ end of the alternative exon.
  • a wild-type or naturally occurring alternative exon may not comprise an ATG start, codon at its 3’ end.
  • modifications are made to the 3’ end of the alternative exon to introduce a heterologous start codon, such that when the alternative exon is spliced-in or retained in the spliced transcript, the downstream coding sequence is translated as a full-length protein.
  • 1, 2, or 3 nucleic acid substitutions may be necessary in order to introduce the heterologous ATG start codon to the 3’ end of the alternative exon, depending on the sequence which is present at the 3’ end of the wild-type or naturally occurring alternative exon.
  • the 3’ end of the alternatively-spliced exon comprises 1 nucleotide substitution, relative to the wild-type alternatively-spliced exon, to form the ATG start codon.
  • the 3’ end of the alternatively-spliced exon comprises 2 nucleotide substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon.
  • the 3’ end of the alternatively-spliced exon comprises 3 nucleotide substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon.
  • the modification comprises the insertion of a heterologous start codon or part of a heterologous start codon at the 3' end of the alternatively-spliced exon (e.g., 1- 3 nucleic acids are added to the 3’ end of the alternatively -spliced exon, rather than substituted, to form an ATG start codon).
  • an alternative exon comprises part of an ATG start, codon at its 3’ end.
  • an alternative exon may comprise, for example, “A” as the last nucleic acid, or “AT” as the last two nucleic acids, which formulate the 3’ end of the alternative exon.
  • the remainder of the ATG start codon may lie at the 5’ end of an exon lying immediately downstream of the alternative exon.
  • the alternative exon may comprise “A” as the last nucleic acid which formulates the 3’ end of the alternative exon, and the exon lying immediately downstream of the alternative exon may comprise “TG” as the first two nucleic acids which formulate the 5’ end of the downstream exon.
  • the alternative exon may comprise “AT” as the last two nucleic acids which formulate the 3’ end of the alternative exon
  • the exon lying immediately downstream of the alternative exon may comprise “G” as the first nucleic acid which formulates the 5’ end of the downstream exon.
  • the ATG formed as a result of the splicing together of the alternative exon and the exon lying immediately downstream of the al ternative exon initiates translation of the exon lying immediately downstream of the alternative exon.
  • the exon lying immediately downstream of the alternative exon may be, for example, the coding region of the transgene (e.g., an MTM1 coding region).
  • an alternative exon comprises an ATG start codon, or part of an ATG start codon, within the nucleic acid sequence of the alternative exon (e.g., not at the 3’ end of the alternative exon).
  • the ATG start codon is in the same reading frame as the coding region of interest.
  • the ATG start codon is within up to 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon.
  • the ATG start codon is within 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 10-12, 13-15, 14-16, 15-17, 16-18, 17- 19, 18-20, 19-21, 20-22, 21-23, 22-24, 23-25, 24-26, 25-27, 26-28, 27-29, or 28-30 nucleotides upstream of the 3’ end of the alternative-spliced exon.
  • the ATG start codon is within 4-12, 8-16, 12-20, 16-24, or 20-30 nucleotides upstream of the 3’ end of the alternative-spliced exon.
  • the ATG start codon is within up to 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides upstream of the 3' end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
  • the ATG start codon is within 4-6, 5-7, 6-8, 7- 9, 8-10, 9-11, 10-12, 13-15, 14-16, 15-17, 16-18, 17-19, 18-20, 19-21, 20-22, 21-23, 22-24, 23- 25, 24-26, 25-27, 26-28, 27-29, or 28-30 nucleotides upstream of the 3’ end of the alternative- spliced exon and is in the same reading frame as the coding region of interest.
  • the ATG start, codon is within 4-12, 8-16, 12-20, 16-24, or 20-30 nucleotides upstream of the 3’ end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
  • the alternative exon comprises 1, 2, or 3 nucleic acid substitutions at the 3' end to result in a heterologous ATG start codon (e.g, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end)
  • the strength of the 5’ splice site of the alternative exon may be diminished, relative to the strength of the 5’ splice site strength of the wild-type or naturally occurring alternative exon.
  • the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 1-5 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively- spliced exon comprise 1 nucleotide substitution, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon.
  • the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 2 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 3 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wildtype alternative exon.
  • the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 4 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 5 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the 1-5 nucleotide substitutions restore or partially restore the strength of the 5’ splice site of the alternative exon, relative to the strength of the 5’ splice site of the naturally occurring or wild-type alternative exon.
  • the modification comprises disrupting or deleting all native start codons located 5' to the heterologous start codon.
  • the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, all native start codons located 5' to the heterologous start codon of the 5‘-most alternatively-spliced exon are disrupted or deleted.
  • the modification comprises introducing into the alternatively-spliced exon a heterologous, in-frame stop codon at least 50 nucleotides upstream of the next 5' splice junction.
  • the alternatively-spliced exon is a nonsense-mediated decay (NMD) exon.
  • NMD nonsense-mediated decay
  • the NMD exon comprises an in-frame stop codon that is at least 50 nucleotides upstream of the next 5’ splice junction.
  • the alternatively-spliced exon is considered to be synthetic when it is situated non-naturally ( ⁇ ?.g, is linked to a coding sequence to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type alternatively-spliced exon (e.g., is heterologous).
  • the alternatively-spliced exon is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non- naturally, relative to the wild-type alternatively-spliced exon.
  • the alternatively -spliced exon is a. regulatory exon.
  • the regulatory exon is an alternatively regulated exon (e.g., an exon known to be subject to alternative splicing mechanisms).
  • alternative splicing is a process by which exons or portions of exons or noncoding regions within a pre-mRNA transcript are differentially joined or skipped, resulting in multiple protein isoforms being encoded by a single gene.
  • the regulation of alternative splicing is complex. Briefly, alternative splicing is known to be regulated by the functional coupling between transcription and splicing.
  • compositions and methods of the present disclosure utilize the naturally- occurring mechanisms which regulate alternative splicing to express coding regions of interest (e.g., what would be alternatively spliced isoforms in the natural context) in specific biological conditions.
  • additional genetic elements may be incorporated into the DNA.
  • such additional genetic elements may become incorporated into the corresponding pre-mRNA, and may consequently influence, control, or otherwise regulate the splicing of the pre-mRNA to form one or more mRNA isoforms.
  • an alternatively-spliced exon — for which splicing may be regulated — is an exon for which splicing levels differ by at least 5%, for example at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% under two different conditions (e.g., in different tissues, in response to intracellular T cell levels, in response to intracellular levels of one or more RNA binding proteins, in the context of an autoregulated gene, etc).
  • splicing levels differ by 5% it is meant that the splicing levels for an exon of interest are measured in two different conditions, and the splicing level is compared between the conditions and expressed as a percentage change. For example, if the splicing level in condition A is 80%, and the splicing level in condition B is 85%, the splicing levels between conditions A and B differ by 5%. Likewise, if the splicing level in condition A is 80%, and the splicing level in condition B is 75%, the splicing levels between conditions A and B also differ by 5%.
  • the step of calculating a difference in expression of certain isoforms of certain genes in certain conditions as described herein is performed by calculating a percent spliced-in (psi) score.
  • a psi (T) score is a value between 0 to 1 (e.g., 0.01, 0.02, 0.03,
  • the score is calculated (e.g, calculated from RNAseq reads) by dividing the number of inclusion reads (e.g., the number of alternative splicing events for a gene of interest) by the total number of inclusion reads and exclusion reads (e.g, the number of normal (e.g, non-altemative) splicing events for the gene of interest). Therefore, in some embodiments the T score is calculated according to the following formula for the gene of interest:
  • the calculating comprises performing a mixture of isoforms (MISO) analysis.
  • MISO analysis provides an estimate of isoform expression levels within a sample (e.g, a sample comprising a tissue of interest) based on a statistical model and assesses confidence in those estimates.
  • MISO analysis is performed using MISO software (see, e.g., Katz, Y., E. T. Wang, et al. (2010), Analysis and design of RNA sequencing experiments for identifying isoform regulation, Nat Methods 7(12): 1009-1015).
  • a T score higher than (>) 0.50 for example 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71,
  • a T score lower than ( ⁇ ) 0.50 indicates that a lower number of alternative splicing events for the gene of interest are present in the tested sample than the number of regular splicing events.
  • delta psi (AT) score is used to refer to the calculation of the difference between two T scores for a single gene of interest (e.g., in different tissues, in different intracellular conditions, etc.).
  • the difference between the two calculated T scores is the AT score.
  • a T score may be any value between 0 and I, as described herein, a AT score (that is, the difference between the two calculated T scores) may also be any value between 0 and I (e.g., 0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26,
  • a AT score may be expressed as an absolute value where the absolute value of e.g, -0.1 is 0.1.
  • the alternatively-spliced exon is a tissue-specific alternatively- spliced exon.
  • one or more tissue-specific alternatively-spliced exons are included in a recombinant nucleic acid (e.g., in a rAAV).
  • tissuespecific alternatively-spliced exons are described in Supplemental Table S5 from Wang, E. T., et al., (2008), Nature, 456, 470-76, incorporated herein by reference.
  • Other tissue-specific exons can be identified from transcriptome data.
  • RNA sequence motifs that can exhibit tissue-specific activity, thereby controlling the inclusion or exclusion of tissue- specific exons, are described in Badr, E., et al., (2016), PLOS One, 1 1 (11): e0166978, incorporated herein by reference.
  • alternative splicing of the tissuespecific exon results in the expression of the transgene (e.g., of the product encoded by the coding region of interest) in heart tissue, but not in skeletal tissue.
  • alternative splicing of the tissue-specific exon results in the expression of the transgene (e.g, of the product encoded by the coding region of interest) in skeletal tissue, but not in heart tissue.
  • a tissue-specific alternatively-spliced exon comprises an alternatively- spliced exon from any one or more of: CAMK.2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
  • the tissue-specific alternatively-spliced exon is or is derived from exon 11 of BINI.
  • the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37.
  • the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
  • an alternatively-spliced exon is an immunoresponsive alternatively-spliced exon (e.g., undergoes alternative splicing in the presence of an enhanced immune response, such as an increased T cell presence).
  • the immunoresponsive alternatively-spliced exon is alternatively spliced in states of cellular inflammation.
  • the immunoresponsive alternatively-spliced exon is alternatively spliced when an abnormally elevated quantity of T cells is present in the intracellular environment (e.g., more T cells are present than under homeostatic conditions).
  • an immunorepressive alternatively-spliced exon comprises an alternatively- spliced exon from any one of ABCC1, AK125149, ASCC2, BAT2DI, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, E1F4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRFI, KCTD13, LEF1,
  • an alternatively-spliced exon is a cell type-specific alternatively- spliced exon (e.g., undergoes alternative splicing only when located in certain cell types).
  • a cell type-specific alternatively -spliced exon comprises an alternatively- spliced exon as described in Joglekar, etal. (2021), A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nature Comm., 12(463), which is incorporated herein by reference with respect to its description of cell type-specific alternative exons.
  • an alternatively-spliced exon is alternatively spliced in cells which exhibit high levels of expression of a particular RNA or protein. In some embodiments, an alternatively-spliced exon is alternatively spliced in cells which exhibit low levels of expression of a particular RNA or protein. High or low expression of a particular protein may in some embodiments be indicative of a disease state. For example, in some forms of frontotemporal dementia, MAPT exon 10 is aberrantly included, leading to increased levels of the 4R vs. 3R isoform. Increased 4R isoform is associated with neurodegeneration.
  • an alternatively-spliced exon is alternatively spliced in cells which exhibit disease ⁇ e.g., severe disease).
  • disease comprises Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebell
  • an alternatively-spliced exon comprises an exon which may be differentially spliced depending on the intracellul ar level of the RNA or protein encoded by the coding region associated with the alternatively-spliced exon.
  • an alternatively-spliced exon comprises an alternatively-spliced exon comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 23-44.
  • an alternatively-spliced exon comprises a polynucleotide sequence that is 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 23-44.
  • the alternatively-spliced exon is retained in the spliced transcript.
  • Retention of the alternatively-spliced exon in the spliced transcript occurs under the alternative splicing conditions specific to said alternatively-spliced exon as described herein.
  • the alternatively-spliced exon cassette comprises more than one alternatively -spliced exon
  • the 5'-most alternatively-spliced exon is retained in the spliced transcript.
  • the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon
  • the 3'-most alternatively-spliced exon is included in the spliced transcript.
  • all alternatively-spliced exons are included in the spliced transcript.
  • retention of the alternatively-spliced exon in the spliced transcript results in the productive expression of the transgene (e.g., productive translation of the protein).
  • Expression of the product (e.g., therapeutic protein) encoded by the coding region of interest may in some embodiments be desirable.
  • expression of myotubularin 1 is depleted in skeletal muscle, and therefore restoration of myotubularin 1 in skeletal muscle is desirable.
  • expression of the product (e.g., therapeutic protein) encoded by the coding region of interest may be undesirable.
  • in myotubular myopathy expression of myotubularin 1 in the heart may be undesirable.
  • retention of the alternatively-spliced exon in the spliced transcript does not result in the productive expression of the transgene (e.g., no transcription of the RNA and/or no productive translation of the protein).
  • the alternatively-spliced exon is located 5' to the coding region of the transgene. In some embodiments, the alternatively-spliced exon is located 3' to the coding region of the transgene. In some embodiments, the alternatively-spliced exon is located within the coding region of the transgene. In some embodiments, the alternatively-spliced exon is not located within the coding region of the transgene. In some embodiments, the alternatively- spliced exon is located 3' to a constitutive exon. In some embodiments, the alternatively-spliced exon is located 5' to a constitutive exon. (ii) Constitutive exons
  • the recombinant viral genomes of the present disclosure comprise one or more constitutive exons.
  • the alternatively-spliced exon and the one or more constitutive exons may be configured as a cassette (e.g, comprised within a transgene.
  • the transgene comprising an alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 constitutive exons.
  • one or more constitutive exons may comprise a coding region of interest, or a portion thereof.
  • the constitutive exon is considered to be constitutive when it is present in all isoforms of spliced mRNAs resulting from the splicing of a pre-mRNA transcript.
  • a constitutive exon may in some embodiments be synthetic, but it need not be.
  • a constitutive exon may be considered synthetic because it undergoes one or more nucleic acid modifications, relative to the wild-type constitutive exon.
  • a nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the constitutive exon.
  • the modification comprises disrupting or deleting all native start codons located within the constitutive exon.
  • the constitutive exon is considered to be synthetic when it is situated non-naturally (e.g, is linked to a coding sequence to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type constitutive exon (e.g., is heterologous).
  • the constitutive exon is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non-naturally, relative to the wild-type constitutive exon.
  • the constitutive exon is naturally occurring (e.g., does not comprise any nucleic acid modifications, relative to the wild-type constitutive exon).
  • the constitutive exon is a native exon associated with the coding region of the transgene.
  • the constitutive exon is from or is derived from the same gene as the alternatively-spliced exon.
  • the constitutive exon is from or is derived from a constitutive exon of a gene selected from the group consisting of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1 , hnRNP C, hnRNP D, hnRNP l)L, hnRNP F, hnRNP H, hnRNP K, hnRNP L hnRNP M, hnRNP R, hnRNP U, FI S, TDP43, PABPX 1, ATXN2, TAF15, EWSR1, MATR3, TIA1, 1 MRP.
  • a gene selected from the group consisting of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1 , hnRNP C, hnRNP D, hnRNP l)L, hnRNP F, hnRNP H, hnRNP K
  • TRIA132 FKRP, FXN, POAIT1, FKTN, POAIT2, POMGnTI, DAG1, AN05, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TORLAIPl, POGLUT2, LAMA2, COL6A1, POMT1, P0MT2, DUX4, EMD, PAX7, PMP22, MPZ, MFN2, SMCHD1, SAIN, Lamin A/C (LANIN), and/or GJB1.
  • the constitutive exon is from or is derived from a constitutive exon of a gene(s) selected from the group consisting of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GAT A3, GOLGA2, HIF1A, HMAIR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, L
  • the constitutive exon is from or is derived from a constitutive exon of a gene(s) selected from the group consisting of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM.
  • the constitutive exon is from or is derived from a constitutive exon of SMN1 .
  • the constitutive exon is from or is derived from exon 6 of SAINI.
  • the constitutive exon which is derived from SAINI exon 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SAINI exon 6.
  • the constitutive exon which is derived from SAINI exon 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102.
  • the constitutive exon which is derived from SMN1 exon 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102.
  • the constitutive exon is not a native exon associated with the coding region of the transgene. In some embodiments, the constitutive exon is not from nor is derived from the same gene as the alternatively-spliced exon.
  • a constitutive exon is located 5' to the alternatively-spliced exon. Additionally or alternatively, in some embodiments a constitutive exon is located 3' to the alternatively-spliced exon. In some embodiments, a constitutive exon is located 5' to the coding region of the transgene. Additionally or alternatively, in some embodiments a constitutive exon is located 3' to coding region of the transgene.
  • the constitutive exon is retained in the spliced transcript (e.g., spliced in). In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, the 5'-most constitutive exon is retained in the spliced transcript. In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, the 3'-most constitutive exon is retained in the spliced transcript. In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, all constitutive exons are retained in the spliced transcript. In some embodiments, the constitutive exon is excluded from the spliced transcript (e.g., spliced out).
  • the recombinant viral genomes of the present disclosure comprise one or more introns.
  • the alternatively -spliced exon and the one or more introns (or portions thereof) may be configured as a cassette.
  • a nucleic acid e.g., a nucleic acid comprising a recombinant viral genome
  • an alternatively -spliced exon cassette is an R.NA molecule (e.g., a pre-mRNA) that contains one or more (e.g., two or more) recombinant (e.g.. engineered; e.g., truncated) introns flanking one or more exons.
  • an alternatively-spliced exon cassette is a DNA molecule that encodes the RNA molecule containing one or more recombinant ⁇ e.g., engineered; e.g., truncated) introns.
  • a transgene comprising an alternatively-spliced exon cassette contains other regulatory sequences ⁇ e.g., promoters, 5’ or 3 UTRs, or other regulatory sequences) in addition to the gene coding (e.g, protein coding) sequences and the at least one recombinant ⁇ e.g., engineered, e.g., truncated) intron for which splicing can be regulated, as described elsewhere herein.
  • a recombinant viral genome of the present disclosure comprises a transgene comprising an alternatively-spliced exon cassette, wherein the alternatively-spliced exon cassette comprises among other components at least one intron (or portion thereof).
  • the intron is a flanking intron (or portion thereof).
  • the alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, or 8 flanking introns (or portion(s) thereof).
  • an exon ⁇ e.g., an alternatively-spliced exon, or a constitutive exon
  • is flanked by one or more introns e.g., flanking introns
  • an alternatively-spliced exon is flanked by one or more introns (or portion(s) thereof).
  • an alternatively-spliced exon is flanked by one intron (or portion thereof).
  • the flanking intron (or portion thereof) is located 3' to the alternatively-spliced exon.
  • the flanking intron (or portion thereof) is located 5' to the alternatively-spliced exon.
  • an alternatively-spliced exon is flanked by two introns (or portions thereof).
  • the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon
  • each alternatively-spliced exon is flanked by at least one, and in some embodiments two, flanking intron(s) (or portion(s) thereof).
  • an intron is a native flanking intron or native flanking intronic sequence of the alternatively-spliced exon.
  • an intron is not a native flanking intron or native flanking intronic sequence of the alternatively-spliced exon.
  • a constitutive exon is flanked by one or more introns (or portion(s) thereof). In some embodiments, a constitutive exon is flanked by one intron (or portion thereof). In some embodiments, wherein the constitutive exon is flanked by one intron, the flanking intron (or portion thereof) is located 3' to the constitutive exon. In some embodiments, wherein the constitutive exon is flanked by one intron, the flanking intron (or portion thereof) is located 5' to the constitutive exon. In some embodiments, a constitutive exon is flanked by two introns (or portions thereof).
  • each constitutive exon is flanked by at least one, and in some embodiments two, flanking intron(s) (or portion(s) thereof).
  • an intron is a native flanking intron or native flanking intronic sequence of the constitutive exon. In some embodiments, an intron is not a native flanking intron or native flanking intronic sequence of the constitutive exon.
  • an intron is a natural intron, and comprises no modifications, relative to a native intron.
  • An intron or intronic sequence may in some embodiments be synthetic, but it need not be.
  • a synthetic intron or intronic sequence may be considered synthetic because it undergoes one or more nucleic acid modifications, relative to the wild-type or native intron.
  • a nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the intron or intronic sequence.
  • an intron or intronic sequence is considered to be synthetic when it is situated non-naturally (e.g., is linked to an exon to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type intron or intronic sequence (e.g., is heterologous).
  • the intron or intronic sequence is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non- naturally, relative to the wild-type intron or intronic sequence.
  • an intron e.g., a flanking intron (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is an engineered intron or intronic sequence.
  • the engineered intron or intronic sequence comprises a splice donor and splice acceptor site, and a functional branch point to which the splice donor site can be joined in the first trans-esterification reaction of splicing.
  • an intron e.g., a flanking intron
  • intronic sequence comprising one or more nucleic acid modifications, relative to the wild-type intron
  • truncated version of a natural intron it is meant that the naturally-occurring, full-length intron is shortened (e.g, truncated) via the removal of nucleotides.
  • an engineered (e.g., recombinant) intron or intronic sequence is a truncated version of a natural intron.
  • an engineered intron or intronic sequence can be designed to include functional splice donor and acceptor sites and a functional branch point in addition to one or more regulatory' regions that are derived from different introns, or that are non-naturally occurring sequences (e.g., sequence variants of naturally-occurring sequences, consensus sequences, or de novo designed sequences).
  • an engineered intron or intronic sequence is not a tamcated version of a naturally occurring intron, but contains one or more sequences from a naturally occurring intron.
  • an intron e.g, a flanking intron (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is truncated at its 5’ end.
  • 1-10,000 nucleotides are tamcated from the 5’ end (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000- 100,000 nucleotides are truncated from the 5’ end).
  • the 5’ splice site is not retained in the truncated intron (or portion thereof).
  • the 5’ splice site is retained in the truncated intron (or portion thereof).
  • a different 5’ splice site is included in the truncated intron (or portion thereof).
  • an intron e.g., a flanking intron (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is tamcated at its 3’ end.
  • 1-10,000 nucleotides are tamcated from the 3’ end (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000- 100,000 nucleotides are truncated from the 3’ end).
  • the 3 : splice site is not retained in the tamcated intron (or portion thereof).
  • the 3’ splice site is retained in the truncated intron (or portion thereof).
  • a different 3’ splice site is included in the truncated intron (or portion thereof!.
  • an intron e.g., a flanking intron (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is tamcated at one or more internal locations.
  • 1-10,000 internal nucleotides are removed (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000-100,000 internal nucleotides are removed).
  • the splice regulatory region is not retained in the truncated intron (or portion thereof).
  • the splice regulatory' region is retained in the truncated intron (or portion thereof).
  • a different splice regulatory region is included in the truncated intron (or portion thereof).
  • an intron e.g, a flanking intron (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, comprises one or more 5’, 3’, and/or internal deletions.
  • the extent of truncation may depend on the size of the intron (or portion thereof) and the size of the gene. A truncation may require removal of sufficient intronic sequence to result in a recombinant gene construct that is small enough to be packaged in a recombinant virus of interest (e.g:, in a recombinant AAV or lenti virus).
  • an intron typically includes one or more sequences required for efficient splicing and/or regulated splicing.
  • an intron or intronic sequence comprises one or more splice junction sites (e.g, a 5’ splice donor site, and/or a 3’ splice acceptor site).
  • an intron or intronic sequence retains a splice donor site (e.g., towards the 5' end of the intron or intronic sequence), a branch site (e.g., towards the 3' end of the intron or intronic sequence), a splice acceptor site (e.g:, at the 3' end of the intron or intronic sequence), and a splice regulatory' sequence.
  • the intron or intronic sequence comprises a 5’ splice donor site.
  • the 5’ splice donor site is a GU or an AU.
  • the intron or intronic sequence comprises a 3’ splice acceptor site.
  • the 3’ splice acceptor site is an AG or an AC.
  • an intron or intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site.
  • a regulatory sequence comprises a response element within an AG exclusion zone of the intron.
  • the intron or intronic sequence retains sequence motifs bound by the encoded protein (e.g., YGCY motifs for MBNL1, or GCAUG for RBFOX, or YCAY for NOVA, etc.).
  • an intron or intronic sequence is spliced out, and is not included in the spliced transcript.
  • an intron or intronic sequence may include one or more human, non-human primate, and/or other mammalian or non-mammalian intron splice-regulatory sequences.
  • the regulatory sequences may have 80%-100% (e.g, 80-85%, 85%-90%, greater than 90%, 90%-95%, or 95%-100%) sequence identity, relative to a wild-type regulatory sequence.
  • an intron or intronic sequence is approximately 50 to 4000 nucleotides long. In some embodiments, an intron or intronic sequence is approximately 50 to 100, 75-125, 100-150, 125-175, 200-250, 225-275, 300-350, 325-375, 400-450, 425-475, 500- 550, 525-575, 600-650, 625-675, 700-750, 725-775, 800-850, 825-875, 900-950, 925-975, 950- 1000, 1025-1075, 1050 to 1100, 1075-1125, 1100-1150, 1125-1175, 1200-1250, 1225-1275, 1300-1350, 1325-1375, 1400-1450, 1425-1475, 1500-1550, 1525-1575, 1600-1650, 1625-1675, 1700-1750, 1725-1775, 1800-1850, 1825-1875, 1900-1950, 1925-1975, 1950-2000, 2025-2075, 2050 to 2100, 2075-2125, 2100
  • an intron or intronic sequence is approximately 50-60, 55-65, 60-70, 65-75, 70-80, 75-85, 80-90, 95-105, 100-110, 105-115, 110- 120, 115-125, 120-130, 125-135, 130-140, 135-145, 140-150, 145-155, 150-160, 155-165, 160- 170, 165-175, 170-180, 175-185, 180-190, 185-195, or 190-200 nucleotides long, or any integer contained therein (e.g., 100, 101, 102, 103, 104, 105, etc.).
  • an intron or intronic sequence is approximately 50-80, 60-90, 70-100, 80-1 10, 90-120, 100-130, 110-140, 120-150, 130-160, 140-170, 150-180, 160-190, or 170-200 nucleotides long, or any integer contained therein (e.g., 120, 121, 122, 123, 124, 125, etc.).
  • a natural or wild-type intron is truncated or otherwise modified so as to retain only the sequence which regulates the up- or down-stream alternative exon.
  • said regulatory sequence is located within approximately 100-300 nucleotides upstream or downstream of the exon-intron (or intron-exon) border. In some embodiments, said regulatory sequence is located within approximately 100-110, 105-115, 1 10-120, 1 15-125, 120-
  • said regulatory' sequence is located within approximately 100-130, 110-140, 120-150, 130-160, 140-170, 150-180, 160-190, 170-200, 210-240, 220-250, 230-260, 240-270, 250-280, 260-290, or 270-300 nucleotides upstream or downstream of the exon-intron (or intron-exon) border.
  • the only intron that is comprised within an alternatively-spliced exon cassette is a truncated regulated intron.
  • a regulated intron may in some embodiments be a regulated intron that flanks the alternative exon in its natural or wild-type context. In some embodiments, two regulated introns flank the alternative exon in its natural or wild-type context. A regulated intron may be located 5’ or 3’ relative to the alternative exon in its natural or wildtype context. In some embodiments, a regulated intron or truncated regulated intron is 5' relative to the alternative exon within an alternative exon cassette of the disclosure.
  • a regulated intron or truncated regulated intron is 3’ relative to the alternative exon within an alternative exon cassete of the disclosure.
  • two or more regulated introns are retained and truncated in an alternatively-spliced exon cassette.
  • the two or more truncated regulated introns flank the alternative exon within the alternative exon cassette.
  • all other (e.g., n on-regulatory) introns and intronic sequences have been removed.
  • one or more of the other introns may be retained (and optionally truncated) depending on the size of the nucleic acid and the size limitations of the virus, respectively.
  • the only introns or intronic sequences in an alternatively-spliced exon cassette are truncated introns or intronic sequences (e.g., only one, 2, 3, 4, 5, 6, 7, 8, 9, 10 truncated introns or intronic sequences).
  • an alternatively-spliced exon cassette does not contain any full-length introns.
  • an alternatively-spliced exon cassette does not contain any truncated introns or intronic sequences that are not regulated.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence from or derived from a gene selected from the group consisting of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EX0C7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR10P2, FIP1L1, F0XRED1, FUBP3, GALT, GAT A3, GOLGA2, HIF1A, I NMR, HRB, IKZF1, I
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence from or derived from a gene selected from the group consisting of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR 1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) is or is derived from an intron of BINI. In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) is or is derived from intron 10 and/or intron 11 of BINI.
  • intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 10 of BIN I comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2121, 2130, 2141, or 2232-2233.
  • an intron or intronic sequence comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2121, 2130, 2141, or 2232-2233.
  • all the introns (or portion(s) thereof) and exons (or portion thereof) of an alternatively-spliced exon cassette are from the same gene.
  • Some embodiments of the present invention contemplate heterologous gene constructs, wherein introns (or portion(s) thereof) and exons (or portion(s) thereof) from different genes are integrated into a single alternatively-spliced exon cassette or transgene.
  • at least one intron (or portion thereof) and at least one exon (or portion thereof) of the nucleic acid construct are from different genes.
  • an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gene(s) which comprises any one or more of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP (', hnRNP D, hnRNP DL, hnRNP F, hnRNP H, hnRNP K, hnRNP L, hnRNP VI, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FV1 RP, MTM1, MTMR2, LAMP/, KIF5A, a microdystrophinencoding gene, C9ORF72, HIT, DNM2, BINI, RYR 1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, KI
  • an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gene(s) which comprises any one or more of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1 , CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM120A, I- AM 136A, FAM36A, I AR.SB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, G0LGA2, HIF 1 A, HMMR, HRB, IKZF1, ILF3,
  • an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gerte(s) which comprises any one or more of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
  • a gerte(s) which comprises any one or more of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
  • one or more introns (or portions thereof) and/or an exon (or portion thereof) is from or derived from BINI.
  • the one or more introns (or portions thereof) is or is derived from an intron(s) of BINI. In some embodiments, the one or more introns (or portions thereof) is or is derived from intron 10 and/or intron 11 of BINI. In some embodiments, the one or more introns (or portions thereof) which is or is derived from intron 10 of BIN I comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • the one or more introns (or portions thereof) which is or is derived from intron 10 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15.
  • the one or more introns (or portions thereof) which is or is derived from intron 1 1 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • the one or more introns (or portions thereof) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
  • an exon (or portion thereof) is or is derived from exon 11 of BINI .
  • the exon (or portion thereof) which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37.
  • the exon (or portion thereof) which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 37.
  • the exon (or portion thereof) which is or is derived from exon 1 1 of BIN 1 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38.
  • the exon (or portion thereof) which is or is derived from exon 11 of BIN I comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
  • the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI together comprise an alternative exon cassette.
  • the alternative exon cassette (which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
  • the alternative exon cassette (which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
  • an alternative exon cassette (e.g, which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) is selected for inclusion in a transgene based on the psi values which the alternative exon cassette achieves in a specific tissue of interest (see, e.g., Table 4; Table 5).
  • the alternative exon cassette selected for inclusion in a transgene would be one wherein a high psi value is observed for skeletal tissue, and wherein a low psi value is observed for heart tissue (e.g., the A psi between skeletal tissue and heart tissue is large).
  • the alternative exon cassette selected from inclusion in a transgene would be one wherein a high psi value is observed for skeletal tissue.
  • the alternative exon cassette selected from inclusion in a transgene would be one wherein a low psi value is observed for heart tissue.
  • the alternative exon cassette which is included in a transgene may be selected based on a variety of factors including, but not limited to: the identity of the protein cargo to be encoded by the coding region of interest, the A psi observed between a first tissue (or condition, etc.) which is of interest and a second tissue (or condition, etc.) which is not of interest, the psi observed in a tissue (or condition, etc.) which is of interest; and/or the psi observed in a tissue (or condition, etc.) which is not of interest.
  • various other factors may also impact which alternative exon cassette is selected for inclusion in a transgene, as described throughout the disclosure.
  • an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from SMN 1.
  • an intron(s) is or is derived from intron 6 and/or intron 7 of SMN1
  • the intron which is derived from SMN1 intron 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 6.
  • the intron which is derived from SMN1 intron 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103.
  • the intron which is derived from SMN1 intron 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103.
  • the intron which is derived from SMN1 intron 7 is a fragment of (e.g, is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 7.
  • the intron which is derived from SMN1 intron 7 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 104. In some embodiments, the intron which is derived from SMN1 intron 7 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 104.
  • an exon is or is derived from exon 6 of SMN1 .
  • the exon which is derived from SMN1 exon 6 is a fragment of (e.g, is truncated relative to) the wild-type or naturally occurring sequence of SMN1 exon 6.
  • the exon which is derived from SMN1 exon 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102.
  • the exon which is derived from SMN1 exon 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102.
  • the recombinant viral genomes of the present disclosure comprise one or more regulatory sequences.
  • the regulatory sequences impart a positive control on the expression of a coding sequence of interest.
  • the regulatory sequences impart a negative control on the expression of a coding sequence of interest.
  • Regulatory sequences may be present, inserted, or otherwise included in an alternatively-spliced exon. Such sequences may be referred to as positive or negative regulatory control cN-elements or “regulatory cis-elements” or merely as “cN-elements.”
  • the one or more czs-elements located within an alternatively-spliced exon and which may influence the level of expression of a coding region of interest through positive and/or negative controls may comprehensively include any genetic element which exerts — as a consequence being spliced-in or spliced-out of the final mRNA- -either a positive or negative regulation on the expression of the coding region.
  • Non-limiting examples of positive or negative regulatory' m-elements located within the alternatively-spliced exons can include, without limitation, a translation start codon, a translation stop codon, a ligand-responsive aptamer, a binding site for an RNA binding protein that serves to positively regulate mRNA translation, a binding site for an RNA binding protein that serves to negatively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA translation, or a binding site for a nucleic acid molecule (e.g, a miRN A or an siRNA) that selves to negatively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to positively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to negatively regulate mRNA stability or degradation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA stability or degradation,
  • the c/x-element is located within the alternatively-spliced exon, but in other cases, the c/s-eletnent is separate from, but at least associated with, the alternatively-spliced exon, such that it is spliced-in or spliced-out at the same time as the alternatively-spliced exon.
  • Non-limiting examples of positive or negative regulatory' cN-elements can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise affects the stability and/or degradation of a mRNA; and (2) a nucleotide sequence element that regulates, modulates, or otherwise affects the translation of a mRNA into one or more encoded polypeptide products (e.g., a therapeutic product).
  • the one or more cA-elements can include, but are not limited to, a translation start codon, a translation stop codon, an siRNA binding site, a miRNA binding site, a sequence forming a stem-loop structure, a sequence forming an RNA dimerization motif, a sequence forming a hairpin structure, a sequence forming an RNA quadruples, polypurine tract, a sequence forming a pair of kissing loops, and a sequence forming a tetraloop/tetraloop receptor pair.
  • cA-elements include binding sites recognized by regulatory' elements, such as, for example, RNA binding proteins.
  • an RNA binding protein may be involved in binding to one or more positive or negative cN-elements and, as such, may be involved in regulating the expression of the coding region of interest.
  • the RNA binding protein is a sequence-specific RNA binding protein.
  • a useful sequence-specific RNA binding protein binds to a target sequence with a binding affinity (e.g., Kd) of 0.01-1000 nM or less (e.g., 0.01 to 1 , 1-10, 10-50, 50-100, 100-500, 500-1,000 nM).
  • an RNA binding protein has serine/arginine domains that act as splicing enhancers, or glycine-rich domains that act as splicing repressors.
  • an RNA binding protein acts as an intronic splicing enhancer, intronic splicing silencer, exonic splicing enhancer, or exonic splicing silencer.
  • a sequence-specific RNA binding protein is one that contains zinc fingers, RNA recognition motifs, KH domains, deadbox domains, or dsRBDs.
  • Non-limiting examples of RBPs that contain zinc fingers include: MBNL, TISH, or TTP.
  • Non-limiting examples of RBPs that contain RNA recognition motifs include hnRNPs and SR proteins, RbFox, PTB, Tra2beta.
  • Nonlimiting examples of RNA binding proteins that contain KH domains include Nova, SF1, and FBP
  • Non-limiting examples of RNA binding proteins that contain deadbox domains are DDX5, DDX6, and DDX17.
  • Non-limiting examples of RNA binding proteins that contain dsRBDs include ADAR, Staufen, and TRBP.
  • RNA binding proteins and their respective sequence specific binding motifs are known in the art, and can be found, for example, in Perez-Perri, J. I., et al., (2Ql ⁇ , Nat. Comm., 9:4408; Van Nostrand, E. L., et al., (2020), Nature, 583, 711 -19; and Corley, M., etal., (2020), Cell, (20): 30159-3, the contents of winch are hereby incorporated by reference with respect to RNA protein binding sites and RNA binding proteins,
  • the recombinant viral vector genomes may further comprise one or more regulatory sequences and/or genes encoding factors that regulate splicing, including splicing of the alternatively-spliced exon.
  • that regulatory gene encodes a tissue-specific RNA binding protein, an autoregulatory RNA binding protein, or a condition-specific RNA binding protein.
  • the protein auto-regulates splicing of the mRNA encoded by the recombinant viral genome.
  • splicing can be regulated by two or more different splice regulatory proteins that bind to splicing regulatory regions.
  • NRAP exon 12 is highly included in skeletal muscle but absent in heart..
  • TPM2 exon 2 is low in heart but high in smooth muscle.
  • SLC25A3 is very' high in heart but low in brain.
  • the recombinant viral genome may further encode a splice- regulatory protein, which can include, for instance, MBNL protein, an SR protein (e.g., SRSF1, SRSF2, SRSF3, SRSF4, SRSF5, SRSF6, SRSF7, SRSF8, SRSF9, SRSF10, SRSF11, or SRSF12), an hnRNP protein, an RbFox protein, a CELF protein, a Nova protein, or a PTB protein.
  • a splice- regulatory protein which can include, for instance, MBNL protein, an SR protein (e.g., SRSF1, SRSF2, SRSF3, SRSF4, SRSF5, SRSF6, SRSF7, SRSF8, SRSF9, SRSF10, SRSF11, or SRSF12), an hnRNP protein, an RbFox protein, a CELF protein, a Nova protein, or a PTB protein.
  • the viral vectors may also encode a splicing factor in the form of an RNA, which may comprise a regulatory RNA molecule, a short hairpin RNA molecule (shRNA), a microRNA molecule, a transfer RNA molecule (tRNA), or an RNA that comprises a DMPK-targeting shRNA or microRNA.
  • the RNA that regulates splicing may also comprise a repeat-targeting shRNA or microRNA (e.g, a CUG shRNA, CAG shRNA, or GGGGCC shRNA), e.g., which targets an RNA binding protein or other member of a related biological pathway.
  • the viral vectors may also encode a splicing factor that comprises a protein-RNA complex
  • the protein-RNA complex comprises a ribosome, snRNP complex, or other macromolecular complex that can interact with RNA to regulate splicing decisions.
  • a snRNP complex comprises U1 snRNP or U2 snRNP.
  • the intracellular factor comprises a protein-RNA complex
  • the RNA comprises a ribozyme that targets one or more CUG repeats.
  • the intracellular factor comprises a protein- RNA complex
  • the RNA comprises a ribozyme that targets specific mRNAs.
  • Non-limiting examples of RNA binding protein motifs and RNA target sequences that can confer or regulate spicing activity are described, for example, in Ray, D., etal., (2014), Nature, 499(7457): 172-77; Lambert., N., et al., (2014), Mol. Cell., 54(5): 887-900; and Van Nostrand, E. L., et al., (2020), Nature, and may be incorporated in the recombinant viral vector genomes described herein to further regulate splicing activity
  • NMD Nonsense mediated decay
  • the recombinant viral vector genomes may comprise an alternatively-spliced exon cassette configured to regulate expression of a coding region of interest by including a nonsense mediated decay (NMD) exon (e.g, an alternative exon comprising a heterologous stop codon) within the RNA.
  • NMD nonsense mediated decay
  • the NMD exon is flanked by introns (or portion(s) thereof) for which alternative splicing is regulated.
  • an NMD exon is an exon that encodes at least one stop codon that is in frame with a previous exon, wherein the stop codon is upstream (5’) from the 3’ splice site of the exon.
  • the in-frame stop codon is inserted at least 100 nucleotides, at least 95 nucleotides, at least 90 nucleotides, at least 85 nucleotides, at least 80 nucleotides, at least 75 nucleotides, at least 70 nucleotides, at least 65 nucleotides, at least 60 nucleotides, at least 55 nucleotides, at least 50 nucleotides, at least 45 nucleotides, at least 40 nucleotides, at least 35 nucleotides, at least 30 nucleotides, at least 25 nucleotides, at least 20 nucleotides, at least 15 nucleotides, at least 10 nucleotides, or at least 5 nucleotides, or between 1 to 5 nucleotides upstream of the next 5’ splice junction.
  • the NMD exon if included in the spliced RNA, it causes degradation of the RNA via nonsense-mediated decay. In some embodiments, if the NMD exon is spliced out, the resulting transcript is stable, and in some embodiments encodes a functional (e.g., full-length) protein of interest.
  • an alternatively-spliced exon cassette for which splicing is regulated is a construct configured to regulate expression of a protein by including a 5’ exon comprising an amino terminal amino acid encoding sequence (e.g, an ATG or part of the ATG) and/or translation control sequences, wherein the 5’ exon is separated from subsequent exon(s) by an intron for which splicing is regulated.
  • the intron is spliced out of the RNA transcript
  • the recombinant 5’ exon is spliced in frame to the subsequent exon(s) and the resulting spliced transcript encodes a protein that is expressed.
  • the recombinant 5’ exon is not spliced to the subsequent exon(s) and as a result a protein is not expressed from the transcript.
  • an intron (or portion thereof) for which splicing is regulated can be included within a gene that encodes a regulator ⁇ ' RNA (e.g., an siRNA).
  • a regulator ⁇ ' RNA e.g., an siRNA
  • an intron(s) (or portion thereof) for which splicing is regulated and that encodes regulator ⁇ ' RNA(s) can be included in an alternatively-spliced exon cassette encoding an RNA transcript.
  • the recombinant genomes disclosed herein may comprise one or more transgenes.
  • a transgene may be recombinant (or “synthetic”), and may be modified to comprise an alternatively-spliced exon or an alternatively-spliced exon cassette described herein (e.g., see FIG. 1) such that the expression of the transgene or coding region of interest comes under the regulatory control of alternatively-spliced exon or the presence of a ligand.
  • a transgene may encode any therapeutic agent, including, but not limited to a therapeutic protein, an antibody or fragment thereof, a bispecific antibody or fragment thereof, antigen-binding fragments, a nucleic acid molecule-based therapeutic (e.g, an siRNA, a microRNA, or an oligonucleotide), genome editing components (e.g., CRISPR/Cas9 based proteins and protein fusion and guide RNA molecules), and complexes (e.g, nucleoprotein complexes).
  • a nucleic acid molecule-based therapeutic e.g, an siRNA, a microRNA, or an oligonucleotide
  • genome editing components e.g., CRISPR/Cas9 based proteins and protein fusion and guide RNA molecules
  • complexes e.g, nucleoprotein complexes.
  • a coding region of a transgene may be naturally-occurring, and may in some embodiments comprise no nucleic acid modifications, relative to the coding region of a wild-type gene.
  • a coding region of a transgene may be synthetic. The coding region of a transgene may be considered synthetic if it undergoes one or more nucleic acid modifications, relative to the coding region of a wild-type gene.
  • a nucleic acid modification maybe a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the coding region of the transgene.
  • the modification comprises disrupting or deleting a native start codon located at the 5’ end of the coding region of the transgene.
  • the modification comprises the insertion of an alternatively-spliced exon into the coding region of the transgene.
  • the coding region of the transgene may comprise one or more nucleic acid modifications (e.g., substitutions) such that the coding region comprises a “barcode” sequence.
  • Barcode sequences may be useful in some embodiments to characterize the identity of the transgene (e.g., a transgene comprising a BINI alternative exon cassette an&MTMl coding sequence), for example when multiple transgenes are being tested together.
  • the wobble positions of five codons within the coding region of the transgene are modified to produce a barcode sequence.
  • a “wobble position” is the third nucleic acid of a codon.
  • Nucleic acids lying at wobble positions can be modified without altering the identity of the amino acid encoded by the associated codon (see FIG. 13, SEQ ID NO: 63).
  • the third nucleic acid of each of five consecutive codons in the coding region of the transgene is modified (e.g, 5 total substitutions are made, SEQ ID NOs: 65-75).
  • said modifications result in the formation of a barcode sequence which is 5 nucleic acid sequences in length.
  • the resultant barcode sequence is unique to the transgene within which it is comprised, and can be used to characterize the identity of said transgene.
  • the five codons which are modified are located approximately 350 nucleotides from the 5’ end of the coding region of the transgene. In some embodiments, the five codons which are modified are located approximately 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, or 550 nucleotides from the 5’ end of the coding region of the transgene.
  • the five codons which are modified are located approximately 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, or 550 nucleotides from the 5’ end of the coding region of the transgene.
  • the five codons which are modified are located approximately 100-130, 120-150, 140-170, 160-190, 180-210, 200-230, 220-250, 240-270, 260-290, 280-310, 300-330, 320-350, 340-370, 370-400, 390-420, 410-440, 430-460, 450-480, 470-500, 490-520, 510-540, or 530-560 nucleotides from the 5’ end of the coding region of the transgene.
  • a coding region of a transgene may naturally comprise one or more internal, out-of-frame ATG start codons.
  • the alternative exon comprising an ATG start codon at its 3’ end
  • translation of the coding region via an alternate, out-of-frame ATG start codon located within the coding region of the transgene would be undesirable.
  • any modification made to the coding region of the transgene must also preserve translation of the full-length protein when the alternative exon is spliced-in.
  • one or more modifications are made to the coding region of the transgene which preserve translation of the full-length protein in the condition wherein the alternative exon is spliced-in, but which disrupt or terminate translation of the full-length protein in the condition wherein the alternative exon is spliced-out.
  • one or more nucleic acid substitutions are made within the coding region of the transgene to introduce one or more heterologous stop codons located downstream of (e.g., 3’ relative to) one or more of the internal, out-of-frame start codons located within the coding region of the transgene.
  • substitutions may comprise the substitution of 1, 2, or 3 nucleic acids to produce any of a TAA, TGA, or TAG stop codon, depending on the nucleic acids which are naturally present at the desired location within the coding sequence.
  • a 3’ UTR intron is included in the transgene which elicits nonsense-mediated decay in the condition wherein the alternative exon is spliced- out (such that translation of the full-length protein is disrupted or terminated), but which preserves translation of the full-length protein in the condition wherein the alternative exon is spliced-in.
  • the coding region or at least one of the exons of the transgene is from or is derived from a coding region from a gene selected from the group consisting of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C. hnRNP D, hnRNP DI ..
  • hnRNP F hnRNP H, hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FMRP, MTMI, MTMR2, LAMP2, KIF5A, microdystrophin, C9ORF72, HTT, DNM2, BINI , RYR1 , NEB, ACTA, TPM3, I PX 12, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, LM0D3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2, POLG1, GA A, AGE, PYGM, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, cytochrome b/cytochrome c oxidase, CLCN1, SCN4A, DMPK, CN
  • the coding region or at least one exon of the transgene is from or is derived from a coding region of MTMI.
  • the coding region of the transgene which is or is derived from MTMI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1881.
  • the coding region of the transgene which is or is derived from MTMI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1881.
  • the coding region of the transgene is from or is derived from a coding region of CAPN3.
  • the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1882.
  • the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1882.
  • the transgene may encode one or more therapeutic proteins (e.g, a biologic or biosimilar thereof), including, but not limited to: adalimumab, rituximab, pegfilgrastim, infliximab, bevacizumab, trastuzumab, etanercept, and epoetin.
  • therapeutic proteins e.g, a biologic or biosimilar thereof
  • a recombinant viral genome comprising an alternatively-spliced exon cassette as described herein is provided in a viral vector (e.g., an rAAV vector; a lentivirus vector).
  • the viral vectors may include rAAV particles, lentivirus particles, or other viral vectors.
  • the recombinant viral genomes packaged into the rAAV or lentiviral vectors further comprise a promoter.
  • the promoter is a constitutive promoter or a regulated promoter.
  • the regulated promoter is an inducible promoter.
  • the promoter comprises any one of: CMV, EFl al ph a, CBh, synapsin, enolase, MECP2, MHCK7, Desmin, or GFAP.
  • an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880. In some embodiments, an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
  • the promoter is a ubiquitous promoter.
  • a ubiquitous promoter is a promoter selected from the group consisting of: an EFl alpha promoter, a beta actin promoter, CMV, CBh, and CAG promoter.
  • the promoter is a tissue-specific promoter, such as a muscle- or heart-biased promoter.
  • a tissue-specific promoter, such as a muscle- or heart-biased promoter is a promoter selected from the group consisting of: a muscle creatine kinase promoter, a C5-I2 muscle promoter, MHCK7, and Desmin.
  • the promoter is a neuronal -biased promoter.
  • a neuronal-biased promoter is a promoter selected from the group consisting of: synapsin and MECP2.
  • the promoter is an astrocyte-biased promoter.
  • an astrocyte-biased promoter is a GFAP promoter.
  • the nucleic acid comprises a promoter and sequence corresponding to an RNA molecule that is capable of being expressed from the nucleic acid.
  • the recombinant viral genome is sufficiently small to be effectively packaged in an AAV viral particle (c.g., the gene construct may be around 0.5-5 kb long, for example around 4.9 kb, 4.8 kb, 4.7 kb, 4,6 kb, 4.5 kb, 4.4 kb, 4.3 kb, 4,2 kb, 4. 1 kb, 4 kb, 3.5 kb, or 3 kb long).
  • a nucleic acid comprises one or more truncated and/or recombinant introns, as described elsewhere herein.
  • a recombinant intron for an rAAV vector is typically shorter than 4 kb, but can be between around 20 bases long and around 2,000 bases long to provide space for other components (e.g., exons, regulatory sequences, other introns, viral packaging sequences) in the nucleic acid (e.g, recombinant gene) construct.
  • a recombinant intron is around 50 bases, around 100 bases, around 250 bases, around 500 bases, around 1,000 bases, around 1,500 bases, or around 2,000 bases long.
  • a recombinant intron is shorter than 4 kb, shorter than 3 kb, shorter than 2 kb, shorter than 1 kb, 100-900 bases long, or shorter than 500 bases long.
  • the recombinant viral genome contains sufficient viral sequences for packaging in a viral vector (e.g, an rAAV particle).
  • a recombinant viral genome is flanked by viral sequences (for example, terminal repeat sequences) that are useful to package the recombinant viral genome in a viral particle (e.g., encapsidated by viral capsid proteins and/or an envelope, where appropriate).
  • the flanking terminal repeat sequences are rAAV inverted terminal repeats (ITRs).
  • the AAV ITR sequences comprise AAV1, AAV2, AAV5, AAV7, AAV8, or AAV9 ITR sequences.
  • the AAV ITR sequences comprise AAV2 ITR sequences.
  • an AAV2 ITR comprises a polynucleotide having at. least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879.
  • an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879.
  • the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
  • the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
  • the recombinant viral genome is a lentivirus genome comprising a DNA molecule, wherein the DNA molecule comprises sequences that encode an RNA molecule.
  • the recombinant viral genome is encapsidated by an rAAV particle as described herein.
  • the rAAV particle may be of any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10), including any derivative (including non-naturally occurring variants of a serotype) or pseudotype.
  • the rAAV particle is an AAV8 particle, which may be pseudotyped with AAV2 ITRs.
  • an AAV2 ITR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879.
  • Non-limiting examples of derivatives and pseudotypes include AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV218, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45; or a derivative thereof.
  • the rAAV vector is of serotype AAV8. In some embodiments, the rAAV vector is pseudotyped.
  • AAV serotypes and derivatives/pseudotypes, and methods of producing such derivatives/pseudotypes are known in the art (see, e.g.. Mol Ther. 2012 Apr;20(4):699-708. doi: 10.1038/mt.2011.287. 2012 Jan 24.
  • the AAV vector toolkit poised at the clinical crossroads. Asokan Al, Schaffer DV, Samulski RJ.).
  • the rAAV particle is a pseudotyped rAAV particle, which comprises (a) a nucleic acid vector comprising ITRs from one serotype (e.g., AAV2) and (b) a capsid comprised of capsid proteins derived from another serotype (e.g., AAV1, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAV10).
  • a pseudotyped rAAV particle which comprises (a) a nucleic acid vector comprising ITRs from one serotype (e.g., AAV2) and (b) a capsid comprised of capsid proteins derived from another serotype (e.g., AAV1, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAV10).
  • Exemplary' rAAV nucleic acid vectors useful according to the disclosure include singlestranded (ss) or self-complementary (sc) AAV nucleic acid vectors, such as single-stranded or self-complementary recombinant viral genomes.
  • Methods of producing rAAV particles and recombinant viral genomes are also known in the art and commercially available (see, e.g., Zolotukhin et al. Production and purification of serotype 1, 2, and 5 recombinant adeno-associated viral vectors. Methods 28 (2002) 158—167; and U.S.
  • a plasmid containing the recombinant viral genome may be combined with one or more helper plasmids, e.g., that contain a rep gene (e.g, encoding Rep78, Rep68, Rep52 and Rep40) and a cap gene (encoding VP1, VP2, and VP3, including a modified VP3 region), and transfected into a producer cel l line such that the nAAV particle can be packaged and subsequently purified.
  • helper plasmids e.g., that contain a rep gene (e.g, encoding Rep78, Rep68, Rep52 and Rep40) and a cap gene (encoding VP1, VP2, and VP3, including a modified VP3 region), and transfected into a producer cel l line such that the nAAV particle can be packaged and subsequently purified.
  • the one or more helper plasmids includes a first helper plasmid comprising a rep gene and a cap gene and a second helper plasmid comprising a El a gene, a E lb gene, a E4 gene, a E2a gene, and a MA gene.
  • the rep gene is a rep gene derived from AAV2 and the cap gene is derived from AAV2 and includes modifications to the gene in order to produce a modified capsid protein described herein.
  • Helper plasmids, and methods of making such plasmids are known in the art and commercially available (see, e.g., pDM, pDG, pDPlrs, pDP2rs, pDP3rs, pDP4rs, pDPSrs, pDP6rs, pDG(R484E/R585E), and pDPS.ape plasmids from PlasmidFactory, Bielefeld, Germany; other products and services available from Vector Biolabs, Philadelphia, PA; Cellbiolabs, San Diego, CA; Agilent Technologies, Santa Clara, Ca; and Addgene, Cambridge, MA; pxx6; Grimm et al.
  • helper plasmids are produced or obtained, which comprise rep and cap ORFs for the desired AAV serotype and the adenoviral V A, E2 A (DBP), and E4 genes under the transcriptional control of their native promoters.
  • the cap ORF may also comprise one or more modifications to produce a modified capsid protein as described herein.
  • HEK293 cells available from ATCC® are transfected via CaPO4-mediated transfection, lipids or polymeric molecules such as Polyethylenimine (PEI) with the helper plasmid(s) and a plasmid containing a nucleic acid vector described herein.
  • PEI Polyethylenimine
  • HEK293 cells are then incubated for at least 60 hours to allow for rAAV particle production.
  • Sf9-based producer stable cell lines are infected with a single recombinant baculovirus containing the nucleic acid vector.
  • HEK293 or BHK cell lines are infected with a HSV containing the nucleic acid vector and optionally one or more helper HSVs containing rep and cap ORFs as described herein and the adenoviral VA, E2A (DBP), and E4 genes under the transcriptional control of their native promoters.
  • the HEK293, BHK, or Sf9 cells are then incubated for at least 60 hours to allow for rAAV particle production.
  • the rAAV particles can then be purified using any method known the art or described herein, e.g, by iodixanol step gradient, CsCl gradient, chromatography, or polyethylene glycol (PEG) precipitation.
  • engineered and recombinant cells are intended to refer to a cell into which an exogenous polynucleotide segment (such as DN A segment that, leads to the transcription of a biologically active molecule) has been introduced. Therefore, engineered cells are distinguishable from naturally occurring cells, which do not contain a recombinantly introduced exogenous DNA segment. Engineered cells are, therefore, cells that comprise at least one or more heterologous polynucleotide segments introduced through the hand of man.
  • a therapeutic agent such as a transgene comprising an alternatively-spliced cassette
  • a therapeutic agent such as a transgene comprising an alternatively-spliced cassette
  • a sequence “under the control of’ a promoter one positions the 5' end of the transcription initiation site of the transcriptional reading frame generally between about 1 and about 50 nucleotides “downstream” of (i.e., 3’ of) the chosen promoter.
  • the “upstream” promoter stimulates transcription of the DNA and promotes expression of the encoded polypeptide. This is the meaning of “recombinant expression” in this context.
  • the recombinant nucleic acid (e.g., viral) vector constructs are those that comprise an rAAV nucleic acid vector that contains a therapeutic gene of interest operably linked to one or more promoters that is capable of expressing the gene in one or more selected mammalian cells.
  • nucleic acid vectors are described in detail herein.
  • the transgene comprising an alternatively-spliced exon cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2236, or 2247-2256.
  • the transgene comprising an alternatively-spliced exon cassette comprises a.
  • polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2080, 2091 , 2099, 2102, 2105, 2108, 2109, 2110, 21 11, 2112, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, 2236, or 2247-2256.
  • a viral vector of the present disclosure comprises a recombinant lentivirus genome.
  • Lentiviruses are the only type of virus that are diploid; they have two strands of RNA.
  • the lentivirus is a retrovirus, meaning it has a single stranded RNA genome with a reverse transcriptase enzyme, which functions to perform transcription of the viral genetic material upon entering the cell.
  • Lentiviruses also have a viral envelope with protruding glycoproteins that aid in attachment to the outer membrane of a. host cell.
  • RNA sequences that code for specific proteins that facilitate the incorporation of the viral sequences into genome of a host cell.
  • the “gag” gene codes for the structural components of the viral nucleocapsid proteins: the matrix (MA/pl7), the capsid (CA/p24) and the nucleocapsid (NC/p7) proteins.
  • the “pol” domain codes for the reverse transcriptase and integrase enzymes.
  • the “env” domain of the viral genome encodes for the glycoproteins and envelope on the surface of the virus.
  • the ends of the genome are flanked with long terminal repeats (LTRs). LTRs are necessary' for integration of the dsDNA into the host chromosome. LTRs also serve as part of the promoter for transcription of the viral genes.
  • LTRs long terminal repeats
  • the env, gag, and/or pol vector(s) forming the particle do not contain a nucleic acid sequence from the lentiviral genome that expresses an envelope protein.
  • a separate vector containing a nucleic acid sequence encoding an envelope protein operably linked to a promoter is used (e.g., an env vector).
  • such env vector also does not contain a lentiviral packaging sequence.
  • the env nucleic acid sequence encodes a lentiviral envelope protein.
  • the native lentivirus promoter is located in the U3 region of the 3' LTR.
  • the presence of the lentivirus promoter can in some embodiments interfere with heterologous promoters operably linked to a transgene.
  • the lentiviral promoter is deleted.
  • the lentivirus vector contains a deletion within the viral promoter. After reverse transcription, such a deletion is in some embodiments transferred to the 5' LTR, yielding a vector/provirus that is incapable of synthesizing vector transcripts from the 5' LTR in the next round of replication.
  • the lentivirus particle is expressed by a vector system encoding the necessary viral proteins to produce a lentivirus particle.
  • the Pol proteins are expressed by multiple vectors.
  • the gag-pol genes are on the same vector.
  • the gag nucleic acid sequence is on a separate vector than at least some of the pol nucleic acid sequence. In some embodiments, the gag nucleic acid sequence is on a separate vector from all the pol nucleic acid sequences that encode Pol proteins.
  • the lentivirus vector does not contain nucleotides from the lentiviral genome that package lentiviral RNA, referred to as the lentiviral packaging sequence. It will be understood that selective inclusion of envelopes could result in changes in infectivity, such that the lentivirus vector could infect many different types of cells, and could be targeted to specific cell types of interest. Accordingly, in some embodiments, the envelope protein is not from the lentivirus, but from a different virus. The resultant lentivirus particle is referred to as a pseudotyped particle.
  • env gene that encodes an envelope protein that targets an endocytic compartment such as that of the influenza virus, VSV-G, alpha viruses (Semliki forest virus, Sindbis virus), arenaviruses (lymphocytic choriomeningitis virus), flaviviruses (tick-borne encephalitis virus, Dengue vims), rhabdoviruses (vesicular stomatitis virus, rabies vims), and orthomyxoviruses (influenza vims) is used.
  • alpha viruses Semliki forest virus, Sindbis virus
  • arenaviruses lymphocytic choriomeningitis virus
  • flaviviruses tac-borne encephalitis virus, Dengue vims
  • rhabdoviruses vesicular stomatitis virus, rabies vims
  • orthomyxoviruses influenza vims
  • the lentivirus is a human immunodeficiency virus (HIV1 or HIV2), a feline immunodeficiency virus (FIV), a bovine immunodeficiency vims (BIV), a caprine arthritis encephalitis virus, an equine infectious anemia virus, a jembrana disease virus, a puma lentivirus, aimian immunodeficiency vims, or a visna-maedi vims.
  • HIV1 or HIV2 human immunodeficiency virus
  • FV feline immunodeficiency virus
  • BIV bovine immunodeficiency vims
  • caprine arthritis encephalitis virus an equine infectious anemia virus
  • jembrana disease virus a jembrana disease virus
  • puma lentivirus aimian immunodeficiency vims
  • a visna-maedi vims a visna-maedi vims.
  • a nucleic acid sequence encoding a transgene comprising an alternatively-spliced exon cassette of the present invention is inserted into the empty' lentiviral parti cles by use of a plurality of vectors each containing a nucleic acid segment of interest and a lentiviral packaging sequence necessary to package lentiviral RNA into the lentiviral particles (the packaging vector).
  • the packaging vector contains a 5' and 3' lentiviral LTR with the desired nucleic acid segment inserted between them.
  • the nucleic acid segment can be antisense molecules or, in some embodiments, encodes a therapeutic protein.
  • the transgene is oriented in the anti-sense orientation within the lentiviral genome. In some embodiments, orienting the transgene in the anti-sense direction within the lentiviral genome avoids the loss of introns (e.g, the splicing-out of introns) during viral packaging.
  • the packaging vector contains a selectable marker gene.
  • marker genes are well known in the art and include such genes as green fluorescent protein (GFP), blue fluorescent protein (BFP), luciferase, LacZ, nerve growth factor receptor (NGFR), etc. E. Methods of delivering viral vectors
  • Some aspects of the invention contemplate a method of treating a disease or condition in a subject comprising administering a viral vector of the present disclosure to a subject, wherein the viral vectors comprise a recombinant viral genome described herein.
  • a method of delivering the disclosed viral (e.g., rAAV; lentivirus) particles are delivered by administering any one of the compositions disclosed herein to a subject.
  • “administering” or “administration” means providing a material to a subject in a manner that is pharmacologically useful.
  • viral particles are delivered to one or more tissues and cell types in a subject.
  • viral particles are delivered to one or more of muscle, heart, CNS, and immune cells.
  • delivery of a viral particle restores transcriptome homeostasis.
  • Deliver ⁇ / vehicles, vectors, particles, nanoparticles, formulations and components thereof which are suitable for expression of one or more elements of an engineered AAV capsid system as described herein are as described in, for example, International Patent Application Publication Nos. WO 2021/050974 and WO 2021/077000 and International Application No.
  • a viral particle is administered to the subject parenterally.
  • a viral particle is administered to a subject subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intracisternally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
  • a viral particle is administered to the subject by injection into the hepatic artery' or portal vein.
  • compositions described above or elsewhere herein are typically administered to a subject in an effective amount, that is, an amount capable of producing a desirable result.
  • the desirable result will depend upon the active agent being administered.
  • an effective amount of rAAV particles may be an amount of the particles that are capable of transferring an expression construct to a host organ, tissue, or cell.
  • a therapeutically acceptable amount may be an amount that is capable of treating a disease.
  • dosage for any one subject depends on many factors, including the subject’s size, body surface area, age, the particular composition to be administered, the active ingredient(s) in the composition, time and route of administration, general health, and other drugs being administered concurrently.
  • a single composition comprising viral particles as disclosed herein is administered only once.
  • a subject may need more than 1 administration of a viral composition (e.g, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more times).
  • a subject may need to be provided a second administration of any one of the viral compositions as disclosed herein 1 day, I week, I month, 1 year, 2 years, 5 years, or 10 years after the subject was administered a first composition.
  • a first composition of viral particles is different from the second composition of viral particles.
  • the administration of the composition is repeated at least once (e.g., at least once, at least twice, at least thrice, at least four times, at least five times, at least six times, at least 10 times, at least 25 times, or at least 50 times), and wherein the time between a repeated administration and a previous administration is at least 1 month (e.g., at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 5 months, at least 6 months, or at least 12 months).
  • the time between a repeated administration and a previous administration is at least 1 month (e.g., at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 5 months, at least 6 months, or at least 12 months).
  • the administration of the composition is repeated at least once, and wherein the time between a repeated administration and a previous administration is at least 1 year (e.g., at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, or at least 20 years).
  • the administration of the composition is facilitated by AAV capsids such as AAV1-9, e.g., with AAV2 ITRs, or other capsids that sufficiently deliver to affected tissues.
  • AAV capsids such as AAV1-9, e.g., with AAV2 ITRs, or other capsids that sufficiently deliver to affected tissues.
  • Patent Publication No. 2020-0263201 U.S. Patent Publication No. 2020-0101099; U.S. Patent Publication No. 2020-0318082; U.S. Patent Publication No. 2018-0369414; U.S. Patent Publication No. 2019-0330278; U.S. Patent Publication No. 2020-0231986, the contents of each of which are incorporated by reference herein ,
  • a mammalian subject is a human, a non-human primate, or other mammalian subject.
  • the subject has one or more mutations associated with aberrant intron and/or alternative splicing.
  • a subject suffers from or is at risk of developing a disease or condition associated with aberrant splice regulation resulting in one or more symptoms of a disease or condition.
  • diseases/conditions include instances in which the homeostasis of RNA binding proteins is altered (e.g., other repeat expansion diseases), or diseases/conditions in which there are mutations in RNA binding protein sequences.
  • the disease or condition is selected from: a repeat expansion disease, a laminopathy, a cardiomyopathy, a muscular dystrophy, a neurodegenerative disease, a cancer, an intellectual disability, and/or premature aging.
  • compositions of this application are administered to a subject resulting in regulated overexpression of the RNA binding protein exhibiting aberrant activity.
  • compositions of this application are administered to a subject resulting in the regulated addition of additional non-mutated, non-aberrant RNA binding protein(s).
  • the disease or condition is selected from the group consisting of: Dentatorubral-pallido-luysian atrophy (DRPL A ), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDI.,2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SC A3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7),
  • Distal muscular dystrophy Emery-Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (MND), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PMA), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader-Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria,
  • Non-limiting examples of symptoms of these diseases/conditions include neurodevelopmental, neurofunctional, or neurodegenerative changes (e.g., ALS, FTD, Spinocerebellar Ataxias, FXTAS, or Huntington’s Disease symptoms) or abnormal proliferation or migration of cells (e.g, as in cancer).
  • myotonic dystrophy type 1 and type 2 are caused by expanded CTG repeats in the DMPK gene and CCTG repeats in the CNBP gene, respectively. Both diseases are highly multi- systemic with symptoms in skeletal muscles, cardiac tissue, gastrointestinal tract, endocrine system, and central nervous system, among others.
  • the present disclosure relates to methods and compositions that are useful for treating myotonic dystrophy type 1 and type 2 (dystrophia myotonica, DM1 and DM2, respectively), for example by delivering viral particles comprising viral constructs (e.g, containing one or more alternative spicing cassettes) to cells or tissue in a subject.
  • viral particles comprising viral constructs (e.g, containing one or more alternative spicing cassettes)
  • DM1 can also manifest in a severe form called congenital DM1, in which profound developmental delays occur. A 25% chance of death before the age of 18 months and 50% chance of survival into mid-30s has been reported.
  • Methods and compositions of the application can be useful to treat, alleviate, or otherwise improve one or more symptoms of DM1.
  • one or more viral constructs can be delivered to a subject having one or more symptoms of myotonic dystrophy. Such symptoms may include, but are not. limited to, delayed muscle relaxation, muscle weakness, prolonged involuntary muscle contraction, loss of muscle, abnormal heart rhythm, cataracts, or difficulty swallowing.
  • a viral composition provided herein is administered to a subject having congenital DM1 or DM2.
  • the viral constructs treat, alleviate, ameliorate, or otherwise improve one or more symptoms associated with DM4 and/or DM2.
  • the viral constructs reduce muscle weakness, reduce muscle loss, reduce muscle wasting, reduce prolonged muscle contractions, improve speech, and/or improve swallowing in a subject.
  • treatment reduces or corrects one or more other symptoms of myotonic dystrophy.
  • splicing of a recombinant intron and/or an alternatively-spliced exon is sufficiently regulated to be therapeutically effective.
  • a recombinant viral genome for delivering a transgene wherein said genome comprises at least one alternatively-spliced exon cassete comprising at least one alternatively- spliced exon, at least one flanking intron, and a coding region of the transgene.
  • Clause 2 The viral genome of clause 1, wherein the alternatively-spliced exon is retained in the spliced transcript.
  • Clause 3 The viral genome of clause 1 or clause 2, wherein the alternatively-spliced exon cassette further comprises at least one constitutive exon.
  • Clause 8 The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon.
  • Clause 11 The viral genome of any one of clauses 1-7, wherein the alternatively-spliced exon cassette comprises two alternatively-spliced exons, each with flanking introns.
  • Clause 12 The viral genome of clause 11, wherein the two alternatively-spliced exons are adjacent.
  • Clause 13 The viral genome of clause 11 or clause 12, wherein the constitutive exon is located 5’ to the two alternatively-spliced exons.
  • Clause 14 The viral genome of any one of clauses 11-13, wherein each alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon.
  • Clause 15 The viral genome of clause 14, w'herein all native start codons located 5’ to the heterologous start codon of the 5’-most alternatively-spliced exon are disrupted or deleted.
  • Clause 16 The viral genome of any one of clauses 11-15, w'herein only one of the two alternatively-spliced exons is retained in the spliced transcript. Clause 17. The viral genome of any one of clauses 11-16, wherein the 5’-most alternatively- spliced exon is retained in the spliced transcript.
  • Clause 18 The viral genome of any one of clauses 11-16, wherein the 3 ’-most alternatively- spliced exon is retained in the spliced transcript.
  • Clause 19 The viral genome of any preceding clause, wherein the alternatively-spliced exon(s) and flanking intron(s) are located within the coding region of the transgene.
  • Clause 21 The viral genome of clause 20, wherein the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
  • Clause 22 The viral genome of clause 20 or clause 21, wherein the heterologous stop codon elicits nonsense-mediated decay.
  • Clause 23 The viral genome of any preceding clause, wherein the alternatively-spliced exon is retained in the spliced transcript in distinct tissues or in distinct cell types.
  • Clause 24 The viral genome of any preceding clause, wherein the alternatively-spliced exon is retained in the spliced transcript in the presence of activated T cells, and/or in states of inflammation.
  • Clause 25 The viral genome of any preceding clause, wherein the alternatively -spliced exon is retained in the spliced transcript in cells exhibiting one or more signs or symptoms of a disease state, and/or in cells exhibiting non-homeostatic levels of the protein encoded by the natural gene comprising the transgene. Clause 26.
  • the alternatively-spliced exon comprises an alternatively-spliced exon from a gene selected from the group consisting of ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, E1F4A2, EIF4G2, EIF4FI, EXOC7, EZH2, FAM120A, FAM136A, F AM36A.
  • a gene selected from the group consisting of ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3,
  • flanking intron(s) is a native flanking intron(s) of the alternatively-spliced exon(s).

Landscapes

  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)

Abstract

Provided herein, in some embodiments, are nucleic acid constructs encoding therapeutic nucleic acids (e.g., miRNAs) of interest comprising one or more alternatively-spliced exons that regulate the expression of proteins or RNAs of interest. Such constructs may in some embodiments be useful for delivery in a recombinant viral vector.

Description

SMALL. MOLECULE-INDUCIBLE GENE EXPRESSION SWITCHES
RELATED APPLICATIONS
The application claims the benefit under 35 U.S.C. 119(e) of U.S. Provisional Application number 63/373,451 filed August 24, 2022, which is incorporated by reference herein in its entirety.
BACKGROUND
Recombinant viruses (e.g., recombinant adeno-associated viruses (AAV) and recombinant lentiviruses, etc.) can be used to express therapeutic proteins (i.e., therapeutic cargoes) in patients as a form of genetic therapy. Controlling expression of therapeutic cargoes can enhance treatment outcomes in patients.
GOVERNMENT SUPPORT
This invention was made with government support under Grant Number R01 NS 112291, awarded by the National Institutes of Health. The government has certain rights in the invention.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The contents of the electronic sequence listing (U119670102WO00-SEQ-PRW. xml; Size: 2,846,842 bytes; and Date of Creation: August 23, 2023) is herein incorporated by reference in its entirety.
SUMMARY OF THE INVENTION
Aspects of the application relate to recombinant nucleic acids containing a transgene comprising a ligand-responsive alternatively spliced exon that controls expression of an mRNA (e.g., encoding a protein of interest) or a functional RNA (e.g., a regulatory' RNA) encoded by the transgene. In some embodiments, the recombinant nucleic acids are delivered to a host cell (e.g., ex vivo or in vivo). In some embodiments, a host cell nucleic acid (e.g., one or more genomic alleles) is edited to introduce a ligand-responsive alternatively spliced exon into a naturally occurring gene. In some aspects, ligand-responsive alternative splicing is used to regulate AAV-delivered gene expression. In some embodiments, ligand-responsive alternative splicing can confer greater control of therapeutic cargoes and also potentially avoid potential toxicides from constitutive over-expression of therapeutic cargoes. Previous aptazyme-based approaches lack modularity and has leaky, non-zero basal expression. Other efforts using drug-responsive alternative splicing patterns to control AAV-mediated gene expression potentially affect many other cryptic splice sites and are restricted to a single specific molecule.
Aspects of the present invention relate to the use of alternative splicing switches in mammalian cells and sequence designs that allow for ligand-inducible regulation of gene expression or knockdown. The approach uses rational design, coupled to deep sequencing, to characterize behavior of hundreds to thousands of synthetic intron/ exon cassettes. Several riboswitch designs that facilitate small molecule-mediated regulation of alternative splicing and multiple sequence variants are described. Unlike switches that promote exon inclusion this design promotes exon skipping upon drug induction. These designed switches can dynamically regulate protein isoforms, protein expression levels, and production of RNA interference triggers. This approach is termed SPlicing by Ligand Induction for Controllable Expression based on Riboswitch (SPLICER). The designs are compact in size and promoter-independent., making them useful regulatory' tools that can be incorporated into gene expression cassettes for basic and translational applications. In turn, the designs can be useful for controlling the expression patterns (e.g., timing of expression by addition of a ligand) of therapeutically useful genes.
In some embodiments, polynucleotides of the present disclosure comprise a ligand- responsive sequence. In some embodiments, the polynucleotide is a transgene, such as one comprising a cassette which is responsive to certain ligands. To this end, the cassettes comprise ligand-responsive sequences which regulate alternative splicing. For example, cassettes may comprise ligand-responsive aptamers that can bind to exogenous or endogenous ligands which results in conformational changes in the transcript of the transgene that effects splicing patterns. In some embodiments, transgenes of the present disclosures are provided in vectors. In some embodiments, the transgenes are provided in recombinant viral genomes that can be provided in AAV particles. In this manner, the splicing of the transgenes and the expression of different isoforms of the transgenes can be expressed in specific tissues in a chemically-inducible manner. In some aspects, the present disclosure relates to a polynucleotide comprising a transgene, wherein the transgene comprises at least one alternatively spliced exon, at least two introns flanking the alternatively spliced exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the at least one alternative exon and the ligand- responsive aptamer along with the introns.
Aspects of the present disclosure relate to the observation that alternatively-spliced exons may be used in the context of viral vectors (e.g., AA V viral vectors or lentivirus viral vectors) to effectively regulate the expression of a coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein). In certain embodiments, the alternatively-spliced exons regulate a coding region of interest in a condition-responsive manner. /\s used herein, “condition-responsive manner” means that the alternatively-spliced exon regulates the expression of a coding region of interest in a manner that is controlled or influenced by one or more conditions, including, but not limited to, environmental conditions, intracellular conditions, extracellular conditions, type of cell (e.g, liver versus kidney cell), gene expression pattern, or disease state. Accordingly, the present disclosure relates to a new approach for regulating expressi on of a coding region of interest (e.g, a coding region of a transgene that encodes a therapeutic protein) from recombinant viral vectors, optionally in a condition-responsive manner, by coupling the expression of a coding region of interest with an alternatively-spliced exon. The present disclosure describes a variety of exemplary configurations and methods of coupling the expression of a coding region of interest (or multiple portions of coding regions) with an alternatively-spliced exon, but any suitable arrangement or configuration is contemplated so long as the expression of the coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein) is configured to come under regulatory control of an alternatively- spliced exon.
In some embodiments, aspects of the present disclosure relate a polynucleotide comprising a sequence encoding a ligand-responsive sequence, wherein the polynucleotide is capable of being alternatively spliced in the presence of a ligand to produce a first RNA or a second RNA. In some embodiments, the polynucleotide comprises an alternative exon operably linked to the ligand-responsive sequence. In some embodiments, the first RNA comprises the alternative exon, wherein the second RNA does not comprise the alternative exon. In some embodiments, the first RNA encodes a long isoform of an RNA of interest and/or the second RNA encodes a short isoform of the RNA of interest.
In some embodiments, the first RNA encodes an RNA of interest. In some embodiments, the first RNA is not operably linked to a pre-mature stop codon (e.g., does contain a pre-mature stop codon). In some embodiments, the first RNA is operably linked to a start codon (e.g., contains a start codon).
In some embodiments, the second RNA encodes an RNA of interest. In some embodiments, the second RNA is not operably linked to a pre-mature stop codon (e.g., does not contain a pre-mature stop codon). In some embodiments, the second RNA is operably linked to a start codon (e.g., contains a start codon).
In some embodiments, the RNA of interest is an interfering RNA. In some embodiments, RNA of interest is a microRNA. In some embodiments, second RNA encodes the microRNA. In some embodiments, the RNA of interest encodes a protein. In some embodiments, the RNA of interest encodes a CRISPR/Cas nuclease or a guide RNA (gRNA). In some embodiments, the RNA of interest encodes a therapeutic RNA and/or a therapeutic protein.
In some embodiments, the ligand-responsive sequence is a risdiplam-responsive sequence or a branaplam-responsive sequence. In some embodiments, the alternative exon comprises a first portion of the risdiplam-responsive sequence and an intron downstream of the alternative exon comprises a second portion of the risdiplam-responsive sequence. In some embodiments, the first portion of the risdiplam-responsive sequence comprises a WGA sequence and the second portion of the risdiplam-responsive sequence comprises a GTAAGW sequence. In some embodiments, the alternative exon further comprises a AGGAAG sequence which is 5’ to the WGA sequence.
In some embodiments, the alternative exon further comprises an upstream sequence which is 5’ to the AGGAAG sequence. In some embodiments, the upstream sequence comprises at least 10 nucleotides. In some embodiments, the alternative exon further comprises a downstream sequence which is 3’ to the AGGAAG sequence and 5’ to the WGA sequence. In some embodiments, the downstream sequence comprises at least 6 nucleotides. In some embodiments, the risdiplam-responsive sequence comprises NNNNNNNNNNAGGAAGNNNNNNNNNNAWGAGTAAGW (SIR.) ID NO: 2183), wherein N is any nucleotide and W is A or T. In some embodiments, the risdiplam-responsive sequence comprises YWWKWWWMKYAGGAAGYTAKTWGTTAWGAGTAAGW (SEQ ID NO:
2184) or YWWKWWWMKYAGGAAGYTAKTRWGTTAWGAGTAAGW (SEQ ID NO:
2185), wherein Y is C or T, K is G or T, W is A or T, M is A or C, and R is A or G. In some embodiments, the risdiplam-responsive sequence comprises
ATRTCC ACTYAA AAAAATCTGGCGATGGG AGC AGA AWGAGT A AGW (SEQ ID NO :
2186), wherein R is A or G, Y is C or T, and W is A or T.
In some embodiments, the branaplam-responsive sequence comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
In some embodiments, the ligand-responsive sequence is a tetracycline-responsive sequence. In some embodiments, the tetracycline-responsive sequence is located in a tetracycline-responsive aptamer comprising the sequence TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T.
In some embodiments, the polynucleotide comprises, from 5’ to 3’, an upstream 3' splice site, a first stem region, a 5' splice site reverse complementary sequence, the tetracyclineresponsive sequence, a 5' splice site, a sequence comprising GT, the second stem region, and a downstream 3’ splice site. In some embodiments, the upstream 3’ splice site is at least 20 nucleotides long and the two nucleotides at the 3’ end are AG. In some embodiments, the downstream 3’ splice site is at least 20 nucleotides long. In some embodiments, the first stem region and the second stem region are at least 2 nucleotides long. In some embodiments, the 5’ reverse complementary sequence and the 5’ splice site are at least 7 nucleotides long.
In some embodiments, polynucleotides of the present disclosure are transgenes.
In some embodiments, for example, the present disclosure relates to a polynucleotide comprising a transgene, wherein the transgene comprises: at least one alternative exon, at least two introns flanking the alternative exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the alternative exon, the at least two introns, and the ligand- responsive aptamer from the transgene.
In some embodiments, wherein the at least one alternative exon and the at least two introns are from the same gene. In some embodiments, wherein the alternative exon and the at ieast two introns are from different genes.
In some embodiments, wherein the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity7, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, wherein the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide have a nucleic acid sequence set forth as in SEQ ID NO: 2081 , 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, wherein the alternative exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
In some embodiments, wherein the alternative exon comprises a polynucleotide have a nucleic acid sequence set forth as in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 21 14, 2137, 2236, or 2247-2256.
In some embodiments, wherein at least one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, wherein at least one of the introns comprise a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 21 18, 2121 , 2127, 2129, 2130, or 2141.
In some embodiments, wherein at least one of the exons comprise a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA- 16 2 gene. In some embodiments, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2281 .
In some embodiments, wherein the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2281.
In some embodiments, wherein the ligand-response aptamer comprises a polynucleotide comprising a nucleic acid sequence that is 20-60 nucleotides in length.
In some embodiments, wherein the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
In some embodiments, wherein the ligand-responsive aptamer comprises a polynucleotide having at nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 21 12, or 2187-2189.
In some embodiments, wherein the ligand-responsive aptamer binds to tetracycline.
In some embodiments, wherein the ligand-responsive aptamer is located in the intron downstream of the alternative exon.
In some embodiments, wherein the ligand-responsive aptamer is located in the intron upstream of the alternative exon.
In some embodiments, wherein the ligand-responsive aptamer is located in the alternative exon .
In some embodiments, wherein the ligand-responsive aptamer in the intron downstream of the alternative exon.
In some embodiments, wherein the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239 and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
In some embodiments, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260.
In some embodiments, wherein the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In some embodiments, a vector comprises the transgene.
In some embodiments, wherein the vector is a plasmid.
In some embodiments, a cell comprises the vector.
In some embodiments, wherein the cell is a mammalian cell.
In some embodiments, wherein the cell is a human cell or cell from a human subject.
In some embodiments, a recombinant viral genome comprises the transgene.
In some embodiments, wherein the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
In some embodiments, wherein the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
In some embodiments, wherein the AAV ITR sequences are AAV2 ITR sequences.
In some embodiments, wherein the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, or 2138.
In some embodiments, wherein the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In some embodiments, an rAAV particle comprises the recombinant viral genome.
In some embodiments, wherein the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHlO, AAV2 (Y~>F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45. In some embodiments, further comprising at least one helper plasmid.
In some embodiments, wherein the helper plasmid comprises a rep gene and a cap gene.
In some embodiments, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
In some embodiments, wherein the rAAV particle comprises two helper plasmids.
In some embodiments, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
In some embodiments, the present disclosure relates to a method of treating a disease or condition in a subject comprising administering the recombinant viral genome or the rAAV particle. In some embodiments, wherein the subject is a mammal.
In some embodiments, wherein the mammal is a human.
In some embodiments, wherein the recombinant viral genome or rAAV particle is administered to the subject at least one time.
In some embodiments, wherein the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
In some embodiments, wherein the viral genome or rAAV particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intraci st ernally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
In some embodiments, the present disclosure relates to a method of regulating the expression of a transgene in a subject comprising administering to a subject a polynucleotide comprising the transgene comprising at least one alternative exon, at least two introns flanking the alternative exon, and a ligand-responsive aptamer, and a ligand, wherein the presence of the ligand results in splicing out the alternative exon, the at least two introns, and the ligand- responsive aptamer from the transgene.
In some embodiments, wherein the transgene further comprises two exons flanking the alternative exon, the at least two introns, the ligand-responsive aptamer comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, wherein the transgene further comprises two exons flanking the alternative exon, the at least two introns, and the ligand-responsive aptamer comprising a polynucleotide having the nucleic acid sequence set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, wherein the transgene comprises a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA-16 2 gene.
In some embodiments, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2281 .
In some embodiments, wherein the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2281.
In some embodiments, wherein the at least one alternative exon comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
In some embodiments, wherein the at least one alternative exon comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
In some embodiments, wherein at least one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, wherein at least one of the introns comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, wherein the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
In some embodiments, wherein the ligand-responsive aptamer comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
In some embodiments, wherein the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239 and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
In some embodiments, wherein the ligand is tetracycline.
In some embodiments, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 21 11, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In some embodiments, wherein the transgene comprises a polynucleotide having a nucleic acid sequence set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In some embodiments, wherein the transgene is provided in a recombinant viral genome.
In some embodiments, wherein the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
In some embodiments, wherein the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
In some embodiments, wherein the AAV ITR sequences are AAV2 ITR sequences.
In some embodiments, wherein the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260. In some embodiments, wherein the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260.
In some embodiments, wherein the recombinant viral genome is provided in a an rAAV particle.
In some embodiments, wherein the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y~»F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
In some embodiments, wherein the rAAV particle further comprises at least one helper plasmid.
In some embodiments, wherein the helper plasmid comprises a rep gene and a cap gene.
In some embodiments, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
In some embodiments 71 , wherein the rAAV'' particle comprises two helper plasmids.
In some embodiments, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a Ela gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
In some embodiments, wherein administration of the ligand to the subject results in a fold increase in the RNA level of the exclusion isoform of about 300-400-fold.
In some embodiments, wherein administration of the ligand to the subject results in a fold increase in the protein level of the exclusion isoform of about 5-25-fold.
In some embodiments, splicing out the alternative exon, the at least two introns, and the aptamer from the transgene results in the production of a functional start codon in the transgene.
In some embodiments, splicing out the alternative exon, the at least two introns, and the aptamer results in the removal of a pre-mature stop codon from the transgene.
The present disclosure further relates to the following embodiments.
In some embodiments, aspects relate to a recombinant viral genome capable of delivering expressing) a transgene or coding region thereof in a subject, wherein said recombinant viral genome comprises at least one alternatively-spliced exon and a coding region of the transgene. In various aspects, the alternatively-spliced exon undergoes differential splicing in a condition-responsive manner to result in different spliced transcripts (e.g., mRNA isoforms), whereby the alternatively-spliced exon has been either retained (“spliced in”) or not retained (“spliced-out”) in the resulting spliced transcripts. For example, in a healthy cell environment, the alternatively-spliced exon may be spliced-out of the resulting transcript; however, in a cancer cell, the alternatively-spliced exon may be spliced-in the resulting transcript. And, depending upon the regulator}' sequences present in the alternatively-spliced exon, and whether those regulatory sequences impart a positive or negative regulator}- control on the expression of the coding region of interest, the alternatively-spliced exon regulates the expression of the coding region of interest by virtue of being either present (spliced-in) or not present (spliced-out) in the resulting mRNA transcript isoform.
In some embodiments, the alternatively-spliced exon may be provided in the form of a transgene comprising the alternatively-spliced exon, one or more introns (or portion(s) thereof), and one or more additional exons (e.g., constitutive exons). Such transgenes comprising an alternatively-spliced exon may be referred to herein as comprising an “alternatively-spliced exon cassettes.” The configuration of the alternatively-spliced exon cassettes and transgenes is not limited in any way, and examples of such configurations are provided in the Figures.
In some embodiments, the transgene comprises an alternatively-spliced exon, one or more introns (or portion(s) thereof) and one or more exons. In various embodiments, the one or more exons can be constitutive exons (i.e., those that are retained in all mRNA isoforms resulting from splicing). In certain embodiments, the transgene or the alternatively-spliced exon cassette comprises one intron (or portion thereof). In some embodiments, the intron (or portion thereof) is located 3’ or 5’ to an alternatively-spliced exon. In other embodiments, the transgene or the alternatively-spliced exon cassette comprises two introns (or portion(s) thereof) (e.g., whereby the one or more introns are flanking introns, i.e., introns that are immediately upstream or downstream of the alternatively-spliced exon).
In some embodiments, an alternative exon cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778. In some embodiments, an alternative exon cassette comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
In some embodiments, the alternatively-spliced exon comprises at least one modification, relative to a naturally occurring alternatively-spliced exon. In some embodiments, the alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon. In some embodiments, all native start codons located 5’ to the heterologous start codon are disrupted or deleted.
In some embodiments, the alternatively-spliced exon is located 5’ to the coding region of the transgene. In some embodiments, the alternatively-spliced exon cassette comprises two alternatively-spliced exons, each with flanking introns. In some embodiments, the two alternatively-spliced exons are adjacent. In some embodiments, the constitutive exon is located 5’ to the two alternatively-spliced exons.
In some embodiments, each alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon. In some embodiments, all native start codons located 5’ to the heterologous start codon of the 5 ’-most alternatively-spliced exon are disrupted or deleted.
In some embodiments, only one of the two alternatively-spliced exons is retained in the spliced transcript. In some embodiments, the 5 ’-most alternatively-spliced exon is retained in the spliced transcript. In some embodiments, the 3 ’-most alternatively-spliced exon is retained in the spliced transcript.
In some embodiments, the alternatively-spliced exon(s) and flanking intron(s) are located within the coding region of the transgene.
In some embodiments, the alternatively-spliced exon comprises a heterologous, in-frame stop codon. In some embodiments, the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction. In some embodiments, the heterologous stop codon elicits nonsense-mediated decay.
In various embodiments, the alternatively-spliced exon is spliced-in or retained in the presence of one or more conditions (z.e., in a condition-responsive manner) to result in an mRNA isoform comprising the alternatively-spliced exon and a coding region of interest. In some embodiments, the one or more conditions comprise the conditions that define one cell type from another. In other embodiments, the one or more conditions comprise the intracellular conditions that define a healthy cell state from a diseased cell state. In some embodiments, the one or more conditions comprise the presence or absence of activated T cells and/or the presence or absence of a state of inflammation. In still other embodiments, the one or more conditions comprise one or more signs or symptoms of a disease state, and/or the presence or absence of one or more disease markers. In still other embodiments, the one or more conditions comprise the expression level and/or activity of the endogenous protein that corresponds to the protein encoded by the coding region of interest in the alternatively-spliced exon cassette of the recombinant virus genome. For example, in one embodiment, if the endogenous protein has a low level of expression and/or activity (e.g., due to a defective naturally occurring gene encoding the endogenous protein), the alternatively-spliced exon may be spliced-in, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory' sequence). In another embodiment, if the endogenous protein has a low level of expression and/or activity (e.g., due to a defective naturally occurring gene encoding the endogenous protein), the alternatively-spliced exon may be spliced-in, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory' sequence). In still other embodiments, if the endogenous protein has a low level of expression and/or activity (e.g, due to a defective naturally occurring gene encoding the endogenous protein), the alternatively-spliced exon may be spliced-out, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory' sequence that is removed by the splicing-out of the exon). In another embodiment, if the endogenous protein has a low level of expression and/or activity (e.g., due to a defective naturally occurring gene encoding the endogenous protein), the alternatively-spliced exon may be spliced-out, and the coding region of interest may be downregulated (e.g., if the alternatively- spliced exon comprises a positive regulatory sequence that is removed by the splicing-out of the exon).
In various embodiments, the one or more conditions (e.g., environmental, intracellular, disease state, cell type, expression pattern, etc.) may result in the splicing-in or splicing-out of the alternatively-spliced exon. For example, the one or more conditions may cause the alternatively-spliced exon to be spliced-in, and the coding region of interest may be upregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory sequence). In another embodiment, the one or more conditions may cause the alternatively-spliced exon to be spliced- in, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a negative regulatory/ sequence). In still other embodiments, the one or more conditions may cause the alternatively-spliced exon to be spliced-out, and the coding region of interest may be upregulated (e.g, if the alternatively-spliced exon comprises a negative regulatory sequence that is removed by the splicing-out. of the exon). In another embodiment, the one or more conditions may cause the alternatively-spliced exon to be spliced-out, and the coding region of interest may be downregulated (e.g., if the alternatively-spliced exon comprises a positive regulatory sequence that is removed by the splicing-out of the exon).
In some embodiments, the alternatively-spliced exon comprises an alternatively-spliced exon from a gene selected from the group consisting of: ABCC1, AK 125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EX0C7, EZH2, FAM120A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIPIL1, F0XRED1, FUBP3, GALT, GATA3, G0LGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, 1RAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAEI, NCSTN, NR4A3, XRI' L NUP98, PARP6, PCM 1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPHl, SEC16A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TR.OVE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK2B, PKP2, l.GMX, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM. In some embodiments, the alternatively-spliced exon comprises an alternatively-spliced exon from or derived from an alternatively-spliced exon of a gene selected from the group consisting of CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIFI3A, and/or PICALM. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of CAMK2B. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of PKP2. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of LGMN. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of NRAP. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of VPS39. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of KSR1. In some embodiments, the alternatively- spliced exon is or is derived from an alternatively-spliced exon of PDLIM3. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of BINI. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of ARFGAP2. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of KIF13A. In some embodiments, the alternatively-spliced exon is or is derived from an alternatively-spliced exon of PICALM.
In some embodiments, the alternatively-spliced exon is or is derived from exon 11 of BINI. In some embodiments, the alternatively-spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the alternatively- spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having a. nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the alternatively- spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38. In some embodiments, the alternatively-spliced exon which is or is derived from exon 1 1 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
In some embodiments, a component (e.g., an alternative exon; an intronic sequence) which is “derived from” a gene (e.g., BINI, SMN1) may be derived from the gene in that the component is taken from its wild-type or natural context and put into a non-natural context (e.g., inserted into the nucleic acid sequence of a transgene), but may comprise the wild-type or natural nucleic acid sequence of said component. In some embodiments, a component (e.g., an alternative exon; an intronic sequence) which is “derived from” a gene (e.g., BINI, SMNf) may be derived from the gene in that the component is taken from its wild-type or natural context and put into a non-natural context (e.g., inserted into the nucleic acid sequence of a. transgene), and may also be derived from the gene in that the nucleic acid sequence of the component is modified, relative to the wild-type or natural nucleic acid sequence of said component. Modifications to the various components (e.g., introns, exons, etc.) are described elsewhere herein. In some embodiments, the alternatively-spliced exon comprises an alternatively-spliced exon comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 23-44.
In some embodiments, the flanking intron(s) (or portion(s) thereof) is a native flanking intron(s) (or portion(s) thereof) of the alternatively-spliced exon(s). In some embodiments, the flanking intron(s) (or portion(s) thereof) comprises at its 5’ end a 5’ splice donor site. In some embodiments, the flanking intron(s) (or portion(s) thereof) comprises at its 3’ end a 3’ splice donor site. In some embodiments, the flanking intron(s) (or portion(s) thereof) comprises no modifications, relative to a naturally occurring intron (or portion thereof). In some embodiments, the flanking intron(s) (or portion(s) thereof) comprises at least one modification, relative to a naturally occurring intron (or portion thereof). In some embodiments, the modification is a substitution or deletion of one or more nucleotides. In some embodiments, the flanking intron(s) (or portion(s) thereof) is a regulated intron (or portion thereof).
In some embodiments, the flanking intron(s) is or is derived from an intron of a gene selected from the group consisting of ABCC1, AK 125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPl 11, SEC16A, SFRS3, SFRS7, SI . MAP, SMNI, SNRNP70, STAT6, TBC1D1, T1MM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK.2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM.
In some embodiments, the flanking intron(s) is or is derived from an intron of SMNI. In some embodiments, the flanking intron(s) which is or is derived from an intron of SMNI flanks a constitutive exon. In some embodiments, the flanking intron(s) is or is derived from intron 6 and/or intron 7 of SMNI . In some embodiments, the flanking intron which is derived from SMNI intron 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMNI intron 6, In some embodiments, the flanking intron which is derived from SMNI intron 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103. In some embodiments, the flanking intron which is derived from SMN1 intron 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103. In some embodiments, the flanking intron which is derived from SMN 1 intron 7 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 7. In some embodiments, the flanking intron which is derived from SMN1 intron 7 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 104. In some embodiments, the flanking intron which is derived from SMN1 intron 7 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 104.
In some embodiments, the flanking intron(s) is or is derived from an intron of BINI . In some embodiments, the flanking intron(s) which is or is derived from an intron of BINI flanks an alternative exon. In some embodiments, the flanking intron(s) is or is derived from intron 10 and/or intron 11 of BINI. In some embodiments, the flanking intron(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the flanking intron(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the flanking intron(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16. In some embodiments, the flanking intron(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
In some embodiments, the flanking intron(s) comprises an intron comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 1-22, 103, and 104.
In some embodiments, the constitutive exon is an exon which is natively associated with the coding region of the transgene. In some embodiments, the constitutive exon is not a exon which is natively associated with the coding region of the transgene. In some embodiments, the constitutive exon is or is derived from the same gene as the alternatively-spliced exon(s). In some embodiments, the gene is the gene from which the coding region of the transgene is also derived. In some embodiments, the constitutive exon is not from or derived from the same gene as the alternatively-spliced exon(s).
In some embodiments, the coding region of the transgene is or is derived from a coding region of a gene selected from the group consisting of MBNLl, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C, hnRNP D, hnRNP DL, hnRNP F, hnRNP H, hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FMRP, MTM1, MTMR2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1, NEB, ACTA, TPMS, TPM2, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, L.MOD3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2. POLGI, GAA, AGL, PYGM:, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcoglycan-encoding gene, a delta-sarcoglycan- encoding gene, TCAP, TRIM32, FKRP, FXN, POMT1, FKTN, POMT2, POMGnTl, DAG1, ANO5, PLECl, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TOR1AIP1, POGLUT2, LA.MA2, COL6A1, POMT1, POMT2, DUX4, HMD, PAX7, PMP22, MPZ, MFN2, SMCHD1, SMN, Lamin A/C (I. A MN). GJB1, ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, C North 5, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, E1F4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1, MALTl e7, MAP2K7, MAP3K7, M AP4K2. MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF 1, NUP98, PARI56, PCM 1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH 1, SEC16A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TR.OVE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM. In some embodiments, the coding region of the transgene is or is derived from MTM1, CAPN3, or FXN. In some embodiments, the coding region of the transgene is or is derived from FXN. In some embodiments, the coding region of the transgene is or is derived from MTM1 . In some embodiments, the coding region of the transgene which is or is derived from MTM1 comprises a polynucleotide having at least 70%, at least 75%, at ieast 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1881 . In some embodiments, the coding region of the transgene which is or is derived from MTM1 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1881.
In some embodiments, the coding region of the transgene is or is derived from CAPN3. In some embodiments, the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1882. In some embodiments, the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1882.
In some embodiments, a recombinant viral genome of the present disclosure further comprises a promoter. In some embodiments, the promoter is a native promoter of the coding region of the transgene. In some embodiments, the promoter is not a native promoter of the coding region of the transgene. In some embodiments, the promoter is constitutive. In some embodiments, the promoter is inducible. In some embodiments, the promoter is a cell-specific promoter. In some embodiments, the promoter is a tissue-specific promoter. In some embodiments, the promoter is selected from the group consisting of an EFI alpha promoter, beta actin promoter, CMV, muscle creatine kinase promoter, C5-12 muscle promoter, MHCK7, CBh, synapsin, MECP2, enolase, GFAP, Desmin, and CAG promoter.
In some embodiments, the promoter is an MHCK7 promoter. In some embodiments, an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880. In some embodiments, an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
In some embodiments, the promoter drives expression of the transgene (e.g., expression of the product encoded by the coding region of interest). In some embodiments, the promoter is a ubiquitous promoter. In some embodiments, a ubiquitous promoter is a promoter selected from the group consisting of: an EFl alpha promoter, a beta actin promoter, CMV, CBh, and CAG promoter. In some embodiments, the promoter is a tissue-specific promoter, such as a muscle- or heart-biased promoter. In some embodiments, a tissue-specific promoter, such as a muscle- or heart-biased promoter, is a promoter selected from the group consisting of: a muscle creatine kinase promoter, a C5-12 muscle promoter, MHCK7, and Desmin. In some embodiments, the promoter is a neuronal -biased promoter. In some embodiments, a neuronal -biased promoter is a promoter selected from the group consisting of: synapsin and VIECP2. In some embodiments, the promoter is an astrocyte-biased promoter. In some embodiments, an astrocyte-biased promoter is a GFAP promoter.
In some embodiments, the coding region of the transgene comprises at least one modification, relative to a coding region of a naturally occurring gene. In some embodiments, the modification is an addition, substitution or deletion of at least one nucleotide. In some embodiments, the coding region of the transgene comprises a deletion of a native start codon, or a portion thereof. In some embodiments, the coding region of the transgene comprises an addition of a non-native stop codon, or a portion thereof. In some embodiments, the transgene comprises one or more recombinant introns (e.g., a 3’ UTR intron). In some embodiments, the one or more recombinant introns (e.g., a 3' UTR intron), when translated, elicits nonsense mediated decay (NMD).
In some embodiments, the naturally occurring gene is a gene selected from the group consisting of MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C, hnRNP I), hnRNP DL, hnRNP F, hnRNP I L hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAFI 5, EWSR1, M ATR3, TIA1, FMRP, MTM1, MTMR.2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, K M 11.40.. K I .Hl .4 L. LMOD3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2, POLG1, GAA, AGE, PYGM, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNA.IB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcogly can-encoding gene, a beta-sarcogly can-encoding gene, a gamma-sarcoglycan-encoding gene, a del ta-sarcogly can-encoding gene, TCAP, TRIM32, I K R.P. FXN, POMTI, I K I N. POM 1'2, POMGnTl, DAG1, ANO5, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, P0PDC1, TORI AIP1, POGLUT2, LAMA2, COL6A1, P0MT1, P0MT2, DUX4, EMD, PAX7, PMP22, MPZ, MFN2, SMCHD1, SMN, Lamin A/C (LAMN), and/or GJ Bl . In some embodiments, the naturally occurring gene is MTM1, CAPN3, or FXN. In some embodiments, the naturally occurring gene is MTM1. In some embodiments, the naturally occurring gene is CAPN3. In some embodiments, the naturally occurring gene is FXN.
In some embodiments, the coding region of the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882. In some embodiments, the coding region of the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
In some embodiments, the recombinant viral genome is a recombinant genome from an adeno-associated vims (rAAV), lentivirus, retrovirus, or foamyvirus. In some embodiments, the recombinant viral genome is from an AAV, In some embodiments, the transgene is flanked by AAV inverted terminal repeat (ITR) sequences. In some embodiments, the ITR sequences comprise AAV1, AAV2, AAV5, AAV7, AAV8, or AAV9 ITR sequences. In some embodiments, the recombinant viral genome is from a lentivirus. In some embodiments, the alternatively-spliced exon cassette is located on the minus strand of the lentivirus genome.
In some embodiments, a recombinant viral genome of the present disclosure further comprises a 3’ untranslated region (UTR) that is endogenous or exogenous to the transgene. In some embodiments, the exogenous 3’ UTR is the 3’ UTR from bovine growth hormone, SV40, EBV, or Myc.
In some embodiments, the exogenous 3’ UTR is SV40. In some embodiments, the SV40 3’ UTR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set. forth in SEQ ID NO: 1883. In some embodiments, the SV40 3’ UTR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1883.
In some embodiments, the exogenous 3’ UTR comprises a polyadenylation (pA) signal. In some embodiments, the pA signal is an SV40 pA signal.
Aspects of the invention contemplate a viral particle comprising a viral genome according to any embodiment of the present disclosure. In some embodiments, the viral particle is an rAAV particle. In some embodiments, the rAAV particle comprises an AAV serotype selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, the rAAV particle comprises AAV serotype 9. In some embodiments, the rAAV particle comprises an AAV derivative or pseudotype selected from the group consisting of an AAV2-AAV3 hybrid, AAVrh.10, AA.Vhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, A AV-H AE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y">F), AAV8 (Y733F), AAV2. I 5, AAV2.4, AAVM4I, and AAVr3.45.
In some embodiments, the viral particle further comprises at least one helper plasmid. In some embodiments, the helper plasmid comprises a rep gene and a cap gene. In some embodiments, the rep gene encodes Rep78, Rep68, Rep52, or Rep40. In some embodiments, the cap gene encodes a VP1 , VP2, and/or VP3 region of the viral capsid protein. In some embodiments, the viral particle comprises two helper plasmids. In some embodiments, the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a Ela gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
In some embodiments, the viral particle is a recombinant lentivirus particle. In some embodiments, the lentivirus is a human immunodeficiency virus (HIV1 or HIV2), a feline immunodeficiency virus (FIV), a bovine immunodeficiency virus (BIV), a caprine arthritis encephalitis virus, an equine infectious anemia virus, a jembrana disease virus, a puma lentivirus, aimian immunodeficiency virus, or a visna-maedi vims. In some embodiments, the viral particle further comprises a viral envelope.
Aspects of the invention relate to a method of treating a disease or condition in a subject comprising administering a recombinant viral genome or a viral particle according to any embodiment of the present disclosure to the subject. In some embodiments, the subject is a mammal. In some embodiments, the mammal is a human. In some embodiments, the recombinant viral genome or viral particle is administered to the subject at least one time. In some embodiments, the recombinant viral genome or viral particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times. In some embodiments, the recombinant viral genome or viral particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intra ci sternally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more ceils, tissues, or organs. In some embodiments, the recombinant viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravitreai injection.
In some embodiments, the disease or condition is a disease or condition selected from the group consisting of Dentatorubrai-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI ), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCAl), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SC.A8), spinocerebellar ataxia type 10 (SC A 10), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCAl 7), Syndromic / non-syndromic X-linked mental retardation, Emery-Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Marie-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Werner syndrome, Dilated cardiomyopathy (DCM), Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (ARVD), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limb-girdle muscular dystrophy, Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy, Distal muscular dystrophy, Emery-Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD- related disorder, Prion disease, a motor neuron disease (VXD), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PM A), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader-Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type HI, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronuclear myopathy.
Aspects of the invention relate to a method of regulating transgene expression (e.g, comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) using a viral vector comprising a recombinant viral genome as described herein, wherein the transgene, or coding region of the transgene, are under the regulatory' control of an alternatively-spliced exon. In some embodiments, the method comprises inserting into the recombinant viral genome at least one alternatively-spliced exon and at least one coding region of interest (e.g., which encodes a therapeutic protein), wherein the expression of the at least one coding region of interest is regulated by the alternative-spliced exon. In turn, how the regulation of the coding region of interest is imparted depends on (a) the presence or absence of positive or negative regulatory control sequences in the alternatively-spliced exon, and (b) whether the alternatively-splice exon is spliced-in (i.e., retained) or spliced-out (i.e., removed) from the final mRNA transcript isoform. The recombinant viral genome may be configured with one or more additional introns, exons, and/or regulatory sequences (e.g., promoters, enhancers, and the like that control transcription from the recombinant viral genome). In addition, the alternatively- splice exon may be comprised on a cassette (which may be referred to as an alternatively-spliced exon cassette), comprising the alternatively-spliced exon(s) and one or more introns, which may be inserted into the recombinant viral genome in a manner that couples it to the coding region of interest, such that the expression of the coding region of interest comes under regulatory control of the alternatively-spliced exon of the cassette.
In other embodiments, the transgene comprises an alternatively-spliced exon, optionally one or more introns (or portion(s) thereof), optionally one or more constitutive exons, and a coding region of interest.
Aspects of the invention relate to a method of regulating transgene (e.g., comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) expression using a viral vector comprising a recombinant viral genome as described herein. In some embodiments, the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron (or portion thereof), and a coding region of a transgene; (b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively-spliced exon; (c) disrupting or deleting all native start codons located 5’ to the heterologous start codon; and (d) deleting or disrupting one or more native start codons, or a portion(s) thereof, from the coding region of the transgene. In some embodiments, the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron (or portion thereof), and a coding region of a transgene; (b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively- spliced exon; (c) disrupting or deleting all native start codons located 5’ to the heterologous start codon; and (d) adding a heterologous 3’ UTR, or a portion thereof, to the coding region of the transgene. In some embodiments, translation of the heterologous 3’ UTR elicits nonsense mediated decay. In some embodiments, (a) inserting into the recombinant viral genome at least one alternatively-spliced exon cassette, wherein the alternatively-spliced exon cassette comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron (or portion thereof), and a coding region of a transgene; (b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively-spliced exon; (c) disrupting or deleting all native start codons located 5’ to the heterologous start codon, (d) deleting or disrupting one or more native start codons, or a portion(s) thereof, from the coding region of the transgene; and (e) adding a heterologous 3’ UTR, or a portion thereof, to the coding region of the transgene. In some embodiments, translation of the heterologous 3’ UTR elicits nonsense mediated decay. In some embodiments, the constitutive exon, alternatively-spliced exon, and flanking intron (or portion thereof) are each located 5’ to the coding region of the transgene.
Aspects of the invention relate to a method of regulating transgene (e.g, comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) expression using a viral vector comprising a recombinant viral genome as described herein. In some embodiments, the method comprises: (a) inserting into the recombinant viral genome at least one transgene, wherein the transgene comprises an alternatively-spliced exon and at least one flanking intron (or portion thereof) within the coding region of the transgene; and (b) introducing into the alternatively -spliced exon a heterologous, in-frame stop codon upstream of the next 5' splice junction. In some embodiments, the heterologous, in-frame stop codon elicits nonsense-mediated decay. In certain embodiments, the in-frame stop codon is inserted at least 100 nucleotides, at least 95 nucleotides, at least 90 nucleotides, at least 85 nucleotides, at least 80 nucleotides, at least 75 nucleotides, at least 70 nucleotides, at least 65 nucleotides, at least 60 nucleotides, at least 55 nucleotides, at least 50 nucleotides, at least 45 nucleotides, at least 40 nucleotides, at least 35 nucleotides, at least 30 nucleotides, at least 25 nucleotides, at least 20 nucleotides, at least 15 nucleotides, at least 10 nucleotides, or at least 5 nucleotides, or between 1 to 5 nucleotides upstream of the next 5’ splice junction.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (v) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon. In some embodiments, all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Other aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively- spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, and (v) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5: to 3’ orientation. Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative Gs-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively -spliced exon comprising at its 3’ end a heterologous ATG start codon, and (iii) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon. In some embodiments, all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation, (ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iii) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation; (iv) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, and (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon. Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative c/s-acting element; and (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon;
(iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon. In some embodiments, all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative as-acting element; (iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; and (iv) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon. In some embodiments, all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation; (v) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5' end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (vi) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon. Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element, and (iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3: splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start, codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (v) a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation, wherein the third exonic sequence comprises an alternatively-spliced exon; (vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (vii) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start, codon. In some embodiments, all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon; (vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site (m); and (vii) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first, intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises a first alternatively-spliced exon comprising a positive or negative cA-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a second alternatively-spliced exon; (vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, and (vii) a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation, wherein the third exonic sequence comprises a constitutive exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation, (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ spiice donor site and at its 3’ end a 3’ splice acceptor site; and (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively -spliced exon.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first, portion of a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon; (vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (vii) a nucleotide sequence comprising a second portion of a coding region of the transgene having a 5’ to 3’ orientation.
Aspects of the invention relate to a transgene comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a coding region of the transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising a first, intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site, (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cis-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and (v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon. Aspects of the disclosure relate to a transgene comprising: (i) a constitutive exon and one or more intronic sequences, each from a first gene; (ii) an alternatively-spliced exon cassette, and (iii) a coding region of interest from a third gene. In some embodiments, the alternatively- spliced exon cassette comprises: (a) an alternatively-spliced exon, and (b) flanking intronic sequences. In some embodiments, each of (a) and (b) are from a second gene. In some embodiments, the alternatively-spliced exon comprises an ATG start codon at its 3’ end.
In some embodiments, the first and second gene are the same gene, the first and third gene are the same gene; or all of the first, second, and third genes are the same gene.
In some embodiments, the first gene is survival motor neuron 1 (SMN1).
In some embodiments, the constitutive exon comprises exon 6 of SMN1, or a portion thereof. In some embodiments, the constitutive exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102. In some embodiments, the constitutive exon comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102.
In some embodiments, the one or more intronic sequences of (i) are or are derived from intron 6 and/or intron 7 of SMN1. In some embodiments, the one or more intronic sequences of (i) comprise(s) a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104. In some embodiments, the one or more intronic sequences of (i) comprise(s) a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104.
In some embodiments, the second gene is a gene selected from the group consisting of: CAMK2B, PKP2, LGMN, \RAP. VPS39, KSR 1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM. In some embodiments, the second gene is bridging integrator 1 (BINI),
In some embodiments, the alternatively-spliced exon comprises exon 11 of BINI . In some embodiments, the alternatively-spliced exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38. In some embodiments, the alternatively-spliced exon comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38.
In some embodiments, the flanking intronic sequences of (ii) are or are derived from intron 10 and/or intron 11 of BINI. In some embodiments, the flanking intronic sequences of (ii) each comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16, In some embodiments, the flanking intronic sequences of (ii) each comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16.
In some embodiments, the alternatively-spliced exon cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778. In some embodiments, the alternatively-spliced exon cassette comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
In some embodiments, the third gene is myotubularin 1 (MTM1) or calpain 3 (CA.PN3).
In some embodiments, the coding region of interest comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882. In some embodiments, the coding region of interest comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
In some embodiments, if the wild-type alternatively-spliced exon does not comprise an ATG start codon, the alternatively-spliced exon comprises 1-3 nucleic acid substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon within the alternatively- spliced exon. In some embodiments, the ATG start codon is formed in the alternatively-spliced exon by 1 nucleic acid substitution. In some embodiments, the ATG start codon is formed in the alternatively-spliced exon by 2 nucleic acid substitutions. In some embodiments, the ATG start codon is formed in the alternatively-spliced exon by 3 nucleic acid substitutions.
In some embodiments, the alternatively-spliced exon is retained in the spliced transcript. In some embodiments, all native start codons located 5’ to the ATG start codon located within the alternatively-spliced exon are disrupted or deleted.
In some embodiments, the alternatively-spliced exon cassette is located 5’, relative to the coding region of interest. In some embodiments, the constitutive exon is located 5’, relative to the alternatively-spliced exon cassette. In some embodiments, the one or more intronic sequences of (i) flank the alternatively-spliced exon cassette.
In some embodiments, the alternatively-spliced exon comprises a heterologous, in-frame stop codon. In some embodiments, the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction. In some embodiments, the heterologous, inframe stop codon elicits nonsense-mediated decay.
In some embodiments, the alternatively-spliced exon is retained in the spliced transcript in distinct tissues. In some embodiments, the alternatively-spliced exon is retained in the spliced transcript in skeletal muscle. In some embodiments, the alternatively-spliced exon is not retained in the spliced transcript in heart and/or liver tissue.
In some embodiments, the flanking intronic sequences of (ii)(b) are or are derived from native flanking introns of the alternatively-spliced exon. In some embodiments, the flanking intronic sequences of (ii)(b) each comprise at least one modification, relative to a naturally occurring intronic sequence. In some embodiments, the modification is a substitution or deletion of one or more nucleic acids.
In some embodiments, the ATG start codon is located at the 3’ end of the alternatively- spliced exon. In some embodiments, the ATG start codon is in the same reading frame as the coding region of interest. In some embodiments, the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon. In some embodiments, the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
In some embodiments, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end, the first 10 nucleotides of the flanking intronic sequence which is immediately 3’ to the alternatively-spliced exon comprise 1 -5 nucleotide substitutions, relative to the wild-type flanking intronic sequence which is immediately 3’ to the wild-type alternatively- spliced exon. In some embodiments, the one or more intronic sequences of (i) each comprise at least one modification, relative to a naturally occurring intronic sequence. In some embodiments, the modification is a substitution or deletion of one or more nucleic acids.
In some embodiments, the coding region of interest comprises at least one modification, relative to a naturally occurring coding region of the third gene. In some embodiments, the modification is a substitution or deletion of one or more nucleic acids. In some embodiments, the coding region of interest comprises a deletion or disruption of a native start codon. In some embodiments, the coding region of interest comprises at least one heterologous stop codon. In some embodiments, the at least one heterologous stop codon is at least 50 nucleotides upstream of the next 5’ splice junction. In some embodiments, the at least one heterologous stop codon elicits nonsense-mediated decay.
In some embodiments, a transgene as described in any embodiment of the disclosure further comprises a 3’ untranslated region (UTR). In some embodiments, the 3’ UTR is SV40. In some embodiments, the SV40 3: UTR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1883. In some embodiments, the SV40 3’ UTR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1883. In some embodiments, the 3’ UTR comprises a polyadenylation (pA) site and a cleavage site. In some embodiments, the polyadenylation site is an SV40 pA site.
In some embodiments, a transgene as described in any embodiment of the disclosure further comprises a promoter, wherein the promoter is located 5’, relative to all of (i), (ii), and (iii). In some embodiments, the promoter is a tissue-specific promoter. In some embodiments, the tissue-specific promoter is an MHCK7 promoter. In some embodiments, an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880. In some embodiments, an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
In some embodiments, the alternatively-spliced exon cassette comprises a nucleic acid sequence which is 450 to 650 nucleotides in length. Aspects of the disclosure relate to a recombinant viral genome comprising a transgene as described in any embodiment of the disclosure. In some embodiments, the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV). In some embodiments, the transgene is flanked by AAV inverted terminal repeat (ITR) sequences. In some embodiments, the AAV ITR sequences are AAV2 ITR sequences. In some embodiments, an AAV2 ITR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879.
In some embodiments, the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ II) NO: 105 or SEQ ID NO: 106. In some embodiments, the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
Aspects of the disclosure relate to an rAAV particle comprising a recombinant viral genome as described in any embodiment of the disclosure. In some embodiments, the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV- HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAA Shi 110, AAV2 (Y-»F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45. In some embodiments, the rAAV particle further comprises at least one helper plasmid. In some embodiments, the helper plasmid comprises a rep gene and a cap gene. In some embodiments, the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VPS region of the viral capsid protein. In some embodiments, the r,AAV particle comprises two helper plasmids. In some embodiments, the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
Aspects of the disclosure relate to a recombinant viral genome comprising a transgene. In some embodiments, the transgene comprises: (i) a constitutive exon and one or more intronic sequences; (ii) an alternative exon cassette; and (iii) a coding region of interest. In some embodiments, the alternative exon cassette comprises: (a) an alternatively-spliced exon; (b) at least a portion of the intron immediately upstream of the alternatively-spliced exon, and (c) at least a portion of the intron immediately downstream of the alternatively-spliced exon. In some embodiments, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end: (1) the 3’ end of the alternatively-spliced exon comprises 1-3 nucleic acid substitutions relative to the wild-type alternatively-spliced exon to form an ATG start codon, and (2) the first 10 nucleotides of the intron immediately downstream of the alternatively-spliced exon comprise 1-5 nucleic acid substitutions relative to the wild-type intron immediately downstream of the wild-type alternatively-spliced exon.
In some embodiments, the 1-5 nucleic acid substitutions of (2) increase splice site strength. In some embodiments, any wild-type start codons within the alternatively-spliced exon located upstream of the ATG start codon at the 3’ end of the alternatively-spliced exon are disrupted or deleted. In some embodiments, the recombinant viral genome further comprises a tissue-specific promoter upstream of the alternative exon cassette. In some embodiments, the coding region of interest, is or is derived from a naturally occurring coding region of MTM1 or CAPN3. In some embodiments, the tissue-specific promoter is an MHCK7 promoter. In some embodiments, the alternative exon is exon 11 of the BIN I gene. In some embodiments, the constitutive exon is exon 6 of the SMN1 gene. In some embodiments, the alternative exon cassette promotes skeletal muscle expression of the coding region of interest and reduces cardiac muscle expression of the coding region of interest. In some embodiments, the alternative exon cassette is approximately 600 nucleotides in length.
Aspects of the disclosure relate to a method of treating a disease or condition in a subject comprising administering a recombinant viral genome or an rAAV particle according to any embodiment, of the present disclosure to the subject. In some embodiments, the subject is a mammal. In some embodiments, the mammal is a human. In some embodiments, the recombinant viral genome or rAAV particle is administered to the subject at least one time. In some embodiments, the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times. In some embodiments, the viral genome or rAAV particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intraci sternal ly, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs. In some embodiments, the viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravitreal injection. In some embodiments, the disease or condition is a disease or condition selected from the group consisting of Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SCA8), spinocerebellar ataxia type 10 (SCA10), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCA17), Syndromic / non-syndromic X-linked mental retardation, Emery-Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Mari e-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Werner syndrome, Dilated cardiomyopathy (DCM), Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (ARVD), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limb-girdle muscular dystrophy. Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy, Distal muscular dystrophy, Emery-Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (VXD), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PM A). Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader- Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronucl ear myopathy .
BRIEF DESCRIPTION OF DRAWINGS
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
FIG. 1 is a schematic illustrating the concept of a recombinant viral genome (e.g., rAAV or lentivirus) modified to include a transgene comprising a coding region of interest (e.g., encoding a therapeutic protein) under regulatory’ control by an alternatively-spliced exon (or an alternatively-spliced exon cassette). Step (b) shows the formation of a pre-mRNA which includes the coding region of interest and the alternatively-spliced exon. Step (c) shows the splicing-out or splicing-in of the alternatively -spliced exon based on one or more conditions (e.g, cell type, disease state, or other intracellular environmental signal). The splicing-out of the alternatively- spliced exon results in mRNA isoform 1 in (d), whereas the splicing-in of the alternatively- spliced exon (ASE) results in mRNA isoform 2 in (e). As shown in (g), the absence of the alternatively-spliced exon removes a positive or negative regulatory' ds-element. The removal of a positive regulatory cA-element, such as a translation start signal, will result in the downregulation or decreased expression of the transgene, i.e., the reduced expression of the product encoded by the coding region of interest. However, the removal of a negative regulatory c/x-element, such as mRNA degradation element, may lead to the upregulation or increased expression of the transgene, i.e., the increased expression of the product encoded by the coding region of interest. As shown in (h), the presence of the alternatively-spliced exon splices-in a positive or negative regulatory/ m-element associated with the alternatively-spliced exon. The maintenance of a positive regulatory c/s-element, such as a translation start signal, will result in the upregulation or increased expression of the transgene, i.e., the increased expression of the product encoded by the coding region of the transgene. However, the maintenance of a negative regulatory czs-element, such as mRNA degradation element, may lead to the downregulation or decreased expression of the transgene, i.e., the decreased expression of the product encoded by the coding region of the transgene.
FIG. 2 shows different models of alternative splicing which could be utilized in the nucleic acid vectors of the present disclosure. From top to bottom: a skipped exon model of alternative splicing, a retained intron model of alternative splicing, an alternative 5’ splice site model of alternative splicing, an alternative 3’ splice site model of alternative splicing, a mutually exclusive exon model of alternative splicing, and an alternative last exon model of alternative splicing. White regions represent constitutive exons throughout. Gray regions represent alternatively-spliced exons. One or more of the constitutive exons may be modified to contain a coding region of interest, e.g., a coding region of a transgene that encodes a therapeutic protein.
FIGs. 3A-3B show two schematics representing exemplary recombinant viral genomes. FIG. 3A shows a typical recombinant adeno-associated virus (rAAV) genome design. Two AAV inverted terminal repeats (ITRs) flank the transgene. The transgene may comprise a coding region of interest (e.g., encoding a therapeutic protein) under regulator}' control of an alternatively-spliced exon (or cassette comprising an alternatively -spliced exon). In various embodiments, the cassettes (e.g., in the context of a transgene) may take on the architectures shown in any of FIGs. 2 or 3-8, or any other suitable arrangement of elements so long as the alternatively-spliced exon is configured to regulate the expression of the coding region of interest of the transgene. FIG. 3B shows a typical recombinant lentivirus genome design. The 5’ and 3’ sequences of the lentivirus genome flank the packaging signal (PSI), rev response elements (RRE), and transgene. The transgene may comprise a coding region of interest (e.g, encoding a therapeutic protein) under regulator}/ control of an alternatively-spliced exon (or cassette comprising an alternatively-spliced exon). When transgenes are introduced using a lentivirus vector genome, the promoter and nucleotide sequence comprising the transgene sequence must be encoded on the minus strand of the lentivirus genome to prevent splicing during virus production and packaging. In various embodiments, the cassettes (e.g., in the context of a transgene) may take on the architectures shown in any of FIGs. 2 or 3-8, or any other suitable arrangement of elements so long as the alternatively-spliced exon is configured to regulate the expression of the coding region of interest of the transgene. FIGs. 4A-4T show seven embodiments contemplated for the structural configuration of the cassettes (e.g., in the context of a transgene) that may inserted into a recombinant viral vector genome and which comprise an alternatively-spliced exon and comprising, in some embodiments, at least one positive or negative regulatory cA-element. Non-limiting examples of positive or negative regulatory cA-elements located within the alternatively-spliced exons can include, without limitation, a translation start codon, a translation stop codon, a binding site for an RNA binding protein that serves to positively regulate mRNA translation, a binding site for an RNA binding protein that serves to negatively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g., an siRNA) that serves to negatively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to positively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to negatively regulate mRNA stability or degradation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA stability or degradation, a ligand-responsive sequence, or a binding site for a nucleic acid molecule (e.g, an siRNA) that serves to negatively regulate mRNA stability or degradation. This list of examples is not intended to place any limitation on the scope and meaning of the positive and negative cis- elements and the disclosure embraces any genetic element or region positioned within, or at least associated with, an alternatively-spliced exon which exerts a positive or negative control on the overall expression of a transgene (e.g, encoding a therapeutic protein or a miRNA). In some cases, the cis-element is within the alternatively -spliced exon, but in other cases, the cis-element is separate from, but at least associated with, the alternatively-spliced exon, such that it becomes spliced-in or spliced-out at the same time as the alternatively-spliced exon. In various embodiments, the cassettes (e.g., in the context of a transgene) may include one or more additional components, including one or more introns. In FIGs, 4A-4C, the constitutive exons not comprising the coding region of interest are represented by narrow rectangles, introns are represented as dashed lines, and the alternatively-spliced exons are represented as shaded narrow rectangles. The exon or exons comprising the coding region (or portions thereof in the case of where the coding region is split into separate exons) are indicated as solid thick white rectangles. FIG. 4A is a schematic of a cassette (e.g, in the context of a transgene) embodiment whereby the alternatively-spliced exon is upstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is to the 5’ of the exon encoding the coding region of interest. FIG. 4B is a schematic of a cassette (e.g., in the context of a transgene) embodiment whereby the alternatively-spliced exon is downstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively- spliced exon is to the 3: of the exon encoding the coding region of interest. FIG. 4C is a schematic of a cassette (e.g, in the context of a transgene) embodiment whereby the alternatively-spliced exon is positioned between two separate exons encoding portions of the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is between the exons encoding the portions of the coding region of interest. FIG. 4D shows a nonlimiting embodiment of an approach that puts a gene sequence under control of a ligand- responsive sequence. In this embodiment, a naturally occurring gene can be engineered to become under the control of a ligand by inserting the cassette into the gene. The portions upstream and downstream of the site at which the cassette is inserted then become separate exons, FI€». 4E shows a non -limiting embodiment of a transgene comprising an alternatively- spliced cassette. In this embodiment, the expression cassette comprises a general structure comprising at least, one alternative exon, at least two introns flanking the alternative exon, a ligand-response sequence, and a plurality of splice sites. FIG. 4F shows a non-limiting embodiment of a transgene comprising a non-continuous start codon split by the alternatively spliced cassette. In this embodiment, the exons comprise a non-continuous start, codon such that the 3’ most nucleotides of the upstream exon comprise an A or AT and the 5’ most nucleotides of the downstream exon comprise a TG or G, respectively. FIG. 4G shows a non-limiting embodiment of an alternatively spliced exon cassette comprising a stop codon that is inserted between two consecutive coding sequences of a gene (e.g., two exons of a gene). In this embodiment, the exons flanking the cassette are not translated in the absence of ligand and the presence of a pre-mature stop codon in the alternative exon. FIG. 4H shows a non-limiting embodiment of an alternatively spliced exon cassette that is inserted in a coding sequence for a regulatory RNA molecule. In this embodiment, the two exons encode an interfering RNA, such as a miRNA, such that removal of the alternative exon produces a functional miRNA molecule that is capable of regulating gene expression. FIG. 41 shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence. In this embodiment, an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are brought together to form an RNA that encodes the protein of interest. FIG. 4J shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence. In this embodiment an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are disrupted and the RNA cannot encode the protein of interest. FIG. 4K shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms. The alternative exon is flanked by introns. Ligand binding results in exclusion of the alternative exon in the spliced RNA thereby encoding the shorter isoform of the protein. The absence of the ligand results in inclusion of the alternative exon from the spliced RNA which encodes the longer isoform of the protein. FIG. 4L shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms. The alternative is flanked by introns. Ligand binding results in inclusion of the alternative exon in the spliced RNA thereby encoding the longer isoform of the protein. The absence of the ligand results in exclusion of the alternative exon from the spliced RNA which encodes the shorter isoform of the protein. FIG. 4M shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. The alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to production of the protein corresponding to the RNA. FIG. 4N shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. The alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the absence of the ligand leads to production of the protein corresponding to the RNA. FIG. 40 shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to an RNA which cannot be translated into a protein. FIG. 4P shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Exclusion of the alternative exon in the presence of the ligand leads to an RNA which can be translated into a protein. FIG. 4Q shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript. FIG. 4R shows a nonlimiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript. FIG. 4S shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript. FIG. 4T shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript.
FIGs. 5A-5G depict various embodiments of the general model of the cassettes (e.g, in the context of a transgene) of FIG, 4A. FIG. 5A depicts an embodiment of the ‘‘skipped exon model.’'’ FIG. 5B depicts an embodiment of the “retained intron model.” FIG. 5C depicts an embodiment of the “alternative 5’ splice site model.” FIG. 5D depicts an embodiment of the “alternative 3’ splice site model.” FIG. 5E depicts an embodiment of the “mutually exclusive exon model.” FIG. 5F depicts an exemplary alternatively spliced transcript. FIG. 5G depicts an exemplary constitutively spliced transcript.
FIGs. 6A-6G depict various embodiments of the general model of the cassettes (e.g., in the context of a transgene) of FIG. 4B. FIG. 6A depicts an embodiment of the “alternative last exon model.” FIG. 6B depicts an embodiment of the “skipped exon model.” FIG. 6C depicts an embodiment of the “retained intron model.” FIG. 6D depicts an embodiment of the “alternative 5’ splice site model.” FIG. 6E depicts an embodiment of the “alternative 3’ splice site model.” FIG. 6F depicts an embodiment of the “mutually exclusive exon model.” FIG. 6G depicts an embodiment of the “alternative last exon model.”
FIGs. 7A-7F depict various embodiments of the general model of the cassettes (e.g, in the context of a transgene) of FIG. 4C. FIG. 7A depicts the “skipped exon model.” FIG. 7B depicts the “retained intron model.” FIG. 7C depicts “alternative 5’ splice site model.” FIG. 7D depicts the “alternative 3’ splice site model.” FIG. 7E depicts the “mutually exclusive exon model.” FIG. 7F depicts the “alternative last exon model.” FIGs. 8A-8B show embodiments of the general model of the cassettes (e.g., in the context of a transgene). FIG. 8A shows an embodiment of the general model of the cassettes (e.g, in the context of a transgene) of FIG. 4 A. In the approach shown, the cassette (e.g:, in the context of a transgene) comprises a constitutive exon at the left, an alternatively-spliced exon comprising an ATG (an example of a positive regulatory cA-element) in the middle, and a constitutive exon comprising a coding region of interest (shown with the natural ATG start codon removed to eliminate translation of that exon without further positive control by the alternatively-spliced exon). Black lines indicate intronic sequences (e.g., the flanking introns of the alternatively-spliced exon). Alternative reading frames within the exon comprising the coding sequence may in some embodiments be removed, as appropriate. Under alternative splicing conditions, which are specific to the nature of the chosen alternatively-spliced exon, the alternatively-spliced exon will be included, and productive translation of the coding sequence will result. To the contrary, under homeostatic conditions (normal splicing conditions), only the consti tutive exon will be i ncluded, the presence of the ATG start codon in the alternatively- spliced exon will be eliminated, and the coding sequence will not be translated. The upper dotted lines show the splicing pattern leading to a splicing-in of the alternatively-spliced exon (expression of the coding region). The lower dotted lines show the splicing pattern leading to a splicing-out of the alternative-spliced exon (no or reduced expression of the coding region). FIG. SB shows an embodiment of the general model of the cassettes (e.g, in the context of a transgene) of FIG. 4C. In the approach shown, the cassette (e.g., in the context of a transgene) comprises an alternatively -spliced exon (shown in gray) positioned between two separate constitutive exons each comprising a portion of the desired coding region. The exon to the left comprises the 5’ end of the coding sequence and the exon to the right comprises the 3’ end of the coding region. An in-frame stop codon is inserted into the alternatively-spliced exon at a location which is >50 nucleotides upstream of the next downstream splice site. Under alternative splicing conditions, which are specific to the nature of the chosen alternatively-spliced exon, the alternatively-spliced exon will be included, and NMD (nonsense-mediated mRNA decay) will result. Under homeostatic conditions (normal splicing conditions), only the constitutive exon wall be included, and the 5’ and 3’ ends of the coding sequence will be joined resulting in productive translation of the coding sequence. The upper dotted lines show the splicing pattern leading to a splicing-in of the alternatively-spliced exon (no or reduced expression of the coding region due to active NMD). The tower doted tines show the splicing pattern leading to a splicing-out of the alternative-spliced exon (expression of the coding region).
FIG. 9 shows a configuration of a gene therapy cargo whose translation can be regulated by alternative splicing. Inclusion of an alternative exon that ends in “ATG” can lead to translation of the downstream coding sequence. Exclusion will prevent appropriate protein translation of the downstream coding sequence.
FIG. 10 shows a construct design for the screening of alternative exon cassettes with regulatory activity. The construct used the SMN1 exon 6 and intron 6/7 context. Test alternative exon cassettes were inserted between portions of SAINI intron 6 and 7. An MHCK7 was used. The coding sequence was derived from the human MTM1 gene. The 3’ UTR contained an SV40 polyadenylation and cleavage site. AAV2 ITRs flanked the construct. Splice site scores of the flanking constitutive exons are listed.
FIG. 11 show's a strategy to prevent undesired translation of peptides from alternative reading frames of MTM1. Amino acids generated in the MTM 1 reading frame are listed (e.g., GCT encodes Alanine); only the 5’ end of MTM1 sequence is shown. Substitutions that preserve MTM1 reading frame but terminate alternative reading frames are shown. Arrows denote point mutations made to generate stop codons that would terminate open reading frames in the +1 and +2 reading frames. Nucleic acid substitutions are denoted by lower-case letters.
FIG. 12 shows a strategy to preserve splice site strength following mutation of bases to introduce ATG to the ends of alternative exons by altering 5' splice site sequences. Because the addition of ATG to the end of each alternative exon may change the splice site strength, intronic bases to were altered to maintain splice site strength and preserve splicing activity. All upstream ATGs were also removed from alternative exons. Splice site strengths were scored by MaxEntScan and are shown. Splice sites are listed for the endogenous sequence (top), the endogenous sequence altered such that ATG is introduced (middle), and a “compensated” splice site sequence (botom). Nucleic acid substitutions are denoted by lower-case letters.
FIG. 13 show's a construct barcoding strategy. For the first round of screening, a barcode strategy was used in which synonymous mutations were made and used to identify each candidate alternative exon uniquely. Barcodes w'ere -350 NT away from the splice site with the intent of not affecting splicing. Barcodes were comprised of 5 wobble positions and generated by randomly cloning in: AAY CTN AGA TTY GCN (SEQ ID NO: 101) (2 * 4 * 2 * 4 = 64 possibilities). Barcode sequences (each 5 nucleotides in length) are shown at the end of each row in parentheses.
FIGs. 14A-14C show percent spliced in (psi) values for each tested cassette exon in various tissues. Psi values were plotted in heart (H), tibialis anterior (TA), and liver (L). Data for tibialis anterior was obtained from animals injected intramuscularly, and data from the other tissues was obtained from animals injected intravenously. FIG. 14A shows data obtained from the following tested cassette exons (from left to right): ARFGAP2, BINI, CAMK2B, and KIF13A. FIG. 14B show's data obtained from the following tested cassette exons (from left to right): KSR1, LGMN, NRAP, and PDLIM3. FIG. 14C shows data obtained from the following tested cassette exons (from left to right): PICALM, PKP2, and VPS39.
FIGs. 15A-15B show percent spliced in (psi) values for each tested exon in tibialis anterior at various times following injection. Psi values were plotted for each sample versus every' other sample. The number following the dash indicates the replicate number for that particular week. FIG. ISA show's a first comparison of psi values obtained at different time points following injection. FIG. 15B shows a second comparison of psi values obtained at different time points following injection.
FIGs. 16A-16B show the ratios of RNA binding protein (RBP) RNA expression in heart vs. skeletal muscle, or vice-versa. RNA expression values for RNA binding proteins were obtained from publicly available databases. The ratio of expression in heart versus skeletal muscle was computed; the RBPs showing the strongest bias in either direction were plotted. FIG. 16A show's the RBPs which were found to be enriched in muscle tissue, relative to heart tissue. FIG. 16B shows the RBPs winch were found to be depleted in muscle tissue, relative to heart tissue.
FIG. 17 shows that the intronic sequence upstream of BINI exon 11 is enriched for CAC motifs. Top: -250 nucleotides upstream of BINI exon 11 are shown. Every/ CAC motif is shown in bold text. Bottom: the last 34 bases of the intron are shown from human, rhesus macaque, mouse, dog, and elephant. Every species shown has 2 CAC motifs within this region except for dog.
FIG. IS shows percent spliced in (psi) values for BINI exon 11 in human, rhesus macaque, and dog. Psi values for BINI exon 11 for these species were obtained from publicly available datasets and plotted. The dog data includes data from animals modeling XLMTM1, including those also being treated with AAV-MTM1. AAV low, mid, and high denotes AAV- MTM1 treatment in XLMTM1 dogs from Dupont el al. (2020).
FIG. 19 shows splice site variants which were considered in the high throughput screen to optimize the BINI exon 11 cassette. The endogenous BINI 3’ splice site is listed (top), along with the endogenous BINI 5’ splice site (second row from top), the endogenous BINI 5’ splice site sequence altered such that ATG is introduced (third row from top), and the “compensated” version characterized in the first screen (bottom). Additional splice sites tested are listed below. Nucleic acid substitutions are denoted by lower-case letters.
FIG. 20 shows intronic variants which were considered in the high throughput screen to optimize the BINI exon 11 cassette. Sequence from the downstream intron of BINI exon 1 1 is shown (top). Putative MBNL binding sites (YGCY motifs) are bolded. Putative RBFOX binding sites (TGCATG) are underlined. Sequence that includes 4 possible alterations is shown (bottom). The alterations, denoted with lower-case letters, either generate additional MBNL binding sites (the first, second, and third alterations, from 5’ to 3’) or an additional RBFOX site (the fourth alteration). Consideration of 0, 1, 2, 3, or 4 alterations in all combinations yields 16 possible sequences to test.
FIG. 21 show's a strategy to use PCR amplicons to read the association between barcodes and variants (the codebook). Given short read Illumina sequencing (-75 nucleotides), a PCR strategy was used to associate the downstream barcode with upstream sequence variants.
FIG. 22 shows the number of barcodes encoding each variant. A histogram of the number of barcodes encoding each variant is shown for the plasmid library. On average, -8 barcodes encode each variant.
FIGs. 23A-23C show scatters of percent spliced in (psi) values for each variant in different tissues. Each point represents the mean psi for each variant across all barcodes representing that variant. Data from selected tissues is shown. FIG. 23A shows scatter between 2 heart samples, which lies along the diagonal (indicating reproducibility). FIG. 23B shows scatter between 2 gastrocnemius samples, which also lies along the diagonal (indicating reproducibility). FIG. 23C shows scatter between heart and skeletal muscle samples, which lies above the diagonal. This is because psi for most variants is higher in skeletal muscle than in heart. FIGs. 24A-24B show scatters of mean percent spliced in (psi) as computed across multiple animals. Each point represents the mean psi for each variant across multiple animals (n=4 for all tissues). FIG. 24A shows data obtained from tibialis anterior (y-axis) versus heart (x-axis) tissue. FIG 24B shows data obtained from gastrocnemius (y-axis) versus heart (x-axis) tissue.
FIGs. 25A-25D show percent spliced in (psi) values as a function of splice site strength for selected samples. Psi values for each variant were grouped by 3: or 5’ splice site strength; data is shown only for heart sample I and gastrocnemius sample 1. There is a trend such that strong splice sites tend to yield higher inclusion levels. FIG. 25. A shows the 3' splice site strength relative to the psi in heart tissue for heart sample 1 . FIG. 25B shows the 5’ splice site strength relative to the psi in heart tissue for heart sample 1. FIG. 25C shows the 3’ splice site strength relative to the psi in gastrocnemius tissue for gastrocnemius sample 1. FIG. 251) shows the 5’ splice site strength relative to the psi in gastrocnemius tissue for gastrocnemius sample I .
FIGs. 26A-26B show scatters of mean percent spliced in (psi) for each variant as computed across multiple animals when linked to a CAPN3 cargo. Each point represents the mean psi for each variant across multiple animals (u -4 for all tissues). FIG. 26A shows data obtained from tibialis anterior (y-axis) versus heart (x-axis) tissue. FIG. 26B shows data obtained from gastrocnemius (y-axis) versus heart (x-axis) tissue.
FIGs. 27A-27B show7 scatters of mean percent spliced in (psi) for each variant when linked to an MTM1 cargo versus a CAPN3 cargo. Each point represents the mean psi for each variant across multiple animals (n:==4 for all tissues). The psi value for variants linked to the MTM1 cargo is shown on the x-axis and the psi value for the same variants linked to the CAPN3 cargo is shown on the y-axis. FIG. 27A shows data for heart tissue. FIG. 27B shows data for gastrocnemius tissue.
FIG. 28 shows an exemplary' riboswitch-regulated alternative exon library design. MBNL1 exon 5 is flanked by 39 different 3’ splice sites and 20 different 5’ splice sites in different construct variants. The 5 ’splice site is incorporated into the communication stem of the downstream riboswitch. In the absence of the drug, the 5’ splice site is recognized by U1 snRNP and the exon is included to yield full length MBN. In the presence of the drug, the 5’ splice site is occluded and causes exon 5 skipping. FIG. 29 shows an exemplary workflow for the massively parallel barcoded splicing assay. The barcoded synthetic plasmid library/ was sequenced to obtain the codebook that links barcode sequences to specific splice site variants. The plasmid library was transfected to analyze splicing patterns for each barcode in the presence and absence of drug. The codebook was then used to decode barcodes, to characterize splicing patterns for individual variants.
FIG. 30 shows Psi data for barcodes and variants. For the left-side panel, psi for uniquely identifiable barcodes in the presence and absence of drug is shown. Barcodes that appear in all six samples (3x drug-, 3x drug+) and in the codebook were plotted. Error bars are shown for three biological replicates. For the right-side panel, psi for 780 variants with/without drug is shown. Psi for barcodes linked to the same variants were averaged, and error bars are shown for three biological replicates. The triangle highlights variants with Apsi >0.3, representing promising candidates with large dynamic splicing changes in response to drug treatment.
FIGs. 31A-31C show? analyses of psi and delta psi for various 3’ and 5’ splice site variants. FIG. 31A shows variants that were grouped according to 3’ splice site identity and sorted by mean psi in the absence of tetracycline. FIG. 31B shows variants that were grouped according to 5’ splice site identity and sorted by mean psi in the absence of tetracycline. FIG. 31C shows delta psi plotted in a heatmap format, in which row/columns denote specific 3’ and 5’ splice site combinations. Splice sites were sorted by mean psi in the absence of tetracycline.
FIG. 32 shows protein isoform regulation from a single variant. The left-side panel shows gel electrophoresis analysis of RT-PCR products analyzed by fragment analyzer. The right-side pane shows western blot analysis of MBNL protein using an anti-HA tag antibody.
FIG. 33 show's an exemplary/ cassette configuration for alternative splicing-regulated protein expression. The alternative splicing cassette was placed between an ATG and downstream coding sequence for the protein of interest. An HA tag was placed before the ATG for protein immunoblotting.
FIG. 34 shows exonic splicing switch variants. Nucleotides that base-pair within the communication stem of the riboswitch are underlined.
FIG. 35 shows skipping percentage of exonic splicing switch variants with/without drug.
RNA splicing assays were performed by RT-PCR and fragment analyzer. FIG. 36 shows exclusion percentages of AltEx9 following different tetracycline concentrations. Variant AltEx9 was tested against different concentrations of tetracycline, and exon-skipping RNA isoform percentages were calculated.
FIGs. 37A-37B show RNA splicing and protein expression regulation of three variant constructs. FIG. 37 A shows RT-PCR analysis of RNA splicing patters of three constructs that \vere fused to a nano-luciferase reporter in response to drug treatment. FIG. 37B show's nanoluciferase enzymatic activity for three variants, along with exclusion and inclusion isoform controls. Nano-luciferase signal was normalized by co-transfected firefly luciferase (fLuc).
FIG. 38 show's alternative splicing regulated protein expression by reconstructing translation initiation. Exon inclusion disrupts translation initiation sites and exon skipping reconstructs strong Kozak sequences for translation of a downstream protein of interest.
FIG. 39 show's exemplary designs for alternative splicing-regulated RNA interference. An exemplary' pri-miR 16_2 scaffold bearing the miRNA-targeting luciferase reporter is shown. The dashed box denotes the sequence location in which the alternative splicing cassette should be placed.
FIG. 40 shows riboswitch-regulated RNAi. Firefly Luciferase reporter signal was normalized by co-transfected renilla luciferase. RNAi (+) was from the pri-miR 16_2 scaffold bearing fLuc miRNA;RNAi while (-) is from a non-functional control RNA. The RNAi AltEx9 has alternative splicing cassette AltEx9 inserted in the pri-miR scaffold.
FIG. 41 shows a non-limiting example of a nucleic acid design to regulate 5. aureus Cas9 by tetracycline.
FIGs. 42A-42B show representative cellular screening results for the nucleic acid shown in FIG. 41. FIG. 42A shows a scatter plot of PSI for each of 2760 variants analyzed in a high throughput screen in HEK293T cells. Each point represents the behavior of an individual variant at a particular dose of tetracycline (y-axis) relative to no tetracycline (x~axis). Circles, squares and triangles denote treatment with 25 uM, 50 pM and 100 pM tetracycline, respectively. FIG. 42B show's a heat map of delta PSI (no tetracycline minus 100 pM tetracycline) as a function of aptamer stem length and splice site strength.
FIG. 43 shows a non-limiting example of a nucleic acid design to regulate erythropoietin (EPO) expression by a risdiplam-responsive sequence. FIG. 44 shows representative cellular screening results for the nucleic acid design shown in FIG. 43. The scatter plot shows percent intron removal for 30,455 variants analyzed in a high throughput screen in HEK293T cells. Each point represents the behavior of an individual variant at a particular dose of risdiplam (y-axis) relative to no risdiplam (x-axis). Circles, squares and triangles denote treatment with 250 nM, 500 nM, and 1000 nM risdiplam, respectively.
FIGs. 45A-45B show representative data from real-time PCR (RT-PCR) analyses of individual variants shown in FIG. 44. FIG. 45A shows products made from cloning seven distinct variants and testing the expression of said said sequences with RT-PCR. Fragment analysis shows the abundance of intron retained product (top band) or intron spliced product (bottom band) in the presence (1 pM) or absence of risdiplam. FIG, 45B shows quantitation of the data shown in FIG. 45A.
FIGs. 46A-46C show a non-limiting example of a strategy for using risdiplam- responsive motifs to regulate GABRG2 isoforms. FIG. 46A shows an overview of the mechanism through which risdiplam-responsive sequences identified from the screen performed in FIGs. 44 and 45A-45B (variants 3 and 7) were incorporated into an alternatively spliced gene that allows for production of either the exon 9-containing (long) i soform of GABRG2 or the exon 9-skipped (short) isoform. The gray box indicates GABRG2 exons 1 through 8. The white box indicates exons 9 and 10, and the dotted box indicates the risdiplam-responsive sequence. The black box indicates exon 10 alone. The introns are synthetic. Addition of risdiplam leads to inclusion of the alternative exon and production of GABRG2L. FIG. 46B shows representative data of tw'O different risdiplam-responsive motifs that were tested in Neuro2A cells using RT- PCR. Primers that target the gray and black boxes was performed to evaluate splicing behavior in the presence (1 pM) or absence of risdiplam. FIG. 46C shows quantitation of the data shown in FIG. 45B.
FIGs. 47A-47D show a non-limiting example of a strategy for using a risdiplam- responsive motif from POMT2 exon l ib to regulate CSNK1D isoforms. FIG. 47A shows an overview of the mechanism through which risdiplam-responsive sequences were incorporated into an alternatively spliced gene that allows for production of either the exon 9-containing (long) isoform of CSNK1D or the exon 9-skipped (short) isoform. The gray box indicates CSNK1D exons 1 through 8. The white box indicates exons 9 and 10, and the dotted box indicates the risdiplam-responsive sequence derived from POMT2. The black box indicates exon 10 alone. The upstream intron is CSNK1D intron 8, and downstream intron is synthetic. Addition of risdiplam leads to inclusion of the alternative exon and production of the long isoform of CSNKID. FIG. 47B shows representative data from testing a nucleic acid in HEK293T and Neuro2A cells using RT-PCR. Primers that target the gray and black boxes were evaluated for splicing behavior in the presence (1 uM) or absence of risdiplam. FIG. 47C shows quantitation of the RT-PCR data in FIG. 47B. FIG. 47D shows a non-limiting example of strategy wherein the construct tested in FIGs. 47B-47C was further optimized by an A:C mutation at the +10 position in the intron downstream of POMT2 El IB and then a Western blot was performed against protein tags incorporated into the ends of the white and black boxes, respectively. Isoform A indicates the exon 9-skipped isoform. Isoform B indicates the exon 9- included isoform.
FIGs. 48A-48C show a non-limiting example of a strategy for repurposing exon 1 lb in POMT2 to regulate CasMini. FIG. 48A shows a non-limiting example of a nucleic acid design for a risdiplam-responsive splicing cassette that regulates translation of the N-terminal portion of CasMini. Exon 1 lb and flanking introns from POMT2 were modified to contain a start codon in frame with downstream CasMini. Inclusion of this exon leads to production of an N-terminal portion of CasMini fused to nanoluciferase. FIG, 48B shows representative data from testing the nucleic acid shown in FIG. 48A for responsiveness to varying concentrations of risdiplam in HEK293T. Top: fragment analyzer bands, bottom: quantitation. FIG. 48C shows additional nonliming examples of variants that were cloned and assayed for nanoluciferase signal in the presence (l uM) and absence of risdiplam in Neuro2A cells. CTRL denotes a control plasmid which encodes firefly luciferase but not nanoluciferase, to serve as nanoluciferase substrate control .
FIGs. 49A-49C show a non-limiting example of a strategy which leverages tetracycline aptamer-regulated splicing to control microRNA biogenesis via exon skipping. FIG. 49 A shows a non-limiting example of a tetracycline-responsive exon cassette placed between two halves of a primary' microRNA sequence. Exon inclusion leads to suboptimal recognition of the microRNA precursor by Dicer and thus lower production of a mature microRNA. Exon skipping leads to proper recognition of the microRNA precursor by Dicer and thus higher production of the mature microRNA. FIG. 49B shows representative data from assaying the nucleic acid shown in FIG. 49A in a Drosha knockout HEK293 cell line. FIG. 49C show's representative Northern Blot data from assaying HEK293T transfected with the nucleic acid shown in FIG. 49A and testing in FIG.
49B.
FIGs. 50A-50C show a non-limiting example of a strategy which leverages branaplam- regulated splicing to control microRNA biogenesis via exon inclusion. FIG. 50A shows a nonlimiting example of a cassette such that branaplam is capable of enhancing exon inclusion via recognition of certain sequences near the 5’ splice site of the alternatively spliced exon. The primary microRNA sequence was split across the 2nd intron of a cassette exon event derived from SF3B3 such that inclusion of the cassette exon facilitates formation of the full microRNA base stem, which can enhance Drosha recognition and processing. FIG. 50B shows representative Northern blot data from testing several branaplam-responsive cassettes. YZ230 is a control that encodes the sequence expected with exon inclusion. YZ231 is a control that encodes the sequence expected with exon skipping. YZ232 is a variant in which the exon cassette is present and can respond to branaplani. FIG. 50C shows representative data from luciferase assay analysis of knockdown by microRNAs encoded by branaplam-responsive cassettes. In each case, luciferase transcript is targeted by the microRNA. 95 is a construct that constitutively generates a microRNA active against luciferase. 259 is a construct that does not generate a microRNA active against luciferase. 231 and 232 are the constructs as shown in (b), both with and without branaplam.
FIGs, 51A-51B show7 a non-limiting example of a strategy for controlling leaky microRNA production due to basal recognition of an incomplete microRNA stem. FIG. 51A shows non-limiting examples of microRNA scaffolds. YZ95 is a potent primary microRNA scaffold that is effectively recognized by Drosha and can downregulate a GFP reporter transcript comprising a target site. YZ293 was produced by mutating bases in the stem of YZ95 which are recognized by Drosha. YZ301 was produced by re-constituting the complete microRNA stem. FIG. 51B shows analyses of GFP silencing in HEK293 cells using YZ95, YZ293, and YZ301.
DETAILED DESCRIPTION
The present disclosure relates to the use of alternatively-spliced exons to control the expression of one or more genes of interest (e.g., genes that are useful therapeutically and/or diagnostically). In some embodiments, alternative splicing of an exon can be placed under the control of a ligand by introducing a ligand-binding sequence (e.g., a sequence encoding a ligand- binding aptamer) into an alternatively spliced exon and/or into at least one of the introns flanking the alternatively spliced exon. In some embodiments, a ligand-responsive alternatively spliced exon is introduced into a naturally occurring gene (e.g., at one or both alleles of the gene in the genome of a host cell). In some embodiments, a synthetic gene construct is provided that includes a ligand-responsive alternatively spliced exon. Accordingly, in some embodiments of the application, alternatively spliced exons can be used to regulate one or more aspects of gene expression (e.g., of mRNA translation and/or RNA function) by including one or more translation stop codons, interrupting a start codon, and/or interrupting a functional RNA sequence (e.g., a mRNA, a regulatory RNA, such as an interfering RNA, and/or a ribozyme).
In some embodiments, one or more aspects of the application (e.g., one or more ligand- responsive alternatively spliced exons) can be used in the context of viral vectors (e.g., AAV viral vectors or lentivirus viral vectors) to effectively regulate the expression of a coding region of interest (e.g., a coding region of a transgene that encodes a therapeutic protein). In certain aspects, the alternatively-spliced exons regulate the expression of a coding region of interest in a condition-sensitive manner (e.g., expression in one type of cell but not another, expression in a diseased condition, or expression in the presence of certain intracellular conditions, such as the presence of a ligand). Accordingly, the present disclosure relates to a new approach for regulating expression of a transgene (or a coding region thereof) from a recombinant viral vector that couples alternatively-spliced exons with the expression of a coding region of interest (e.g., a coding region of a transgene encoding a therapeutic protein). The present disclosure describes a variety of exemplary configurations as to how to combine or otherwise pair the expression of a coding region of interest (or multiple portions of coding regions) with an alternatively-spliced exon, but any suitable arrangement or configuration is contemplated so long as the expression of the coding region of interest (or portions thereof) is configured to come under regulatory control of the alternatively-spliced exon.
In other aspects, the present disclosure relates to that the use of inducibly-spliced exon cassettes in the context of viral vectors (e.g., AAV viral vectors or lentivirus viral vectors) to effectively regulate the expression of a transgene encoding a therapeutic cargo such as microRNAs (miRNA) and proteins. In certain aspects, the transgene regulates the expression of an inducibly-spliced exon cassette in a condition-sensitive manner (e.g., the presence of a drug or ligand). In some embodiments, the inducibly-spliced cassette encodes an RNA comprising a ligand-responsive sequence which is alternatively spliced (e.g., to exclude or include an alternative exon) in response to ligand binding. In certain aspects, the inducibly-spliced cassette comprises a tnicroRNA (miRNA) sequence and a ligand-responsive aptamer controlling the splicing of the said cassette. Accordingly, the present disclosure relates to a new approach for regulating splicing of a transgene comprising, for example, a miRNA from a recombinant viral vector in a chemically-inducible manner.
It will be understood that the inducibly-spliced exon cassette will be either spliced out or not spliced in a manner that can be dependent on one or more environmental conditions, e.g., the presence of an external factor (such as, for example, an administered agent such as a drug or ligand). Thus, whether the inducibly-spliced exon cassette is alternatively spliced can be dependent upon the condition of the cell in which the splicing machinery operates.
Thus, in some embodiments a recombinant nucleic acid (e.g., recombinant viral genome) of the present disclosure comprises a transgene comprising at least two exons and the alternatively -spliced cassette comprising at least two introns flanking an alternative exon, and a ligand-responsive aptamer. In other embodiments, a recombinant nucleic acid (e.g., a recombinant viral genome) comprises a transgene comprising one or more ligand-responsive sequences that do not comprise an aptamer (e.g., a ligand-responsive exon). In some embodiments, the transgene comprising the inducibly-spliced cassette comprises other regulatory' sequences including, but not limited to, 3’ UTRs, 5’ UTRs, poly A sequences, promoters, enhances, etc. In some embodiments, the inducibly-spliced cassette comprises a sequence that is capable of regulating the expression of another gene such as a miRNA.
Accordingly, compositions and methods described herein can be useful to regulate expression of therapeutic transcripts (e.g., in the context of viral vector-based treatments for diseases or disorders). In some embodiments, the transgene can be spliced in an inducible manner to form a functional miRNA that modulates the expression of a mutated or variant protein or a misexpressed protein that is implicated in a disease or disorder. In some aspects, the present application provides compositions and methods that are useful for delivering genes and gene products (such as RNAs and proteins) that retain or restore therapeutically effective levels of regulation of a protein or variant thereof implicated in a disease or disorder.
A schematic representing the disclosed new approach for regulating expressi on of a transgene (or a coding region of a transgene, e.g., a transgene encoding a therapeutic protein) in a recombinant viral genome using alternatively-spliced exons is provided in FIG. 1 . As shown in FIG. 1, a viral genome may be configured to include a transgene that comprises a coding region of interest (e.g., encoding a therapeutic protein) and an alternatively-spliced exon (or a cassette comprising an alternatively-spliced exon) which regulates the expression of the coding region of the transgene. In addition, a number of exemplary embodiments of recombinant nucleic acid molecule constructs that comprise an alternatively-spliced exon and a coding region of interest (e.g., encoding a therapeutic protein) are shown in FIG. 2. FIG. 3 depicts, in general, typical AAV and lentivirus vector constructs comprising a coding region of interest whose expression is driven by a promoter, and which further include the insertion (at any suitable location) of a nucleotide sequence comprising an alternatively-spliced exon (or a cassette comprising an alternatively-spliced exon) to further regulate the expression of the coding region (e.g., by controlling translation or mRNA homeostasis, e.g., mRNA levels). In some embodiments, the nucleotide sequence comprising an alternatively-spliced exon may be in the form of a “cassette.” Examples of this are provided in FIGs. 2 and 4-7.
Such constructs represent embodiments that enable the disclosed new approach for regulating transgene expression (e.g., the expression of a therapeutic protein) from recombinant viral vectors in a condition-responsive manner, whereby the condition-responsive expression is controlled by alternatively-spliced exons which are included in the recombinant genome of the expression vector in such a manner that imparts a level of control on the expression of a coding region of interest (e.g., encoding a therapeutic protein). It will be understood that alternatively- spliced exons are spliced-in or spliced-out in a manner that can be dependent on one or more environmental conditions, e.g., intracellular conditions, such as a disease state (e.g., cancer) or even a type of cell (e.g., a liver cell versus a neuron, each of which have different intracellular conditions), or the presence of an external factor (such as, for example, an administered agent). Thus, whether the alternatively -spliced exon is spliced-in or spliced-out can be dependent upon the condition of the cell in which the splicing machinery operates.
Turning to FIG. 1, a generalized schematic of a recombinant AAV is provided in (a) which comprises a transgene located between the left and right ITRs. The transgene is indicated as comprising a coding region of interest (e.g., which encodes a therapeutic protein) and an alternatively-spliced exon that regulates the expression of the transgene (or the product encoded by the coding region of interest). While the drawing depicts a recombinant AAV genome, other recombinant viral vector genomes may be used, such as recombinant lentivirus genomes. The recombinant viral genomes may be delivered or administered to subjects packaged in a viral vector, which refers to an infectious viral particle comprising a recombinant viral genome within a viral capsid, and in addition which may further include a lipid/protein envelope layer for enveloped viruses. In various embodiments, such as those provided in FIG. 2, or FIGs. 4-8, the coding region (or exon comprising the coding region) may be combined or arranged with the alternatively-spliced exon in the form of a transgene comprising any suitable arrangement of additional components, including one or more constitutive exons (i.e., those exons present in all spliced mRNA isoforms that result from the initial pre-mRNA transcript) and one or more introns. In other embodiments, an alternative exon cassette (comprising the alternatively-spliced exon) may be linked with or coupled to any coding region of interest to impart regulator}- control on that coding region of interest.
The alternatively-spliced exon may be any naturally-occurring alternatively-spliced exon or any recombinant alternatively-spliced exon. A variety of configurations are contemplated, and no limitation is implied by FIG. 1 as to the possible configurations that may be employed. For instance, the alternatively-spliced exon may be located between two exons that each separately comprise a portion of the coding region of interest. In other instances, the alternatively-spliced exon is located outside of the exon comprising the coding region of interest. In such embodiments, the alternatively-spliced exon may be located downstream of the exon encoding the coding region of interest. In other such embodiments, the alternatively-spliced exon may be located upstream of the exon encoding the coding region of interest. The general descriptions of the configuration of the cassettes comprising the alternatively-spliced exon and the coding region of interest (or the exon comprising the coding region of interest) embrace any suitable configuration, including those embodiments described in FIGs. 2 and 4-8.
In FIG. 1, step (b) show's the formation of a pre-mRNA (i.e., a primary transcription product which has not yet been processed by splicing) which includes the coding region of interest and the alternatively-spliced exon. Step (c) shows the splicing-out or splicing-in of the alternatively-spliced exon based on one or more conditions (e.g., cell type, disease state, or other intracellular environmental signal). The splicing-out of the alternatively-spliced exon results in mRNA isoform 1 in (d), whereas the splicing-in of the alternatively-spliced exon results in mRNA isoform 2 in (e). As shown in (g), the absence of the alternatively-spliced exon removes a positive or negative regulatory civ-element. The removal of a positive regulatory civ-element, such as a translation start signal, will result in the downregulation or down-expression of the transgene, i.e., the reduced expression of the product encoded by the coding region of interest. However, the removal of a negative regulatory civ-element, such as mRNA degradation element, may lead to the upregulation or up-expression of the transgene, i.e., the increased expression of the product encoded by the coding region of interest. As shown in (h), the presence of the alternatively-spliced exon splices-in a positive or negative regulatory' cA-elernent associated with the alternatively-spliced exon. The maintenance of a positive regulatory cA-element, such as a translation start signal, will result in the upregulation or up-expression of the transgene, i.e., the increased expression of the product encoded by the coding region of the transgene. However, the maintenance of a negative regulatory’ c/x-element, such as mRNA degradation element, may lead to the downregulation or down-expression of the transgene, i.e., the decreased expression of the product encoded by the coding region of the transgene. Other configurations are also possible and contemplated herein and exemplified below in various embodiments provided in FIGs. 2-8.
In certain aspects, the disclosure provides methods and compositions for regulating gene expression using viral vectors comprising a recombinant viral genome described herein. Viral vectors can be used to deliver one or more transgenes (comprising a coding region of interest w'hich encodes a protein of interest, such as a therapeutic protein) for therapeutic, diagnostic, or other purposes. In some aspects, expression of a transgene in a recombinant viral genome can be regulated using alternative splicing of an RNA expressed from the viral genome.
Thus, aspects of the disclosure relate to methods and compositions for regulating expression of a transgene (comprising a coding region of interest which encodes a protein of interest, such as a therapeutic protein) using viral vectors comprising a recombinant viral genome described herein, A recombinant viral genome can be engineered to include one or more exons (e.g., one or more of a constitutive exon, an alternatively-spliced exon, and/or engineered versions thereof) that (a) can be either spliced-in or spliced-out of a pre-mRNA encoded by the genome, and (b) include one or more positive or negative regulatory cA-elements that affect protein expression (e.g., mRNA stability and/or translation of the coding region of interest).
Different intron and exon configurations can be used to provide for alternatively-spliced exon splicing, as discussed in greater detail herein, and shown in FIG. 2 and FIGs. 4-8 as examples. Non-limiting examples include the following models of alternative splicing: skipped exons, retained introns, alternative 5’ splice sites, alternative 3’ splice sites, mutually exclusive exons, and alterative last exons as illustrated in FIGs. 2 and 4-8. Each of these different intron/ exon configurations can be used to leverage alternatively-spliced exons which may, in some embodiments, include one or more positive or negative regulatory civ-elements that promote or limit expression of the coding region of interest. For example, such sequences may promote translation and/or stability, or inhibit or terminate RNA translation and/or promote RNA degradation. Such c/x-acting elements may in some embodiments be sequences that form secondary' structures (e.g., that slow translation), bind to one or more regulatory' RNAs (e.g., siRNAs), and/or be targeted by one or more intracellular enzymes (e.g, nucleases).
It will be appreciated that different types of splice sites exist which may result in splicing under specific conditions. Such splice sites can be chosen for their ability to regulate splicing under conditions of interest. Alternatively or additionally, splice sites may be chosen based upon their relative strength, as calculated using a variety of published methods (see, e.g., Yeo & Burge (2004), Maximum entropy' modeling of short sequence motifs with applications to RNA splicing signals, J. Compul. Biol, 11(2-3):377-94). Such relative strength may in some embodiments reflect the efficiency of recognition by the core spliceosomal machinery (e.g., U1 and U2 snRNPs). In some embodiments, splice sites may be altered to enhance or diminish recognition by the core spliceosomal machinery. Such alterations may be performed, in some embodiments, to achieve the desired regulatory' behavior in conditions of interest. For example, splice sites may be used to make splicing responsive to certain endogenous or exogenous factors such that the alternative splicing of the DNA is specific to, such as, for example, certain tissues, certain diseases, certain intracellular conditions, etc. In some embodiments, splicing may be additionally or alternatively responsive to an exogenous agent (e.g., a small molecule, antibody, or other compound) which regulates splicing of the pre-rnRNA.
Alternatively-spliced exons as described herein may in some embodiments be contained within an alternatively-spliced exon cassette, as shown in the various embodiments of FIGs. 2 and 4-8.
Thus, in some embodiments a recombinant viral genome of the present disclosure comprises a transgene comprising at least one alternatively-spliced exon (or “regulatory'”) cassette. In some embodiments, a transgene comprising an alternatively-spliced exon cassette comprises at least one alternatively-spliced exon, intronic sequences flanking the alternatively- spliced exon, and an exon comprising a coding region of interest. However, a transgene comprising a regulatory' cassette may in some embodiments also contain additional components, such as a constitutive exon, additional intronic sequences, or both. Accordingly, in some embodiments, a transgene comprising an alternatively-spliced exon cassette comprises any one or more of the following components: an alternatively-spliced exon, a flanking intron, an exon comprising a coding region of interest, and/or a constitutive exon.
In some aspects, alternative splicing regulation can be used to help control the expression of a coding region of interest encoded by a recombinant viral genome (e.g., an rAAV recombinant genome, a lentivirus recombinant genome). Thus, aspects of the invention relate to a method of regulating expression of a coding region of interest using a viral vector comprising a recombinant viral genome described herein. In some embodiments, the method comprises: (i) inserting into the recombinant viral genome at least one transgene comprising an alternatively- spliced exon cassette (e.g., such as any of those shown in FIGs. 2 and 4-8); (ii) introducing a heterologous start codon or part of a heterologous start codon at the 3' end of the alternatively- spliced exon, (iii) disrupting or deleting all native start codons located 5' to the heterologous start codon; and (iv) deleting a native start codon, or a portion thereof, from, and/or introducing heterologous stop codons into, the exon comprising a coding region of interest. In some embodiments, the constitutive exon, alternatively-spliced exon, and flanking intron are each located 5' to the coding region of interest. In some embodiments, the method comprises: (i) inserting into the recombinant viral genome at least one transgene comprising an alternatively- spliced exon cassette; and (ii) introducing into the alternatively-spliced exon a heterologous, inframe stop codon at least 50 nucleotides upstream of the next 5' splice junction.
In some embodiments, the heterologous, in-frame stop codon elicits nonsense-mediated decay. In some embodiments, a transgene comprising an alternatively-spliced exon cassette comprises any one or more of the following components: an alternatively-spliced exon, a flanking intron, a coding region of interest, and/or a constitutive exon.
Accordingly, compositions and methods described herein can be useful to regulate expression of therapeutic transcripts in the context of viral vector-based treatments for diseases or disorders. Abnormal cellular regulation (e.g., abnormal regulation of intron splicing of one or more genes) can lead to changes in gene regulation and subsequent protein expression associated with a disease state. Some aspects of the invention therefore concern a method of treating a disease or condition in a subject comprising administering a viral vector of the disclosure to a subject, wherein the viral vector comprises a recombinant viral genome described herein. In some aspects, the present application provides compositions and methods that are useful for delivering genes that retain or restore therapeutically effective levels of regulation (e.g., therapeutically effective regulation of intron splicing).
In some aspects, a viral vector (e.g., an r.AAV vector; a lentivirus vector, etc.) comprises a recombinant viral genome that includes a nucleic acid that encodes an RNA (e.g., an mRNA) comprising one or more introns. In some embodiments, splicing of at least one intron is regulated by one or more intracellular factor(s). Regulation of intron splicing can control the expression level of the RNA and/or of the type of RNA (e.g., of an RNA splice alternative) inside a cell.
A. Definitions
Unless otherwise defined herein, all scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. The meaning and scope of the terms are clear, however, in the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. In this disclosure, the use of “or” means “and/or” unless stated otherwise. Furthermore, the use of the term “including,” as well as other forms, such as “includes” and “included,” is not limiting. Things described as “including” or “comprising” can also be configured as “consisting of” or similar language. Also, terms such as “element” or “component” encompass both elements and components comprising one unit and elements and components that comprise more than one subunit unless specifically stated otherwise.
Generally, nomenclatures used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics, and protein and nucleic acid chemistry and hybridization described herein are those well-known and commonly used in the art. The methods and techniques of the present disclosure are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present disclosure unless otherwise indicated. Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art. or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, chemical analyses, pharmaceutical preparation, formulation, and delivery, and treatment of subjects.
That the present disclosure may be more readily understood, select terms are defined below.
(i) Polynucleotide
As used herein, “polynucleotide” refers to any nucleic acid comprising naturally- occurring sequences, engineered sequences, or a combination thereof. In some instances, the term “polynucleotide” may be used interchangeably with the term “nucleic acid”. In some embodiments, a polynucleotide may be DNA, In some embodiments, a polynucleotide may be RNA. Accordingly, in some embodiments, the term “polynucleotide” may be used to refer to both DNA and an RNA encoded by or corresponding to said DNA (e.g., an RNA that is alternatively spliced in the presence of a ligand). In some embodiments, a polynucleotide (e.g., a guide RNA) is a chemically modified nucleic acid.
In some embodiments, polynucleotides of the present disclosure comprise a sequence encoding ligand-responsive sequence. In some embodiments, the polynucleotide is capable of being expressed in a cell and alternatively spliced in the presence of the ligand. In some embodiments, a ligand induces alternative splicing to produce a first RNA. In some embodiments, a ligand induces splicing to produce a second RNA. Accordingly, in some embodiments, a polynucleotide comprises all of the sequence information to encode the first and the second RNA, such that one of the RNAs will be more highly expressed in the presence of the ligand and the other RNA will more highly expressed in the absence of the ligand.
In some embodiments, the presence of the ligand results in increased expression of the first RNA. In some embodiments, the increase in expression of the first RNA in the presence of the ligand is on the order of 2- to 500-fold relative to the expression of the first RNA and/or second RNA in the absence of the ligand. In some embodiments, the increase is approximately 2- f old, 3-fold, 4-fold, 5-fold, 6-fold, 7-fbld, 8-fold, 9-fold, 10-fold, 1-fold to 3-fold, 1-fold to 4- fold, 1-fold to 5-fold, 1-fold to 6-fold, 1-fold to 7-fold, 1-fold to 8-fold, 1-fold to 9-fold, 1-fold to 10-fold, 10-fold to 20-fold, 20-fold to 30-fold, 30-fold to 40-fold, 40-fold to 50-fold, 50-fold to 60-fold, 60-fold to 70-fold, 70-fold to 80-fold, 80-fold to 90-fold, 90-fold to 100-fold, 100-fold to 200-fold, 200-fold to 300-fold, 300-fold to 400-fold, 400-fold to 500-fold, 500-fold to 600- fold, 600-fold to 700-fold, 700-fold to 800-fold, 800-fold to 900-fold, or 900-fold to 1000-fold. In some embodiments, the increase in expression of the first RNA in the presence of the ligand is on the order of 5- to 25-fold relative to the expression of the first RNA and/or the second RNA in the absence of the ligand.
In some embodiments, the presence of the ligand results in increased expression of the second RNA. In some embodiments, the increase in expression of the second RNA in the presence of the ligand is on the order of 2-fold to 500-fold relative to the expression of the first RNA and/or the second RNA in the absence of the ligand. In some embodiments, the increase in the second RNA is approximately 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10- fold, 1-fold to 3-fold, 1-fold to 4-fold, 1-fold to 5-fold, 1-fold to 6-fold, 1 -fold to 7-fold, 1-fold to 8-fold, 1-fold to 9-fold, 1-fold to 10-fold, 10-fold to 20-fold, 20-fold to 30-fold, 30-fold to 40- fold, 40-fold to 50-fold, 50-fold to 60-fold, 60-fold to 70-fold, 70-fold to 80-fold, 80-fold to 90- fold, 90-fold to 100-fold, 100-fold to 200-fold, 200-fold to 300-fold, 300-fold to 400-fold, 400- fold to 500-fold, 500-fold to 600-fold, 600-fold to 700-fold, 700-fold to 800-fold, 800-fold to 900-fold, or 900-fold to 1000-fold. In some embodiments, the increase in expression of the second RNA in the presence of the ligand is on the order of 5-fold to 25-fold relative to the expression of the first RNA and/or the second RNA in the absence of the ligand.
In some embodiments, polynucleotides nucleotides may comprise one or more exons. In some embodiments, the polynucleotide may comprise one or more introns. In some embodiments, the polynucleotide may comprise the full sequence of a gene, such as one comprising a plurality of exons. In some embodiments, the first RNA and the second RNA differ by at least one exon. In some embodiments, for example, the first RNA comprises an exon that is not found in the second RNA. In some embodiments, binding of a ligand to the ligand- responsive sequence may promote inclusion of one or more alternative exons in the first. RNA. In some embodiments, binding of a ligand to the ligand-responsive sequence may promote exclusion of one or more alternative exons in the second RNA. In some embodiments, each of the one or more alternative exons is flanked by an intron.
In some embodiments, exons found polynucleotides correspond to an RNA of interest. In some embodiments, the first RNA encodes an RNA of interest (e.g., one that can lead to synthesis of a corresponding protein) and the second RNA does not. In some embodiments, the second RNA encodes an RNA of interest (e.g., a microRNA that can bind a target transcript of interest) and the first RNA does not.
In some embodiments, polynucleotides comprise one or more splice sites. In some embodiments, a 3’ splice site is at least 2 nucleotides long. In some embodiments, a 3’ splice site is 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotides in long. In some embodiments, a 5’ splice site is 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20- 30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotides in long. In some embodiments, a 5’ splice site is at least 7 nucleotides long. In some embodiments, a 5’ splice site is at least 9 nucleotides long. In some embodiments, the polynucleotide may comprise one or more 5’ splice sites and 3’ splice sites which are used differentially used in the splicing of the RNA encoded by the polynucleotide depending on the presence or absence of the ligand.
Non-limiting examples of polynucleotides and sequences encoded therein are found in the drawings presented herein, such as in FIGs. 1-8, 28, 33, 38, 39, 41, 43, 46A, 47A, 48A, 49A, 50A, and 51 A, and Tables 7-34, respectively. For example, in some embodiments, a polynucleotide, comprises at least one alternative exon, at least two introns flanking an alternative exon, and a ligand-responsive aptamer, wherein the presence of the ligand results in splicing out the at least one alternative exon, the at least two introns flanking the at least one alternative exon, and the ligand-responsive aptamer. Non-limiting examples of such polynucleotides are disclosed in FIGs. 4E-4H. However, such disclosures should not be considered limiting as, in other embodiments, it may be desirable to use a ligand to retain an alternatively spliced exon in the spliced RNA. Non-limiting examples of such polynucleotides are disclosed in FIGs. 4L, 4N, 4P, 46A, 47 A, and 48A.
In some embodiments, polynucleotides of the present disclosure are transgenes. In some embodiments, polynucleotides (e.g., transgenes) comprise cassettes described herein. In some embodiments, polynucleotides of the present disclosure are provided in a vector (e.g., a plasmid, phage, transposon, cosmid, chromosome, or artificial chromosome). In some embodiments. vectors are single-stranded or double-stranded. In some embodiments, vectors are circular (e.g., circular plasmids, nanoplasmids, and minicircle plasmids) or linear. In some embodiments, vectors are self-complementary. In some embodiments, polynucleotides of the present disclosure are provided in recombinant viral genome.
In some embodiments, the polynucleotide comprises a sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2183-2255, or 2259-2260. In some embodiments, the polynucleotide comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091 , 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2183-2255, or 2259- 2260.
In some embodiments, the polynucleotide comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to an exon set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143. In some embodiments, the polynucleotide comprises an exon comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, the polynucleotide comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256. In some embodiments, the polynucleotide comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, or 2137.
In some embodiments, the polynucleotide comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141. In some embodiments, the polynucleotide comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, the polynucleotide comprises a 3' splice site comprising having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239. In some embodiments, the polynucleotide comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
In some embodiments, the polynucleotide comprises a 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34. In some embodiments, the polynucleotide comprises a 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in any one of Tables 7, 25, 26, or 34.
In some embodiments, the polynucleotide comprises a ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 21 12, 2138, 2183, 2186, 2206-2211, 2213- 2220, or 2236-2260. In some embodiments, the polynucleotide comprises at least one ligand- responsive sequence comprising a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-2220, or 2236- 2260.
In some embodiments, the polynucleotide comprises a ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189. In some embodiments, the polynucleotide comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187- 2189.
In some embodiments, a polynucleotide comprises an intron, exon (e.g., alternative exon), and/or a splice site corresponding to a gene selected from the group consisting of: MBNL1; MBNL2; MBNL3; hnRNP Al; hnRNP A2B1; hnRNP ( : hnRNP D; hnRNP DL; hnRNP F; hnRNP H; hnRNP K, hnRNP L; hnRNP M; hnRNP R; hnRNP U; FUS; TDP43;
PABPN1; ATXN2; TAF15; EWSR1; MATR3; TIA1; FMRP; MTM1; MTMR2; L AMP?: KIF5A; a microdystrophin-encoding gene; C9ORF72; HTT; DNM2; BIN1 ; RYR1; NEB; ACTA; TPM3; TPM2; TNNT2; CFL2; KBTBD13; KLHL40; KLHL41; LM0D3; MYPN; SEPN1; T I N. SPEG; MYH7; TK2; P0LG1; GAA; AGE, PYGM; SLC22A5; OCTN2; ETF, ETFH; PNPLA2; a cytochrome b oxidase-encoding gene; a cytochrome c oxidase-encoding gene; CLCN1 ; SCN4A; DMPK, CNBP; MYOT; LMNA; CAV3; DNAJB6; DES; TNPO3, HNRPDL; CAPN3; DYSF; an alpha-sarcogly can-encoding gene; a beta-sarcoglycan-encoding gene; a gamma-sarcogly can-encoding gene; a delta-sarcogly can-encoding gene; TCAP;
TRIM32; FKRP; FXN; PO.MT1; FKTN, P0MT2; POMGnTl ; DAG1 ; AN05; PLEC1;
TRAPPCI 1; GMPPB; ISPD; LIMS2; POPDC1; TOR1AIP1; POGLUT2; LAMA2; COL6A1; P0MT1; P0MT2; DUX4; EMD; PAX7; PMP22; MPZ, MFN2, SMCHD1; SMN; Lamin A/C (LAMN); GJB1; ABCC1; AK 125149; ASCC2; BAT2D1; BBX; BRD8; BRE; C17orf70;
CAMKK2; CBFB; CC ARI ; CCDC7CD6; CHTF8; COL4A3BP, COL6A3; CUGBP1, CUGBP2; CXorf45; DENND3; DGUOK; DKFZp762G094; DNAJC7; DNASE1; E1F4A2; EIF4G2; EH <411; EXOCT; EZH2, FAM120A; FAM136A; FAM36A, FARSB; FBXO38; FGFR1OP2; FIP1L1; FOXRED1 ; FUBP3; GALT; GATA3; GOLGA2; HIF1A; HMMR; HRB; IKZF1; ILF3; IRAK4; IRF1; KCTD13; LEF1; LUC7L; LYRM L MAl . i l e7; MAP2K7;
MAP3K7; MAP4K2; MBNL2; MFF; NAE1; NCSTN; NR4A3; NRF1 ; NUP98; PARP6; PCM1 ; PLAUR; PLSCR3; PPIL5; PPP5C; PTPRC-E4; PTPRC-E6; PTS; RABL5; RAPH1; SEC16A; SFRS3; SFRS7; SLMAP; SNRNP70; STAT6; TBC1D1; TIMM8B; IIR8; TRA2A; TROVE2; UGCGLI ; VAP-B: VAVI ; ZNF384; ZNF496; CAMK2B: PKP2; LGMN; NRAP; VPS39; KSR1 ; PDLIM3; BINI; ARFGAP2; KIF13A; and PICALM.
(ii) Transgene
As used herein, the term “transgene” refers to any recombinant gene or a segment thereof that includes a non-naturally occurring sequence. The non-naturally occurring sequence may in some embodiments be from a different organism, but it need not be. For example, in some embodiments a transgene is a recombinant gene, or segment thereof, from one organism or infectious agent (e.g., a virus) that is introduced into the genome of another organism or infectious agent. By contrast, in some embodiments, the transgene may contain segments of DNA taken from the same organism, but the segments are arranged in a non-natural configuration. In some embodiments, the non-naturally occurring sequence is an engineered nonnatural ly occurring sequence. As used herein, a transgene may comprise any combination of naturally-occurring and engineered DNA sequences.
A transgene may be introduced into the genome of another organism or infectious agent using recombinant DNA techniques. In some embodiments, a transgene may include or may be modified to include one or any combinati on of regulatory' sequences, including, but not limited to, transcription regulatory' sequences (e.g., promoter, enhancer, silencer, transcription factor binding sequence, 5’ UTR, or 3’ UTR), post-transcriptional regulatory sequences (e.g., acceptor/ donor splicing sites and splicing regulatory sequences), ligand-responsive sequences (e.g., aptamers), and/or translation regulatory' sequences (e.g., translation initiation signals, translation termination signals, mRNA degradation or decay signals, polyadenylation signals). In some embodiments, a regulatory' sequence, such as a ligand-responsive aptamer or a ligand- responsive exon, is located in an alternatively-spliced expression cassette between two exon regions of the transgene thereby separating a single exon into two non-continuous stretches of nucleotides. In some embodiments, the transgene encodes an RNA product that plays a regulatory' role effecting gene expression in the cell such as a miRNA. In some embodiments, wherein a transgene is introduced into the genome of another organism using a recombinant adeno associated virus (AAV), the transgene comprises all components (e.g., exons, introns, regulatory' sequences, alternative exons, ligand-responsive aptamers, etc.) which are located between the .AAV inverted terminal repeat sequences (see, e.g., FIG. 3 A).
In some embodiments, the transgene encoded a sequence encoding a ligand-responsive sequence (e.g., a ligand-responsive aptamer). In some embodiments, the transgene comprises a sequence encoding an RN A of interest. In some embodiments, a transgene comprises two or more discontinuous sequences encoding distinct portions of an RNA of interest.
In some embodiments, a transgene may be modified to comprise an alternatively-spliced exon, defined below, such that the regulation of the expression of the transgene, the product encoded by the transgene, or the target of a miRN A encoded by the transgene comes under control of the alternatively-spliced exon. In some embodiments, the alternative splicing of an exon of the transgene is dependent upon the presence of a ligand to which a ligand-responsive aptamer sequence within the transgene binds to. In some embodiments, the alternative splicing of an exon of the transgene is dependent upon the presence of a iigand to which a ligand-responsive exon within the transgene binds to. The alternatively-spliced exon may be configured in a “cassette,” defined below.
In some embodiments, the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260. In some embodiments, the transgene comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 21 12, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260
In some embodiments, the transgene comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to a nucleic acid sequence of an exon as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143. In some embodiments, the transgene comprises an exon comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, the transgene comprises at least two exons having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to the nucleic acid sequences of two exons as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143. In some embodiments, the transgene comprises at least two exons comprising a nucleic acid sequence of two exons as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, the transgene comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256. In some embodiments, the transgene comprises at least two exons comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
In some embodiments, the transgene comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141 . In some embodiments, the transgene comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121 , 2127, 2129, 2130, or 2141.
In some embodiments, the transgene comprises at least two introns having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to the nucleic acid sequences of trvo introns as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141. In some embodiments, the transgene comprises at least two introns comprising the nucleic acid sequences of two introns as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101 , 2104, 2107, 2113, 2115, 21 17, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, the transgene comprises at least one 3' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239. In some embodiments, the transgene comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
In some embodiments, the transgene comprises at least one 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34. In some embodiments, the transgene comprises at least one 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34.
In some embodiments, the transgene comprises at least one ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-
2220, or 2236-2260. In some embodiments, the transgene comprises at least one ligand- responsive sequence a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211 , 2213-2220, or 2236-2260. In some embodiments, the transgene comprises at least one ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189. In some embodiments, the transgene comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, or 2187-2189.
(Hi) Regulatory sequence
Overview of Regulatory Sequences
As used herein, a “regulatory sequence” or, equivalently, a “regulatory element,” may refer to a nucleotide sequence that regulates, directly or indirectly, any aspect of the expression of a gene or transgene, including regulatory sequences that effect transcription of a gene or transgene into one or more mRNAs, the processing of mRNA (e.g., the splicing of a pre-mRNA comprising exons and introns to produce one or more mRNA isoforms), and/or the translation of a coding region in a mRNA to form a polypeptide product. Non-limiting examples of positive or negative regulatory sequences, such as m-elements can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise controls the amount, stability, and/or degradation of an mRNA encoding a coding region of interest (or portions thereof); and/or (2) a nucleotide sequence element that regulates, modulates, or otherwise controls the translation of a coding region of interest (or portions thereof) encoded by an mRNA.
In some embodiments, a ligand-responsive sequence may function as a cA-element which is capable of binding to an exogenously administered ligand. In some embodiments, a ligand- responsive sequence functions as a cA-element by regulating alternative splicing of the nucleic acid it is provided in (such as an inducibly-spliced cassette of a transgene). In some embodiments, a ligand-responsive sequence functions as a positive regulator (e.g., increasing expression or the function transgene). In some embodiments, a ligand-responsive sequence functions as a negative regulator (e.g., reducing expression or the function transgene).
In some embodiments, a ligand-responsive aptamer may function as a cA-element which is capable of binding to an exogenously administered ligand. In some embodiments, the ligand- responsive aptamer functions as a m-element by regulating alternative splicing of the nucleic acid it is provided in (such as an inducibly-spliced cassette of a transgene). In some embodiments, a ligand-responsive aptamer functions as a positive regulator (e.g, increasing expression or the function transgene). In some embodiments, a ligand-responsive aptamer functions as a negative regulator (e.g, reducing expression or the function transgene).
Non-Limiting Embodiments of Other Regulatory Sequences
In some embodiments, polynucleotides of the present disclosure (e.g., transgenes) are operably linked to at least one other regulator}' sequence in addition to at least one operably linked ligand-responsive sequence described herein. As used herein, a polynucleotide and regulatory sequences are said to be “operably linked” (which may be used interchangeably with “operatively linked”) when they are covalently linked in such a way as to place the expression (e.g., transcription and/or translation) of the nucleic acid sequence under the influence or control of the regulator}/ sequences. For example, a promoter region would be operably linked to a nucleic acid sequence if the promoter region w'ere capable of effecting transcription of that DNA sequence such that the corresponding RNA (e.g., a pre-mRNA, a mRNA, a miRNA, etc.) might be present at increased levels in a cell and/or translated into the desired protein or polypeptide. Similarly, two or more coding regions are operably linked when they are linked in such a way that their transcription from a common promoter result in the expression of two or more proteins having been translated in frame.
Non-limiting examples of other regulatory sequences which may be located in polynucleotides (e.g., transgenes) comprising ligand-responsive sequences (e.g., a transgene comprising a cassette wherein alternative splicing of RNA encoded by the cassette is regulated by an operably linked ligand-responsive sequence) include transcriptional regulatory sequences (e.g., promoters, enhancers, silencers, transcription factor binding sequences, 5’ UTRs, or 3’ UTRs), post-transcriptional regulatory sequences (e.g., accept or/donor splicing sites and splicing regulatory sequences), and/or translation regulatory sequences (e.g., translation initiation signals, translation termination signals, mRNA degradation or decay signals, polyadenylation signals). In some embodiments, regulatory' sequences include, without limitation, promoter sequences, ribosome binding sites, ribozymes, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5’ and 3’ untranslated regions (UTRs), transcriptional start sites, transcription terminator sequences, polyadenylation sequences, introns, and premature stop codons. In some embodiments, for example, a pre-mature stop codon may be found in an RNA such that it is in-frame with a sequence encoding an RNA of interest (e.g., located within an exon) which results in production of a truncated protein corresponding to the RNA. In some embodiments, the pre-mature stop codon can be UAA, UAG, or UGA.
Figure imgf000078_0001
The promoter driving expression of polynucleotides of the present disclosure can be, but is not limited to, a constitutive promoter, an inducible promoter, a tissue-specific promoter, or a synthetic promoter.
In some embodiments, a constitutive promoter maintains constant expression of RNAs regardless of the conditions or physiological state of a host cell. In some embodiments, a constitutive promoter can be, but is not limited to, a Herpes Simplex virus (HS V) promoter, a thymidine kinase (TK) promoter, a Rous Sarcoma Virus (RSV) promoter, a Simian Virus 40 (SV40) promoter, a Mouse Mammary Tumor Virus (MMTV) promoter, an Adenovirus El A promoter, a cytomegalovirus (CMV) promoter (see, e.g,, Boshart et al.. Cell, 41 :521-530 (1985)), the phosphoglycerol kinase (PGK) promoter, the CAG promoter, and the human elongation factor-1 alpha (EFl a) promoter [Invitrogen], the dihydrofolate reductase promoter, a mammalian housekeeping gene promoter, or a P-actin promoter.
In some embodiments, inducible promoters allow regulation of gene expression and can be regulated by exogenously supplied compounds, environmental factors such as temperature, or the presence of a specific physiological state. In some embodiments, an inducible promoter can be, but is not limited to, an IPTG-inducible promoter, a cytochrome P450 gene promoter, a heat shock protein gene promoter, a metallothionein gene promoter, a hormone-inducible gene promoter, an estrogen gene promoter, or a tetVP16 promoter, the zinc-inducible sheep metallothionine (MT) promoter, the dexamethasone (Dex)-inducible mouse mammary tumor virus (MMTV) promoter, the T7 polymerase promoter system (WO 98/10088), the ecdysone insect promoter (No et al., Proc. Natl. .Acad. Sci. USA, 93:3346-3351 (1996)), the tetracycline- repressible system (Gossen et al., Proc. Natl. Acad. Sci. USA, 89:5547-5551 (1992)), the tetracycline-inducible system (Gossen et al., Science, 268: 1766-1769 (1995), see also Harvey et al., Curr. Opin. Chem. Biol., 2:512-518 (1998)), the RU486-inducible system (Wang et al., Nat. Biotech., 15:239-243 (1997) and Wang et al., Gene Then, 4:432-441 (1997)), the rapamycin- inducible system (Magari et al., J. Clin. Invest., 100:2865-2872 (1997)). Still, in other embodiments, inducible promoters which may be useful in this context are those which are regulated by a specific physiological state.
In some embodiments, the tissue-specific regulatory sequences bind tissue-specific transcription factors that induce transcription in a tissue specific manner. In some embodiments, tissue-specific promoters include, but are not limited to, retinoschisin proximal promoter, interphotoreceptor retinoid-binding protein enhancer (RS/IRBPa), rhodopsin kinase (RK), liverspecific thyroxin binding globulin (TBG) promoter, an trypsin promoter, a glucagon promoter, a somatostatin promoter, a pancreatic polypeptide (PPY) promoter, a synapsin-1 (Syn) promoter, a creatine kinase (MCK) promoter, a mammalian desmin (DES) promoter, a a-myosin heavy chain (a-MHC) promoter, or a cardiac Troponin T (cTnT) promoter. Other exemplary promoters include Beta-actin promoter, hepatitis B virus core promoter. Sandig et al., Gene Ther., 3: 1002-9 (1996); alpha-fetoprotein (AFP) promoter, Arbuthnot et al.. Hum. Gene Ther., 7: 1503-14 (1996)), bone osteocalcin promoter (Stein et al., Mol. Biol. Rep., 24 : 185-96 (1997)); bone sialoprotein promoter (Chen et al., J. Bone Miner. Res., 11 :654-64 (1996)), CD2 promoter (Hansal et ah, J. Immunol., 161 : 1063-8 (1998); immunoglobulin heavy chain promoter ; T cell receptor a-chain promoter, neuronal such as neuron-specific enolase (NSE) promoter (Andersen et al.. Cell. Mol. Neurobiol., 13:503-15 (1993)), neurofilament light-chain gene promoter (Piccioli et al., Proc. Natl. Acad. Sci. USA, 88:5611-5 (1991)), and the neuron-specific vgf gene promoter (Piccioli et al., Neuron, 15:373-84 ( 1995)), among others which will be apparent to the skilled artisan.
In some embodiments, polynucleotides of the present disclosure are operably linked to a native promoter of a gene which endogenous to a cell (e.g., a cell comprising a polynucleotide described herein). In some embodiments, the native promoter may be preferred when it is desired that expression of the polynucleotide should mimic the native expression of a gene of interest. In some embodiments, the native promoter may be used when expression of the polynucleotide must be regulated temporally, developmentally, in a tissue-specific manner, or in response to specific transcriptional stimuli. In a further embodiment, other native regulatory/ sequences, such as enhancer elements, poly adeny lation sites, and/or Kozak consensus sequences may also be used to mimic the native expression.
In some embodiments, the regulatory sequence driving expression of a polynucleotide is an RNA pol II promoter. In some embodiments, the regulatory sequence is an RNA pol III promoter, such as U6 or Hl. In some embodiments, the regulatory sequence is an RNA pol II promoter. In some embodiments, the regulatory sequence is a CMV enhancer (CMVe). In some embodiments, the regulatory' sequence is a chicken P-actin (CBA) promoter. In some embodiments, the regulatory sequence is a CMVe and a CBA promoter. In some embodiments, the regulatory sequence is a CAG promoter. Other examples of regulatory' sequence which maybe operably linked to a sequence encoding an polynucleotide described herein include a BDNF promoter, an NGF promoter, an EGF promoter, a growth factor promoter, an axon-specific promoter, a dendrite-specific promoter, a brain-specific promoter, a hippocampal-specific promoter, a kidney-specific promoter, an elafin promoter, a cytokine promoter, an interferon promoter, an al antitrypsin promoter, a brain cell-specific promoter, a neural cell-specific promoter, a central nervous system cell-specific promoter, a peripheral nervous system cellspecific promoter, an interleukin promoter, a serpin promoter, a hybrid CMV promoter, a hybrid P-actin promoter, an EFl promoter, a Ula promoter, a Ulb promoter, a Tet-inducible promoter, a VP 16 Lex A promoter, or a mammalian or avian p-actin promoter.
Non-1 Amiting EAibodiments of 3 ’ Sequences
In some embodiments, a polynucleotide comprises a polyadenylation sequence following the sequence encoding the polynucleotide and before any other 3’ regulatory- sequence (e.g., a 3’ AAV ITR). In some embodiments, a poly(A) signal sequence is inserted following the sequence encoding the polynucleotide and before any other 3’ sequence (e.g., a 3’ AAV ITR), which signals for the polyadenylation of transcribed mRNA molecules. Examples of poly(A) signal sequences include, but are not limited to, bovine growth hormone (bGH) poly(A) signal sequence, SV-40 poly(A) signal sequence, and synthetic poly(A) signal sequences, which are known to cause polyadenylation of eukaryotic transgenes and efficient termination of translation ( Azzoni A R et al., J Gene Med. 2007; 9(5):392-402).
In some embodiments, a regulatory sequence that enhances expression of the polynucleotide may further be inserted following the sequence encoding the RNA of interest and before the 3’ AAV ITR and poly(A) signal sequences. .An exemplary regulatory sequence includes, but is not limited to, a woodchuck hepatitis virus (WHV) post-transcriptional regulatory' element (WPRE) (Higashimoto T et al., Gene Ther. 2007, 14(17): 1298-304), (iv) Alternatively-spliced exon
An “exon” refers to certain nucleotide sequences comprising exon sequences in addition to exon regions which are either retained (e.g., spliced-in), excluded (e.g., spliced-out), or spliced together (such as forming one continuous exon from an exon that was previously split into two non-continuous regions) during post-transcriptional splicing of a pre-mRNA or pri-miRNA.
Whether an exon is spliced-in or spliced-out may depend on a number of different factors, including, but not limited to one or more cellular conditions, such as the presence or absence of a disease state (e.g., cancer), type of cell (e.g., liver cell versus skeletal cell), other intracellular conditions, or an external engineered factor (e.g., the administration of an agent such as a ligand). In some embodiments, the term exon may be used interchangeably with the term “alternatively-spliced exon” or “alternative exon.” Differential splicing events can result in different spliced transcripts (e.g, mRNA isoforms) that either retain or exclude the alternative exon. Further, as disclosed herein, exons may comprise one or more positive or negative regulatory c/.s-elements that exert a positive or negative regulatory control on the expression of a coding region of interest (or portions thereof). Separately, exons may comprise one or more positive or negative regulatory cxs’-elements that exert positive or negative regulatory control on the expression of an RNA, such as one encoding a protein (e.g., an mRNA encoding a therapeutic protein) or a miRNA.
Exons may be found in nature in a naturally-occurring gene, or may be modified by changing or altering the sequence thereof, including adding or changing the splice site, and/or adding or changing a positive or negative regulatory' cis-element (e.g., a ligand-responsive sequence). Such altered exons may be referred to as “recombinant” or “synthetic” exons. “Recombinant” or “synthetic” may in some embodiments include naturally occurring exons that have been placed into a heterologous gene (e.g, an unmodified exon placed into a non-natural context). In some embodiments, the c/.v-elements mediate localization to a specific cellular compartment, such as, for example, an organelle, the cytoskeleton, plasma membrane, the endoplasmic reticulum, the mitochondria, the nucleus, etc.
In some embodiments, the polynucleotide (e.g., a transgene or a cassette) comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256. In some embodiments, the polynucleotide (e.g., a transgene or a cassette) comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 21 14, 2137, 2236, or 2247- 2256.
(v) Cassette
As used herein, the term “cassette'’ refers to any set of introns and/or exons (including an alternatively-spliced exon) capable of exhibiting a splicing pattern to produce different spliced transcript (e.g., mRNA isoforms).
In some embodiments, the cassette comprises an alternatively-spliced exon. In some embodiments, sequence comprising the intronic sequences (or portions thereof) flanking the alternatively-spliced exon may be referred to as an “alternative splicing cassette” or equivalently, “alternatively-spliced exon cassette” or “alternative exon cassette.” When situated in an alternatively-spliced exon cassette, an alternative-spliced exon may be alternatively referred to as a “cassette exon.” For purposes of clarity, a “cassette,” and in particular, an “alternatively- spliced exon cassette,” may exclude a coding region of interest, but also may be configured to be operatively linked to any coding region of interest such that the alternatively-spliced exon cassette regulates the expression of the coding region of interest.
In some embodiments, the term “cassette” refers to a set of introns, alternative exon(s) and ligand-responsive aptamer capable of exhibiting a splicing pattern to produce differentially spliced transcript (e.g., miRNA or mRNA isoforms). In some embodiments, the term “cassette” refers to a set of introns, alternative exon(s) and ligand-responsive sequence that is not an aptamer (e.g., a ligand-responsive exon) capable of exhibiting a splicing pattern to produce differentially spliced transcript (e.g., miRNA or mRNA isoforms).
In some embodiments, the terms “cassette,” “expression cassette,” “inducibly-spliced cassete,” “inducibly-spliced exon cassette” or “alternatively-spliced cassette” may be used equivalently or interchangeably. As such, the inducibly-spliced cassettes of the present disclosure can be considered to be “ligand-responsive” as the presence of the ligand-responsive sequence, such as a ligand-responsive exon or a ligand-responsive aptamer, in the cassette induces the splicing of the transgene comprising the cassette. When situated in an inducibly- spliced cassette, an alternative exon may be alternatively referred to as a “cassette exon.” For purposes of clarity, a “cassette,” and in particular, an “inducibly-spliced cassette,” may exclude a coding region of interest, but also may be configured to be operatively linked to any coding region of interest such that the inducibly-spliced exon cassette regulates the expression of the coding region of interest. Such an example would be an indicubly-spliced exon cassette comprising in its non-spliced form a crt-regulatory element that either negatively or positively regulates the expression of a coding sequence to which it is operatively linked. In this way, the presence of a ligand which binds to the ligand-responsive sequence (e.g., an aptamer) of the cassette would result in a splicing reaction which would alter the functionality of the cassette acting as a czk-regulatory element thereby inducibly changing the expression patterns of the coding region to which the cassette is operatively linked. In this way, the cassette comprising the introns, alternative exon, and the ligand-responsive sequence (e.g., aptamer) may act as a riboswitch by regulating the splicing patterns of the transgene based on the presence or absence of the ligand.
Alternatively, another example would be a non-functional start codon (e.g., a start, codon provided in the two non-continuous regions of an exon or provided in two separate exons), wherein upon inducing splicing, a functional start codon is produced which promotes protein translation of the downstream sequence. In some embodiments, the intronic sequences that split the exon are positioned near an alternative exon and a ligand-responsive aptamer which regulates splicing of the inducibly-spliced cassette. In some embodiments, the cassette comprises a premature stop codon which regulates the translation of the transgene and is spliced out only in the presence of a ligand. In some embodiments, the inducibly-spliced cassette comprises a miRNA gene which is n on-functional in the absence of the ligand and functional only upon splicing of the cassette in the presence of the ligand.
In some embodiments, the cassette is inserted without making any changes to the sequence flanking the insertion site (e.g., at a genomic site in a host cell). However, in some embodiments one or more nucleotide sequence changes are made in one or both flanking regions (e.g., at the positions immediately flanking the site of insertion). In some embodiments, the one or more nucleotide changes render either or both flanking sequences more compatible with splicing. In some embodiments, the one or more nucleotide changes result in either or both flanking sequences becoming effective 3’ and/or 5’ splice sites. In some embodiments, the one or more nucleotide changes include introducing one or more sequences that support an effective dynamic range between alternative splicing events of a ligand-induced alternatively spliced exon described in this application. In some embodiments, the one or more nucleotide changes include introducing one or more flanking sequence described in this application.
In some embodiments, the cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260. In some embodiments, the cassette comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 21 16, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In some embodiments, the cassette comprises an exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an exon as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143. In some embodiments, the cassette comprises at least two exons comprising a nucleic acid sequence of an exon as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, the cassette comprises at least two exons having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to the nucleic acid sequences of two exons as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143. In some embodiments, the cassette comprises at least two exons comprising the nucleic acid sequences of two exons as set forth in any one of SEQ ID NOs: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
In some embodiments, the cassette comprises an alternative exon having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an alternative exon as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137, 2236, or 2247-2256. In some embodiments, the cassette comprises an alternative exon comprising a nucleic acid sequence of an alternative exon as set forth in any one of SEQ ID NOs: 2084, 2094, 2100, 2103, 2106, 21 14, or 2137, 2236, or 2247-2256.
In some embodiments, the cassette comprises an intron having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of an intron as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141. In some embodiments, the cassette comprises an intron comprising a nucleic acid sequence of an intron as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 21 15, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, the cassette comprises at least two introns having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity', relative to the nucleic acid sequences of tw?o introns as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141. In some embodiments, the cassette comprises at least two introns comprising a nucleic acid sequence of two introns as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 21 13, 2115, 2117, 21 18, 2121, 2127, 2129, 2130, or 2141.
In some embodiments, the cassette comprises at least one 3' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 3’ splice site as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239. In some embodiments, the cassette comprises at least one 3' splice site comprising a nucleic acid sequence of a 3’ splice site as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
In some embodiments, the cassette comprises at least one 5' splice site having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in Tables 7, 25, 26, or 34. In some embodiments, the cassette comprises at least one 5' splice site comprising a nucleic acid sequence of a 5’ splice site as set forth in any one of Tables 7, 25, 26, or 34.
In some embodiments, the cassette comprises at least one ligand-responsive sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NO: 2086, 2095, or 2112. In some embodiments, the cassette comprises at least one ligand-responsive sequence comprising a nucleic acid sequence of a ligand-responsive sequence as set forth in SEQ ID NO: 2086, 2095, or 2112. In some embodiments, the cassette comprises at least one ligand-responsive aptamer having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213- 2220, or 2236-2260. In some embodiments, the cassette comprises at least one ligand-responsive aptamer comprising a nucleic acid sequence of a ligand-responsive aptamer as set forth in SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211 , 2213-2220, or 2236-2260.
(vi) Ligand-responsive sequences
Overview of Ligand-Responsive Sequences
In some embodiments, polynucleotides of the present disclosure comprise a ligand- responsive sequence. As used herein, a “ligand-responsive sequence” refers to a polynucleotide (e.g., an RNA sequence found in a pre-mRNA) having a sequence capable of binding a ligand.
In some embodiments, a ligand-responsive sequence binds a ligand to regulate alternative splicing of an RNA comprising said ligand-responsive sequence. In some embodiments, binding of a ligand to a ligand-responsive sequence induces a specific combination of 5’ and 3’ splice sites to be used during splicing. In some embodiments, polynucleotides that comprise ligand- responsive sequences have a balance of splice strengths, such that addition of a ligand sufficiently changes the way that the spliceosome recognizes the sequences involved in regulating splicing.
In some embodiments, a ligand-responsive sequence comprises approximately 2-200 nucleotides in length. In some embodiments, a ligand-responsive sequence comprises approximately 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, 20-30, 30-40, 40-50, 50-60, 70-80, 80-90, 90-100, 100-125, 125-150, 150-175, 175-200, 200-250, or 250-300 nucleotides in length. However, in other embodiments, a ligand-responsive sequence may be greater than 300 nucleotides.
In some embodiments, a ligand-responsive sequence is generated by modifying a natural exon, natural intron, and/or natural splice site. In some embodiments, such modifications comprise substituting, deleting, and/or inserting one or more nucleotides (e.g., nucleotides in a sequence known to bind to ligands) to enhance ligand binding. In some embodiments, a ligand- responsive sequence is generated by completely replacing an exon, intron, and/or splice site with a sequence not naturally found in the gene. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in an exon. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in an intron. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in a 5’ splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found in a 3: splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is placed in an intron downstream of a 5’ splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, is placed in an intron upstream of a 3’ splice site. In some embodiments, a ligand-responsive sequence, or a portion thereof, spans an exon-intron boundary (e.g., a first portion of the ligand-responsive sequence is found in an exon and a separate portion thereof is found in an adjacent intron). For example, in some embodiments, a first portion of a ligand-responsive sequence comprises approximately 1-10, 10-20, 20-30, 30-40, 40-50, or more nucleotides and may be located in an exon which is located immediately 5’ of an intron comprising a second portion of the ligand-responsive sequence comprising approximately 1-10, 10-20, 20-30, 30-40, 40-50 or more nucleotides. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, or more nucleotides upstream or downstream from a 5’ splice. In some embodiments, a ligand-responsive sequence, or a portion thereof, is found 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, or more nucleotides upstream or downstream from a 3’ splice site. In some embodiments, the ligand-responsive sequence, or a portion thereof, is found in an intron, an exon (e.g., an alternative exon), and/or a splice site.
In some embodiments, a polynucleotide may comprise at least one ligand-responsive sequence. In some embodiments, a polynucleotide may comprise a plurality (e.g., 2, 3, 4, or more) of ligand-responsive sequences. In some embodiments, a polynucleotide comprising a plurality of ligand-responsive sequences may be responsive to more than one ligand.
Non-limiting examples of ligand responsive sequences are found in Examples 7 and 10 (e.g., SEQ ID NOs: 2086, 2095, 2112, 2138, 2183, 2186, 2206-2211, 2213-2220, or 2236-2260, and those in Table 34).
Non-l dmitlng Embodiments of Ligand Binding In some embodiments, a ligand-responsive sequence encodes an RNA sequence that is capable of binding RNA. In some embodiments, such a ligand may be capable of binding an RNA (e.g., a sequence in a splice site, exon, intron, and/or aptamer, or a sequence wherein distinct portions thereof are found in a splice site, exon, and/or intron, such as sequences found in pre-mRNA) and/or one or more components of the spliceosome. In some embodiments, the binding affinity' may be characterized by a dissociation constant in the micromolar, nanomolar, or femtomolar range.
In some embodiments, ligand-responsive sequences bind biomolecules such as, but not limited to, proteins, peptides, carbohydrates, lipids, nucleic acids, and combinations thereof such as glycoproteins or lipidated proteins. In some embodiments, the ligand is a small molecule (e.g., a drug molecule). In some embodiments, the ligand is a nucleic acid (e.g., an antisense oligonucleotide, such as an exon-skipper).
In some embodiments, a ligand-responsive sequence comprises affinity to a non-toxic ligand. For example, in some embodiments, such a ligand will be tolerable (e.g., does not result in cytotoxicity) across a broad range of concentrations that are sufficient to regulate alternative splicing. In some embodiments, a ligand-responsive sequence comprises binding affinity to a ligand that is cell permeable. In some embodiments, a ligand-responsive sequence comprises binding affinity to a ligand that is expressed in a cell (e.g., an ASO encoded by a nucleic acid in the cell). In some embodiments, for example, a. ligand-responsive sequence may comprise an exon from a gene that exhibits alternative splicing in the presence of an exon-skipping ASO (e.g., an exon 7 from a SMN2 gene further comprising flaking introns which may be derived from the SMN2 gene or an alternatively spliced exon from a dystrophin gene).
In some embodiments, a ligand-responsive sequence comprises a sequence with binding affinity to a specific ligand when it is expressed as an RNA and adopts a three-dimensional conformation that specifically binds the ligand. In some embodiments, such sequences facilitate alternative splicing of polynucleotides as described herein. However, in other embodiments, a ligand-responsive sequence (e.g., one found in an RNA capable of binding a ligand) comprises a sequence with binding affinity to a plurality of ligands.
In some embodiments, the ligand-responsive sequence binds risdiplam. In some embodiments, a sequence capable of binding risdiplam comprises WGAGTAAGW, wherein W is A or T. In some embodiments, the ligand-responsive sequence binds branaplam. In some embodiments, a sequence capable of binding branaplam comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187)
In some embodiments, the ligand-responsive sequence binds tetracycline. In some embodiments, a sequence capable of binding tetracycline comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T. In some embodiments, a sequence capable of binding tetracycline comprises TAAAACATACCAYMCGKAAMCGKMTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T. lJgaHd~Resp()nsive Aptamers
In some embodiments, ligand-responsive sequences are aptamers. The term “aptamer” or “ligand-responsive aptamer” as used herein, refers to an oligonucleotide (e.g., single-stranded DNA (ssDNA) or single-stranded RNA (ssRNA) that can specifically bind to a ligand. As used herein, the term “ligand-responsive aptamer” may be used to describe an aptamer which changes its structural confirmation as a result of binding to a ligand. In some embodiments, whether the aptamer is spliced out or retained in the transgene is not a direct result of binding to the ligand. In some embodiments, it is dependent on the splice site strength which is regulated by the aptamer (and the location of the aptamer relative to the intron and/or exon sequences that are either spliced out or retained).
An aptamer binds to its target with high affinity, selectivity, and specificity (see., e.g., Keefe et al., Aptamers as therapeutics. Nat. Rev. Drag Discov. 2010;9:537-550; Jayasena S.D. Aptamers: .An emerging class of molecules that rival antibodies in diagnostics. Clin. Chem. 1999:45: 1628- 1650). Aptamer binding is determined by its tertiary structure. Target recognition and binding of an aptamer involves three-dimensional, shape-dependent interactions as well as hydrophobic interactions, base-stacking, and intercalation. It will be known to those in the art that a ligand refers to a target, molecule to which a separate molecule (e.g., an aptamer) binds with specific chemical affinity7. In some embodiments, ligands of aptamers include biomolecules such as, but not limited to, proteins, peptides, carbohydrates, lipids, nucleic acids, and combinations thereof such as glycoproteins or lipidated proteins. In some embodiments, a target molecule of an aptamer is a small molecule or a toxin. In some embodiments, aptamers bind cells (e.g., live cells). In some embodiments, an aptamer binds to drug molecules, such as tetracycline, branaplam, or risdiplam.
In some embodiments, aptamers are screened for their ability to bind to ligands through various methods known in the art such as SELEX (see, e.g., Ruscito & DeRosa. Small-Molecule Binding Aptamers: Selection Strategies, Characterization, and Applications. Front. Chem. 2016:4; 1.). As such, an aptamer is said to be “ligand-responsive” if it binds to a ligand target molecule. In some embodiments, the terms “aptamer” and “ligand-responsive aptamer” can be used interchangeably. Responding to a ligand may entail a confirmational change in the ligand- responsive aptamer thereby altering the 3-D shape of the aptamer upon assuming its bound-state in the presence of its ligand.
In some embodiments, an aptamer comprises between 20 and 60 nucleotides, between 25 and 55 nucleotides, between 30 and 50 nucleotides, between 35 and 45 nucleotides, between 20 and 50 nucleotides, between 20 and 40 nucleotides, between 25 and 40 nucleotides, between 20 and 30 nucleotides, between 30 and 40 nucleotides, between 30 and 60 nucleotides, between 40 and 60 nucleotides, or between 50 and 60 nucleotides. However, in other embodiments, aptamers may comprise more than 60 nucleotides (e.g., approximately 80, 100, 120, 140, etc).
In some embodiments, an aptamer comprises a first stem region and a second stem region. In some embodiments, for example, ligand-responsive aptamer stem length influences the sensitivity of an RNA to ligand binding to effect splicing. In some embodiments, a stem region comprises at least two nucleotides. In some embodiments, a stem region comprises 1-5 nucleotides. In some embodiments, a stem region may comprise approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. In some embodiments, a stem region comprises more than 15 nucleotides. In some embodiments, the first stem region and the second stem region are the same length. In some embodiments the first stem region and the second stem region are different lengths. In some embodiments, the first stem region and the second stem region differ in length by approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides.
In some embodiments, an aptamer comprises a loop region. In some embodiments, a loop region may comprise 1-10, 10-20, 20-30, 30-40, 40-50, or more nucleotides (e.g., 50-75, 75-100, etc,). In some embodiments, an aptamer comprises a plurality of stems and loops. For example, in some embodiments, an aptamer may comprise 2, 3, 4, 5, or more loops each associated with their own respective first stem region and second stem region.
In some embodiments, an aptamer regulates the activity of 3’ and 5' splice sites. In some embodiments, splice sites may be part of the aptamer structure (e.g., to influence its 3D conformation to effect splicing). In some embodiments, splices sites are not found in an aptamer sequence but are located within 1-5, 5-10, 10-15, 15-20, or 20-30 nucleotides of a sequence capable of binding a ligand.
In some embodiments, a first stem region is located downstream of a 3’ splice site. In some embodiments, a first stem region is located upstream of a 3’ splice site. In some embodiments, a sequence that is not a stem region comprising approximately 1-10, 10-20, 20-30, or more nucleotides is found downstream of the 3’ splice site and first stem region and upstream of the remaining aptamer sequence. In some embodiments, a stem region is located upstream of a 5’ splice site. In some embodiments, a stem region is located downstream of a 5’ splice site. In some embodiments, a sequence that is not a stem region approximately, I, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides long is found between the 5’ splice site and the second stem region.
In some embodiments, a polynucleotide comprising a ligand-responsive aptamer is comprises a general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Ligand-Binding Sequence]-[5’ splice site]-[second stem region].
In some embodiments, a polynucleotide comprising a ligand-responsive aptamer comprises a general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Ligand-Binding Sequence]-[5’ splice site] -[sequence comprising at least 2 nucleotides]-[second stem region]. In some embodiments, said sequences may be flanked by one or more introns and/or exons.
In some embodiments, for example, a polynucleotide comprising a ligand-responsive aptamer comprises a general structure of: [EXON]-[INTRON]-[upstream 3' splice site]-[first stem region]-[5’ splice site reverse complementary' sequence]-[Ligand-Binding Sequence]-[5’ splice site]-[sequence comprising at least 2 nucleotides]-[second stem region] -[INTRON] - [downstream 3' splice site]~[EXON]. In some embodiments, an aptamer binds risdiplam. In some embodiments, the aptamer comprises WGAGTAAGW (SEQ ID NO: 2261), wherein W is A or T.
In some embodiments, an aptamer binds branaplam. In some embodiments, the aptamer comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
In some embodiments, an aptamer binds tetracycline. In some embodiments, the aptamer comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T. In some embodiments, the aptamer comprises TAAAACATACCAYMCGKAAMCGKMTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T.
In some embodiments, a transgene comprises at least one ligand-responsive aptamer that comprises a sequence that is part of a 5' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence of a 5’ splice site as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239. In some embodiments, the transgene comprises at least one ligand-responsive aptamer comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
In some embodiments, a ligand-responsive aptamer is provided in a polynucleotide wherein the aptamer sequence is flanked by non-aptamer nucleic acid sequences (e.g., exons and/or introns). In some embodiments, the aptamer is provided in an intron of the transgene. In some embodiments, the aptamer is provided in an alternative exon of the transgene. In some embodiments, the aptamer spans an intron-exon boundary of the transgene.
In some embodiments, upon binding to a ligand, the aptamer alters its 3D conformation thereby conveying ligand-dependent regulator}- effects on the polynucleotide within which it is provided. In some embodiments, binding to a ligand enables the aptamer to regulate the alternative splicing of a polynucleotide within which it is provided. In some embodiments, the presence of a ligand increases the translation of an mRNA comprising the ligand-responsive aptamer. In some embodiments, the presence of a ligand decreases the translation of an mRNA comprising the ligand-responsive aptamer. In some embodiments, the presence of a ligand may enhance the expression of a particular isoform of an mRNA sequence or protein upon binding its cognate ligand-responsive aptamer. In some embodiments, the presence of a ligand forces the aptamer to be spliced out of the transgene thereby forming a functional RNA product such as a miRNA. In some embodiments, the ligand-responsive aptamer is present in the intron and, therefore, is spliced out of the transgene regardless of the presence or absence of ligand. In such an embodiment, ligand addition also results in splicing out the alternative exon of the transgene. In some embodiments, splicing out an aptamer from the transgene as a result of ligand addition causes two regions of a non-continuous exon to be spliced together forming a continuous exon sequence. In some embodiments, the ligand-responsive aptamer is alternatively-spliced out of the transgene wherein the presence of the ligand results in the formation of a transgene which does not comprise the ligand-responsive aptamer.
In some embodiments, the transgene comprises at least one ligand-responsive aptamers comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239. In some embodiments, the transgene comprises at least one ligand-responsive aptamer comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NOs: 2086, 2095, 2138, 2188-2189, 2212-2220, or 2236-2239.
Risdiplam-Responsive Sequences
In some embodiments, a ligand-responsive sequence is a risdiplam-responsive sequence.
In some embodiments, risdiplam enhances recognition of 5’ splice sites (e.g., suboptimal or weak 5’ splice sites) by a component of the spliceosome (e.g., the U1 snRNP). In some embodiments, risdiplam enhances pre-mRNA interactions with the U1 snRNP at a 5’ splice site. In some embodiments, risdiplam interacts with exon sequence upstream of a 5’ splice site to either preclude interaction with splicing silencers or recruit splicing enhancers. Accordingly, in some embodiments, a risdiplam-responsive sequence occurs in an alternative exon at. the 5’ splice site and upstream of the downstream intron. In some embodiments, binding of risdiplam to a risdiplam-responsive sequence will lead to intron exclusion. In some embodiments, for example, when the RNA comprises one intron flanked by exons, the presence of risdiplam results in intron removal. A non-limiting example of such embodiments is diagrammed in FIG. 43. In other embodiments, binding of risdiplam to a risdiplam-responsive sequence will lead to alternative exon inclusion. In some embodiments, for example, when the RNA comprises two introns flanking one or more alternative exons, the presence of risdiplam results in inclusion of the one or more alternative exons. A non-limiting example of such embodiments is diagrammed in FIG. 47 A.
In some embodiments, a risdiplam-responsive sequence comprises a sequence of WGA wherein W corresponds to T or A. In some embodiments, a risdiplam-responsive sequence comprises a sequence of GTAAGW wherein W corresponds to T or A. In some embodiments, a risdiplam-responsive sequence is in an exon-intron boundary with a sequence comprising WGA|gtaagw wherein W corresponds to T or A and indicates the exon-intron boundary' (introns may be shown in lower case letter in some instances herein). In some embodiments, a risdiplam-responsive sequence comprises the sequence of AGGAAG which may be located in an exon (e.g., an alternative exon).
In some embodiments, a risdiplam-responsive sequence comprises AGGAAG which is 5’ of the sequence AWGAgtaagw (SEQ ID NO: 2190), wherein W is A or T. In some embodiments, the AGGAAG is preceded by any 5’ sequence and proceeded by any 3’ sequence. In some embodiments, the sequence 5’ sequence preceding the AGGAAG can be 1-5, 5-10, 10- 15, 15-20, or more nucleotides in length. In some embodiments, the sequence 3’ sequence proceeding the AGGAAG can be 1-5, 5-10, 10-15, 15-20, or more nucleotides in length. In some embodiments, the 5’ sequence comprises ATAATTTTTT (SEQ ID NO: 2191), CACTTTTATT (SEQ ID NO: 2192), CATTATAATC (SEQ ID NO: 2193), CCATAAGTTT (SEQ ID NO: 2194), TACTATTTAT (SEQ ID NO: 2195), TCATATCT AT (SEQ ID NO: 2196), or TTAGTATCGT (SEQ ID NO: 2197). In some embodiments, the 3’ sequence comprises GTTACGCTTT (SEQ ID NO: 2198), TTGTGTTGTT (SEQ ID NO: 2199), TTAGTGTGTT (SEQ ID NO: 2200), TGATGTATAT (SEQ ID NO: 2201), TTTATCTATC (SEQ ID NO: 2202), TTTTTTACAG (SEQ ID NO: 2203), or CTATTAGTTA (SEQ ID NO: 2204).
In some embodiments, a risdiplam-responsive sequence comprises the general structure: NNNNNNNNNNAGGAAGNNNNNNNNNNAWGAgtaagw (SEQ ID NO: 2183), wherein N is any nucleotide and W is A or T. In some embodiments, a risdiplam-responsive sequence comprises the general structure: NNNNNNNNNNAGGAAGNNNNNNh\T\n\LAWGAgtaagw? (SEQ ID NO: 2205), wherein N is any nucleotide and W is A or T. In some embodiments, a risdiplam-responsive sequence comprises the general structure YWWKWWWMKYAGGAAGYTAKT(R)WGTTAWGAgtaagw (SEQ ID NO: 2206), wherein Y is C or T, K is G or T, VV is A or T, M is A or C, R is A or G, and (R) is optionally present.
In some embodiments, for example, a risdiplam-responsive sequence comprises CATTATAATCAGGAAGTTAGTGTGTTAAGAgtaagt (SEQ ID NO: 2207). In some embodiments, a risdiplam-responsive sequence comprises TTAGTATCGTAGGAAGCTATTAGTTAATGgtaagt (SEQ ID NO: 2208). In some embodiments, a risdiplam-responsive sequence comprises ATRTCCACTYAAAAAAATCTGGCGATGGGAGCAGAAWGAgtaagw (SEQ ID NO: 2186), wherein R is A or G, Y is C or T, and W is A or T. In some embodiments, for example, a risdiplam-responsive sequence comprises, ATGTCCACTTAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2209), ATGTCCACTCAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2210), or ATATCCACTTAAAAAAATCTGGCGATGGGAGCAGAAAGAgtaagt (SEQ ID NO: 2211).
In some embodiments, a risdiplam-responsive sequence comprises a sequence in Variant 3 or Variant 7 (see, e.g., Example 10). In some embodiments, a risdiplam-responsive sequence comprises a sequence in a variant of exon 1 lb (El IB ) of a POMT2 gene (see, e.g., Example 10). In some embodiments, a risdiplam-responsive sequence comprises an A:C mutation at the +10 position in the intron downstream of POMT2 El IB. In some embodiments, a risdiplam- responsive sequence comprises a sequence in YZ312, YZ316, YZ317, or a variant thereof (see, e.g., Example 10).
Branaplam-Re spoi tsive Sequences
In some embodiments, a ligand-responsive sequence is a branapl am -responsive sequence. In some embodiments, a branaplam-responsive sequence binds to a ligand to promote alternative exon inclusion. In some embodiments, a branaplam-responsive sequence binds to a ligand to promote alternative exon exclusion. In some embodiments, branaplam enhances exon inclusion via recognition of sequences near the 5’ splice site of an alternative exon. In some embodiments, branaplam regulates interaction (e.g., directly or indirectly) between a 5’ splice site and a splicesome component (e.g., the U1 snRNP).
In some embodiments, a branaplam-responsive sequence comprises a sequence in YZ231 or YZ232 (see, e.g., Example 10). In some embodiments, a branaplam-responsive sequence comprises a sequence in YZ301 (see, e.g., Example 10). In some embodiments, for example, when the RNA comprises one intron flanked by at least two exons, the presence of branaplam results in intron removal. In other embodiments, for example, when the RNA comprises two introns flanking one or more alternative exons, the presence of branaplam results in inclusion of the one or more alternative exons. In some embodiments, a branaplam-responsive sequence comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
Tetracycline-Responsive Sequences
In some embodiments, a ligand-responsive sequence is a tetracycline-responsive sequence.
In some embodiments, a tetracycline-responsive sequence comprises TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or I'.
In some embodiments, a tetracycline-responsive sequence comprises TAAAACATACCAYMCGKAAMCGK.MTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2189), wherein Y is C or T, M is A or C, and K is G or T.
In some embodiments, a tetracycline-responsive sequence comprises TAAAACATACCTACCGTAACCGGTAGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2212), TAAAACATACCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2095), or TAAAACATACCAGACGGAAACGTCTGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2086). In some embodiments, a tetracycline-responsive sequence is an aptamer. In some embodiments, a tetracycline-responsive aptamer is found in a sequence comprising the general structure of: [upstream 3’ splice site]-[first stem region]-[5’ splice site reverse complementary sequence]-[Tetracycline-Binding Sequence]-[5’ splice site]gt[second stem region].
In some embodiments, the upstream 3’ splice site and downstream 3’ splice site are at least 20 nucleotides long, wherein the last two nucleotides are AG. In some embodiments, the upstream 3‘ splice site comprises nnnnnnnnnnnnnnnnnnag wherein n is any nucleotide. In some embodiments, the first stem region and the second stem region are at least two nucleotides long. In some embodiments, the first stem region comprises the sequence NN and the second stem region comprises the sequence nn, wherein N/'n is any nucleotide. In some embodiments, the 5' splice site reverse complementary sequence and the 5' splice site are at least 7 nucleotides long. In some embodiments, the 5' splice site reverse complementary sequence comprises NNNNNNN and the 5' splice site comprises NNNnnnn, wherein N/n is any nucleotide.
In some embodiments, a tetracycline-responsive aptamer is found in a sequence comprising the general structure of: [EXON] -[INTRON] -[up stream 3' splice site]-[first stem region]-[5' splice site reverse complementary sequence]-[Tetracycline-Binding Sequence]-[5' splice site]gt[second stem region]-[INTRON]-[downstream 3' splice site]-[EXON],
In some embodiments, the upstream 3' splice site may comprise the sequence TCCTCATIGCCTCTCCTT (SEQ ID XO 2213), TTTCCAACTTATTTCCCT (SEQ ID NO: 2214), CTTACTTTGTATTCCCAT (SEQ ID NO: 2215), AATCTTTATCTCTATTTC (SEQ ID NO: 2216), TGCXICTATCTTACCTTAT (SEQ ID NO: 2217), TGCACTTTCATTCATTTT (SEQ ID NO: 2218), CCACCTTTTTTTATTTTC (SEQ ID NO: 2219), or CCCCCATTTGTCT TCCC X (SEQ ID NO: 2220).
In some embodiments, the upstream 5' splice site reverse complementary sequence may comprise the reverse complement of CAGGTAA, AACGTAA, CAGGTAC, CCGGTAC, ATCGTAA, GCGGTAC, GAGGTAC, ACGGTAG, CAAGTAA, GAGGTGA, CGCGI AA, GTCGTAA, GAGGTAT, AAGGTAT, TTCGTAA, CCGGTGC, GAGGTAG, CTCGTAA, CTGGTAC, AACGTGA, GCGGTAT, CCGGTAG, or C ACC d TGA.
In some embodiments, the variable sequence for stem region NN and the variable sequence for stem region nn may comprise CA and ac, CC and ac, AC and ac, AC and cc, or AC and ct. In some embodiments, the downstream variable region for 3' splice site nnnnnnnnnnnnnnnnnnnn may comprise the sequence tttctttttctctttttcag (SEQ ID NO: 2237), tttcttattctccctttcag (SEQ ID NO: 2238), or tttcttcttctacctttcag (SEQ ID NO: 2239).
In some embodiments, a tetracycline-responsive aptamer comprises a sequence in YZ150 or a variant thereof (see, e.g., Example 10).
In some embodiments, a tetracycline-responsive aptamer comprises the sequence of SEQ ID NOs: 2086, 2095, 2112 or 2188.
(vii) RNAs Encoded by Polynucleotides
Overview of RNAs Encoded by Polynucleotides
In some embodiments, polynucleotides of the present disclosure comprise a sequence encoding an RNA (e.g., an RNA comprising the sequence of an RNA of interest). As used herein, “RNA of interest” refers to a functional RNA (e.g., an mRNA that can encode a full- length protein, such as a therapeutic protein, or an interfering RNA that can bind to a target transcript).
In some embodiments, the RNA of interest is functional when present in what is referred to herein as a “first RNA”. In other embodiments, the RNA of interest is functional when present in what is referred to herein as a “second RNA”. In some embodiments, the RNA of interest is functional in either form corresponding to the “first RNA” and the “second RNA”, wherein the first RNA and second RNA encode different isoforms of the RNA of interest. In some embodiments, an RNA of interest corresponds to any gene or protein sequence described herein (see, e.g., Examples 1-10).
In some embodiments, a sequence encoding an RNA of interest comprises at least 1-5000 nucleotides in length. In some embodiments, a sequence encoding an RNA of interest, is approximately 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1-3, 1-4, 1-5, 1 -6, 1-7, 1 -8, 1-9, 1 10, 10-20, 20-30, 30- 40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1,000, 1,000-1,100, 1,100-1 ,200, 1,200-1,300, 1,300-1,400, 1,400-1,500, 1,500-1,600, 1,600-1,700, 1,700-1,800, 1,800-1,900, 1,900-2,000, 2,000-2,100, 2,100-2,200, 2,200-2,300, 2,300-2,400, 2,400-2,500, 2,500-2,600, 2,600-2,700, 2,700-2,800,
2,800-2,900, 2,900-3,000, 3,000-3, 100, 3,100-3,200, 3,200-3,300, 3,300-3,400, 3,400-3,500, 3.500-3,600, 3,600-3,700, 3,700-3,800, 3,900-4,000, 4,000-4,100, 4,100-4,200, 4,200-4,300, 4,300-4,400, 4,400-4,500, 4,500-4,600, 4,600-4,700, 4,700-4,800, 4,800-4,900, or 4,900-5,000 nucleotides long. In some embodiments, a sequence encoding an RNA of interest is approximately 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 100-200, 200-300, 300-400, 400-500, 500- 600, 600-700, 700-800, 800-900, 900-1000, 1000-1 100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, or 2900- 3000 nucleotides long.
In some embodiments, a polynucleotide comprises two or more sequences encoding distinct portions of an RNA of interest. In some instances, said sequences may be referred to as a “first sequence”, a “second sequence”, or a “third sequence”. In some embodiments, a portion of an RNA of interest comprises at least 1-5000 nucleotides in length. In some embodiments, a portion of an RINA of interest comprises approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1-3, 1-4, 1-5, 1-6, 1-7, 1-8, 1-9, 110, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1,000, 1,000-1,100, 1,100-1,200, 1,200-1,300, 1 ,300-1,400, 1,400-1,500, 1,500-1,600, 1,600-1,700, 1,700-1,800, 1,800-1,900, 1,900-2,000, 2,000-2,100, 2,100-2,200, 2,200-2,300, 2,300-2,400, 2,400-2,500,
2.500-2,600, 2,600-2,700, 2,700-2,800, 2,800-2,900, 2,900-3,000, 3,000-3,100, 3, 100-3,200, 3,200-3,300, 3,300-3,400, 3,400-3,500, 3,500-3,600, 3,600-3,700, 3,700-3,800, 3,900-4,000, 4,000-4,100, 4,100-4,200, 4,200-4,300, 4,300-4,400, 4,400-4,500, 4,500-4,600, 4,600-4,700, 4,700-4,800, 4,800-4,900, or 4,900-5,000 nucleotides long. In some embodiments, a portion of an RNA of interest comprises approximately 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, 1000-1100, 1100- 1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900- 2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 2400-2500, 2500-2600, 2600-2700, 2700- 2800, 2800-2900, or 2900-3000 nucleotides long. In some embodiments, when the sequence of the RNA of interest in the polynucleotide (e.g., in an RNA that has not undergone splicing, such as a pre-mRNA) is discontinuous (e.g., interrupted by a sequence comprising a ligand-responsive sequence), the RNA of interest may be split into two or more portions each comprising 1-10, hundreds, or thousands of nucleotides in length. In some embodiments, for example, the polynucleotide may encode an RNA of interest that is 4000 nucleotides in length, wherein the first sequence comprises 1000 nucleotides of the RNA of interest (e.g., the 5’-most 1000 nucleotides) and the third sequence may comprise 3000 nucleotides of the RNA of interest (e.g., the 3 ’-most 3000 nucleotides).
In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises at least one exon. In some embodiments, the second sequence comprises at least one alternative exon. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises at least one intron. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises at least one splice site. In some embodiments, the first sequence, the second sequence, and/or the third sequence comprises a ligand-responsive sequence. In some embodiments, the second sequence, and/or the third sequence comprise distinct portions of a ligand-responsive sequence.
In some embodiments, the first RNA comprises the first sequence, the second sequence, and the third sequence. In some embodiments, the second RNA comprises the first sequence and the third sequence. In some embodiments, the second RNA lacks the second sequence (e.g., as a result of alternative splicing).
Non-Limiting Embodiments ofRNAs Encoded by Polynucleotides
In some embodiments, an RNA of interest encodes a marker. Non-limiting examples of markers include cell surface proteins (e.g., an antibody or antigen-binding fragment thereof, receptors, membrane proteins which become glycosylated upon expression in a cell, etc.), luciferase or variants thereof, alkaline phosphatase or variants thereof, beta-galactosidase or variants thereof, and fluorescent markers (e.g., mNeonGreen, GFP (e.g., SEQ ID NO: 28), EGFP, Superfold GFP, Azami Green, m Wasabi, TagGFP, TurboGFP, acGFP, zsGreen, T- sapphire, EBFP, EBFP2, Azurite, TagBFP, ECFP, mECFP, Cerulean, mTurquoise, CyPet, AmCyanl , TagCFP, mTFPl, EYFP, mCitrine, TagYFP, phiYFP, zsYellowl, mBanana, Kusabira Orange, mOrange, dTomato, DsRed, mTangerine, mRuby, m Apple, mStrawberry, AsRed2, mRFPl, mCherry, HcRedl, iRFP720, smURFP, and AQ143).
In some embodiments, an RNA of interest comprises corresponds to a gene selected from the group consisting of: MBNL1; MBNL2; MBNL3; hnRNP Al; hnRNP A2B1; hnRNP C; hnRNP D, hnRNP DL; hnRNP F; hnRNP H; hnRNP K , hnRNP L, hnRNP M; hnRNP R; hnRNP U; FUS; TDP43; PABPN1; ATXN2; TAF15; EWSR1; MATR3; TIA1; FMRP; MTM1; MTMR2; LAMP2; KIF5A; a microdystrophin-encoding gene; C9ORF72; HTT; DNM2; BINI , RYR1; NEB; ACTA; TPM3; TPM2; TNNT2; CFL2; KBTBD13; KLHL40; KLHL41; LM0D3; MYPN; SEPN1 ; TTN; SPEG; MYH7; TK2; POLG1; GAA; AGE; PYGM; SLC22A5; OCTN2; ETF; ETFH; PNPLA2; a cytochrome b oxidase-encoding gene; a cytochrome c oxidase- encoding gene; CLCN1 ; SCN4A; DMPK, CNBP; MYOT; LMNA, CAV3; DNAJB6; DES;
TNPO3; HNRPDL; CAPN3; DYSF; art alpha-sarcoglycan-encoding gene; a beta-sarcoglycan- encoding gene; a gamma-sarcoglycan-encoding gene; a delta-sarcoglycan-encoding gene; TCAP; TRIM32; FKRP; FXN; POMT1; FK TN; POMT2; POMGnTl; DAG1; ANO5; PLEC1;
TRAPPCI 1; GMPPB; ISPD; LIMS2; POPDC1; TOR1AIP1; POGLUT2; LAMA2; COL6A1; POMT1; POMT2; DUX4; EMD; PA.X7; PMP22; MPZ; MFN2; SMCHD1 ; SMS. Lamin A'C (LAMN); GJB1; ABCC1; AK125149; ASCC2; BAT2D1; BBX; BRD8; BRE; C17orf70;
CAMKK2; CBFB; CCAR1; CCDC7CD6; CHTF8; COL4A3BP; COL6A3; CUGBP1; CUGBP2; CXorf45; DENND3; DGUOK; DKFZp762G094; DNAJC7; DNASE 1; EIF4A2; EIF4G2; EIF4H; EXOC7, EZH2; FA.M120A; FAM 136 A, FAM36A; FA.RSB; FBXO38; FGFR1OP2; FIP1L1; FOXRED1; FUBP3; GALT; GATA3; GOLGA2; HIF1A; HMMR; HRB; IKZF1; ILF3; IRAK4; IRF1; KCTD13; LEF1 ; LUC7L, LYRM1; MALT1 e7; MAP2K7;
MAP3K7; MAP4K2; MBNL2; MFF; NAEI , NCSTN; NR4.A3; NRFI , NIJP98; PARP6; PCM1; PLAUR; PLSCR3; PPIL5; PPP5C; PTPRC-E4; PTPRC-E6; PTS; RABL5; RAPH1; SEC16A; SFRS3; SFRS7; SI. MAP. SNRNP70; STAT6; TBC1D1; TIMM8B; TIR8; TRA2.A; TROVE2; UGCGL1; VAP-B; VAV1; ZNF384; ZNF496; CAMK2B; PKP2; LGMN; NRAP; VPS39;
KSR1; PDLIM3; BINI; ARFGAP2; KIF13 A, and PICALM.
In some embodiments, an RNA of interest corresponds to a gene encoding a component of a “CRISPR/Cas system” which may be alternatively referred to as a “CRISPR/Cas” molecule. In some embodiments, a CRISPR/Cas molecule comprises a Cas nuclease (e.g., Cas9 or a variant thereof, Cas 12a or a variant thereof, Cas fusion protein comprising CasPhi, CasMini, etc.). In some embodiments, the CRISPR'Cas molecule binds to a guide RNA (gRNA) described herein. In some embodiments, the CRISPR/Cas molecule binds to a gRNA encoded by a polynucleotide regulated by an alternatively spliced sequence described herein. In some embodiments, the CRISPR/Cas molecule binds to a gRNA encoded by a separate polynucleotide that does not comprise an alternatively spliced sequence described herein. In some embodiments, the RNA of interest corresponds to a gRNA. In some embodiments, the gRNA binds to a CRISPR/Cas molecule described herein. In some embodiments, the gRNA binds to a CRISPR/Cas molecule encoded by a polynucleotide regulated by an alternatively spliced sequence described herein. In some embodiments, the gRNA binds to a CRISPR/Cas molecule encoded by a separate polynucleotide that does not comprise an alternatively spliced sequence described herein.
In some embodiments, a CRISPR/Cas molecule is of, or derived from, Streptococcus Staphylococcus aureus (e.g., A aureus Cas9). In some embodiments, a CRISPR/Cas molecule comprises a Cas nuclease variant that encoded by a shorted variant sequence (e.g., CasMini). In some embodiments, such a Cas nuclease may be selected in order to fit within the packaging capacity of an rAAV genome.
In some embodiments, a CRISPR/Cas molecule may be selected to promote genomic editing with a suitable gRNA. In some embodiments, the gRNA may bind to a target domain in the genome of a host cell (e.g., when present in a ribonucleoprotein complex with a CRISPR/Cas nuclease). In some embodiments, the gRN A may comprise a targeting domain that may be partially or completely complementary to the target domain. In some embodiments, the gRNA comprise a targeting domain that may be partially or completely complementary to the target domain located in a genomic sequence (e.g., a gene) implicated in a disease or disorder (e.g., a mutated gene). In some embodiments, a gRNA can be unimolecular (having a single RNA molecule), sometimes referred to herein as sgRNAs (comprising more than one, and typically two, separate RNA molecules, such a single RNA molecule including both crRNA and tracrRNA sequences covalently bound to each other). In some embodiments, the targeting domain is 15 to 25 nucleotides in length. In some embodiments, the gRNA is chemically modified.
In some embodiments, the RNA of interest corresponds to an erythropoietin (EPO) gene.
In some embodiments, the RNA of interest corresponds to a GARBRG2 gene. In some embodiments, the RNA of interest corresponds to a long protein isoform of GARBRG2. In some embodiments, the RNA of interest corresponds to a short protein isoform of a GARBRG2. In some embodiments, the long protein isoform of GABRG2 comprises a sequence corresponding to exon 9 of the GABRG2 gene. In some embodiments, the short protein isoform of GABRG2 does not comprise a sequence corresponding to exon 9 of the GABRG2 gene. In some embodiments, the RNA of interest corresponds to the CSNK1 D gene. In some embodiments, the RNA of interest corresponds to a long protein isoform of CSNK1D. In some embodiments, the RNA of interest corresponds to a short protein isoform of CSNK1D. In some embodiments, the long protein isofomi of CSNK1D comprises a sequence corresponding to exon 9 of the CSNK1D gene. In some embodiments, the short protein isofonn does not comprise a sequence corresponding to exon 9 of the CSNK1D gene.
Non-Limiting Embodiments of Therapeutic RNAs and Therapeutic Proteins
In some embodiments, an RNA of interest is a therapeutic RNA and/or encodes a therapeutic protein. ,As used herein, a “therapeutic RNA” or “therapeutic protein” leads to a physiological change that is associated with or expected to at least partially, if not fully, remedy at least one symptom associated with a disease, disorder, or condition. A therapeutic RNA may refer to an RNA expressed from a transgene that is therapeutic as an RNA upon expression in a target cell and without being translated into a protein. A therapeutic protein may refer to any proteinaceous molecule that is translated from an RNA expressed from a transgene which is therapeutic upon translation in a target cell. A therapeutic RNA or protein may be therapeutic for any disease, disease, or condition described herein upon administration to a subject in need thereof.
In some embodiments, therapeutic RNAs can be, but are not limited to, interfering RNAs (e.g., shRNAs, siRNAs, miRNAs, ncRNAs, piRNAs, pro-siRNAs, etc.), exon-skipping RNAs, enzymatic RNAs, guide RNAs or gRNAs (e.g., sgRNAs) of a CRISPR/Cas editing system (e.g., Cas9-based genome editing and derivatives thereof, such as base editing and prime editing), small nuclear RNAs (snRNAs), ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), and niRNAs.
As used herein, “miRNA” refers to a nucleic acid which comprises several structural and functional characteristics. miRNAs are single-stranded RNAs of about 19-25 nucleotides that regulate the expression, stability, and/or translation of other mRNAs comprising complementary sequences. miRNAs are cleaved from a longer endogenous double-stranded hairpin precursors by the enzymes Drosha and dicer. miRNAs match genomic regions that can potentially encode precursor RNAs in the form of double-stranded hairpins. miRNAs and their predicted precursor secondary structures are phylogenetically conserved. Drosha, dicer, and Argonaute are crucial regulators of miRNA biosynthesis, maturation, and function. Canonical miRNA biogenesis involves Drosha cleavage on hairpin shaped primary miRN A to generate hairpin precursor with 2 or 3 nucleotide overhangs in the 3' end, and then Dicer cleavage on precursor miRNA to generate miRNA duplex. Additionally, the stem-loop structure of pre-miRNA is crucial for miRNA processing wherein disruption of such structures inhibits the Drosha cleavage reaction and, thus, the production of functional miRNAs. Cofactors bind to the pre-miRNA to form a pre-micro ribonucleoprotein (pre-miRNP) and unwind the pre- miRNAs into single-stranded miRNAs. The pre-miRNP is then transformed to miRNP. miRNAs play crucial roles in eukaryotic gene regulation. For instance, miRNAs are thought to interact with target mRNAs through complementary base-pairing which leads to suppressed translation. Separately, miRNAs promote RNA degradation.
Due to their small size of 19-25 nucleotides, the use of quantitative real-time PCR for monitoring expression of mature miRNAs is excluded. Therefore, most miRNA researchers currently use Northern blot analysis combined with polyacrylamide gels to examine expression of both the mature and pre-miRNAs. Primer extension has also been used to detect the mature miRNA. Alternatively, miRNA expression may be assessed by measuring the levels of the target mRNA and/or its protein product.
Examples of miRNAs include, but are not limited to miRNA-16 2 gene. In some embodiments, the transgene comprises the scaffold of primary miRNA 16-2 and an miRNA seed sequence of HSUR4 miRNA. In some embodiments, the transgene comprising the pri-miRNA 16-2 scaffold can further comprise a miRNA seed sequence of any miRNA of interest. In some embodiments, the miRNA comprises a sequence of YZ150, YZ232, or YZ301.
In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one miRNAs comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64. In some embodiments, for example, a miRNA having at least 70% sequence identity relative to SEQ ID NO: 60, 61. Or 64 may comprise targeting ability (e.g., reverse complementarity) to a different RNA target but is regulated by a ligand-responsive sequence described herein. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one miRNA comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one exon comprising a niiRNA sequence comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one exon comprising a miRNA sequence comprising a polynucleotide comprising a nucleic acid sequence as set forth in SEQ ID NO: 60, 61, or 64.
In some embodiments, therapeutic proteins can be, but are not limited to, enzymes (such as proteases, signaling proteins, transcriptional regulators (e.g., MECP2), Cas9, base editors, prime editors, etc.), enzymatic domains, enzyme substrates, secreted proteins (e.g., progranulin), hormones (e.g., erythropoietin, insulin or a variant thereof, such as a furin-cleavale pro-insulin, etc.), receptors (e.g., chimeric antigen receptors), components of gene editing ribonucleoprotein complexes (e.g., CRISPR/Cas molecules, such as Cas9, base editors, such as adenine base editors and cytidine base editors, prime editors, etc.), a zinc finger nuclease, a TALEN, peptibodies, growth factors, RNA-binding proteins, clotting factors, cytokines, chemokines, activating or inhibitor}/ peptides acting on cell surface receptors or ion channels, cell-permeable peptides targeting intracellular processes, thrombolytics, bone morphogenetic proteins, Fc-fusion proteins, anticoagulants, and antibodies or antigen-binding fragments thereof In some embodiments, a therapeutic RNA or protein is selected for the purposes of gene replacement therapy. In some embodiments, the therapeutic protein is selected for the purposes of vaccine production against a human pathogen (e.g., a protein, or fragment thereof, comprising an antigen of a human pathogen).
(viii) Introns
As used herein, an “intron” or “intronic sequence” or “intronic regions” can refer to a nucleotide sequence that does not code for a therapeutic protein or therapeutic RNA and is spliced out of the transgene transcript. In some instances, an “intron” or “intronic sequence” or “intronic regions” can refer to alternatively spliced sequence (e.g., an intron found in a polynucleotide comprising a risdiplam-responsive sequence). In some embodiments, such splicing may be regulated by the presence or absence of a ligand. In some embodiments, the terms “intron” and “intronic sequence” may be used interchangeably. In some embodiments, the transgene comprises at least two introns or intronic sequences. An intron, alternatively referred to as a flanking component, may in some embodiments be immediately adjacent to the central component. For example, a central ligand-responsive aptamer may, in some embodiments, be flanked by two introns, wherein such introns are positioned immediately adjacent to the central ligand-responsive aptamer. In other embodiments, for example, a central ligand-responsive sequence (e.g., one comprising an alternative exon) may be flanked by two introns. In some embodiments, the transgene comprises a polynucleotide comprising an exon or exon region at the 5' and 3' ends with a central region comprising at least two introns, an alternative exon and a ligand-responsive aptamer. In some embodiments, introns of the transgene are spliced out of the transgene along with the ligand-responsive aptamer in the presence of the ligand thereby forming a transgene lacking both the at least one intron and the aptamer. In some embodiments, in the absence of ligand, only the introns are spliced out.
In some embodiments, the polynucleotide (e.g., a transgene) comprises at least two introns comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 21 13, 2115, 2117, 2118, 2121, 2130, 2141 , or 2232-2233. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least two introns comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2130, 2141, or 2232-2233.
In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 3' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 5' splice site comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in Tables 7, 25, 26, or 34. In some embodiments, the polynucleotide (e.g., a transgene) comprises at least one intron that comprises a sequence that is part of a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of Tables 7, 25, 26, or 34.
(viii) Additional Terms
As used herein, an “engineered intron” is an intron which comprises at least one modification, relative to a native intron. For example, an engineered intron may comprise one or more nucleotide deletions, and thus be truncated, relative to a native intron.
As used herein, an “engineered exon” is an exon which comprises at least one modification, relative to a native exon. For example, an engineered exon may comprise one or more nucleotide deletions, and thus be truncated, relative to a native exon.
As used herein, a “flanking” component (e.g., a flanking intron) refers to a component which is located upstream (e.g., 5’) or downstream (e.g., 3’) of a central component (e.g., an exon). A flanking component may in some embodiments be immediately adjacent to the central component, but that is not required by the methods and compositions of the present disclosure. For example, a central alternatively-spliced exon may, in some embodiments, be flanked by two introns, wherein such introns are immediately adjacent to the central alternatively-spliced exon. The same central alternatively-spliced exon may also be flanked by two additional exons, which are located upstream and downstream of the central alternatively-spliced exon, respectively, but which are not immediately adjacent to the central alternatively-spliced exon.
As used herein, a “constitutive exon” is an exon that is present in all spliced transcripts (e.g., mRNA isoforms) formed as a result of splicing a pre-mRNA or miRNA transcripts that are transcribed from a gene. A constitutive exon is therefore common to different mRNA isoforms of a gene.
Additional terms are defined throughout the disclosure.
Ik Alternative splicing and models used herein
Through alternative splicing of pre-mRNAs, individual mammalian genes often produce multiple mRNAs (i.e., mRNA isoforms) and resultant protein isoforms that may have related, distinct or even opposing functions. The mRNA and protein isoforms produced by alternative splicing (or equivalently, alternative processing) of primary RNA transcripts may differ in structure, function, localization or other properties. Alternative splicing in particular is known to affect more than half of all human genes, and has been proposed as a primary driver of the evolution of phenotypic complexity in mammals. The number of variants of a gene ranges from two to potentially thousands. The resulting proteins may exhibit different and sometimes antagonistic functional and structural properties, and may inhabit the same cell with the resulting phenotype representing a balance between their expression levels. Defects in splicing have been implicated in human diseases, including cancer.
Aspects of the invention utilize alternative splicing mechanisms as a method of regulating the expression of a transgene (e.g., encoding a therapeutic protein or miRNA). Thus, by manipulating the composition and arrangement of an inducibly-spliced exon cassette, a recombinant viral genome of the present disclosure comprising the inducibly-spliced exon cassette may behave in a predictable manner, and the transgene and/or coding region of interest may be expressed in specific conditions which are therapeutically beneficial (e.g., in a specific cell type, a specific tissue, a disease state, and/or upon an inflammatory response). Thus, aspects of the invention contemplate inducibly-spliced exon cassettes for regulating the expression of coding regions of interest (e.g., encoding therapeutic nucleic acids such as miRNAs and/or therapeutic proteins).
Aspects of the invention utilize alternative splicing mechanisms as a method of regulating the expression of a transgene (e.g.., encoding a therapeutic protein). However, unlike naturally occurring alternatively-spliced exons, the alternatively-spliced exons of the application do not necessarily result in alternative sequence isoforms of the encoded protein. In many embodiments, an alternatively-spliced exon impacts the level of protein expression without impacting the sequence of the protein that is expressed. That is, the alternatively-spliced exon is utilized as a means of regul ation of the expression of the protein of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in the productive translation of a coding region of interest. In some embodiments, exclusion of the alternatively-spliced exon from the spliced transcript results in the coding region of interest not being translated (e.g., the alternatively-spliced exon is spliced out). In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense mediated decay. In some embodiments, exclusion of the alternatively-spliced exon from the spliced transcript results in the productive translation of the coding region of interest.
Thus, by manipulating the composition and arrangement of an alternatively-spliced exon cassette, a recombinant viral genome of the present disclosure comprising the alternatively- spliced exon cassette may behave in a predictable manner, and the transgene and/or coding region of interest may be expressed in specific conditions which are therapeutically beneficial (e.g., in a specific cell type, a specific tissue, a disease state, and/or upon an inflammatoiy response). Transgenes comprising alternatively-spliced exon cassettes may be designed according to any one of several non-limiting models of alternative splicing (shown in FIGs. 2 or 4-8), each of which is specifically contemplated herein, in addition to other models of alternative splicing. Thus, aspects of the invention contemplate alternatively-spliced exon cassettes for regulating the expression of coding regions of interest (e.g., encoding therapeutic proteins).
In various aspects, the alternatively-spliced exons are spliced-in or spliced-out in a manner that, is dependent upon one or more environmental cues, e.g., cell or tissue type, disease state, or intracellular conditions such as the presence of a ligand. The alternatively-spliced exons can be sourced from a naturally occurring gene or may be recombinant, for example, in order to add one or more genetic regulatory/ elements for influencing expression levels of the transgene and/or coding region of the transgene. Examples of alternatively-spliced exons are disclosed herein.
In various embodiments, the alternatively-spliced exons may comprise one or more regulatory' sequences that modulate the expression of a coding sequence of interest. Such regulatory sequences may be referred to a cis-elements. Further, m-elements that impart a positive regulatory control on a coding sequence of interest may be referred to as a positive regulatory czs-element. To the contrary, czs-elements that impart a negative regulatory control on a coding sequence of interest may be referred to as a negative regulatory cis-element.
Alternatively-spliced exons may be found in nature in a naturally-occurring genes, or may be modified by changing or altering the sequence thereof (e.g., derived from a naturally- occurring gene), including adding or changing the splice site, and/or adding or changing a positive or negative regulatory' cis-element. The one or more positive or negative regulatory cis- elements may be located within an alternatively-spliced exon, and may influence the level of expression of a coding region of interest through positive and/or negative controls, and may include any regulatory' sequence which exerts as a consequence being spliced-in or spliced-out of the final niRNA — either a positive or negative regulation on the expression of the coding region.
FIG. 4 shows seven non-limiting embodiments contemplated for the structural configuration of a cassette (e.g., comprised within a transgene) for use with a recombinant virus genome, wherein the cassette (e.g, comprised within a transgene) comprises an alternatively- spliced exon and a coding region, wherein the alternatively-spliced exon further comprises at least one positive or negative regulatory czs-element. Non-limiting examples of positive or negative regulatory’ czx-elements can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise affects the stability and/or degradation of a mRNA, and (2) a nucleotide sequence element that regulates, modulates, or otherwise affects the translation of a mRNA into one or more encoded polypeptide products (e.g., a therapeutic product). Without limitation, positive or negative regulatory' czs-elements may include, but are not limited to, a translation start, codon, a translation stop codon, a ligand-responsive aptamer, a binding site for an RNA binding protein that serves to positively regulate transgene expression, a binding site for an RNA binding protein that serves to negatively regulate transgene expression, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate transgene expression, a binding site for an RNA binding protein that serves to negatively regulate transgene expression, a binding site for a nucleic acid molecule (e.g., an miRNA) that serves to positively regulate transgene expression, or a binding site for a nucleic acid molecule (e.g., an siRNA or miRNA). This list of examples is not intended to place any limitation on the scope or meaning of the positive and negative regulatory czx-elements and the disclosure embraces any genetic element or region positioned within or at least associated with an alternatively-spliced exon which exerts a positive or negative control on the overall expression of a transgene (e.g., encoding a therapeutic protein).
In some embodiments, the one or more czx-elements can include, but are not limited to, a translation start codon, a translation stop codon, a ligand-responsive aptamer, an siRNA binding site, a miRNA binding site, a sequence forming a stem-loop structure, a sequence forming an RNA dimerization motif, a sequence forming a hairpin structure, a sequence forming an RNA quadruplex, polypurine tract, a sequence forming a pair of kissing loops, and a sequence forming a tetral oop/tetraloop receptor pair. In some embodiments, cA-elements include binding sites recognized by regulatory elements, such as, for example, RNA binding proteins. In some embodiments, an RNA binding protein capable of exerting regulatory' control once bound is an RNA binding protein described in Van Nostrand, et al. (2020), A large-scale binding and functional map of human RNA-binding proteins, Nature, 583: 711-719, which is herein incorporated by reference with respect to its description of RNA binding proteins.
In some embodiments, a transgene comprising an inducibly-spliced exon cassette comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260. In some embodiments, a transgene comprising an inducibly-spliced cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
In various embodiments, the cassettes (e.g., comprised within a transgene) may include one or more additional components, including one or more other constitutive exons, and one or more introns. In FIGs. 4A-4C, the constitutive exons not comprising the coding region of interest are represented by narrow rectangles, introns are represented as dashed lines, and the alternatively-spliced exons are represented as shaded narrow rectangles. In some embodiments, the exon or exons comprising the coding region (or portions thereof, in embodiments wherein the coding region is split into separate exons) are indicated as solid thick white rectangles. In other embodiments, the alternatively-spliced exon may contain portions of a coding region of interest.
FIG. 4A is a schematic of an embodiment wherein the alternatively-spliced exon is upstream of the exon encoding the coding region of interest. Said another way, in this embodiment, the alternatively-spliced exon is to the 5’ of the exon encoding the coding region of interest.
FIG. 4B is a schematic of an embodiment wherein the alternatively-spliced exon is downstream of the exon encoding the coding region of interest. Said another way , in this embodiment, the alternatively-spliced exon is to the 3’ of the exon encoding the coding region of interest.
FIG. 4C is a schematic of an embodiment wherein the alternatively-spliced exon is positioned between two separate exons encoding portions of the coding region of interest.. Said Ill another way, in this embodiment, the alternatively-spliced exon is between the exons encoding the portions of the coding region of interest.
Exemplary Uses of Ligands
In some embodiments, polynucleotides encode an RNA of interest corresponding to any gene described herein (see, e.g., Examples 1-10). In some embodiments, the RNA of interest becomes functional as a result of alternative splicing. In some embodiments, alternative splicing is induced by binding of a ligand (e.g., a small molecule).
In some embodiments, for example, the presence of a ligand results in exclusion of an alternatively spliced sequence (e.g., an intron, exon, aptamer, etc.) from the RNA encoded by the polynucleotide which enables the RNA of interest encoded therein to comprise a continuous, n on-interrupted sequence, adopt a functional three-dimensional structure (e.g., such as that of a microRNA), and/or be translated into a protein (e.g., a therapeutic protein).
In some embodiments, as a further example, the presence of ligand results in inclusion of an alternatively spliced sequence (e.g., one or more exons) in the RNA encoded by the polynucleotide which enables the RNA of interest encoded therein to comprise a sequence encoding an RNA of interest, adopt a functional three dimensional structure (e.g., such as that of a microRNA), and/or be translated into a protein (e.g., a therapeutic protein).
In other embodiments, ligand-depending inclusion or exclusion of an alternatively spliced sequence (e.g., one or more exons) in the RNA encoded by the polynucleotide which enables the RNA of interest encoded therein to be differentially expressed (e.g., to be translated into a long or short protein isoform, respectively).
FIG. 4D shows a non-limiting embodiment, of an approach that puts a gene sequence under control of a ligand-responsive aptamer. In this embodiment, a naturally occurring gene can be engineered to become under the control of a ligand by inserting the cassette into the gene. The portions upstream and downstream of the site at which the cassette is inserted then become separate exons. In some embodiments, the cassette is inserted without making any changes to the sequence flanking the insertion site. However, in some embodiments one or more nucleotide sequence changes are made in one or both flanking regions (e.g., at the positions immediately flanking the site of insertion). In some embodiments, the one or more nucleotide changes render either or both flanking sequences more compatible with splicing. In some embodiments, the one or more nucleotide changes result in either or both flanking sequences becoming effective 3’ and/or 5' splice sites. In some embodiments, the one or more nucleotide changes include introducing one or more sequences that support an effective dynamic range between alternative splicing events of a ligand-induced alternatively spliced exon described in this application. In some embodiments, the one or more nucleotide changes include introducing one or more flanking sequence described in this application. FIG. 4E shows a non-limiting embodiment of a transgene comprising an alternatively-spliced cassette. In this embodiment, the expression cassette comprises a general structure comprising at least one alternative exon, at least two introns flanking the alternative exon, a ligand-response aptamer, and a plurality of splice sites. In this embodiment, one exon is positioned 5’ to the cassette sequence and one exon is positioned 3’ to the cassette sequence thereby flanking the intervening at least two introns, alternative exon, ligand-responsive aptamer, and plurality of splice sites. In this embodiment, at least two exons flanking the cassette are always present in the RNA molecule transcribed from the transgene regardless of the presence of the ligand or splicing reaction outcomes. In this embodiment, the alternative exon comprises the ligand-responsive aptamer wherein the ligand-responsive aptamer regulates the splicing (i.e., removal) of the alternative exon. In this embodiment, when the ligand which binds to the aptamer is absent, the alternative exon is present in the spliced RNA molecule transcribed from the transgene. In this embodiment, the presence of a ligand which binds to the aptamer results in removal of the alternative exon such that the spliced RNA molecule comprises only the at least two exons and lacks the alternative exon, the two introns, and the ligand- responsive aptamer. In this embodiment, when two introns are present, for example, the most 5’ intron is downstream (3’) of the most upstream exon and the 3 ’ most intron is upstream (5’) of the most downstream exon such that the exons exist at the 5’ and 3’ termini of the cassete sequences which include the introns, alternative exon, and the ligand-responsive aptamer. In this embodiment, the boundaries of an exon-intron sequence comprise splice sites that regulate the splicing of the cassette. In this embodiment, the splicing of the introns occurs regardless of the presence of ligand such that the spliced RNA molecule comprising the cassette sequence lacks the at least tw'O introns. However, only in the presence of ligand are the introns spliced out in addition to the alternative exon and the ligand-responsive aptamer. As illustrated in FIG. 4E, the ligand-responsive aptamer may be located in either one of the introns, in the alternative exon, or may span an intron-exon boundary occurring between the alternative exon and one of the introns. However, in some embodiments, a ligand-responsive aptamer may be included in one of the flanking exon sequences provided that it is configured such that binding of the ligand affects the splicing of the alternatively spliced exon and the ligand-responsive aptamer. In the embodiment illustrated in FIG. 4E, the splice sites are provided in multiples of two such that two splice sites (a 5’ site and a 3: site) are always required to regulate the splicing of a sequence. In this embodiment, the 3’ splice site that is 5’ of the alternative exon comprises intronic sequences. In this embodiment, the 5’ splice site that is 3’ of the alternative exon comprises both intronic and exonic sequences such that when the alternative exon is included in the RNA molecule it will comprise a partial sequence that is part of the original 5’ splice site. Non-limiting examples of embodiments illustrating this configuration can be found in Example 7 and SEQ ID NO: 2081 .
FIG. 4F shows a non-limiting embodiment of a transgene comprising a non-continuous start codon split by the alternatively spliced cassette. In this embodiment, the exons comprise a non-continuous start codon such that the 3’ most nucleotides of the upstream exon comprise an A or AT and the 5’ most nucleotides of the downstream exon comprise a TG or G, respectively. In this embodiment, the absence of a ligand results in splicing reactions that includes the alternative exon and thereby produces an RNA molecule that contains a non-continuous start codon that is disrupted by the alternative exon and is not translated into the full-length protein product. In this embodiment, the presence of a ligand results in splicing reactions that removes the alternative exon and thereby produces an RNA molecule that comprises a continuous start codon provided by the nucleotides of the first and last exon resulting in translation of the full- length protein product of the transgene. Non-limiting examples of embodiments illustrating this configuration can be found in Example 7. SEQ ID NO: 2131 represents a non-limiting example of a control construct that can be used to assess the inducibility of alternative splicing of a transgene comprising a non-continuous start codon. In this embodiment, the transgene lacks the aptamer and alternative exon. Here, the intron sequence is spliced out thereby converting the non-continuous start codon into a continuous start codon resulting in translation of the transgene. SEQ ID NO: 2132 represents another non-limiting example of a control construct that can be used to assess the inducibility of alternative splicing of a transgene comprising a non-continuous start codon. Here, the alternative exon comprising an aptamer disrupts the start codon thereby preventing translation of the transgene. FIG. 4G shows a non-limiting embodiment of an alternatively spliced exon cassette comprising a pre-mature stop codon that is inserted between two consecutive coding sequences of a gene (e.g., two exons of a gene). In this embodiment, the exons flanking the cassette are not translated in the absence of ligand due to the presence of a pre-mature stop codon in the alternative exon (e.g., in frame with the reading frame of the upstream exon). In this embodiment, the presence of the stop codon in the alternative exon results in pre-mature termination of translation of the transgene when the alternative exon is not spliced out of the RNA molecule. In this embodiment, the presence of a ligand induces splicing upon binding to the aptamer such that the alternative exon comprising the pre-mature stop codon is removed thereby allowing translation to produce the full-length protein product encoded by the transgene. Non limiting examples of embodiments illustrating this configuration can be found in Example 7 and SEQ ID NOs: 2091, 2099, 2102, 2105, 2108, 2109-2112, 2116, 2118, 2120, 2123, and 2128. In some embodiments, the pre-mature stop codon can be UAA, UAG, or UGA provided that it is in frame with the reading frame of the first exon. In some embodiments, the stop codon may be provided within the aptamer sequence if the aptamer is provided in the alternative exon. In some embodiments, the stop codon may be upstream or downstream of the aptamer and provided in the alternative exon.
FIG. 4H shows a non-limiting embodiment of an alternatively spliced exon cassette that is inserted in a coding sequence for a regulatory RNA molecule. In this embodiment, the at least two exons encode an interfering RNA, such as a miRNA, such that removal of the alternative exon produces a functional miRNA molecule that is capable of regulating gene expression. Nonlimiting examples of embodiments illustrating this configuration can be found in Example 7 and SEQ ID NO: 2138. In some embodiments, the aptamer may be provided in an intron sequence, the alternative exon sequence, or may span the alternative exon and a flanking intron. In some embodiments, the sequences encoding the regulatory RNA may comprise a pri-miRNA scaffold and/or miRNA seed sequence.
FIG. 41 shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence. In this embodiment, an intron splits tw'O exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are brought together to form an RNA that encodes the protein of interest. FIG. 4J shows a non-limiting embodiment of a nucleic acid design to regulate RNA splicing using a ligand-responsive sequence. In this embodiment, an intron splits two exons. Ligand binding to the ligand-responsive sequence results in alternative splicing, wherein the exons are disrupted and the RNA cannot encode the protein of interest.
FIG. 4K shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isofomis. The alternative exon is flanked by introns. Ligand binding results in exclusion of the alternative exon in the spliced RNA thereby encoding the shorter isoform of the protein. The absence of the ligand results in inclusion of the alternative exon from the spliced RNA which encodes the longer isoform of the protein.
FIG. 4L shows a non-limiting embodiment of a ligand-responsive nucleic acid that can be used to differentially regulate the expression of protein isoforms. The alternative is flanked by introns. Ligand binding results in inclusion of the alternative exon in the spliced RNA thereby encoding the longer i soform of the protein. The absence of the li gand results in exclusion of the alternative exon from the spliced RNA which encodes the shorter isoform of the protein.
FIG. 4M shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. The alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to production of the protein corresponding to the RNA. In some embodiments, said nucleic acid is useful in providing an inducible ON switch for regulating synthesis of a protein of interest.
FIG. 4N shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. The alternative exon comprises a ligand-responsive sequence and prevents a start codon from being in frame with the RNA. Inclusion of the alternative exon in the absence of the ligand leads to production of the protein corresponding to the RNA. In some embodiments, said nucleic acid is useful in providing an inducible OFF switch for regulating synthesis of a protein of interest.
FIG. 40 show's a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Inclusion of the alternative exon in the presence of the ligand leads to an RNA which cannot be translated into a protein. In some embodiments, said nucleic acid is useful in providing an inducible ON switch for regulating synthesis of a protein of interest.
FIG. 4P shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates translation of an RNA. Presence of the alternative exon causes a pre-mature stop codon to be in frame with the RNA. Exclusion of the alternative exon in the presence of the ligand leads to an RNA which can be translated into a protein. In some embodiments, said nucleic acid is useful in providing an inducible OFF switch for regulating synthesis of a protein of interest.
In some embodiments (e.g., some embodiments of examples illustrated in FIGs. 4E-4P), the alternatively spliced exon and the introns flanking the alternatively spliced exon may include all of the sequences that are useful for splicing. However, in some embodiments, one or more nucleotide changes are also made in one or both flanking exon (e.g., upstream and/or downstream exon) sequences to further support splicing. In some embodiments, one or more nucleic acids described herein provide a high level of differential splicing between the presence of ligand and the absence of ligand. In some embodiments, the dynamic range (e.g., the level of expression of a gene or protein of interest under the control of an alternatively-spliced exon of the present, disclosure in the presence of ligand relative to the absence of ligand) can be greater than 5 fold, greater than 10 fold, greater than 25 fold, greater than 50 fold, greater than 100 fold, 100-250 fold, 250-500 fold, 500-1,000 fold, or more.
FIGs. 4E-4P illustrate non-limiting embodiments that refer to Exon 1 and Exon 2 or Exon 1, Exon 2, and Exon 3 as examples. However, the same configuration can be used for other exons of a gene, and in some embodiments Exon 1 and Exon 2 and/or Exon 3 in FIGs. 4E-4P could represent other upstream and downstream exons that are not necessarily the first and second exons of a gene.
FIG. 4Q show's a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript. In some embodiments, said nucleic acid is useful in providing an inducible OFF switch for regulating a target transcript.
FIG. 4R shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand results in formation of the complete microRNA which can function to reduce expression of a target transcript. In some embodiments, said nucleic acid is useful in providing an inducible ON switch for regulating a target transcript.
FIG. 4S shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the absence of the ligand disrupts microRNA structure thereby inhibiting its ability to reduce expression of a target transcript. In some embodiments, said nucleic acid is useful in providing an inducible ON switch for regulating a target transcript.
FIG. 4T shows a non-limiting embodiment of a ligand-responsive nucleic acid that regulates production of a microRNA. Inclusion of the second sequence in the presence of the ligand disrupts microRN A structure thereby inhibiting its ability to reduce expression of a target transcript. In some embodiments, said nucleic acid is useful in providing an inducible OFF switch for regulating a target transcript.
A ligand may include a variety of molecules including both synthetic and naturally- occurring chemical species. Ligands may include, but are not necessarily limited to, small molecule drugs (e.g., risdiplam, branaplam, etc.), peptides, nucleic acids or modified nucleic acids (e.g., ASOs, such as exon-skipping ASOs), lipids, carbohydrates, and metabolites present in the cell.
Examples of ligands that bind to aptamers of the present disclosure include tetracycline, theophylline, glycine, adenine, guanine and cyclic GMP (cGMP). Aptamers are single-stranded nucleic acids that bind to ligands based on their specific three dimensional shape and chemical affinity. Aptamers may comprise non-modified or modified nucleotides or combinations thereof. Upon binding to a ligand, aptamers undergo conformational changes that may change their functional properties and, by extension, the functional properties of the molecules they are provided in. Non-limiting examples of aptamers include theophylline-binding aptamer and natural aptamers (riboswitches) that bind to adenine, glycine, and guanine.
In some embodiments, a ligand is tetracycline. In some embodiments, the RNA capable of binding a ligand comprises a tetracycline-responsive sequence. In some embodiments, a tetracycline-responsive sequence comprises an aptamer. In some embodiments, a tetracyclineresponsive sequence comprises a sequence in YZ150 or a variant thereof (see, e.g., Example 10). In some embodiments, a tetracycline-responsive sequence comprises a sequence described herein (e.g., an aptamer comprising the sequence of SEQ ID NOs: 2086, 2095, 2112 or 2188; see. Example 7 and Example 10 for further details).
In some embodiments, a ligand is risdiplam. In some embodiments, risdiplam promotes interaction between a pre-mRNA corresponding to the polynucleotide and U1 spliceosome at 5’ splice site. In some embodiments, risdiplam interacts with risdiplam -responsive sequences in exons to preclude interaction with splicing silencers. In some embodiments, risdiplam interacts with risdiplam-responsive sequences in exons to recruit splicing enhancers. In some embodiments, a risdiplam-responsive sequence, or a portion thereof, is present in a 5’ splice site. In some embodiments, a risdiplam-responsive sequence, or a portion thereof, is present in a exon-intron boundary. In some embodiments, the RNA capable of binding a ligand comprises a risdiplam-responsive sequence. In some embodiments, binding of risdiplam to a risdiplam- responsive sequence will lead to intron exclusion. In some embodiments, for example, when the RNA capable of binding a ligand comprises one intron flanked by exons, the presence of risdiplam results in intron removal. A non-limiting example of such embodiments is diagrammed in FIG. 43. In some embodiments, binding of risdiplam to a risdiplam-responsive sequence will lead to alternative exon inclusion. In some embodiments, for example, when the RNA capable of binding risdiplam comprises two introns flanking one or more alternative exons, the presence of risdiplam results in inclusion of the one or more alternative exons. A non-limiting example of such embodiments is diagrammed in FIG. 47A.
In some embodiments, a ligand is branaplam. In some embodiments, the RNA capable of binding a ligand comprises a branaplam-responsive sequence. In some embodiments, a branaplam-responsive sequence comprises a sequence in YZ231 or YZ232 (see, e.g., Example 10). In some embodiments, a branaplam-responsive sequence comprises a sequence in YZ301 (see, e.g., Example 10). In some embodiments, for example, when the RNA capable of binding a ligand comprises one intron flanked by exons, the presence of branaplam results in intron removal. In some embodiments, for example, when the RNA capable of binding branaplam comprises two introns flanking one or more alternative exons, the presence of branaplam results in inclusion of the one or more alternative exons. Various specific embodiments of these general groups of configurations are further shown in FIGs. 3-8, 28, 33, 38, 39, 41, 43, 46A, 47A, 48A, 49 A, 50A, and 51 A, Tables 7-34, and further described as follows. In some embodiments, a polynucleotide (e.g., a transgene) comprises at least 70%, sequence identity relative to at least one of the nucleic acid sequences as set forth in SEQ ID NOs: 2080-2082, 2084, 2086, 2088-2089, 2091-2097, 2099-2121 , 2123, 2127-2132, 2135, 2137- 2138, 2141-2143, or 2183-2260. In some embodiments, a transgene comprising an alternatively- spliced exon cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 45-55, 2236, or 2247-2256. In some embodiments, a transgene comprising an alternatively- spliced exon cassette comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 45-55, 2236, or 2247-2256.
(i) Skipped exon model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a skipped exon model of alternative splicing (see, e.g., FIGs. 5 A, 6B, and 7A).
Referencing the components as labeled in FIG. 5A, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3 ’ end a heterologous ATG start codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(j), wherein the coding region of interest comprises at its 5’ end a modification comprising the removal of a nati ve ATG start codon (k), and wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (f) are mutated or deleted.
Referencing the components as labeled in FIG, 6B, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation
(b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3' end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation (j), wherein the exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG. 7A, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (e), wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); and a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (j).
In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative m-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative as-acting element.
(ii) Retained intron model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a retained intron model of alternative splicing (see, e.g., FIGs. 5B, 6C, and 7B).
Referencing the components as labeled in FIG, SB, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation (b), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (c); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (d), wherein the coding region of interest comprises at its 5' end a modification comprising the removal of a nati ve ATG start codon (e), and wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (c) are mutated or deleted.
Referencing the components as labeled in FIG, 6C, in some embodiments, the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(a); a nucleotide sequence comprising a first, exonic sequence having a 5’ to 3’ orientation
(b), wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element (c); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (d), wherein the second exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG, 7B, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (b), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (c); a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (d); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (e), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (f) and at its 3’ end a 3’ splice acceptor site (g); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (h), wherein the second exonic sequence comprises a constitutive exon.
In some embodiments, retention of the alternative exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively - spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative c/x-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c/x-acting element.
(Hi) Alternative 5’ splice site model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative 5’ donor site model of alternative splicing (see, e.g., FIGs. 5C, 6D, and 7C).
Referencing the components as labeled in FIG. 5C, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon, a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (b), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3 ’ splice acceptor site (f); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (g), wherein the coding region of interest comprises at its 5’ end a modification comprising the removal of a native ATG start codon (h), and wherein all native ATG start codons located upstream (e.g, 5’) of the heterologous ATG start codon (c) are mutated or deleted.
Referencing the components as labeled in FIG. 61), in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (b), wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative czs-acting element (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3’ splice acceptor site (f); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation (g), wherein the exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG. 7C, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a transgene having a 5’ to 3’ orientation (a); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (b), wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (c); a nucleotide sequence comprising an intronic sequence having a 5’ to 3: orientation (d), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (e) and at its 3’ end a 3' splice acceptor site (f); and a nucleotide sequence comprising a second portion of a transgene having a 5’ to 3’ orientation (g).
In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative m-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative Gx-acting element.
(iv) Alternative 3 ’ splice site model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative 3’ donor site model of alternative splicing (see, e.g., FIGs. 5D, 6E, and 7D).
Referencing the components as labeled in FIG. 5D, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (b), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (f); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (g), wherein the coding region of interest comprises at its 5' end a modification comprising the removal of a native ATG start, codon (h), wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (f) are mutated or deleted.
Referencing the components as labeled in FIG. 6E, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation (b), wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation (e), wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative czs-acting element (f); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
(g), wherein the exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG. 7D, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ spiice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (g); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation
(h), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (i) and at its 3’ end a 3’ splice acceptor site (j); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (k), wherein the second exonic sequence comprises a constitutive exon.
In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expressi on of the coding regi on of interest, wherein expression of the coding region of interest is regulated by a positive or negative cG-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c-A-acting element.
(v) Mutually exclusive exon model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise a mutually exclusive exon model of alternative splicing (see, e.g., FIGs. 5E, 6F, and 7E).
Referencing the components as labeled in FIG. 5E, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3: orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (e), wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation
(j), wherein the third exonic sequence comprises an alternatively-spliced exon; a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
(k), wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation (n), wherein the coding region of interest comprises at its 5' end a modification comprising the removal of a nati ve ATG start, codon (o). wherein all native ATG start codons located upstream (e.g., 5’) of the heterologous ATG start codon (f) are mutated or deleted.
Referencing the components as labeled in FIG, 6F, in some embodiments, the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation
(b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3' end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises a first alternatively-spliced exon comprising a positive or negative cA-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation
(j), wherein the second exonic sequence comprises a second alternatively-spliced exon, a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
(k), wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation (n), wherein the third exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG. 7E, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first, portion of a coding region of interest having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3: orientation
(j), wherein the second exonic sequence comprises an alternatively-spliced exon; a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation
(k), wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its 3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (n).
In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative czs-acting element. In some embodiments, retention of the alternatively -spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative c/.v-acting element.
(vi) Alternative last exon model of alternative splicing
In some embodiments, the nucleic acid vectors of the present invention comprise a transgene comprising an alternatively-spliced exon cassette comprising components which, when alternatively spliced, comprise an alternative last exon model of alternative splicing (see, e.g, FIGs. 6A, 6G, and 7F). Referencing the components as labeled in FIG. 6A, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (a), wherein the first exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a first intronic sequence having a 5’ to 3: orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(e); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation
(f), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (g) and at its 3’ end a 3’ splice acceptor site (h); and a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
(i), wherein the second exonic sequence comprises an alternatively-spliced exon.
Referencing the components as labeled in FIG. 6G, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a coding region of interest having a 5’ to 3’ orientation
(a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation
(b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d); a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative chs-acting element (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation
(g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation
(j), wherein the second exonic sequence comprises a constitutive exon.
Referencing the components as labeled in FIG. 7F, in some embodiments the transgene comprising an alternatively-spliced exon cassette comprises, in the 5’ to 3’ direction: a nucleotide sequence comprising a first portion of a transgene having a 5’ to 3’ orientation (a); a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation (b), wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site (c) and at its 3’ end a 3’ splice acceptor site (d), a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon (f); a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation (g), wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site (h) and at its 3’ end a 3’ splice acceptor site (i); a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation (j), wherein the second exonic sequence comprises a constitutive exon; a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation (k), wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site (1) and at its
3’ end a 3’ splice acceptor site (m); and a nucleotide sequence comprising a second portion of a coding region of interest having a 5’ to 3’ orientation (n).
In some embodiments, retention of the alternatively-spliced exon in the spiiced transcript results in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in expression of the coding region of interest. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript does not result in nonsense-mediated decay. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is regulated by a positive or negative cvs-acting element. In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in expression of the coding region of interest, wherein expression of the coding region of interest is not regulated by a positive or negative czs-acting element.
C. Components of the recombinant vector genomes
In some embodiments, a nucleic acid vector (e.g, a viral vector) of the present invention comprises a transgene comprising at least one alternatively-spliced exon cassette as described herein. Nucleic acid vectors or transgenes may have one alternatively-spliced exon cassette, or multiple such cassettes. In some embodiments, a nucleic acid vector or transgene comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15, or more alternatively-spliced exon cassettes. A transgene comprising an alternatively-spliced exon cassette may, in some embodiments, comprise any one or more of the following components: an alternatively-spliced exon, an intron (e.g., a flanking intron), an exon comprising a coding region of interest, and/or a constitutive exon. In some embodiments, transgene comprising an alternatively-spliced exon cassette comprises an alternatively-spliced exon, a flanking intron, and an exon comprising a coding region of interest (wherein, in some embodiments, the coding region of interest may be split into portions across two or more exons).
(i) Alternatively-spliced exons
In some embodiments, a nucleic acid vector or transgene comprises an alternatively- spliced exon cassette, wherein the alternatively-spliced exon cassette comprises among other components at least one alternatively-spliced exon. In some embodiments, the alternatively- spliced exon cassette comprises 1, 2, 3, or 4 alternatively-spliced exons. In some other embodiments, the alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, I I , 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 alternatively-spliced exons. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, the alternatively-spliced exons are adjacent. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, the alternatively-spliced exons are not adjacent.
In some embodiments, the alternatively-spliced exon is synthetic or recombinant. In some embodiments, the alternatively-spliced exon is considered to be synthetic or recombinant because it undergoes one or more nucleic acid modifications, relative to the wild-type alternatively-spliced exon. A nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the alternatively-spliced exon.
In some embodiments, an alternative exon comprises an ATG start codon at its 3’ end. In some embodiments, the “3’ end” comprises the 1, 2, or 3 nucleic acids lying at the 3’ end of the alternative exon. As will be understood, in some embodiments a wild-type or naturally occurring alternative exon may comprise an ATG start codon at its 3’ end. In such embodiments, the alternative exon may comprise nucleic acid modifications unrelated to the insertion of a heterologous start codon at the 3’ end of the alternative exon. However, it wall be further understood that in some embodiments a wild-type or naturally occurring alternative exon may not comprise an ATG start, codon at its 3’ end. In such embodiments, modifications are made to the 3’ end of the alternative exon to introduce a heterologous start codon, such that when the alternative exon is spliced-in or retained in the spliced transcript, the downstream coding sequence is translated as a full-length protein. As wall be understood, in some embodiments 1, 2, or 3 nucleic acid substitutions may be necessary in order to introduce the heterologous ATG start codon to the 3’ end of the alternative exon, depending on the sequence which is present at the 3’ end of the wild-type or naturally occurring alternative exon. In such embodiments, the 3’ end of the alternatively-spliced exon comprises 1 nucleotide substitution, relative to the wild-type alternatively-spliced exon, to form the ATG start codon. In such embodiments, the 3’ end of the alternatively-spliced exon comprises 2 nucleotide substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon. In such embodiments, the 3’ end of the alternatively-spliced exon comprises 3 nucleotide substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon. In some embodiments, the modification comprises the insertion of a heterologous start codon or part of a heterologous start codon at the 3' end of the alternatively-spliced exon (e.g., 1- 3 nucleic acids are added to the 3’ end of the alternatively -spliced exon, rather than substituted, to form an ATG start codon).
In some embodiments, an alternative exon comprises part of an ATG start, codon at its 3’ end. In some embodiments, an alternative exon may comprise, for example, “A” as the last nucleic acid, or “AT” as the last two nucleic acids, which formulate the 3’ end of the alternative exon. In such embodiments, the remainder of the ATG start codon may lie at the 5’ end of an exon lying immediately downstream of the alternative exon. For example, in some embodiments the alternative exon may comprise “A” as the last nucleic acid which formulates the 3’ end of the alternative exon, and the exon lying immediately downstream of the alternative exon may comprise “TG” as the first two nucleic acids which formulate the 5’ end of the downstream exon. In some embodiments, the alternative exon may comprise “AT” as the last two nucleic acids which formulate the 3’ end of the alternative exon, and the exon lying immediately downstream of the alternative exon may comprise “G” as the first nucleic acid which formulates the 5’ end of the downstream exon. In some embodiments, the ATG formed as a result of the splicing together of the alternative exon and the exon lying immediately downstream of the al ternative exon initiates translation of the exon lying immediately downstream of the alternative exon. In some embodiments, the exon lying immediately downstream of the alternative exon may be, for example, the coding region of the transgene (e.g., an MTM1 coding region).
In some embodiments, an alternative exon comprises an ATG start codon, or part of an ATG start codon, within the nucleic acid sequence of the alternative exon (e.g., not at the 3’ end of the alternative exon). In some embodiments, the ATG start codon is in the same reading frame as the coding region of interest. In some embodiments, the ATG start codon is within up to 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon. In some embodiments, the ATG start codon is within 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 10-12, 13-15, 14-16, 15-17, 16-18, 17- 19, 18-20, 19-21, 20-22, 21-23, 22-24, 23-25, 24-26, 25-27, 26-28, 27-29, or 28-30 nucleotides upstream of the 3’ end of the alternative-spliced exon. In some embodiments, the ATG start codon is within 4-12, 8-16, 12-20, 16-24, or 20-30 nucleotides upstream of the 3’ end of the alternative-spliced exon. In some embodiments, the ATG start codon is within up to 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides upstream of the 3' end of the alternative-spliced exon and is in the same reading frame as the coding region of interest. In some embodiments, the ATG start codon is within 4-6, 5-7, 6-8, 7- 9, 8-10, 9-11, 10-12, 13-15, 14-16, 15-17, 16-18, 17-19, 18-20, 19-21, 20-22, 21-23, 22-24, 23- 25, 24-26, 25-27, 26-28, 27-29, or 28-30 nucleotides upstream of the 3’ end of the alternative- spliced exon and is in the same reading frame as the coding region of interest. In some embodiments, the ATG start, codon is within 4-12, 8-16, 12-20, 16-24, or 20-30 nucleotides upstream of the 3’ end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
In some embodiments, wherein the alternative exon comprises 1, 2, or 3 nucleic acid substitutions at the 3' end to result in a heterologous ATG start codon (e.g, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end), the strength of the 5’ splice site of the alternative exon may be diminished, relative to the strength of the 5’ splice site strength of the wild-type or naturally occurring alternative exon. In such embodiments, one or more additional modifications made be made to the intronic sequence located immediately downstream of the sequence comprising the 3’ end of the alternative exon (see FIG. 12). In some embodimen ts, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 1-5 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively- spliced exon comprise 1 nucleotide substitution, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 2 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 3 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wildtype alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 4 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the first 10 nucleotides of the intronic sequence located immediately downstream of the alternatively-spliced exon comprise 5 nucleotide substitutions, relative to the naturally occurring or wild-type intronic sequence located immediately downstream of naturally occurring or wild-type alternative exon. In some embodiments, the 1-5 nucleotide substitutions restore or partially restore the strength of the 5’ splice site of the alternative exon, relative to the strength of the 5’ splice site of the naturally occurring or wild-type alternative exon.
Additionally or alternatively, in some embodiments the modification comprises disrupting or deleting all native start codons located 5' to the heterologous start codon. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, all native start codons located 5' to the heterologous start codon of the 5‘-most alternatively-spliced exon are disrupted or deleted. Additionally or alternatively, in some embodiments the modification comprises introducing into the alternatively-spliced exon a heterologous, in-frame stop codon at least 50 nucleotides upstream of the next 5' splice junction. In some embodiments, the alternatively-spliced exon is a nonsense-mediated decay (NMD) exon. In some embodiments, the NMD exon comprises an in-frame stop codon that is at least 50 nucleotides upstream of the next 5’ splice junction.
In some embodiments, the alternatively-spliced exon is considered to be synthetic when it is situated non-naturally (<?.g, is linked to a coding sequence to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type alternatively-spliced exon (e.g., is heterologous). In some embodiments, the alternatively-spliced exon is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non- naturally, relative to the wild-type alternatively-spliced exon.
In some embodiments, the alternatively -spliced exon is a. regulatory exon. In some embodiments, the regulatory exon is an alternatively regulated exon (e.g., an exon known to be subject to alternative splicing mechanisms). It will be appreciated that alternative splicing is a process by which exons or portions of exons or noncoding regions within a pre-mRNA transcript are differentially joined or skipped, resulting in multiple protein isoforms being encoded by a single gene. The regulation of alternative splicing is complex. Briefly, alternative splicing is known to be regulated by the functional coupling between transcription and splicing. Additional molecular features, such as chromatin structure, RNA structure and alternative transcription initiation or alternative transcription termination, collaborate with these basic components to produce the multiple isoforms that result from alternative splicing (see, e.g., Wang, et al., Biomed Rep. 2015 Mar; 3(2): 152-158). In certain embodiments, the compositions and methods of the present disclosure utilize the naturally- occurring mechanisms which regulate alternative splicing to express coding regions of interest (e.g., what would be alternatively spliced isoforms in the natural context) in specific biological conditions. In other embodiments, additional genetic elements may be incorporated into the DNA. In some embodiments, such additional genetic elements may become incorporated into the corresponding pre-mRNA, and may consequently influence, control, or otherwise regulate the splicing of the pre-mRNA to form one or more mRNA isoforms.
In some aspects, an alternatively-spliced exon — for which splicing may be regulated — is an exon for which splicing levels differ by at least 5%, for example at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% under two different conditions (e.g., in different tissues, in response to intracellular T cell levels, in response to intracellular levels of one or more RNA binding proteins, in the context of an autoregulated gene, etc). By “splicing levels differ by 5%”, it is meant that the splicing levels for an exon of interest are measured in two different conditions, and the splicing level is compared between the conditions and expressed as a percentage change. For example, if the splicing level in condition A is 80%, and the splicing level in condition B is 85%, the splicing levels between conditions A and B differ by 5%. Likewise, if the splicing level in condition A is 80%, and the splicing level in condition B is 75%, the splicing levels between conditions A and B also differ by 5%.
In some embodiments, the step of calculating a difference in expression of certain isoforms of certain genes in certain conditions as described herein is performed by calculating a percent spliced-in (psi) score. A psi (T) score is a value between 0 to 1 (e.g., 0.01, 0.02, 0.03,
0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20,
0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37,
0.38, 0.39, 0.40, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71,
0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88,
0.89, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, or 1 .0, or any value included therein such as e.g, 0.001, 0.0001, 0.0001, etc.) that quantifies alternative splicing occurrences present within a sample, or under certain conditions of interest.
In some embodiments, the
Figure imgf000139_0001
score is calculated (e.g, calculated from RNAseq reads) by dividing the number of inclusion reads (e.g., the number of alternative splicing events for a gene of interest) by the total number of inclusion reads and exclusion reads (e.g, the number of normal (e.g, non-altemative) splicing events for the gene of interest). Therefore, in some embodiments the T score is calculated according to the following formula for the gene of interest:
Figure imgf000139_0002
In some embodiments, the calculating comprises performing a mixture of isoforms (MISO) analysis. MISO analysis provides an estimate of isoform expression levels within a sample (e.g, a sample comprising a tissue of interest) based on a statistical model and assesses confidence in those estimates. In some embodiments, MISO analysis is performed using MISO software (see, e.g., Katz, Y., E. T. Wang, et al. (2010), Analysis and design of RNA sequencing experiments for identifying isoform regulation, Nat Methods 7(12): 1009-1015).
In some embodiments, a T score higher than (>) 0.50 (for example 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71,
0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88,
0.89, 0.90, 0.91 , 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, or 1.0, or any value included therein such as e.g., 0.5001, 0.50001, etc.) indicates that a greater number of alternative splicing events for the gene of interest are present in the tested sample than the number of regular splicing events. Conversely, in some embodiments a T score lower than (<) 0.50 (for example 0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, or any value included therein such as e.g:, 0.499, 0.4999, etc.) indicates that a lower number of alternative splicing events for the gene of interest are present in the tested sample than the number of regular splicing events.
As used herein, delta psi (AT) score is used to refer to the calculation of the difference between two T scores for a single gene of interest (e.g., in different tissues, in different intracellular conditions, etc.). The difference between the two calculated T scores is the AT score. It will be understood that, because a T score may be any value between 0 and I, as described herein, a AT score (that is, the difference between the two calculated T scores) may also be any value between 0 and I (e.g., 0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26,
0.27, 0.28, 0.29, 0.30, 0.31 , 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42, 0.43,
0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60,
0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71 , 0.72, 0.73, 0.74, 0.75, 0.76, 0.77,
0.78, 0.79, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90, 0.91, 0.92, 0.93, 0.94,
0.95, 0.96, 0.97, 0.98, 0.99, or 1 .0, or any value included therein such as e.g., 0.001, 0.0001, 0.0001, etc.) or any value between 0 and -1 (e.g., 0, -0.01, -0.02, -0.03, -0.04, -0.05, -0.06, -0.07,
-0.08, -0.09, -0.10, -0.11, -0. 12, -0.13, -0.14, -0.15, -0.16, -0.17, -0.18, -0.19, -0.20, -0.21, -0.22,
-0.23, -0.24, -0.25, -0.26, -0.27, -0.28, -0.29, -0.30, -0.31, -0.32, -0.33, -0.34, -0.35, -0.36, -0.37,
-0.38, -0.39, -0.40, -0.41, -0.42, -0.43, -0.44, -0.45, -0.46, -0.47, -0.48, -0.49, -0.50, -0.51, -0.52,
-0.53, -0.54, -0.55, -0.56, -0.57, -0.58, -0.59, -0.60, -0.61 , -0.62, -0.63, -0.64, -0.65, -0.66, -0.67,
-0.68, -0.69, -0.70, -0.71, -0.72, -0.73, -0.74, -0.75, -0.76, -0.77, -0.78, -0.79, -0.80, -0.81, -0.82,
-0.83, -0.84, -0.85, -0.86, -0.87, -0.88, -0.89, -0.90, -0.91, -0.92, -0.93, -0.94, -0.95, -0.96, -0.97,
-0.98, -0.99, or -1.0, or any value included therein such as e.g., -0.001, -0.0001, -0.0001, etc ). In some embodiments, a AT score may be expressed as an absolute value where the absolute value of e.g, -0.1 is 0.1.
In some embodiments, the alternatively-spliced exon is a tissue-specific alternatively- spliced exon. In some embodiments, one or more tissue-specific alternatively-spliced exons are included in a recombinant nucleic acid (e.g., in a rAAV). Non-limiting examples of tissuespecific alternatively-spliced exons are described in Supplemental Table S5 from Wang, E. T., et al., (2008), Nature, 456, 470-76, incorporated herein by reference. Other tissue-specific exons can be identified from transcriptome data. Non-limiting examples of RNA sequence motifs that can exhibit tissue-specific activity, thereby controlling the inclusion or exclusion of tissue- specific exons, are described in Badr, E., et al., (2016), PLOS One, 1 1 (11): e0166978, incorporated herein by reference. In some embodiments, alternative splicing of the tissuespecific exon results in the expression of the transgene (e.g., of the product encoded by the coding region of interest) in heart tissue, but not in skeletal tissue. In some embodiments, alternative splicing of the tissue-specific exon results in the expression of the transgene (e.g, of the product encoded by the coding region of interest) in skeletal tissue, but not in heart tissue. In some embodiments, a tissue-specific alternatively-spliced exon comprises an alternatively- spliced exon from any one or more of: CAMK.2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
In some embodiments, the tissue-specific alternatively-spliced exon is or is derived from exon 11 of BINI. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38. In some embodiments, the tissue-specific alternatively-spliced exon which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
In some embodiments, an alternatively-spliced exon is an immunoresponsive alternatively-spliced exon (e.g., undergoes alternative splicing in the presence of an enhanced immune response, such as an increased T cell presence). In some embodiments, the immunoresponsive alternatively-spliced exon is alternatively spliced in states of cellular inflammation. In some embodiments, the immunoresponsive alternatively-spliced exon is alternatively spliced when an abnormally elevated quantity of T cells is present in the intracellular environment (e.g., more T cells are present than under homeostatic conditions). In some embodiments, an immunorepressive alternatively-spliced exon comprises an alternatively- spliced exon from any one of ABCC1, AK125149, ASCC2, BAT2DI, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, E1F4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIF1A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRFI, KCTD13, LEF1, LUC7L, LYRM1, MALTl e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM 1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC16A, SFRS3, SFRS7, SI, MAP. SNRNP70, STA 1'6, TBC1D1, TIMM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, and ZNF496.
In some embodiments, an alternatively-spliced exon is a cell type-specific alternatively- spliced exon (e.g., undergoes alternative splicing only when located in certain cell types). In some embodiments, a cell type-specific alternatively -spliced exon comprises an alternatively- spliced exon as described in Joglekar, etal. (2021), A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nature Comm., 12(463), which is incorporated herein by reference with respect to its description of cell type-specific alternative exons.
In some embodiments, an alternatively-spliced exon is alternatively spliced in cells which exhibit high levels of expression of a particular RNA or protein. In some embodiments, an alternatively-spliced exon is alternatively spliced in cells which exhibit low levels of expression of a particular RNA or protein. High or low expression of a particular protein may in some embodiments be indicative of a disease state. For example, in some forms of frontotemporal dementia, MAPT exon 10 is aberrantly included, leading to increased levels of the 4R vs. 3R isoform. Increased 4R isoform is associated with neurodegeneration.
Accordingly, in some embodiments an alternatively-spliced exon is alternatively spliced in cells which exhibit disease {e.g., severe disease). In some embodiments, such disease comprises Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SCA8), spinocerebellar ataxia type 10 (SCA10), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCA17), Syndromic / non-syndromic X-linked mental retardation, Emery-Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Marie-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Werner syndrome, Dilated cardiomyopathy (DCM), Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (ARVD), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limb-girdle muscular dystrophy, Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy, Distal muscular dystrophy, Emery -Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD- r 'elated disorder, Prion disease, a motor neuron disease (MND), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PMA), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader-Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronuclear myopathy.
In some embodiments, an alternatively-spliced exon comprises an exon which may be differentially spliced depending on the intracellul ar level of the RNA or protein encoded by the coding region associated with the alternatively-spliced exon.
In some embodiments, an alternatively-spliced exon comprises an alternatively-spliced exon comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 23-44. In some embodiments, an alternatively-spliced exon comprises a polynucleotide sequence that is 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 23-44. In some embodiments, the alternatively-spliced exon is retained in the spliced transcript. Retention of the alternatively-spliced exon in the spliced transcript occurs under the alternative splicing conditions specific to said alternatively-spliced exon as described herein. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively -spliced exon, the 5'-most alternatively-spliced exon is retained in the spliced transcript. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, the 3'-most alternatively-spliced exon is included in the spliced transcript. In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, all alternatively-spliced exons are included in the spliced transcript.
In some embodiments, retention of the alternatively-spliced exon in the spliced transcript results in the productive expression of the transgene (e.g., productive translation of the protein). Expression of the product (e.g., therapeutic protein) encoded by the coding region of interest may in some embodiments be desirable. For example, in myotubular myopathy, expression of myotubularin 1 is depleted in skeletal muscle, and therefore restoration of myotubularin 1 in skeletal muscle is desirable. However, in some embodiments, expression of the product (e.g., therapeutic protein) encoded by the coding region of interest may be undesirable. For example, in myotubular myopathy, expression of myotubularin 1 in the heart may be undesirable. Accordingly, in some embodiments retention of the alternatively-spliced exon in the spliced transcript does not result in the productive expression of the transgene (e.g., no transcription of the RNA and/or no productive translation of the protein).
In some embodiments, the alternatively-spliced exon is located 5' to the coding region of the transgene. In some embodiments, the alternatively-spliced exon is located 3' to the coding region of the transgene. In some embodiments, the alternatively-spliced exon is located within the coding region of the transgene. In some embodiments, the alternatively-spliced exon is not located within the coding region of the transgene. In some embodiments, the alternatively- spliced exon is located 3' to a constitutive exon. In some embodiments, the alternatively-spliced exon is located 5' to a constitutive exon. (ii) Constitutive exons
In some embodiments, the recombinant viral genomes of the present disclosure comprise one or more constitutive exons. In various embodiments, the alternatively-spliced exon and the one or more constitutive exons may be configured as a cassette (e.g, comprised within a transgene. In some embodiments, the transgene comprising an alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 constitutive exons. In various embodiments, one or more constitutive exons may comprise a coding region of interest, or a portion thereof. In some embodiments, the constitutive exon is considered to be constitutive when it is present in all isoforms of spliced mRNAs resulting from the splicing of a pre-mRNA transcript.
A constitutive exon may in some embodiments be synthetic, but it need not be. A constitutive exon may be considered synthetic because it undergoes one or more nucleic acid modifications, relative to the wild-type constitutive exon. A nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the constitutive exon. In some embodiments, the modification comprises disrupting or deleting all native start codons located within the constitutive exon.
In some embodiments, the constitutive exon is considered to be synthetic when it is situated non-naturally (e.g, is linked to a coding sequence to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type constitutive exon (e.g., is heterologous). In some embodiments, the constitutive exon is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non-naturally, relative to the wild-type constitutive exon.
In some embodiments, the constitutive exon is naturally occurring (e.g., does not comprise any nucleic acid modifications, relative to the wild-type constitutive exon). In some embodiments, the constitutive exon is a native exon associated with the coding region of the transgene. In some embodiments, the constitutive exon is from or is derived from the same gene as the alternatively-spliced exon.
In some embodiments, the constitutive exon is from or is derived from a constitutive exon of a gene selected from the group consisting of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1 , hnRNP C, hnRNP D, hnRNP l)L, hnRNP F, hnRNP H, hnRNP K, hnRNP L hnRNP M, hnRNP R, hnRNP U, FI S, TDP43, PABPX 1, ATXN2, TAF15, EWSR1, MATR3, TIA1, 1 MRP. MTM1, MTMR2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, LM0D3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2, POLG1, GAA, AGL, PYGM, SLC22A5, 0CTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcogly can-encoding gene, a deita-sarcoglycan- encoding gene, 1 CAP. TRIA132, FKRP, FXN, POAIT1, FKTN, POAIT2, POMGnTI, DAG1, AN05, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TORLAIPl, POGLUT2, LAMA2, COL6A1, POMT1, P0MT2, DUX4, EMD, PAX7, PMP22, MPZ, MFN2, SMCHD1, SAIN, Lamin A/C (LANIN), and/or GJB1.
In some embodiments, the constitutive exon is from or is derived from a constitutive exon of a gene(s) selected from the group consisting of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GAT A3, GOLGA2, HIF1A, HMAIR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC 16 A, SFRS3, SFRS7, SEMAP, SNRNP70, STAT6, TBCID1, T1MM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, and/or ZNF496.
In some embodiments, the constitutive exon is from or is derived from a constitutive exon of a gene(s) selected from the group consisting of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM. In some embodiments, the constitutive exon is from or is derived from a constitutive exon of SMN1 . In some embodiments, the constitutive exon is from or is derived from exon 6 of SAINI. In some embodiments, the constitutive exon which is derived from SAINI exon 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SAINI exon 6. In some embodiments, the constitutive exon which is derived from SAINI exon 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102. In some embodiments, the constitutive exon which is derived from SMN1 exon 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102.
In some embodiments, the constitutive exon is not a native exon associated with the coding region of the transgene. In some embodiments, the constitutive exon is not from nor is derived from the same gene as the alternatively-spliced exon.
In some embodiments, a constitutive exon is located 5' to the alternatively-spliced exon. Additionally or alternatively, in some embodiments a constitutive exon is located 3' to the alternatively-spliced exon. In some embodiments, a constitutive exon is located 5' to the coding region of the transgene. Additionally or alternatively, in some embodiments a constitutive exon is located 3' to coding region of the transgene.
In some embodiments, the constitutive exon is retained in the spliced transcript (e.g., spliced in). In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, the 5'-most constitutive exon is retained in the spliced transcript. In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, the 3'-most constitutive exon is retained in the spliced transcript. In some embodiments, wherein the transgene comprising an alternatively-spliced exon cassette comprises more than one constitutive exon, all constitutive exons are retained in the spliced transcript. In some embodiments, the constitutive exon is excluded from the spliced transcript (e.g., spliced out).
(iii) Introns
In other embodiments, the recombinant viral genomes of the present disclosure comprise one or more introns. In various embodiments, the alternatively -spliced exon and the one or more introns (or portions thereof) may be configured as a cassette. In some embodiments, a nucleic acid (e.g., a nucleic acid comprising a recombinant viral genome) comprises an alternatively- spliced exon cassette encoding at least one transgene that contains at least one recombinant (e.g, engineered, truncated) intron that supports sufficient splice regulation of the transgene to be therapeutically effective. In some embodiments an alternatively -spliced exon cassette is an R.NA molecule (e.g., a pre-mRNA) that contains one or more (e.g., two or more) recombinant (e.g.. engineered; e.g., truncated) introns flanking one or more exons. In some embodiments, an alternatively-spliced exon cassette is a DNA molecule that encodes the RNA molecule containing one or more recombinant {e.g., engineered; e.g., truncated) introns. In some embodiments, a transgene comprising an alternatively-spliced exon cassette contains other regulatory sequences {e.g., promoters, 5’ or 3 UTRs, or other regulatory sequences) in addition to the gene coding (e.g, protein coding) sequences and the at least one recombinant {e.g., engineered, e.g., truncated) intron for which splicing can be regulated, as described elsewhere herein.
Accordingly, in some embodiments, a recombinant viral genome of the present disclosure comprises a transgene comprising an alternatively-spliced exon cassette, wherein the alternatively-spliced exon cassette comprises among other components at least one intron (or portion thereof). In some embodiments, the intron is a flanking intron (or portion thereof). In some embodiments, the alternatively-spliced exon cassette comprises 1, 2, 3, 4, 5, 6, 7, or 8 flanking introns (or portion(s) thereof).
In some embodiments, an exon {e.g., an alternatively-spliced exon, or a constitutive exon) is flanked by one or more introns (e.g., flanking introns), or portion(s) thereof. In some embodiments, an alternatively-spliced exon is flanked by one or more introns (or portion(s) thereof). In some embodiments, an alternatively-spliced exon is flanked by one intron (or portion thereof). In some embodiments, wherein the alternatively-spliced exon is flanked by one intron, the flanking intron (or portion thereof) is located 3' to the alternatively-spliced exon. In some embodiments, wherein the alternatively-spliced exon is flanked by one intron, the flanking intron (or portion thereof) is located 5' to the alternatively-spliced exon. In some embodiments, an alternatively-spliced exon is flanked by two introns (or portions thereof). In some embodiments, wherein the alternatively-spliced exon cassette comprises more than one alternatively-spliced exon, each alternatively-spliced exon is flanked by at least one, and in some embodiments two, flanking intron(s) (or portion(s) thereof). In some embodiments, an intron is a native flanking intron or native flanking intronic sequence of the alternatively-spliced exon. In some embodiments, an intron is not a native flanking intron or native flanking intronic sequence of the alternatively-spliced exon.
In some embodiments, a constitutive exon is flanked by one or more introns (or portion(s) thereof). In some embodiments, a constitutive exon is flanked by one intron (or portion thereof). In some embodiments, wherein the constitutive exon is flanked by one intron, the flanking intron (or portion thereof) is located 3' to the constitutive exon. In some embodiments, wherein the constitutive exon is flanked by one intron, the flanking intron (or portion thereof) is located 5' to the constitutive exon. In some embodiments, a constitutive exon is flanked by two introns (or portions thereof). In some embodiments, wherein the alternatively- spliced exon cassette comprises more than one constitutive exon, each constitutive exon is flanked by at least one, and in some embodiments two, flanking intron(s) (or portion(s) thereof). In some embodiments, an intron is a native flanking intron or native flanking intronic sequence of the constitutive exon. In some embodiments, an intron is not a native flanking intron or native flanking intronic sequence of the constitutive exon.
In some embodiments, an intron is a natural intron, and comprises no modifications, relative to a native intron.
An intron or intronic sequence may in some embodiments be synthetic, but it need not be. A synthetic intron or intronic sequence may be considered synthetic because it undergoes one or more nucleic acid modifications, relative to the wild-type or native intron. A nucleic acid modification may be a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the intron or intronic sequence.
In some embodiments, an intron or intronic sequence is considered to be synthetic when it is situated non-naturally (e.g., is linked to an exon to which it would not be linked in wild-type or naturally-occurring conditions), relative to the wild-type intron or intronic sequence (e.g., is heterologous). In some embodiments, the intron or intronic sequence is considered to be synthetic when it (i) undergoes one or more nucleic acid modifications, and (ii) is situated non- naturally, relative to the wild-type intron or intronic sequence.
In some embodiments, an intron (e.g., a flanking intron) (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is an engineered intron or intronic sequence. In some embodiments, the engineered intron or intronic sequence comprises a splice donor and splice acceptor site, and a functional branch point to which the splice donor site can be joined in the first trans-esterification reaction of splicing.
In some embodiments, an intron (e.g., a flanking intron) or intronic sequence comprising one or more nucleic acid modifications, relative to the wild-type intron, comprises a truncated version of a natural intron. By “truncated version of a natural intron”, it is meant that the naturally-occurring, full-length intron is shortened (e.g, truncated) via the removal of nucleotides. In some embodiments, an engineered (e.g., recombinant) intron or intronic sequence is a truncated version of a natural intron. However, in some embodiments an engineered intron or intronic sequence can be designed to include functional splice donor and acceptor sites and a functional branch point in addition to one or more regulatory' regions that are derived from different introns, or that are non-naturally occurring sequences (e.g., sequence variants of naturally-occurring sequences, consensus sequences, or de novo designed sequences).
Accordingly, in some embodiments an engineered intron or intronic sequence is not a tamcated version of a naturally occurring intron, but contains one or more sequences from a naturally occurring intron.
In some embodiments, an intron (e.g, a flanking intron) (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is truncated at its 5’ end. In some embodiments, 1-10,000 nucleotides are tamcated from the 5’ end (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000- 100,000 nucleotides are truncated from the 5’ end). In some embodiments, the 5’ splice site is not retained in the truncated intron (or portion thereof). In some embodiments, the 5’ splice site is retained in the truncated intron (or portion thereof). In some embodiments, a different 5’ splice site is included in the truncated intron (or portion thereof).
In some embodiments, an intron (e.g., a flanking intron) (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is tamcated at its 3’ end. In some embodiments, 1-10,000 nucleotides are tamcated from the 3’ end (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000- 100,000 nucleotides are truncated from the 3’ end). In some embodiments, the 3: splice site is not retained in the tamcated intron (or portion thereof). In some embodiments, the 3’ splice site is retained in the truncated intron (or portion thereof). In some embodiments, a different 3’ splice site is included in the truncated intron (or portion thereof!.
In some embodiments, an intron (e.g., a flanking intron) (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, is tamcated at one or more internal locations. In some embodiments, 1-10,000 internal nucleotides are removed (e.g., 1-50, 50-100, 100-500, 500-1,000, 1,000-5,000, 5,000-10,000, 10,000-20,000, 20,000-50,000, or 50,000-100,000 internal nucleotides are removed). In some embodiments, the splice regulatory region is not retained in the truncated intron (or portion thereof). In some embodiments, the splice regulatory' region is retained in the truncated intron (or portion thereof). In some embodiments, a different splice regulatory region is included in the truncated intron (or portion thereof).
In some embodiments, an intron (e.g, a flanking intron) (or portion thereof) comprising one or more nucleic acid modifications, relative to the wild-type intron, comprises one or more 5’, 3’, and/or internal deletions. It should be understood that the extent of truncation may depend on the size of the intron (or portion thereof) and the size of the gene. A truncation may require removal of sufficient intronic sequence to result in a recombinant gene construct that is small enough to be packaged in a recombinant virus of interest (e.g:, in a recombinant AAV or lenti virus).
However, an intron typically includes one or more sequences required for efficient splicing and/or regulated splicing. In some embodiments, an intron or intronic sequence comprises one or more splice junction sites (e.g, a 5’ splice donor site, and/or a 3’ splice acceptor site). In some embodiments, an intron or intronic sequence retains a splice donor site (e.g., towards the 5' end of the intron or intronic sequence), a branch site (e.g., towards the 3' end of the intron or intronic sequence), a splice acceptor site (e.g:, at the 3' end of the intron or intronic sequence), and a splice regulatory' sequence. In some embodiments, the intron or intronic sequence comprises a 5’ splice donor site. In some embodiments, the 5’ splice donor site is a GU or an AU. In some embodiments, the intron or intronic sequence comprises a 3’ splice acceptor site. In some embodiments, the 3’ splice acceptor site is an AG or an AC. In some embodiments, an intron or intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site. In some embodiments, a regulatory sequence comprises a response element within an AG exclusion zone of the intron. In some embodiments, the intron or intronic sequence retains sequence motifs bound by the encoded protein (e.g., YGCY motifs for MBNL1, or GCAUG for RBFOX, or YCAY for NOVA, etc.). In some embodiments, an intron or intronic sequence is spliced out, and is not included in the spliced transcript.
In some embodiments, an intron or intronic sequence may include one or more human, non-human primate, and/or other mammalian or non-mammalian intron splice-regulatory sequences. In some embodiments, the regulatory sequences may have 80%-100% (e.g, 80-85%, 85%-90%, greater than 90%, 90%-95%, or 95%-100%) sequence identity, relative to a wild-type regulatory sequence.
In some embodiments, an intron or intronic sequence is approximately 50 to 4000 nucleotides long. In some embodiments, an intron or intronic sequence is approximately 50 to 100, 75-125, 100-150, 125-175, 200-250, 225-275, 300-350, 325-375, 400-450, 425-475, 500- 550, 525-575, 600-650, 625-675, 700-750, 725-775, 800-850, 825-875, 900-950, 925-975, 950- 1000, 1025-1075, 1050 to 1100, 1075-1125, 1100-1150, 1125-1175, 1200-1250, 1225-1275, 1300-1350, 1325-1375, 1400-1450, 1425-1475, 1500-1550, 1525-1575, 1600-1650, 1625-1675, 1700-1750, 1725-1775, 1800-1850, 1825-1875, 1900-1950, 1925-1975, 1950-2000, 2025-2075, 2050 to 2100, 2075-2125, 2100-2150, 2125-2175, 2200-2250, 2225-2275, 2300-2350, 2325- 2375, 2400-2450, 2425-2475, 2500-2550, 2525-2575, 2600-2650, 2625-2675, 2700-2750, 2725- 2775, 2800-2850, 2825-2875, 2900-2950, 2925-2975, 2950-3000, 3025-3075, 3050 to 3100, 3075-3125, 3100-3150, 3125-3175, 3200-3250, 3225-3275, 3300-3350, 3325-3375, 3400-3450, 3425-3475, 3500-3550, 3525-3575, 3600-3650, 3625-3675, 3700-3750, 3725-3775, 3800-3850, 3825-3875, 3900-3950, 3925-3975, or 3950-4000 nucleotides long, or any integer contained therein (e.g, 51, 52, 53, 54, 55, etc.). In some embodiments, an intron or intronic sequence is approximately 50-60, 55-65, 60-70, 65-75, 70-80, 75-85, 80-90, 95-105, 100-110, 105-115, 110- 120, 115-125, 120-130, 125-135, 130-140, 135-145, 140-150, 145-155, 150-160, 155-165, 160- 170, 165-175, 170-180, 175-185, 180-190, 185-195, or 190-200 nucleotides long, or any integer contained therein (e.g., 100, 101, 102, 103, 104, 105, etc.). In some embodiments, an intron or intronic sequence is approximately 50-80, 60-90, 70-100, 80-1 10, 90-120, 100-130, 110-140, 120-150, 130-160, 140-170, 150-180, 160-190, or 170-200 nucleotides long, or any integer contained therein (e.g., 120, 121, 122, 123, 124, 125, etc.).
In some embodiments, a natural or wild-type intron is truncated or otherwise modified so as to retain only the sequence which regulates the up- or down-stream alternative exon. In some embodiments, said regulatory sequence is located within approximately 100-300 nucleotides upstream or downstream of the exon-intron (or intron-exon) border. In some embodiments, said regulatory sequence is located within approximately 100-110, 105-115, 1 10-120, 1 15-125, 120-
130, 125-135, 130-140, 135-145, 140-150, 145-155, 150-160, 155-165, 160-170, 165-175, 170-
180, 175-185, 180-190, 185-195, 190-200, 205-215, 210-220, 215-225, 220-230, 225-235, 230-
240, 235-245, 240-250, 245-255, 250-260, 255-265, 260-270, 265-275, 270-280, 275-285, 280- 290, 285-295, or 290-300 nucleotides upstream or downstream of the exon-intron (or intronexon) border. In some embodiments, said regulatory' sequence is located within approximately 100-130, 110-140, 120-150, 130-160, 140-170, 150-180, 160-190, 170-200, 210-240, 220-250, 230-260, 240-270, 250-280, 260-290, or 270-300 nucleotides upstream or downstream of the exon-intron (or intron-exon) border.
In some embodiments, the only intron that is comprised within an alternatively-spliced exon cassette is a truncated regulated intron. A regulated intron may in some embodiments be a regulated intron that flanks the alternative exon in its natural or wild-type context. In some embodiments, two regulated introns flank the alternative exon in its natural or wild-type context. A regulated intron may be located 5’ or 3’ relative to the alternative exon in its natural or wildtype context. In some embodiments, a regulated intron or truncated regulated intron is 5' relative to the alternative exon within an alternative exon cassette of the disclosure. In some embodiments, a regulated intron or truncated regulated intron is 3’ relative to the alternative exon within an alternative exon cassete of the disclosure. In some embodiments, two or more regulated introns are retained and truncated in an alternatively-spliced exon cassette. In some embodiments, the two or more truncated regulated introns flank the alternative exon within the alternative exon cassette. In some embodiments, all other (e.g., n on-regulatory) introns and intronic sequences have been removed. However, in some embodiments, one or more of the other introns (e.g., the introns that are not subject to regulated splicing) or intronic sequences may be retained (and optionally truncated) depending on the size of the nucleic acid and the size limitations of the virus, respectively. In some embodiments, the only introns or intronic sequences in an alternatively-spliced exon cassette are truncated introns or intronic sequences (e.g., only one, 2, 3, 4, 5, 6, 7, 8, 9, 10 truncated introns or intronic sequences). In some embodiments, an alternatively-spliced exon cassette does not contain any full-length introns. In some embodiments, an alternatively-spliced exon cassette does not contain any truncated introns or intronic sequences that are not regulated.
In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence from or derived from a gene selected from the group consisting of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EX0C7, EZH2, FAM 120 A, FAM136A, FAM36A, FARSB, FBXO38, FGFR10P2, FIP1L1, F0XRED1, FUBP3, GALT, GAT A3, GOLGA2, HIF1A, I NMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC 16 A, SFRS3, SFRS7, SEMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, and/or ZNF496.
In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence from or derived from a gene selected from the group consisting of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR 1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) is or is derived from an intron of BINI. In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) is or is derived from intron 10 and/or intron 11 of BINI. In some embodiments, intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 10 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 10 of BIN I comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16. In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
In some embodiments, the intron(s) or intronic sequence(s) flanking an alternative exon(s) comprise an intron or intronic sequence comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2121, 2130, 2141, or 2232-2233. In some embodiments, an intron or intronic sequence comprises a polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2121, 2130, 2141, or 2232-2233.
In some embodiments, all the introns (or portion(s) thereof) and exons (or portion thereof) of an alternatively-spliced exon cassette are from the same gene. Some embodiments of the present invention contemplate heterologous gene constructs, wherein introns (or portion(s) thereof) and exons (or portion(s) thereof) from different genes are integrated into a single alternatively-spliced exon cassette or transgene. In some embodiments, at least one intron (or portion thereof) and at least one exon (or portion thereof) of the nucleic acid construct are from different genes.
In some embodiments, an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gene(s) which comprises any one or more of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP (', hnRNP D, hnRNP DL, hnRNP F, hnRNP H, hnRNP K, hnRNP L, hnRNP VI, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FV1 RP, MTM1, MTMR2, LAMP/, KIF5A, a microdystrophinencoding gene, C9ORF72, HIT, DNM2, BINI, RYR 1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, KI . I H.40, KLHL41, LMOD3, VI YPN SEPNI, TTN, SPEG, MYH7, TK2, POLG1, GAA, AGL, PYGM, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, I AIN A, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha- sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcoglycan-encoding gene, a delta-sarcogly can-encoding gene, TCAP, TRIM32, FKRP, FXN, POMT1, FKTN, POMT2, POMGnTl, DAG1, ANO5, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TOR1AIP1, POGLUT2, LAMA2, COL6A1, POMT1, POMT2, DUX4, EMD, PAX7, PMP22, AIPZ. MFN2, SMC 31 ID 1 , SAINI, and/or GJB1 .
In some embodiments, an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gene(s) which comprises any one or more of: ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1 , CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM120A, I- AM 136A, FAM36A, I AR.SB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, G0LGA2, HIF 1 A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEFT, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC16A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBCIDI, TIMM8B, TIR8, TRA2A, TR0VE2, UGCGL1, VAP-B, VAV1, ZNF384, and/or ZNF496.
In some embodiments, an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from a gerte(s) which comprises any one or more of: CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
In some embodiments, one or more introns (or portions thereof) and/or an exon (or portion thereof) is from or derived from BINI.
In some embodiments, the one or more introns (or portions thereof) is or is derived from an intron(s) of BINI. In some embodiments, the one or more introns (or portions thereof) is or is derived from intron 10 and/or intron 11 of BINI. In some embodiments, the one or more introns (or portions thereof) which is or is derived from intron 10 of BIN I comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the one or more introns (or portions thereof) which is or is derived from intron 10 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 15. In some embodiments, the one or more introns (or portions thereof) which is or is derived from intron 1 1 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 16. In some embodiments, the one or more introns (or portions thereof) which is or is derived from intron 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 16.
In some embodiments, an exon (or portion thereof) is or is derived from exon 11 of BINI . In some embodiments, the exon (or portion thereof) which is or is derived from exon 11 of BINI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the exon (or portion thereof) which is or is derived from exon 11 of BINI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 37. In some embodiments, the exon (or portion thereof) which is or is derived from exon 1 1 of BIN 1 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 38. In some embodiments, the exon (or portion thereof) which is or is derived from exon 11 of BIN I comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 38.
In some embodiments, the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI together comprise an alternative exon cassette. In some embodiments, the alternative exon cassette (which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778. In some embodiments, the alternative exon cassette (which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
In some embodiments, an alternative exon cassette (e.g, which comprises the one or more introns (or portions thereof) and/or the exon (or portion thereof) which are from or derived from BINI) is selected for inclusion in a transgene based on the psi values which the alternative exon cassette achieves in a specific tissue of interest (see, e.g., Table 4; Table 5). For example, if the coding region of the transgene encodes a protein which would be therapeutically useful in skeletal tissue (e.g., MTM1), but which would not be desirable to express in heart tissue, the alternative exon cassette selected for inclusion in a transgene would be one wherein a high psi value is observed for skeletal tissue, and wherein a low psi value is observed for heart tissue (e.g., the A psi between skeletal tissue and heart tissue is large). In some embodiments, wherein the coding region of the transgene encodes a protein which would be therapeutically useful in skeletal tissue (e.g., MTM 1), but which would not be desirable to express in heart tissue, the alternative exon cassette selected from inclusion in a transgene would be one wherein a high psi value is observed for skeletal tissue. In some embodiments, wherein the coding region of the transgene encodes a protein which would be therapeutically useful in skeletal tissue (e.g.. MTM1), but which would not be desirable to express in heart tissue, the alternative exon cassette selected from inclusion in a transgene would be one wherein a low psi value is observed for heart tissue. As will be understood, the alternative exon cassette which is included in a transgene may be selected based on a variety of factors including, but not limited to: the identity of the protein cargo to be encoded by the coding region of interest, the A psi observed between a first tissue (or condition, etc.) which is of interest and a second tissue (or condition, etc.) which is not of interest, the psi observed in a tissue (or condition, etc.) which is of interest; and/or the psi observed in a tissue (or condition, etc.) which is not of interest. However, various other factors may also impact which alternative exon cassette is selected for inclusion in a transgene, as described throughout the disclosure.
In some embodiments, an intron (or portion thereof) and/or an exon (or portion thereof) is from or derived from SMN 1.
In some embodiments, an intron(s) is or is derived from intron 6 and/or intron 7 of SMN1 , In some embodiments, the intron which is derived from SMN1 intron 6 is a fragment of (e.g., is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 6. In some embodiments, the intron which is derived from SMN1 intron 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103. In some embodiments, the intron which is derived from SMN1 intron 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103. In some embodiments, the intron which is derived from SMN1 intron 7 is a fragment of (e.g, is truncated relative to) the wild-type or naturally occurring sequence of SMN1 intron 7. In some embodiments, the intron which is derived from SMN1 intron 7 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 104. In some embodiments, the intron which is derived from SMN1 intron 7 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 104.
In some embodiments, an exon is or is derived from exon 6 of SMN1 . In some embodiments, the exon which is derived from SMN1 exon 6 is a fragment of (e.g, is truncated relative to) the wild-type or naturally occurring sequence of SMN1 exon 6. In some embodiments, the exon which is derived from SMN1 exon 6 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102. In some embodiments, the exon which is derived from SMN1 exon 6 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102.
(iv) Positive or negative regulatory cis-element
In other embodiments, the recombinant viral genomes of the present disclosure comprise one or more regulatory sequences. In some embodiments, the regulatory sequences impart a positive control on the expression of a coding sequence of interest. In other embodiments, the regulatory sequences impart a negative control on the expression of a coding sequence of interest. Regulatory sequences may be present, inserted, or otherwise included in an alternatively-spliced exon. Such sequences may be referred to as positive or negative regulatory control cN-elements or “regulatory cis-elements” or merely as “cN-elements.”
The one or more czs-elements located within an alternatively-spliced exon and which may influence the level of expression of a coding region of interest through positive and/or negative controls may comprehensively include any genetic element which exerts — as a consequence being spliced-in or spliced-out of the final mRNA- -either a positive or negative regulation on the expression of the coding region. Non-limiting examples of positive or negative regulatory' m-elements located within the alternatively-spliced exons can include, without limitation, a translation start codon, a translation stop codon, a ligand-responsive aptamer, a binding site for an RNA binding protein that serves to positively regulate mRNA translation, a binding site for an RNA binding protein that serves to negatively regulate mRNA translation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA translation, or a binding site for a nucleic acid molecule (e.g, a miRN A or an siRNA) that selves to negatively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to positively regulate mRNA stability or degradation, a binding site for an RNA binding protein that serves to negatively regulate mRNA stability or degradation, a binding site for a nucleic acid molecule (e.g, an miRNA) that serves to positively regulate mRNA stability or degradation, a binding site for a nucleic acid molecule (e.g, an siRNA) that serves to negatively regulate mRNA stability or degradation, a nuclease recognition site, a sequence that can form a secondary structure that slows down translation (for example a stem loop that delays the ribosome), or a sequence that can form a secondary structure that promotes translation. This list of examples is not intended to place any limitation on the scope and meaning of the positive and negative cis- elements and the disclosure embraces any genetic element or region positioned within or at least associated with an alternatively-spliced exon which exerts a positive or negative control on the overall expression of a coding region of the transgene (e.g., encoding a therapeutic protein).
In some cases, the c/x-element is located within the alternatively-spliced exon, but in other cases, the c/s-eletnent is separate from, but at least associated with, the alternatively-spliced exon, such that it is spliced-in or spliced-out at the same time as the alternatively-spliced exon. Non-limiting examples of positive or negative regulatory' cN-elements can include, for instance, (1) a nucleotide sequence element that regulates, modulates, or otherwise affects the stability and/or degradation of a mRNA; and (2) a nucleotide sequence element that regulates, modulates, or otherwise affects the translation of a mRNA into one or more encoded polypeptide products (e.g., a therapeutic product).
In some embodiments, the one or more cA-elements can include, but are not limited to, a translation start codon, a translation stop codon, an siRNA binding site, a miRNA binding site, a sequence forming a stem-loop structure, a sequence forming an RNA dimerization motif, a sequence forming a hairpin structure, a sequence forming an RNA quadruples, polypurine tract, a sequence forming a pair of kissing loops, and a sequence forming a tetraloop/tetraloop receptor pair. In some embodiments, cA-elements include binding sites recognized by regulatory' elements, such as, for example, RNA binding proteins.
In some embodiments, an RNA binding protein may be involved in binding to one or more positive or negative cN-elements and, as such, may be involved in regulating the expression of the coding region of interest.
In some embodiments, the RNA binding protein is a sequence-specific RNA binding protein. In some embodiments, a useful sequence-specific RNA binding protein binds to a target sequence with a binding affinity (e.g., Kd) of 0.01-1000 nM or less (e.g., 0.01 to 1 , 1-10, 10-50, 50-100, 100-500, 500-1,000 nM). In some embodiments, an RNA binding protein has serine/arginine domains that act as splicing enhancers, or glycine-rich domains that act as splicing repressors. In some embodiments, an RNA binding protein acts as an intronic splicing enhancer, intronic splicing silencer, exonic splicing enhancer, or exonic splicing silencer. Different types of sequence-specific RNA binding proteins can be used. In some embodiments, a sequence-specific RNA binding protein is one that contains zinc fingers, RNA recognition motifs, KH domains, deadbox domains, or dsRBDs. Non-limiting examples of RBPs that contain zinc fingers include: MBNL, TISH, or TTP. Non-limiting examples of RBPs that contain RNA recognition motifs include hnRNPs and SR proteins, RbFox, PTB, Tra2beta. Nonlimiting examples of RNA binding proteins that contain KH domains include Nova, SF1, and FBP, Non-limiting examples of RNA binding proteins that contain deadbox domains are DDX5, DDX6, and DDX17. Non-limiting examples of RNA binding proteins that contain dsRBDs include ADAR, Staufen, and TRBP.
Further examples of these types of RNA binding proteins and their respective sequence specific binding motifs are known in the art, and can be found, for example, in Perez-Perri, J. I., et al., (2Ql^}, Nat. Comm., 9:4408; Van Nostrand, E. L., et al., (2020), Nature, 583, 711 -19; and Corley, M., etal., (2020), Cell, (20): 30159-3, the contents of winch are hereby incorporated by reference with respect to RNA protein binding sites and RNA binding proteins,
(v) Splicing factors
In some embodiments, the recombinant viral vector genomes may further comprise one or more regulatory sequences and/or genes encoding factors that regulate splicing, including splicing of the alternatively-spliced exon.
In some embodiments, that regulatory gene encodes a tissue-specific RNA binding protein, an autoregulatory RNA binding protein, or a condition-specific RNA binding protein. In some embodiments, the protein auto-regulates splicing of the mRNA encoded by the recombinant viral genome. In some embodiments, splicing can be regulated by two or more different splice regulatory proteins that bind to splicing regulatory regions. For example, in some embodiments, NRAP exon 12 is highly included in skeletal muscle but absent in heart.. In some embodiments, TPM2 exon 2 is low in heart but high in smooth muscle. In some embodiments, SLC25A3 is very' high in heart but low in brain. Many other examples can be found in the literature and one example of a list of such “switch-like exons” can be found in Wang, E. T., et al., (2008), Nature, 456(7221):470-6. Such sequences may be included in the recombinant viral genomes to further regulate splicing under certain desired conditions. In some embodiments, the recombinant viral genome may further encode a splice- regulatory protein, which can include, for instance, MBNL protein, an SR protein (e.g., SRSF1, SRSF2, SRSF3, SRSF4, SRSF5, SRSF6, SRSF7, SRSF8, SRSF9, SRSF10, SRSF11, or SRSF12), an hnRNP protein, an RbFox protein, a CELF protein, a Nova protein, or a PTB protein.
In some embodiments, the viral vectors may also encode a splicing factor in the form of an RNA, which may comprise a regulatory RNA molecule, a short hairpin RNA molecule (shRNA), a microRNA molecule, a transfer RNA molecule (tRNA), or an RNA that comprises a DMPK-targeting shRNA or microRNA. The RNA that regulates splicing may also comprise a repeat-targeting shRNA or microRNA (e.g, a CUG shRNA, CAG shRNA, or GGGGCC shRNA), e.g., which targets an RNA binding protein or other member of a related biological pathway.
In some embodiments, the viral vectors may also encode a splicing factor that comprises a protein-RNA complex, the protein-RNA complex comprises a ribosome, snRNP complex, or other macromolecular complex that can interact with RNA to regulate splicing decisions. In some embodiments, wherein the intracellular factor comprises a protein-RNA complex, a snRNP complex comprises U1 snRNP or U2 snRNP. In some embodiments, wherein the intracellular factor comprises a protein-RNA complex, the RNA comprises a ribozyme that targets one or more CUG repeats. In some embodiments, wherein the intracellular factor comprises a protein- RNA complex, the RNA comprises a ribozyme that targets specific mRNAs.
Non-limiting examples of RNA binding protein motifs and RNA target sequences that can confer or regulate spicing activity are described, for example, in Ray, D., etal., (2014), Nature, 499(7457): 172-77; Lambert., N., et al., (2014), Mol. Cell., 54(5): 887-900; and Van Nostrand, E. L., et al., (2020), Nature, and may be incorporated in the recombinant viral vector genomes described herein to further regulate splicing activity
(vi) Nonsense mediated decay (NMD) exons
In some embodiments, the recombinant viral vector genomes may comprise an alternatively-spliced exon cassette configured to regulate expression of a coding region of interest by including a nonsense mediated decay (NMD) exon (e.g, an alternative exon comprising a heterologous stop codon) within the RNA. In certain embodiments, the NMD exon is flanked by introns (or portion(s) thereof) for which alternative splicing is regulated. In some embodiments, an NMD exon is an exon that encodes at least one stop codon that is in frame with a previous exon, wherein the stop codon is upstream (5’) from the 3’ splice site of the exon. In various embodiments, the in-frame stop codon is inserted at least 100 nucleotides, at least 95 nucleotides, at least 90 nucleotides, at least 85 nucleotides, at least 80 nucleotides, at least 75 nucleotides, at least 70 nucleotides, at least 65 nucleotides, at least 60 nucleotides, at least 55 nucleotides, at least 50 nucleotides, at least 45 nucleotides, at least 40 nucleotides, at least 35 nucleotides, at least 30 nucleotides, at least 25 nucleotides, at least 20 nucleotides, at least 15 nucleotides, at least 10 nucleotides, or at least 5 nucleotides, or between 1 to 5 nucleotides upstream of the next 5’ splice junction.
In some embodiments, if the NMD exon is included in the spliced RNA, it causes degradation of the RNA via nonsense-mediated decay. In some embodiments, if the NMD exon is spliced out, the resulting transcript is stable, and in some embodiments encodes a functional (e.g., full-length) protein of interest.
In some embodiments, an alternatively-spliced exon cassette for which splicing is regulated is a construct configured to regulate expression of a protein by including a 5’ exon comprising an amino terminal amino acid encoding sequence (e.g, an ATG or part of the ATG) and/or translation control sequences, wherein the 5’ exon is separated from subsequent exon(s) by an intron for which splicing is regulated. In some embodiments, if the intron is spliced out of the RNA transcript, the recombinant 5’ exon is spliced in frame to the subsequent exon(s) and the resulting spliced transcript encodes a protein that is expressed. In some embodiments, if the intron is not spliced out of the RNA transcript, the recombinant 5’ exon is not spliced to the subsequent exon(s) and as a result a protein is not expressed from the transcript.
In some embodiments, an intron (or portion thereof) for which splicing is regulated can be included within a gene that encodes a regulator}' RNA (e.g., an siRNA). In some embodiments, an intron(s) (or portion thereof) for which splicing is regulated and that encodes regulator}' RNA(s) can be included in an alternatively-spliced exon cassette encoding an RNA transcript.
(vii) Transgenes and coding regions thereof
In various embodiments, the recombinant genomes disclosed herein may comprise one or more transgenes. A transgene may be recombinant (or “synthetic”), and may be modified to comprise an alternatively-spliced exon or an alternatively-spliced exon cassette described herein (e.g., see FIG. 1) such that the expression of the transgene or coding region of interest comes under the regulatory control of alternatively-spliced exon or the presence of a ligand. A transgene (e.g, a coding region of a transgene) may encode any therapeutic agent, including, but not limited to a therapeutic protein, an antibody or fragment thereof, a bispecific antibody or fragment thereof, antigen-binding fragments, a nucleic acid molecule-based therapeutic (e.g, an siRNA, a microRNA, or an oligonucleotide), genome editing components (e.g., CRISPR/Cas9 based proteins and protein fusion and guide RNA molecules), and complexes (e.g, nucleoprotein complexes).
A coding region of a transgene may be naturally-occurring, and may in some embodiments comprise no nucleic acid modifications, relative to the coding region of a wild-type gene. In some embodiments, a coding region of a transgene may be synthetic. The coding region of a transgene may be considered synthetic if it undergoes one or more nucleic acid modifications, relative to the coding region of a wild-type gene. A nucleic acid modification maybe a substitution or deletion of one or more nucleotides that form the nucleic acid sequence of the coding region of the transgene. In some embodiments, the modification comprises disrupting or deleting a native start codon located at the 5’ end of the coding region of the transgene. In some embodiments, the modification comprises the insertion of an alternatively-spliced exon into the coding region of the transgene.
In some embodiments, the coding region of the transgene may comprise one or more nucleic acid modifications (e.g., substitutions) such that the coding region comprises a “barcode” sequence. Barcode sequences may be useful in some embodiments to characterize the identity of the transgene (e.g., a transgene comprising a BINI alternative exon cassette an&MTMl coding sequence), for example when multiple transgenes are being tested together. In some embodiments, the wobble positions of five codons within the coding region of the transgene are modified to produce a barcode sequence. As will be understood, a “wobble position” is the third nucleic acid of a codon. Nucleic acids lying at wobble positions can be modified without altering the identity of the amino acid encoded by the associated codon (see FIG. 13, SEQ ID NO: 63). Thus, in some embodiments, the third nucleic acid of each of five consecutive codons in the coding region of the transgene is modified (e.g, 5 total substitutions are made, SEQ ID NOs: 65-75). In some embodiments, said modifications result in the formation of a barcode sequence which is 5 nucleic acid sequences in length. In some embodiments, the resultant barcode sequence is unique to the transgene within which it is comprised, and can be used to characterize the identity of said transgene.
In some embodiments, the five codons which are modified are located approximately 350 nucleotides from the 5’ end of the coding region of the transgene. In some embodiments, the five codons which are modified are located approximately 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, or 550 nucleotides from the 5’ end of the coding region of the transgene. In some embodiments, the five codons which are modified are located approximately 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, or 550 nucleotides from the 5’ end of the coding region of the transgene. In some embodiments, the five codons which are modified are located approximately 100-130, 120-150, 140-170, 160-190, 180-210, 200-230, 220-250, 240-270, 260-290, 280-310, 300-330, 320-350, 340-370, 370-400, 390-420, 410-440, 430-460, 450-480, 470-500, 490-520, 510-540, or 530-560 nucleotides from the 5’ end of the coding region of the transgene.
In some embodiments, a coding region of a transgene may naturally comprise one or more internal, out-of-frame ATG start codons. As will be understood, in the splicing condition wherein the alternative exon (comprising an ATG start codon at its 3’ end) is spliced-out, translation of the coding region via an alternate, out-of-frame ATG start codon located within the coding region of the transgene would be undesirable. However, any modification made to the coding region of the transgene must also preserve translation of the full-length protein when the alternative exon is spliced-in. Accordingly, in some embodiments one or more modifications are made to the coding region of the transgene which preserve translation of the full-length protein in the condition wherein the alternative exon is spliced-in, but which disrupt or terminate translation of the full-length protein in the condition wherein the alternative exon is spliced-out. In some embodiments, one or more nucleic acid substitutions are made within the coding region of the transgene to introduce one or more heterologous stop codons located downstream of (e.g., 3’ relative to) one or more of the internal, out-of-frame start codons located within the coding region of the transgene. As will be understood, such substitutions may comprise the substitution of 1, 2, or 3 nucleic acids to produce any of a TAA, TGA, or TAG stop codon, depending on the nucleic acids which are naturally present at the desired location within the coding sequence. Additionally or alternatively, in some embodiments a 3’ UTR intron is included in the transgene which elicits nonsense-mediated decay in the condition wherein the alternative exon is spliced- out (such that translation of the full-length protein is disrupted or terminated), but which preserves translation of the full-length protein in the condition wherein the alternative exon is spliced-in.
In some embodiments, the coding region or at least one of the exons of the transgene is from or is derived from a coding region from a gene selected from the group consisting of: MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C. hnRNP D, hnRNP DI .. hnRNP F, hnRNP H, hnRNP K, hnRNP L, hnRNP M, hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FMRP, MTMI, MTMR2, LAMP2, KIF5A, microdystrophin, C9ORF72, HTT, DNM2, BINI , RYR1 , NEB, ACTA, TPM3, I PX 12, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, LM0D3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2, POLG1, GA A, AGE, PYGM, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, cytochrome b/cytochrome c oxidase, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA (Lamin A/C), CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, alpha-sarcoglycan, beta- sarcoglycan, gamma-sarcoglycan, delta-sarcoglycan, TCAP, TRIM32, FKRP, FXN, P0MT1, FKTN, POMT2, POMGnTl , DAG1, AN05, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1, TOR1AIP1, POGLUT2, LAMA2, COL6A1, P0MT1, P0MT2, DUX4, EMD, PA.X7, PMP22, MPZ, XIFN2, SMCHD1, or GJB1. In some embodiments, the coding region of the transgene is from or is derived from a coding region of FXN.
In some embodiments, the coding region or at least one exon of the transgene is from or is derived from a coding region of MTMI. In some embodiments, the coding region of the transgene which is or is derived from MTMI comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1881. In some embodiments, the coding region of the transgene which is or is derived from MTMI comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1881.
In some embodiments, the coding region of the transgene is from or is derived from a coding region of CAPN3. In some embodiments, the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1882. In some embodiments, the coding region of the transgene which is or is derived from CAPN3 comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1882.
In other embodiments, the transgene may encode one or more therapeutic proteins (e.g, a biologic or biosimilar thereof), including, but not limited to: adalimumab, rituximab, pegfilgrastim, infliximab, bevacizumab, trastuzumab, etanercept, and epoetin.
D. Packaging recombinant viral genomes into viral vectors
Aspects of the present disclosure provide for the packaging of the herein disclosed recombinant viral genomes into viral vectors (j.e., complete viral particles which may infect cells to deliver the recombinant genomes, and the concomitant expression of the transgenes in a manner dependent on the altematively-splice exons). Thus, in some embodiments a recombinant viral genome comprising an alternatively-spliced exon cassette as described herein is provided in a viral vector (e.g., an rAAV vector; a lentivirus vector). The viral vectors may include rAAV particles, lentivirus particles, or other viral vectors.
In some embodiments, the recombinant viral genomes packaged into the rAAV or lentiviral vectors further comprise a promoter. In some embodiments, the promoter is a constitutive promoter or a regulated promoter. In some embodiments, the regulated promoter is an inducible promoter. In some embodiments, the promoter comprises any one of: CMV, EFl al ph a, CBh, synapsin, enolase, MECP2, MHCK7, Desmin, or GFAP.
In some embodiments, an MHCK7 promoter comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1880. In some embodiments, an MHCK7 promoter comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1880.
In some embodiments, the promoter is a ubiquitous promoter. In some embodiments, a ubiquitous promoter is a promoter selected from the group consisting of: an EFl alpha promoter, a beta actin promoter, CMV, CBh, and CAG promoter. In some embodiments, the promoter is a tissue-specific promoter, such as a muscle- or heart-biased promoter. In some embodiments, a tissue-specific promoter, such as a muscle- or heart-biased promoter, is a promoter selected from the group consisting of: a muscle creatine kinase promoter, a C5-I2 muscle promoter, MHCK7, and Desmin. In some embodiments, the promoter is a neuronal -biased promoter. In some embodiments, a neuronal-biased promoter is a promoter selected from the group consisting of: synapsin and MECP2. In some embodiments, the promoter is an astrocyte-biased promoter. In some embodiments, an astrocyte-biased promoter is a GFAP promoter. Thus, in some embodiments, the nucleic acid comprises a promoter and sequence corresponding to an RNA molecule that is capable of being expressed from the nucleic acid.
In some embodiments, the recombinant viral genome is sufficiently small to be effectively packaged in an AAV viral particle (c.g., the gene construct may be around 0.5-5 kb long, for example around 4.9 kb, 4.8 kb, 4.7 kb, 4,6 kb, 4.5 kb, 4.4 kb, 4.3 kb, 4,2 kb, 4. 1 kb, 4 kb, 3.5 kb, or 3 kb long). So as to fit into the AAV viral particle, in some embodiments a nucleic acid comprises one or more truncated and/or recombinant introns, as described elsewhere herein. Accordingly, a recombinant intron for an rAAV vector is typically shorter than 4 kb, but can be between around 20 bases long and around 2,000 bases long to provide space for other components (e.g., exons, regulatory sequences, other introns, viral packaging sequences) in the nucleic acid (e.g, recombinant gene) construct. In some embodiments a recombinant intron is around 50 bases, around 100 bases, around 250 bases, around 500 bases, around 1,000 bases, around 1,500 bases, or around 2,000 bases long. In some embodiments, a recombinant intron is shorter than 4 kb, shorter than 3 kb, shorter than 2 kb, shorter than 1 kb, 100-900 bases long, or shorter than 500 bases long.
In some embodiments, the recombinant viral genome contains sufficient viral sequences for packaging in a viral vector (e.g, an rAAV particle). For example, in some embodiments a recombinant viral genome is flanked by viral sequences (for example, terminal repeat sequences) that are useful to package the recombinant viral genome in a viral particle (e.g., encapsidated by viral capsid proteins and/or an envelope, where appropriate). In some embodiments, the flanking terminal repeat sequences are rAAV inverted terminal repeats (ITRs). In some embodiments, the AAV ITR sequences comprise AAV1, AAV2, AAV5, AAV7, AAV8, or AAV9 ITR sequences. In some embodiments, the AAV ITR sequences comprise AAV2 ITR sequences. In some embodiments, an AAV2 ITR comprises a polynucleotide having at. least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106. In some embodiments, the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
In some embodiments, the recombinant viral genome is a lentivirus genome comprising a DNA molecule, wherein the DNA molecule comprises sequences that encode an RNA molecule.
(i) Manufacture of rAA V vectors
In some embodiments, the recombinant viral genome is encapsidated by an rAAV particle as described herein. The rAAV particle may be of any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10), including any derivative (including non-naturally occurring variants of a serotype) or pseudotype. In some embodiments, the rAAV particle is an AAV8 particle, which may be pseudotyped with AAV2 ITRs. In some embodiments, an AAV2 ITR comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 1879. In some embodiments, an AAV2 ITR comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 1879.
Non-limiting examples of derivatives and pseudotypes include AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV218, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45; or a derivative thereof. In some embodiments, the rAAV vector is of serotype AAV8. In some embodiments, the rAAV vector is pseudotyped. Such AAV serotypes and derivatives/pseudotypes, and methods of producing such derivatives/pseudotypes are known in the art (see, e.g.. Mol Ther. 2012 Apr;20(4):699-708. doi: 10.1038/mt.2011.287. 2012 Jan 24. The AAV vector toolkit: poised at the clinical crossroads. Asokan Al, Schaffer DV, Samulski RJ.). In some embodiments, the rAAV particle is a pseudotyped rAAV particle, which comprises (a) a nucleic acid vector comprising ITRs from one serotype (e.g., AAV2) and (b) a capsid comprised of capsid proteins derived from another serotype (e.g., AAV1, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAV10). Methods for producing and using pseudotyped rAAV vectors are known in the art (see, e.g., Duan el al., J. Virol., 75:7662-7671, 2001 , Halbert el al., J. Virol., 74:1524- 1532, 2000; Zolotukhin el al., Methods, 28:158-167, 2002; and Auricchio el al.. Hum. Molec. Genet, 10:3075-3081, 2001).
Exemplary' rAAV nucleic acid vectors useful according to the disclosure include singlestranded (ss) or self-complementary (sc) AAV nucleic acid vectors, such as single-stranded or self-complementary recombinant viral genomes.
Methods of producing rAAV particles and recombinant viral genomes are also known in the art and commercially available (see, e.g., Zolotukhin et al. Production and purification of serotype 1, 2, and 5 recombinant adeno-associated viral vectors. Methods 28 (2002) 158—167; and U.S. Patent Publication Numbers US20070015238 and US20120322861, which are incorporated herein by reference; and plasmids and kits available from ATCC and Cell Biolabs, Inc.), For example, a plasmid containing the recombinant viral genome may be combined with one or more helper plasmids, e.g., that contain a rep gene (e.g, encoding Rep78, Rep68, Rep52 and Rep40) and a cap gene (encoding VP1, VP2, and VP3, including a modified VP3 region), and transfected into a producer cel l line such that the nAAV particle can be packaged and subsequently purified.
In some embodiments, the one or more helper plasmids includes a first helper plasmid comprising a rep gene and a cap gene and a second helper plasmid comprising a El a gene, a E lb gene, a E4 gene, a E2a gene, and a MA gene. In some embodiments, the rep gene is a rep gene derived from AAV2 and the cap gene is derived from AAV2 and includes modifications to the gene in order to produce a modified capsid protein described herein. Helper plasmids, and methods of making such plasmids, are known in the art and commercially available (see, e.g., pDM, pDG, pDPlrs, pDP2rs, pDP3rs, pDP4rs, pDPSrs, pDP6rs, pDG(R484E/R585E), and pDPS.ape plasmids from PlasmidFactory, Bielefeld, Germany; other products and services available from Vector Biolabs, Philadelphia, PA; Cellbiolabs, San Diego, CA; Agilent Technologies, Santa Clara, Ca; and Addgene, Cambridge, MA; pxx6; Grimm et al. (1998), Novel Tools for Production and Purification of Recombinant Adenoassociated Vims Vectors, Human Gene Therapy, Vol. 9, 2745-2760, Kern, A. etal. (2003), Identification of a Heparin- Binding Motif on Adeno- Associated Virus Type 2 Capsids, Journal of Virology, Vol. 77, 11072- 1 1081.; Grimm et al. (2003), Helper Virus-Free, Optically Controllable, and Two-Plasmid-Based Production of Adeno-associated Virus Vectors of Serotypes 1 to 6, Molecular Therapy, Vol. 7, 839-850; Kronenberg et al. (2005), A Conformational Change in the Adeno-Associated Virus Type 2 Capsid Leads to the Exposure of Hidden VP1 N Termini, Journal of Virology, Vol. 79, 5296-5303; and Moullier, P. and Snyder, R.O. (2008), International efforts for recombinant adeno-associated viral vector reference standards, Molecular Therapy, Vol. 16, 1185-1188).
An exemplary, non-limiting, rAAV particle production method is described next. One or more helper plasmids are produced or obtained, which comprise rep and cap ORFs for the desired AAV serotype and the adenoviral V A, E2 A (DBP), and E4 genes under the transcriptional control of their native promoters. The cap ORF may also comprise one or more modifications to produce a modified capsid protein as described herein. HEK293 cells (available from ATCC®) are transfected via CaPO4-mediated transfection, lipids or polymeric molecules such as Polyethylenimine (PEI) with the helper plasmid(s) and a plasmid containing a nucleic acid vector described herein. The HEK293 cells are then incubated for at least 60 hours to allow for rAAV particle production. Alternatively, in another example Sf9-based producer stable cell lines are infected with a single recombinant baculovirus containing the nucleic acid vector. As a further alternative, in another example HEK293 or BHK cell lines are infected with a HSV containing the nucleic acid vector and optionally one or more helper HSVs containing rep and cap ORFs as described herein and the adenoviral VA, E2A (DBP), and E4 genes under the transcriptional control of their native promoters. The HEK293, BHK, or Sf9 cells are then incubated for at least 60 hours to allow for rAAV particle production. The rAAV particles can then be purified using any method known the art or described herein, e.g, by iodixanol step gradient, CsCl gradient, chromatography, or polyethylene glycol (PEG) precipitation.
As used herein, the terms “engineered” and “recombinant” cells are intended to refer to a cell into which an exogenous polynucleotide segment (such as DN A segment that, leads to the transcription of a biologically active molecule) has been introduced. Therefore, engineered cells are distinguishable from naturally occurring cells, which do not contain a recombinantly introduced exogenous DNA segment. Engineered cells are, therefore, cells that comprise at least one or more heterologous polynucleotide segments introduced through the hand of man.
To express a therapeutic agent (such as a transgene comprising an alternatively-spliced cassette) in accordance with the present invention one may prepare a tyrosine capsid-modified rAAV particle containing an expression vector that comprises a therapeutic agent-encoding nucleic acid segment under the control of one or more promoters. To bring a sequence “under the control of’ a promoter, one positions the 5' end of the transcription initiation site of the transcriptional reading frame generally between about 1 and about 50 nucleotides “downstream” of (i.e., 3’ of) the chosen promoter. The “upstream” promoter stimulates transcription of the DNA and promotes expression of the encoded polypeptide. This is the meaning of “recombinant expression” in this context. In some embodiments, the recombinant nucleic acid (e.g., viral) vector constructs are those that comprise an rAAV nucleic acid vector that contains a therapeutic gene of interest operably linked to one or more promoters that is capable of expressing the gene in one or more selected mammalian cells. Such nucleic acid vectors are described in detail herein.
In some embodiments, wherein the recombinant viral genome is an rAAV genome, the transgene comprising an alternatively-spliced exon cassette comprises a polynucleotide sequence as set forth in any one of SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, 2236, or 2247-2256. In some embodiments, wherein the recombinant viral genome is an rAA V genome, the transgene comprising an alternatively-spliced exon cassette comprises a. polynucleotide sequence that is 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 2080, 2091 , 2099, 2102, 2105, 2108, 2109, 2110, 21 11, 2112, 2116, 21 18, 2120, 2123, 2128, 2131, 2132, 2138, 2236, or 2247-2256.
(ii) Manufacture of lentivirus vectors
In some embodiments, a viral vector of the present disclosure comprises a recombinant lentivirus genome. Lentiviruses are the only type of virus that are diploid; they have two strands of RNA. The lentivirus is a retrovirus, meaning it has a single stranded RNA genome with a reverse transcriptase enzyme, which functions to perform transcription of the viral genetic material upon entering the cell. Lentiviruses also have a viral envelope with protruding glycoproteins that aid in attachment to the outer membrane of a. host cell.
Within the lentivirus genome are RNA sequences that code for specific proteins that facilitate the incorporation of the viral sequences into genome of a host cell. The “gag” gene codes for the structural components of the viral nucleocapsid proteins: the matrix (MA/pl7), the capsid (CA/p24) and the nucleocapsid (NC/p7) proteins. The “pol” domain codes for the reverse transcriptase and integrase enzymes. Lastly, the “env” domain of the viral genome encodes for the glycoproteins and envelope on the surface of the virus. The ends of the genome are flanked with long terminal repeats (LTRs). LTRs are necessary' for integration of the dsDNA into the host chromosome. LTRs also serve as part of the promoter for transcription of the viral genes.
In some embodiments, the env, gag, and/or pol vector(s) forming the particle do not contain a nucleic acid sequence from the lentiviral genome that expresses an envelope protein. In some embodiments, a separate vector containing a nucleic acid sequence encoding an envelope protein operably linked to a promoter is used (e.g., an env vector). In some embodiments, such env vector also does not contain a lentiviral packaging sequence. In some embodiments, the env nucleic acid sequence encodes a lentiviral envelope protein.
The native lentivirus promoter is located in the U3 region of the 3' LTR. As will be understood by those of skill in the art, the presence of the lentivirus promoter can in some embodiments interfere with heterologous promoters operably linked to a transgene. To minimize such interference and better regulate the expression of transgenes, in some embodiments the lentiviral promoter is deleted. In some embodiments, the lentivirus vector contains a deletion within the viral promoter. After reverse transcription, such a deletion is in some embodiments transferred to the 5' LTR, yielding a vector/provirus that is incapable of synthesizing vector transcripts from the 5' LTR in the next round of replication.
In some embodiments, the lentivirus particle is expressed by a vector system encoding the necessary viral proteins to produce a lentivirus particle. In some embodiments, there is at least one vector containing a nucleic acid sequence encoding the lentiviral Pol proteins necessary for reverse transcription and integration, operably linked to a promoter. In some embodiments, the Pol proteins are expressed by multiple vectors. In some embodiments, there is also a vector containing a nucleic acid sequence encoding the lentiviral Gag proteins necessary for forming a viral capsid operably linked to a promoter. In some embodiments, the gag-pol genes are on the same vector. In some embodiments, the gag nucleic acid sequence is on a separate vector than at least some of the pol nucleic acid sequence. In some embodiments, the gag nucleic acid sequence is on a separate vector from all the pol nucleic acid sequences that encode Pol proteins.
In some embodiments, the lentivirus vector does not contain nucleotides from the lentiviral genome that package lentiviral RNA, referred to as the lentiviral packaging sequence. It will be understood that selective inclusion of envelopes could result in changes in infectivity, such that the lentivirus vector could infect many different types of cells, and could be targeted to specific cell types of interest. Accordingly, in some embodiments, the envelope protein is not from the lentivirus, but from a different virus. The resultant lentivirus particle is referred to as a pseudotyped particle. In some embodiments, env gene that encodes an envelope protein that targets an endocytic compartment such as that of the influenza virus, VSV-G, alpha viruses (Semliki forest virus, Sindbis virus), arenaviruses (lymphocytic choriomeningitis virus), flaviviruses (tick-borne encephalitis virus, Dengue vims), rhabdoviruses (vesicular stomatitis virus, rabies vims), and orthomyxoviruses (influenza vims) is used.
In some embodiments, the lentivirus is a human immunodeficiency virus (HIV1 or HIV2), a feline immunodeficiency virus (FIV), a bovine immunodeficiency vims (BIV), a caprine arthritis encephalitis virus, an equine infectious anemia virus, a jembrana disease virus, a puma lentivirus, aimian immunodeficiency vims, or a visna-maedi vims.
In some embodiments, a nucleic acid sequence encoding a transgene comprising an alternatively-spliced exon cassette of the present invention is inserted into the empty' lentiviral parti cles by use of a plurality of vectors each containing a nucleic acid segment of interest and a lentiviral packaging sequence necessary to package lentiviral RNA into the lentiviral particles (the packaging vector). In some embodiments, the packaging vector contains a 5' and 3' lentiviral LTR with the desired nucleic acid segment inserted between them. The nucleic acid segment can be antisense molecules or, in some embodiments, encodes a therapeutic protein. As wall be understood, proper orientation of the transgene within the lentiviral genome is necessary to avoid the loss of introns (e.g, the splicing-out of introns) during viral packaging. Accordingly, in some embodiments, the transgene is oriented in the anti-sense orientation within the lentiviral genome. In some embodiments, orienting the transgene in the anti-sense direction within the lentiviral genome avoids the loss of introns (e.g, the splicing-out of introns) during viral packaging.
In some embodiments, the packaging vector contains a selectable marker gene. Such marker genes are well known in the art and include such genes as green fluorescent protein (GFP), blue fluorescent protein (BFP), luciferase, LacZ, nerve growth factor receptor (NGFR), etc. E. Methods of delivering viral vectors
Some aspects of the invention contemplate a method of treating a disease or condition in a subject comprising administering a viral vector of the present disclosure to a subject, wherein the viral vectors comprise a recombinant viral genome described herein. Accordingly, provided herein is a method of delivering the disclosed viral (e.g., rAAV; lentivirus) particles. In some embodiments, viral particles are delivered by administering any one of the compositions disclosed herein to a subject. In some embodiments, “administering” or “administration” means providing a material to a subject in a manner that is pharmacologically useful. In some embodiments, viral particles are delivered to one or more tissues and cell types in a subject. In some embodiments, viral particles are delivered to one or more of muscle, heart, CNS, and immune cells. In some embodiments, delivery of a viral particle restores transcriptome homeostasis.
Deliver}/ vehicles, vectors, particles, nanoparticles, formulations and components thereof which are suitable for expression of one or more elements of an engineered AAV capsid system as described herein are as described in, for example, International Patent Application Publication Nos. WO 2021/050974 and WO 2021/077000 and International Application No.
PCT/US2021/042812, the contents of each of which are incorporated by reference herein.
In some embodiments, a viral particle is administered to the subject parenterally. In some embodiments, a viral particle is administered to a subject subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intracisternally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs. In some embodiments, a viral particle is administered to the subject by injection into the hepatic artery' or portal vein.
To “'treat” a disease, as the term is used herein, means to reduce the frequency or severity of at least one sign or symptom of a disease or disorder experienced by a subject. The compositions described above or elsewhere herein are typically administered to a subject in an effective amount, that is, an amount capable of producing a desirable result. The desirable result will depend upon the active agent being administered. For example, an effective amount of rAAV particles may be an amount of the particles that are capable of transferring an expression construct to a host organ, tissue, or cell. A therapeutically acceptable amount may be an amount that is capable of treating a disease. As is well known in the medical and veterinary arts, dosage for any one subject depends on many factors, including the subject’s size, body surface area, age, the particular composition to be administered, the active ingredient(s) in the composition, time and route of administration, general health, and other drugs being administered concurrently.
In some embodiments, a single composition comprising viral particles as disclosed herein is administered only once. In some embodiments, a subject may need more than 1 administration of a viral composition (e.g, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more times). For example, a subject may need to be provided a second administration of any one of the viral compositions as disclosed herein 1 day, I week, I month, 1 year, 2 years, 5 years, or 10 years after the subject was administered a first composition. In some embodiments, a first composition of viral particles is different from the second composition of viral particles.
In some embodiments, the administration of the composition is repeated at least once (e.g., at least once, at least twice, at least thrice, at least four times, at least five times, at least six times, at least 10 times, at least 25 times, or at least 50 times), and wherein the time between a repeated administration and a previous administration is at least 1 month (e.g., at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 5 months, at least 6 months, or at least 12 months). In some embodiments, the administration of the composition is repeated at least once, and wherein the time between a repeated administration and a previous administration is at least 1 year (e.g., at least 1, at least 2, at least 3, at least 4, at least 5, at least 10, or at least 20 years).
In some embodiments, the administration of the composition is facilitated by AAV capsids such as AAV1-9, e.g., with AAV2 ITRs, or other capsids that sufficiently deliver to affected tissues.
Additional AAV vectors are described in International Patent Application Publication No. WO 2019/2071632, the content of which is incorporated by reference herein.
Further AAV' vectors are described in International Patent Application Publication Nos. WO 2020/086881 and WO 2020/235543, the contents of each of which are incorporated by reference herein.
Further AAV vectors are described in International Patent Application Publication Nos. WO 2005/033321; WO 2006/110689; WO 2007/127264; WO 2008/027084; WO 2009/073103;
WO 2009/073104; WO 2009/105084; WO 2009/134681, WO 2009/136977; WO 2010/051367;
WO 2010/138675; WO 2001/038187; WO 2012/112832; WO 2015/054653; WO 2016/179496; WO 2017/100791 , WO 2017/019994; WO 2018/209154; WO 2019/067982; WO 2019/195701 ; WO 2019/217911; WO 2020/041498; WO 2020/210839; U.S. Patent No. 7,906, 111; U.S. Patent No. 9,737,618; U.S. Patent No.10,265,417; U.S. Patent No. 10,485,883; U.S. Patent No. 10,695,441; U.S. Patent No. 10,722,598; U.S. Patent No. 8,999,678; U.S. Patent No.10,301,648; U.S. Patent No. 10,626,415; U.S. Patent No. 9,198,984; U.S. Patent No. 10,155,931; U.S. Patent No. 8,524,219; U.S. Patent No. 9,206,238; U.S. Patent No. 8,685,387; U.S. Patent No. 9,359,618; U.S. Patent No. 8,231,880; U.S. Patent No. 8,470,310; U.S. Patent No. 9,597,363; U.S. Patent No. 8,940,290; U.S. Patent No. 9,593,346; U.S. Patent No. 10,501,757; U.S. Patent No. 10,786,568; U.S. Patent No. 10,973,928; U.S. Patent No. 10,519,198; U.S. Patent No. 8,846,031 , U.S. Patent No. 9,617,561 , U.S. Patent No. 9,884,071; U.S. Patent No. 10,406,173; U.S. Patent No. 9,596,220; U.S. Patent No. 9,719,010; U.S. Patent No. 10,117,125; U.S. Patent No. 10,526,584; U.S. Patent No. 10,881,548; U.S. Patent No. 10,738,087; U.S. Patent Publication No. 2011-023353; U.S. Patent Publication No. 2019-0015527; U.S. Patent Publication No. 2020-155704; U.S. Patent Publication No 2017-0191079, U.S. Patent Publication No. 2019-0218574; U.S. Patent Publication No. 2020-0208176; U.S. Patent Publication No. 2020-0325491; U.S. Patent Publication No. 2019-0055523; U.S. Patent Publication No. 2020-0385689; U.S. Patent Publication No. 2009-0317417; U.S. Patent Publication No. 2016-0051603; U.S. Patent Publication No. 2016-00244783; U.S. Patent Publication No. 2017-0183636; U.S. Patent Publication No. 2020-0263201; U.S. Patent Publication No. 2020-0101099; U.S. Patent Publication No. 2020-0318082; U.S. Patent Publication No. 2018-0369414; U.S. Patent Publication No. 2019-0330278; U.S. Patent Publication No. 2020-0231986, the contents of each of which are incorporated by reference herein ,
F. Subjects
Aspects of the disclosure relate to methods for use with a subject (e.g., a mammal). In some embodiments, a mammalian subject is a human, a non-human primate, or other mammalian subject. In some embodiments, the subject has one or more mutations associated with aberrant intron and/or alternative splicing.
In some embodiments, a subject suffers from or is at risk of developing a disease or condition associated with aberrant splice regulation resulting in one or more symptoms of a disease or condition. Non-limiting examples of these diseases/conditions include instances in which the homeostasis of RNA binding proteins is altered (e.g., other repeat expansion diseases), or diseases/conditions in which there are mutations in RNA binding protein sequences. In some embodiments, the disease or condition is selected from: a repeat expansion disease, a laminopathy, a cardiomyopathy, a muscular dystrophy, a neurodegenerative disease, a cancer, an intellectual disability, and/or premature aging.
In a non-limiting example, compositions of this application are administered to a subject resulting in regulated overexpression of the RNA binding protein exhibiting aberrant activity. In another non-limiting example, compositions of this application are administered to a subject resulting in the regulated addition of additional non-mutated, non-aberrant RNA binding protein(s).
In some embodiments, the disease or condition is selected from the group consisting of: Dentatorubral-pallido-luysian atrophy (DRPL A ), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDI.,2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SC A3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SCA8), spinocerebellar ataxia type 10 (SCAIO), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCA17), Syndromic / non-syndromic X- linked mental retardation, Emery-Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Marie-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Werner syndrome, Dilated cardiomyopathy (DCM), Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (AR AT)), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limbgirdle muscular dystrophy, Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy. Distal muscular dystrophy, Emery-Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (MND), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PMA), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader-Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronuclear myopathy .
Non-limiting examples of symptoms of these diseases/conditions include neurodevelopmental, neurofunctional, or neurodegenerative changes (e.g., ALS, FTD, Spinocerebellar Ataxias, FXTAS, or Huntington’s Disease symptoms) or abnormal proliferation or migration of cells (e.g, as in cancer). For example, myotonic dystrophy type 1 and type 2 (dystrophia myotonica, DM1 and DM2, respectively) are caused by expanded CTG repeats in the DMPK gene and CCTG repeats in the CNBP gene, respectively. Both diseases are highly multi- systemic with symptoms in skeletal muscles, cardiac tissue, gastrointestinal tract, endocrine system, and central nervous system, among others.
In some aspects, the present disclosure relates to methods and compositions that are useful for treating myotonic dystrophy type 1 and type 2 (dystrophia myotonica, DM1 and DM2, respectively), for example by delivering viral particles comprising viral constructs (e.g, containing one or more alternative spicing cassettes) to cells or tissue in a subject. In addition to the symptoms described above, DM1 can also manifest in a severe form called congenital DM1, in which profound developmental delays occur. A 25% chance of death before the age of 18 months and 50% chance of survival into mid-30s has been reported. Methods and compositions of the application can be useful to treat, alleviate, or otherwise improve one or more symptoms of DM1.
Accordingly, in some embodiments one or more viral constructs can be delivered to a subject having one or more symptoms of myotonic dystrophy. Such symptoms may include, but are not. limited to, delayed muscle relaxation, muscle weakness, prolonged involuntary muscle contraction, loss of muscle, abnormal heart rhythm, cataracts, or difficulty swallowing. In some embodiments, a viral composition provided herein is administered to a subject having congenital DM1 or DM2. In some embodiments, the viral constructs treat, alleviate, ameliorate, or otherwise improve one or more symptoms associated with DM4 and/or DM2. In some embodiments, the viral constructs reduce muscle weakness, reduce muscle loss, reduce muscle wasting, reduce prolonged muscle contractions, improve speech, and/or improve swallowing in a subject. In some embodiments, treatment reduces or corrects one or more other symptoms of myotonic dystrophy.
In some embodiments, splicing of a recombinant intron and/or an alternatively-spliced exon is sufficiently regulated to be therapeutically effective.
G. Enumerated embodiments
Certain embodiments are set forth in the enumerated clauses below.
Clause 1. A recombinant viral genome for delivering a transgene, wherein said genome comprises at least one alternatively-spliced exon cassete comprising at least one alternatively- spliced exon, at least one flanking intron, and a coding region of the transgene.
Clause 2. The viral genome of clause 1, wherein the alternatively-spliced exon is retained in the spliced transcript.
Clause 3. The viral genome of clause 1 or clause 2, wherein the alternatively-spliced exon cassette further comprises at least one constitutive exon.
Clause 4. The viral genome of any preceding clause, w'herein the alternatively -spliced exon cassette comprises one flanking intron.
Clause 5. The viral genome of clause 4, wherein the flanking intron is located 3’ or 5’ to the alternatively-spli ced exon.
Clause 6. The viral genome of any one of clauses 1-3, wherein the alternatively-spliced exon cassette comprises two flanking introns. Clause 7. The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises at least one modification, relative to a naturally occurring alternatively-spliced exon.
Clause 8. The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon.
Clause 9. The viral genome of clause 8, wherein all native start codons located 5’ to the heterologous start codon are disrupted or deleted.
Clause 10. The viral genome of any preceding clause, wherein the alternatively-spliced exon is located 5’ to the coding region of the transgene.
Clause 11. The viral genome of any one of clauses 1-7, wherein the alternatively-spliced exon cassette comprises two alternatively-spliced exons, each with flanking introns.
Clause 12. The viral genome of clause 11, wherein the two alternatively-spliced exons are adjacent.
Clause 13. The viral genome of clause 11 or clause 12, wherein the constitutive exon is located 5’ to the two alternatively-spliced exons.
Clause 14. The viral genome of any one of clauses 11-13, wherein each alternatively-spliced exon comprises at its 3’ end a heterologous start codon or part of a heterologous start codon.
Clause 15. The viral genome of clause 14, w'herein all native start codons located 5’ to the heterologous start codon of the 5’-most alternatively-spliced exon are disrupted or deleted.
Clause 16. The viral genome of any one of clauses 11-15, w'herein only one of the two alternatively-spliced exons is retained in the spliced transcript. Clause 17. The viral genome of any one of clauses 11-16, wherein the 5’-most alternatively- spliced exon is retained in the spliced transcript.
Clause 18. The viral genome of any one of clauses 11-16, wherein the 3 ’-most alternatively- spliced exon is retained in the spliced transcript.
Clause 19. The viral genome of any preceding clause, wherein the alternatively-spliced exon(s) and flanking intron(s) are located within the coding region of the transgene.
Clause 20. The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises a heterologous, in-frame stop codon.
Clause 21. The viral genome of clause 20, wherein the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
Clause 22. The viral genome of clause 20 or clause 21, wherein the heterologous stop codon elicits nonsense-mediated decay.
Clause 23. The viral genome of any preceding clause, wherein the alternatively-spliced exon is retained in the spliced transcript in distinct tissues or in distinct cell types.
Clause 24. The viral genome of any preceding clause, wherein the alternatively-spliced exon is retained in the spliced transcript in the presence of activated T cells, and/or in states of inflammation.
Clause 25. The viral genome of any preceding clause, wherein the alternatively -spliced exon is retained in the spliced transcript in cells exhibiting one or more signs or symptoms of a disease state, and/or in cells exhibiting non-homeostatic levels of the protein encoded by the natural gene comprising the transgene. Clause 26. The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises an alternatively-spliced exon from a gene selected from the group consisting of ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, E1F4A2, EIF4G2, EIF4FI, EXOC7, EZH2, FAM120A, FAM136A, F AM36A. FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIFIA, HMMR, HRB, IKZFl, ILF3, IRAK4, IRFL KCTD13, LEF1, LUC7L, LYRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC16.A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TROVE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, C.AMK2B, PKP2, LGMN, NRAP, VPS39, KSR1 , PDLIM3, BINI, ARFGAP2, KIF13A, and/or PIC ALM.
Clause 27. The viral genome of any preceding clause, wherein the alternatively-spliced exon comprises an alternatively-spliced exon comprising a polynucleotide sequence as set forth in anyone of SEQ ID AOs. 23-44.
Clause 28. The viral genome of any preceding clause, wherein the flanking intron(s) is a native flanking intron(s) of the alternatively-spliced exon(s).
Clause 29. The viral genome of any preceding clause, wherein the flanking intron(s) comprises at its 5’ end a 5’ splice donor site.
Clause 30. The viral genome of any preceding clause, wherein the flanking intron(s) comprises at its 3’ end a 3’ splice donor site.
Clause 31. The viral genome of any preceding clause, wherein the flanking intron(s) comprises no modifications, relative to a naturally occurring intron. Clause 32. The viral genome of any one of clauses 1-31, wherein the flanking intron(s) comprises at least one modification, relative to a naturally occurring intron.
Clause 33. The viral genome of clause 32, wherein the modification is a substitution or deletion of one or more nucleotides.
Clause 34. The viral genome of any preceding clause, wherein the flanking intron(s) is a regulated intron.
Clause 35. The viral genome of any preceding clause, wherein the flanking intron(s) comprises an intron from a gene selected from the group consisting of ABCC1, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE 1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM120A, FAM136A, F AM36A. FARSB, FBXO38, FGFR1OP2, FIPIL1, FOXRED L FUBP3, GALT, GAT A3, GOLGA2, HIF1 A, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEF1, LUC7L, LYRM1 , MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC 16 A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TR0VE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM.
Clause 36. The viral genome of any preceding clause, wherein the flanking intron(s) comprises an intron comprising a polynucleotide sequence as set forth in any one of SEQ ID NOs: 1-22, 103, and 104.
Clause 37. The viral genome of any one of clauses 3-36, wherein the constitutive exon is a native exon of the transgene. Clause 38. The viral genome of any one of clauses 3-36, wherein the constitutive exon is not a native exon of the transgene.
Clause 39. The viral genome of any one of clauses 3-38, wherein the constitutive exon is from the same gene as the alternatively-spliced exon(s).
Clause 40. The viral genome of clause 39, wherein the gene is the transgene.
Clause 41. The viral genome of any one of clauses 3-38, wherein the constitutive exon is not from the same gene as the alternatively-spliced exon(s).
Clause 42. The viral genome of any one of clauses 39-41, wherein the gene is a gene selected from the group consisting of MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1, hnRNP C, hnRNP D, hnRNP DL, hnRNP F, hnRNP I I, hnRNP l< , hnRNP L, hnRNP M, hnRNP R hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSRI, MATR3, TIA1, FMRP, VI T. VI I .
MTMR2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI , RYR1, NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, LMOD3, MYPN, SEPN1, TTN, SPEG, MYH7, TK2, POLG1, GAA, AGL, PYGVI, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcoglycan-encoding gene, a delta-sarcoglycan-encoding gene, TCAP, TRIM32, FKRP, FXN, POMT1, FKTN, POMT2, POMGnTI, DAG 1 , ANO5, PL.EC1 , TRAPPCI 1 , GMPPB, ISPD, LIMS2, POPDC1, TORI AIP1, POGLUT2, LAMA2, COL6A1, POMT1, POMT2, DUX4, EMD, PAX7, PMP22, VI PZ. MFN2, SMCHD1, SAIN, Lamin A/C (LAMN), GJB1, ABCCI, AK125149, ASCC2, BAT2D1, BBX, BRD8, BRE, C17orf70, CAMKK2, CBFB, CCAR1, CCDC7CD6, CHTF8, COL4A3BP, COL6A3, CUGBP1, CUGBP2, CXorf45, DENND3, DGUOK, DKFZp762G094, DNAJC7, DNASE1, EIF4A2, EIF4G2, EIF4H, EXOC7, EZH2, FAM120A, FAM136A, FAM36A, FARSB, FBXO38, FGFR1OP2, FIP1L1, FOXRED1, FUBP3, GALT, GATA3, GOLGA2, HIFIA, HMMR, HRB, IKZF1, ILF3, IRAK4, IRF1, KCTD13, LEFI, LUC7L, L YRM1, MALT1 e7, MAP2K7, MAP3K7, MAP4K2, MBNL2, MFF, NAE1, NCSTN, NR4A3, NRF1, NUP98, PARP6, PCM1, PLAUR, PLSCR3, PPIL5, PPP5C, PTPRC-E4, PTPRC-E6, PTS, RABL5, RAPH1, SEC16A, SFRS3, SFRS7, SLMAP, SNRNP70, STAT6, TBC1D1, TIMM8B, TIR8, TRA2A, TR0VE2, UGCGL1, VAP-B, VAV1, ZNF384, ZNF496, CAMK2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF 13 A, and/or PIC ALM.
Clause 43. The viral genome of any preceding clause, further comprising a promoter.
Clause 44. The viral genome of clause 43, wherein the promoter is a native promoter of the transgene.
Clause 45. The viral genome of clause 43, wherein the promoter is not a native promoter of the transgene.
Clause 46. The viral genome of any one of clauses 43-45, wherein the promoter is constitutive.
Clause 47. The viral genome of any one of clauses 43-45, wherein the promoter is inducible.
Clause 48. The viral genome of any one of clauses 43-47, wherein the promoter is a tissuespecific promoter.
Clause 49. The viral genome of any one of clauses 43-48, wherein the promoter is selected from the group consisting of an EFl alpha promoter, beta actin promoter, CMV, muscle creatine kinase promoter, C5-12 muscle promoter, MHCK7, CBh, synapsin, MECP2, enolase, GFAP, Desmin, and CAG promoter.
Clause 50. The viral genome of any one of clauses 43-49, wherein the promoter drives expression of the transgene. Clause 51. The viral genome of any one of clauses 1-50, wherein the coding region of the transgene comprises at least one modification, relative to a coding region of a naturally occurring gene.
Clause 52. The viral genome of clause 51, wherein the modification is a substitution or deletion of at least one nucleotide.
Clause 53. The viral genome of clause 51 or clause 52, wherein the coding region of the transgene comprises a deletion of a native start codon, or a portion thereof.
Clause 54. The viral genome of any preceding clause, wherein the transgene comprises one or more recombinant introns.
Clause 55. The viral genome of any one of clauses 51-54, wherein the naturally occurring gene is a gene selected from the group consisting of MBNL1, MBNL2, MBNL3, hnRNP Al, hnRNP A2B1 , hnRNP C, hnRNP D, hnRNP DL, hnRNP F, hnRNP H, hnRNP K, hnRNP L hnRNP VI. hnRNP R, hnRNP U, FUS, TDP43, PABPN1, ATXN2, TAF15, EWSR1, MATR3, TIA1, FMRP, MTM1, MTMR2, LAMP2, KIF5A, a microdystrophin-encoding gene, C9ORF72, HTT, DNM2, BINI, RYR1 , NEB, ACTA, TPM3, TPM2, TNNT2, CFL2, KBTBD13, KLHL40, KLHL41, LM0D3, MYPN, SEPN1, TIN, SPEG, MYH7, TK2, POLG1, GAA, AGL, PYGM, SLC22A5, OCTN2, ETF, ETFH, PNPLA2, a cytochrome b oxidase-encoding gene, a cytochrome c oxidase-encoding gene, CLCN1, SCN4A, DMPK, CNBP, MYOT, LMNA, CAV3, DNAJB6, DES, TNPO3, HNRPDL, CAPN3, DYSF, an alpha-sarcoglycan-encoding gene, a beta-sarcoglycan-encoding gene, a gamma-sarcoglycan-encoding gene, a delta-sarcoglycan- encoding gene, TCAP, TRI M32, FKRP, FXN, POMT1, FKTN, POMT2, POMGnTI, DAG1, ANO5, PLEC1, TRAPPCI 1, GMPPB, ISPD, LIMS2, POPDC1 , TOR1 AIP1 , POGLUT2, LAMA2, COL6A1, POMT1, POMT2, DUX4, ENID, PAX7, PMP22, MPZ, MFN2, SMCHD1, SMN, Lamin A/C (LAMN), and/or GJ Bl .
Clause 56. The viral genome of any preceding clause, wherein the viral genome is a genome from a recombinant adeno-associated vims (rAAV), lentivirus, retrovirus, or foamyvims. Clause 57. The viral genome of clause 56, wherein the viral genome is from an rAAV.
Clause 58. The viral genome of clause 56 or clause 57, wherein the transgene is flanked by¬
AAV inverted terminal repeat (ITR) sequences.
Clause 59. The viral genome of clause 58, wherein the ITR sequences comprise AAV1, AAV2, AAV5, AAV7, AAV8, or AAV9 ITR sequences.
Clause 60. The viral genome of clause 56, wherein the viral genome is from a lentivirus.
Clause 61. The viral genome of clause 60, wherein the alternatively -spliced exon cassette is located on the minus strand of the lentivirus genome.
Clause 62. The viral genome of any preceding clause, further comprising a 3’ untranslated region (UTR) that is endogenous or exogenous to the transgene.
Clause 63. The viral genome of clause 62, wherein the exogenous 3’ UTR is the 3’ UTR from bovine growth hormone, SV40, EBV, or Myc.
Clause 64. A viral particle comprising a viral genome according to any preceding clause.
Clause 65. The viral particle of clause 64, wherein the viral particle is an rAAV particle.
Clause 66. The viral particle of clause 65, wherein the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.
Clause 67. The viral particle of clause 65, wherein the rAAV particle comprises AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b,
AAVrh.32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CH1-P6, AAV2.5, AAV6.2, AAV218, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, AAV- HAE1/2, AAV clone 32/83, A.AVShHI O, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
Clause 68. The viral particle of any one of clauses 64-67, further comprising at least one helper plasmid.
Clause 69. The viral particle of clause 68, wherein the helper plasmid comprises a rep gene and a cap gene.
Clause 70. The viral particle of clause 69, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40.
Clause 71. The viral particle of clause 69 or clause 70, wherein the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
Clause 72. The viral particle of any one of clauses 68-71, wherein the viral particle comprises two helper plasmids.
Clause 73. The viral particle of clause 72, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
Clause 74. The viral particle of clause 64, wherein the viral particle is a recombinant lentivirus particle.
Clause 75. The viral particle of clause 74, wherein the lentivirus is a human immunodeficiency virus (HIV1 or HIV2), a feline immunodeficiency virus (FIV), a bovine immunodeficiency virus (BIV), a caprine arthritis encephalitis virus, an equine infectious anemia virus, a jembrana disease virus, a puma lentivirus, aimian immunodeficiency virus, or a visna- maedi virus. Clause 76. The viral particle of clause /4 or clause 75, further comprising a viral envelope.
Clause 77. A method of treating a. disease or condition in a subject comprising administering a viral genome according to any one of clauses 1-63 or a viral particle according to any one of clauses 64-76 to the subject.
Clause 78. The method of clause 77, wherein the subject is a mammal.
Clause 79. The method of clause 78, wherein the mammal is a human.
Clause 80. The method of any one of clauses 77-79, wherein the viral genome or viral particle is administered to the subject at least one time.
Clause 81. The method of clause 80, wherein the viral genome or viral particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
Clause 82. The method of any one of clauses 77-81, wherein the viral genome or viral particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intracisternally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
Clause 83. The method of any one of clauses 77-82, wherein the viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravi treat injection.
Clause 84. The method of any one of clauses 77-83, wherein the disease or condition is a disease or condition selected from the group consisting of Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 ( HDL2 ), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type 1, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SCA8), spinocerebellar ataxia type 10 (SCA10), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCA17), Syndromic / non-syndromic X-linked mental retardation, Emery -Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Marie-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Werner syndrome. Dilated cardiomyopathy (DCM), Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (ARVD), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limb-girdle muscular dystrophy, Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy. Distal muscular dystrophy, Emery -Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (MND), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PMA), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader- Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity I, myotubular myopathy, Danon Disease, and/or centronucl ear myopathy .
Clause 85. A method of regulating transgene expression using a viral vector comprising a viral genome, the method comprising:
(a) inserting into the viral genome at least one alternatively -spliced exon cassette, wherein the alternatively-spliced exon cassette comprises a constitutive exon, at least one alternatively-spliced exon, at least one flanking intron, and a coding region of a transgene;
(b) introducing a heterologous start codon or part of a heterologous start codon at the 3’ end of the alternatively-spliced exon;
(c) disrupting or deleting all native start codons located 5’ to the heterologous start codon; and
(d) deleting a native start codon, or a portion thereof, from the coding region of the transgene, wherein the constitutive exon, alternatively-spliced exon, and flanking intron are each located 5’ to the coding region of the transgene.
Clause 86. A method of regulating transgene expression using a viral vector comprising a viral genome, the method comprising:
(a) inserting into the viral genome at least one alternatively-spliced exon cassette, wherein the alternatively-spliced exon cassette comprises an alternatively-spliced exon and at least one flanking intron within the coding region of the transgene; and
(b) introducing into the alternatively -spliced exon a heterologous, in-frame stop codon at least 50 nucleotides upstream of the next 5’ splice junction, wherein the heterologous, in-frame stop codon elicits nonsense-mediated decay.
Clause 87. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first, intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(v) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon, wherein all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Clause 88. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3: orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon;
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(v) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation.
Clause 89. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative czlv-acting element,
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Clause 90. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; and
(iii) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon, wherein all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Clause 91. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon;
(iii) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation; (iv) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
Clause 92. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative c/s-acting element; and
(iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
Clause 93. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first exonic sequence having a S’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon;
(iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(iv) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon, wherein all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Clause 94. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction: (i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon;
(iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(iv) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation.
Clause 95. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively -spliced exon comprising a positive or negative cA-acting element,
(iii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Clause 96. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(1) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon; and
(iv) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon, wherein all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted.
Clause 97. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon;
(iv) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation;
(v) a nucleotide sequence comprising a second intronic sequence having a S’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(vi) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
Clause 98. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation; (ii) a nucleotide sequence comprising an intronic sequence having a 5’ to 3’ orientation, wherein the intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising an exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises an alternatively-spliced exon comprising a positive or negative c/ri-act ing element; and
(iv) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the exonic sequence comprises a constitutive exon.
Clause 99. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous ATG start codon;
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(v) a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation, wherein the third exonic sequence comprises an alternatively-spliced exon;
(vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(vii) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation, wherein the coding region of the transgene comprises at its 5’ end a modification comprising the removal of a native ATG start codon, wherein all native ATG start codons located upstream of the heterologous ATG start codon are mutated or deleted. Clause 100. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon,
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon;
(vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3: orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site (m); and
(vii) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation.
Clause 101. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation (e), wherein the first exonic sequence comprises a first alternatively-spliced exon comprising a positive or negative crt-acting element; (iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a second alternatively-spliced exon;
(vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(vii) a nucleotide sequence comprising a third exonic sequence having a 5’ to 3’ orientation, wherein the third exonic sequence comprises a constitutive exon.
Clause 102. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises a constitutive exon;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first, intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3’ orientation;
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3 ’ end a 3 ’ splice acceptor site; and
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises an alternatively-spliced exon.
Clause 103. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(i) a nucleotide sequence comprising a first portion of a coding region of a transgene having a 5’ to 3’ orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first, intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; (iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising at its 3’ end a heterologous stop codon;
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon;
(vi) a nucleotide sequence comprising a third intronic sequence having a 5’ to 3’ orientation, wherein the third intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(vii) a nucleotide sequence comprising a second portion of the coding region of the transgene having a 5’ to 3’ orientation.
Clause 104. An alternatively-spliced exon cassette comprising, in the 5’ to 3’ direction:
(1) a nucleotide sequence comprising a coding region of a transgene having a 5’ to 3: orientation;
(ii) a nucleotide sequence comprising a first intronic sequence having a 5’ to 3’ orientation, wherein the first intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site;
(iii) a nucleotide sequence comprising a first exonic sequence having a 5’ to 3’ orientation, wherein the first exonic sequence comprises an alternatively-spliced exon comprising a positive or negative cA-acting element;
(iv) a nucleotide sequence comprising a second intronic sequence having a 5’ to 3’ orientation, wherein the second intronic sequence comprises at its 5’ end a 5’ splice donor site and at its 3’ end a 3’ splice acceptor site; and
(v) a nucleotide sequence comprising a second exonic sequence having a 5’ to 3’ orientation, wherein the second exonic sequence comprises a constitutive exon.
Clause 105. A transgene comprising:
(i) a constitutive exon and one or more intronic sequences, each from a first gene; (ii) an alternatively-spliced exon cassete, wherein the alternatively-spliced exon cassette comprises:
(a) an alternatively-spliced exon, and
(b) flanking intronic sequences, wherein each of (a) and (b) are from a second gene, and
(iii) a coding region of interest from a third gene, wherein the alternatively-spliced exon comprises an ATG start codon.
Clause 106. The transgene of clause 105, wherein the first and second gene are the same gene; the first and third gene are the same gene; or all of the first, second, and third genes are the same gene.
Clause 107. The transgene of clause 105 or clause 106, wherein the first gene is survival motor neuron 1 (SMN1).
Clause 108. The transgene of any one of clauses 105-107, wherein the constitutive exon comprises exon 6 of SMN1 , or a portion thereof.
Clause 109. The transgene of any one of clauses 105-108, wherein the constitutive exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 102.
Clause 110. The transgene of any one of clauses 105-109, wherein the constitutive exon comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 102.
Clause 111. The transgene of any one of clauses 105-110, wherein the one or more intronic sequences of (i) are or are derived from intron 6 and/or intron 7 of SMN1.
Clause 112. The transgene of any one of clauses 105-111, wherein the one or more intronic sequences of (i) comprise(s) a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104.
Clause 113. The transgene of any one of clauses 105-112, wherein the one or more intronic sequences of (i) comprise(s) a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 103 and/or SEQ ID NO: 104.
Clause 114. The transgene of any one of clauses 105-113, wherein the second gene is a gene selected from the group consisting of: CAMK.2B, PKP2, LGMN, NRAP, VPS39, KSR1, PDLIM3, BINI, ARFGAP2, KIF13A, and/or PICALM.
Clause 1 15. The transgene of any one of clauses 105-114, w'herein the second gene is bridging integrator 1 (BINI).
Clause 116. The transgene of any one of clauses 105-115, wherein the alternatively-spliced exon comprises exon 11 of BINI .
Clause 117. The transgene of any one of clauses 105-116, wherein the alternatively-spliced exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38.
Clause 118. The transgene of any one of clauses 105-117, wherein the alternatively-spliced exon comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 37 or SEQ ID NO: 38.
Clause 119. The transgene of any one of clauses 105-118, wherein the flanking intronic sequences of (ii) are or are derived from intron 10 and/or intron 11 of BIN 1.
Clause 120. The transgene of any one of clauses 105-119, wherein the flanking intronic sequences of (ii) each comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16.
Clause 121. The transgene of any one of clauses 105-120, wherein the flanking intronic sequences of (ii) each comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 15 or SEQ ID NO: 16.
Clause 122. The transgene of any one of clauses 105-121, wherein the alternatively-spliced exon cassette comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
Clause 123. The transgene of any one of clauses 105-122, wherein the alternatively-spliced exon cassette comprises a polynucleotide having a nucleic acid sequence as set forth in any one of SEQ ID NOs: 107-778.
Clause 124. The transgene of any one of clauses 105-123, wherein the third gene is myotubularin 1 (MTM1) or calpain 3 (CAPN3).
Clause 125. The transgene of any one of clauses 105-124, wherein the coding region of interest comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
Clause 126. The transgene of any one of clauses 105-125, wherein the coding region of interest comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 1881 or SEQ ID NO: 1882.
Clause 127. The transgene of any one of clauses 105-126, wherein, if the wild-type alternatively-spliced exon does not comprise an ATG start codon, the alternatively-spliced exon comprises 1-3 nucleic acid substitutions, relative to the wild-type alternatively-spliced exon, to form the ATG start codon within the alternatively-spliced exon.
Clause 128. The transgene of clause 127, wherein the ATG start codon is formed in the alternatively-spliced exon by 1 nucleic acid substitution.
Clause 129. The transgene of clause 127, wherein the ATG start codon is formed in the alternatively-spliced exon by 2 nucleic acid substitutions.
Clause 130. The transgene of clause 127, wherein the ATG start codon is formed in the alternatively-spliced exon by 3 nucleic acid substitutions.
Clause 131. The transgene of any one of clauses 105-130, wherein the alternatively-spliced exon is retained in the spliced transcript.
Clause 132. The transgene of any one of clauses 105-131, wherein all native start codons located 5’ to the ATG start codon located within the alternatively-spliced exon are disrupted or deleted.
Clause 133. The transgene of any one of clauses 105-132, wherein the alternatively-spliced exon cassette is located 5’, relative to the coding region of interest.
Clause 134. The transgene of any one of clauses 105-133, wherein the constitutive exon is located 5’, relative to the alternatively-spliced exon cassette.
Clause 135. The transgene of any one of clauses 105-134, wherein the one or more intronic sequences of (i) flank the alternatively-spliced exon cassette.
Clause 136. The transgene of any one of clauses 105-135, wherein the alternatively-spliced exon comprises a heterologous, in-frame stop codon. Clause 137. The transgene of clause 136, wherein the heterologous, in-frame stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
Clause 138. The transgene of clause 136, wherein the heterologous, in-frame stop codon elicits nonsense-mediated decay.
Clause 139. The transgene of any one of clauses 105-138, wherein the alternatively-spliced exon is retained in the spliced transcript in distinct tissues.
Clause 140. The transgene of clause 139, wherein the alternatively-spliced exon is retained in the spliced transcript in skeletal muscle and/or wherein the alternatively -spliced exon is not retained in the spliced transcript in heart and/or liver tissue.
Clause 141. The transgene of any one of clauses 105-140, wherein the flanking intronic sequences of (ii)(b) are or are derived from native flanking introns of the alternatively-spliced exon.
Clause 142. The transgene of any one of clauses 105-141, wherein the flanking intronic sequences of (ii)(b) each comprise at least one modification, relative to a naturally occurring intronic sequence.
Clause 143. The transgene of clause 142, wherein the modification is a substitution or deletion of one or more nucleic acids.
Clause 145. The transgene of any one of clauses 105-143, wherein the ATG start codon is located at the 3’ end of the alternatively-spliced exon.
Clause 145. The transgene of clause 144, wherein, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end, the first 10 nucleotides of the flanking intronic sequence which is immediately 3’ to the alternatively-spliced exon comprise 1-5 nucleotide substitutions, relative to the wild-type flanking intronic sequence which is immediately 3’ to the wild-type alternatively-spliced exon.
Clause 146. The transgene of any one of clauses 105-145, wherein the one or more intronic sequences of (i) each comprise at least one modification, relative to a naturally occurring intronic sequence.
Clause 147. The transgene of clause 146, wherein the modification is a substitution or deletion of one or more nucleic acids.
Clause 148. The transgene of any one of clauses 105-147, wherein the coding region of interest comprises at least one modification, relative to a naturally occurring coding region of the third gene.
Clause 149. The transgene of clause 148, wherein the modification is a substitution or deletion of one or more nucleic acids.
Clause 150. The transgene of clause 148, wherein the coding region of interest comprises a deletion or disruption of a native start codon.
Clause 151. The transgene of clause 148, wherein the coding region of interest comprises at least one heterologous stop codon.
Clause 152. The transgene of clause 151, wherein the at least one heterologous stop codon is at least 50 nucleotides upstream of the next 5’ splice junction.
Clause 153. The transgene of clause 151, wherein the at least one heterologous stop codon elicits nonsense-mediated decay.
Clause 154. The transgene of any one of clauses 105-153, further comprising a 3’ untranslated region (UTR). Clause 155. The transgene of clause 154, wherein the 3’ CTR comprises a polyadenylation (pA) site and a cleavage site.
Clause 156. The transgene of clause 155, wherein the polyadenylation site is an SV40 pA site.
Clause 157. The transgene of any one of cl auses 105-156, further comprising a promoter, wherein the promoter is located 5’, relative to all of (i), (ii), and (iii).
Clause 158. The transgene of clause 157, wherein the promoter is a tissue-specific promoter.
Clause 159. The transgene of clause 158, wherein the tissue-specific promoter is an MHCK7 promoter.
Clause 160. The transgene of any one of clauses 105-159, wherein the alternatively-spliced exon cassette comprises a nucleic acid sequence which is 450 to 650 nucleotides in length.
Clause 161. A recombinant viral genome comprising the transgene of any one of clauses 105- 160.
Clause 162. The recombinant viral genome of clause 161, wherein the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
Clause 163. The recombinant viral genome of clause 162, wherein the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
Clause 164. The recombinant viral genome of clause 163, wherein the AAV ITR sequences are AAV2 ITR sequences.
Clause 165. The recombinant viral genome of any one of clauses 161-164, wherein the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
Clause 166. The recombinant viral genome of any one of clauses 161-165, wherein the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 105 or SEQ ID NO: 106.
Clause 167. An rAAV particle comprising a recombinant viral genome according to any one of clauses 161-166.
Clause 168. The rAAV particle of clause 167, wherein the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV218, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F /Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHlO, AAV2 (Y~>F), AAV8 (Y733F), AAV2. I 5, AAV2.4, AAVM4I, and AAVr3.45.
Clause 169. The rAAV particle of clause 167 or clause 168, further comprising at least one helper plasmid.
Clause 170. The rAAV particle of clause 169, wherein the helper plasmid comprises a rep gene and a cap gene.
Clause 171. The rAAV particle of clause 170, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1 , VP2, and/or VP3 region of the viral capsid protein.
Clause 172. The rAAV particle of clause 169, wherein the rAAV particle comprises two helper plasmids. Clause 173. The rAAV particle of clause 172, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
Clause 174. A recombinant viral genome comprising a transgene, wherein the transgene comprises:
(i) a constitutive exon and one or more intronic sequences;
(ii) an alternative exon cassette comprising:
(a) an alternatively-spliced exon;
(b) at least a portion of the intron immediately upstream of the alternatively- spliced exon; and
(c) at least a portion of the intron immediately downstream of the alternatively-spliced exon, wherein, if the wild-type alternatively-spliced exon does not comprise an ATG start codon at its 3’ end:
(1) the 3’ end of the alternatively -spliced exon comprises 1-3 nucleic acid substitutions relative to the wild-type alternatively-spliced exon to form an ATG start codon, and
(2) the first 10 nucleotides of the intron immediately downstream of the alternatively-spliced exon comprise 1-5 nucleic acid substitutions relative to the wild-type intron immediately downstream of the wild-type alternatively-spliced exon; and
(iii) a coding region of interest.
Clause 175. The recombinant viral genome of clause 174, wherein the 1-5 nucleic acid substitutions of (2) increase splice site strength.
Clause 176. The recombinant viral genome of clause 174 or clause 175, wherein any wild-type start codons within the alternatively-spliced exon located upstream of the ATG start codon at the 3’ end of the alternatively-spliced exon are disrupted or deleted. Clause 177. The recombinant viral genome of any one of clauses 174-176, further comprising a tissue-specific promoter upstream of the alternative exon cassette.
Clause 178. The recombinant viral genome of any one of clauses 174-177, wherein the coding region of interest is or is derived from a naturally occurring coding region of MTM1 or CAPN3.
Clause 179. The recombinant viral genome of any one of clauses 174-178, wherein the tissuespecific promoter is an MHCK7 promoter.
Clause 180. The recombinant viral genome of any one of clauses 174-179, wherein the alternative exon is exon 11 of the BINI gene.
Clause 181. The recombinant viral genome of any one of clauses 174-180, wherein the constitutive exon is exon 6 of the SMN1 gene.
Clause 182. The recombinant viral genome of any one of clauses 174- 181, wherein the alternative exon cassette promotes skeletal muscle expression of the coding region of interest and reduces cardiac muscle expression of the coding region of interest.
Clause 183. The recombinant viral genome of any one of clauses 174-182, wherein the alternative exon cassette is approximately 600 nucleotides in length.
Clause 184. A method of treating a disease or condition in a subject comprising administering a recombinant viral genome according to any one of clauses 163-166 or 174-183, or an rAAV particle according to any one of clauses 167-173, to the subject.
Clause 185. The method of clause 184, wherein the subject is a mammal.
Clause 186. The method of clause 185, wherein the mammal is a human. Clause 187. The method of any one of clauses 184-186, wherein the recombinant viral genome or rAAV particle is administered to the subject at least one time.
Clause 188. The method of clause 187, wherein the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
Clause 189. The method of any one of clauses 184-188, wherein the viral genome or rAAV particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intraci sternal ly, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
Clause 190. The method of any one of clauses 184-189, wherein the viral genome or viral particle is administered to the subject by intravenous injection, intramuscular injection, intrathecal injection, or intravitreal injection.
Clause 191. The method of any one of clauses 184-190, wherein the disease or condition is a disease or condition selected from the group consisting of Dentatorubral-pallido-luysian atrophy (DRPLA), myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), Fragile X syndrome of mental retardation (FMRI), Fragile X tremor ataxia syndrome (FXTAS), FRAXE mental retardation (FMR2), Friedreichs ataxia (FRDA), Huntington disease (HD), Huntington disease-like 2 (HDL2), Oculopharyngeal muscular dystrophy (OPMD), Myoclonic epilepsy type I, Alzheimer’s disease, ALS/FTD, spinocerebellar ataxia type 1 (SCA1), spinocerebellar ataxia type 2 (SCA2), spinocerebellar ataxia type 3 (SCA3), spinocerebellar ataxia type 6 (SCA6), spinocerebellar ataxia type 7 (SCA7), spinocerebellar ataxia type 8 (SCA8), spinocerebellar ataxia type 10 (SC A 10), spinocerebellar ataxia type 12 (SCA12), spinocerebellar ataxia type 17 (SCA17), Syndromic / non-syndromic X-linked mental retardation, Emery-Dreifuss muscular dystrophy type 2, familial partial lipodystrophy, limb girdle muscular dystrophy type IB, dilated cardiomyopathy, familial partial lipodystrophy, Charcot-Marie-Tooth disorder type 2B1, mandibuloacral dysplasia, childhood progeria syndrome (Hutchinson-Gilford syndrome), Wemer syndrome, Dilated cardiomyopathy (DC VI). Hypertrophic cardiomyopathy (HCM), Restrictive cardiomyopathy (RCM), Left Ventricular Non-compaction (LVNC), Arrhythmogenic Right Ventricular Dysplasia (ARVD), takotsubo cardiomyopathy, Duchenne muscular dystrophy, Becker muscular dystrophy, Limb-girdle muscular dystrophy. Facioscapulohumeral muscular dystrophy, Congenital muscular dystrophy, Oculopharyngeal muscular dystrophy, Distal muscular dystrophy, Emery -Dreifuss muscular dystrophy, dementia, Parkinson's disease (PD), a PD-related disorder, Prion disease, a motor neuron disease (MND), Progressive bulbar palsy (PBP), Progressive muscular atrophy (PM A), Primary lateral sclerosis (PLS), Spinal muscular atrophy (SMA), a bladder cancer, a breast cancer, a colorectal cancer, a kidney cancer, a lung cancer, a lymphoma, a melanoma, an oral cancer, an ovarian cancer, an oropharyngeal cancer, a pancreatic cancer, a prostate cancer, a thyroid cancer, a uterine cancer, Down syndrome, Prader- Willi Syndrome (PWS), Bloom Syndrome, Cockayne Syndrome Type I -216400, Cockayne Syndrome Type III, Cockayne Syndrome Type I, Hutchinson-Gilford Progeria Syndrome, Mandibuloacral Dysplasia with Type A Lipodystrophy, Progeria, Adult Onset Progeroid Syndrome, Neonatal Rothmund-Thomson Syndrome, Seip Syndrome, Werner Syndrome, Replication Focus-Forming Activity 1, myotubular myopathy, Danon Disease, and/or centronucl ear myopathy .
Clause 192. The transgene of any one of clauses 105-160, wherein the ATG start codon is in the same reading frame as the coding region of interest.
Clause 193. The transgene of any one of clauses 105-160, wherein the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon.
Clause 194. The transgene of any one of clauses 105-160, wherein the ATG start codon is within up to 5, 10, 20, or 30 nucleotides upstream of the 3’ end of the alternative-spliced exon and is in the same reading frame as the coding region of interest.
These and other aspects of the application are illustrated by the folkwing non-limiting examples. EXAMPLES
Vitally-mediated gene therapies that seek to deliver a protein cargo commonly package a coding region of interest along with a 5’ untranslated region, 3’ untranslated region, a promoter that will drive the gene of interest, and, sometimes, a constitutive intron to enhance nuclear export and RNA stability. However, almost all multi-exonic human genes in the human genome (> 95%) are alternatively spliced such that multiple isoforms are generated from a single gene locus. These isoforms may exhibit distinct functions or expression patterns in different cellular conditions. Therefore, they comprise an important aspect of gene regulation and allow multiple species to be generated from a single locus.
There are many descriptions of tissue-specific exons in the literature; these types of data have been derived from microarray or RNAseq analyses of human tissues, or other conditions in which a perturbation is made and the transcriptome is profiled. The inclusion level of an exon is commonly described by “percent spliced in” (psi) and describes the percentage of mRNAs transcribed from a locus that are spliced to contain an alternatively-spliced exon of interest. For example, an exon that has a psi of 10% in a given tissue is included in the mature mRNA 10% of the time. Some examples of tissue-specific or tissue-biased exons include TPM1 exon 2 (<5% psi in heart but >95% psi in colon), or SLC25A3 exon 3 (>90% in heart but <5% in brain). Exons with a strong shift in psi between tissues are sometimes referred to as “switch-like” exons. Switchlike exons tend to exhibit greater phylogenetic conservation in their proximal introns, as compared to constitutively spliced exons or alternatively-spliced exons that do not exhibit switch-like behavior.
To date, tissue-specific alternative splicing regulation has not been used to control virally-mediated gene therapies, and there has been no straightforward method for how to do so. Described here are specific sequences that, may confer tissue-specific regulation for virally- mediated gene therapies (e.g., AAV; lentivirus). In some embodiments, the virus is an adeno- associated virus (AAV). In embodiments where the virus is an AAV, the orientation of the cargo is invariant. This is because the AAV ITRs are symmetric. In some embodiments, the virus is a lentivirus. In embodiments, where the virus is a lentivirus, a cargo with spliced introns must be placed on the minus strand. This is because lentivirus packaging undergoes an RNA intermediate, and the introns must not be lost. Examples 1-6 describe an AAV-mediated gene therapy, however it should be understood that either an AAV or a lentivirus may be utilized according the methods described in the Examples.
A. Example 1; Regulation of AAV cargo using skipped exon trio.
Alternatively-spliced exons and their flanking introns can be incorporated into AAV' cargoes by at least two distinct methods to confer similar tissue-specific behavior. Both approaches utilize a skipped exon “trio” where there are two flanking constitutive exons and the middle exon is alternative.
In the first approach, the exon trio is placed at the start of the AAV cargo and an ATG or part of an ATG translation start, codon is introduced at the end of the middle (alternative) exon. The downstream (constitutive) exon is omitted, but the transgene cargo of interest sans ATG is inserted in its place, such that inclusion of the alternatively-spliced exon results in joining of the ATG from the alternatively-spliced exon with the rest of the transgene of interest upon splicing. ATGs that lie upstream of the intended start codon are mutated or removed. Thus, this results in translation of the transgene only in settings that include the alternatively-spliced exon.
In the second approach, the alternatively-spliced exon and flanking introns are placed within the coding region of the AAV cargo. A stop codon is introduced within the alternatively- spliced exon such that it follows nonsense-mediated decay (NMD) rules, and thus elicits NMD when included. This results in productive translation of the transgene only in settings that exclude the alternatively-spliced exon. If the exon is too short to elicit NMD, another constitutive intron can be placed downstream in the transgene such that NMD rules (e.g, the stop codon should be > 50 nucleotides from the next splice junction) are satisfied.
These two approaches may be applied not only to tissue-specific exons, but also exons that respond to different cellular states or conditions. For example, it may be desirable to confer regulatory' behavior that occurs in:
(1) distinct tissues or cell types;
(2) activated T-cells, or states of inflammation;
(3) cells in which the transgene is highly expressed; and/or
(4) cells that exhibit severe disease.
The general approach described herein is advantageous over protein-based regulatory strategies because no additional protein components are necessary to confer regulation; all regulation occurs using endogenous machinery, and no neo-antigens are generated that could be immunogenic. All of the regulation occurs at the RNA level.
In some embodiments, the virus is an adeno-associated virus (AAV). In embodiments where the virus is an AAV, the orientation of the cargo is invariant. This is because the AAV ITRs are symmetric. In some embodiments, the virus is a lentivirus. In embodiments, where the virus is a lentivirus, a cargo with spliced introns must be placed on the minus strand. This is because lentivirus packaging undergoes an RNA intermediate, and the introns must not be lost.
B. Example 2: Regulated expression of AAV cargo in muscle versus heart tissue.
Commonly used methods to regulate tissue-specific expression include tissue-specific promoters and microRNAs. However, these methods are not quite specific enough to provide the level of control needed for certain therapeutic interventions. In contrast, there are exons that show close to 0% psi in heart but > 90% psi in skeletal muscle. A regulator}' cassette is generated using alternatively-spliced exons that allows an AAV transgene cargo to be expressed in skeletal muscle, but not in the heart. The exons shown in Table 1 will be tested to evaluate differential expression in skeletal versus heart tissue. These exons are good candidates for this type of tissuespecific behavior because they show robust switch-like behavior between heart and muscle. Some of the exons shown in Table 1 are conserved between mouse and human, and, correspondingly, the switch-like behavior is conserved across species. In some embodiments, the intronic sequences that flank the exons shown in Table 1 are also included as part of the regulatory' cassette.
These exons were chosen because of their switch-like behavior between heart and muscle, and because they are all < 250 nucleotides in length, with reasonably conserved intronic sequences that flank the exons. Additionally, these exons are all amenable to being cloned out of their endogenous context and placed into a minigene to act as regulatory cassettes to control AAV cargo expression. It is expected that incorporation of these exons into an AAV-delivered transgene will enable production of a protein cargo in the skeletal muscle and will result in decreased production of that cargo in the heart. Table 1: Candidate exons compiled from heart and skeletal muscle RNAseq data.
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000221_0001
C. Example 3: Splicing events that exhibit regulation during T-cell activation.
A regulator}- cassette (e.g., an alternatively-spliced exon cassette) is designed that exhibits dynamic behavior during T-cell activation. In some embodiments, such a cassete controls regulators of T-cell biology in the context of lentiviral -based cargoes (e.g., CAR-T approaches). For example, upon T-cell activation, a cargo produced using a regulatory cassette as described herein modulates the outcome of that T-cell. Exons from genes that have been previously shown to exhibit splicing changes upon T-cell activation, as published in the literature and shown in Table 2, will be tested.
In some embodiments, the intronic sequences flanking the exons shown in Table 2, along with the exons, will be introduced into a lentivirus splicing reporter and tested in resting and activated T-cells to assess activity. Sequence cassettes that exhibit behavior that is similar to their endogenous counterparts will be further developed to control heterologous cargoes. Exons from these genes were selected because they have been observed to change in splicing behavior following T-cell activation. It is expected that, when taken out of their endogenous context, and placed within an AAV-delivered transgene, some of the exons will recapitulate behavior in activated T-cells.
Table 2: Exons from genes previously shown to exhibit splicing changes upon T-cell activation.
Figure imgf000222_0001
Figure imgf000223_0001
Figure imgf000224_0001
D. Example 4: Broad discovery of tissue-speeific AAV cassettes.
A broad screen was performed to identify tissue-specific exon cassettes that exhibit similar behavior when placed within the context of an AAV cargo. These exons were identified using RNAseq data and exons that, are < 200 nucleotides iong and exhibit high conservation across multiple species were chosen. These alternatively-spliced exons and their proximal introns are packaged into a heterologous context such that their inclusion level can be assessed by RT-PCR or deep sequencing. Nucleotide barcodes are included in the 3 ’ untranslated region such that the identity of each exon cassette can be determined by deep sequencing the barcode. The exon cassettes are packaged as a pool into an AAV library and administered to mice. Tissues or cell types of interest are harvested, and RNA originating from the AAV transgenes is prepared for deep sequencing such that psi values can be associated with each barcode in each tissue. Exon cassettes that exhibit tissue-specific behaviors of interest are identified using this procedure.
Examples of datasets used to identify tissue specific exons can be found in Wang, et al. (2008), Alternative isoform regulation in human tissue transcriptomes, Nature 456(7221): 470- 76; Li, et al. (2017), A Comprehensive Mouse Transcriptomic BodyMap across 17 Tissues by RNA-seq, Set. Rep. 7(1): 4200; and the GEO dataset entitled “[E-MTAB-513] Illumina Human Body Map 2.0 Project” (Series GSE30611). E. Example 5: Research operating procedure.
A general research operating procedure for how to develop gene therapies that take advantage of alternative regulation is also provided. This approach can be generalized to facilitate the identification of particular sequences that confer regulatory behavior that is desired. In some embodiments, it is desirable to prevent over-dosing or over-expression in a given tissue.
The procedure is as follows:
(1) The cargo of interest is expressed using AAV in the tissue or cell types of interest.
(2) Transcriptome profiling is performed to identify exons that are sensitive to transgene over-expressi on.
(3) Using the exons identified in (2), alternatively-spliced exon cassettes that allow for control of the transgene are designed. The design may in some embodiments use the two methods described in Example 1 (e.g., place the ATG within the alternatively-spliced exon, or make the alternatively-spliced exon an NMD substrate). In some embodiments, modifications are made to ensure that the cassette responds appropriately to the transgene. In some embodiments, this work is done in. vitro. In some embodiments, this work is done in vivo.
(4) A library of mutagenized splice sites or intronic elements is made that uses the alternatively-spliced exon cassette identified in (3) as the starting point. Barcodes are incorporated such that mutations can be linked to distinct barcodes. An AAV library is generated and administered in vivo in all the settings of interest
Figure imgf000225_0001
transgene overexpression or wild-type animals). Psi values of all variants are read out by deep sequencing and “wdnners” are chosen.
(5) The winners are individually tested in vivo.
F. Example 6: Engineering tissue-specific alternative splicing to regulate gene therapy cargoes.
A major challenge in the gene therapy field is to develop strategies yielding precise cargo expression - in levels, location, and timing. Because functional transduction of many tissues and cell types by viral vectors remains relatively inefficient, existing cargo sequences often incorporate strong promoters and minimal 5’ and 3’ UTR elements that enhance RNA stability and translation efficiency, aiming to maximize gene expression levels. However, over-expression of some cargoes in certain cell types and tissues may lead to toxicity, thus narrowing or eliminating the therapeutic windows available to treat disease. Solutions to achieve cell typespecific expression include use of tissue-specific promoters, incorporation of regulatory' elements within mRNA sequence (e.g, microRNA binding sites), and packaging of cargoes into capsid variants exhibiting cell type-specific tropisms. These approaches, however, provide limited control, and fail to incorporate certai n basic mechanisms of gene regulation ubiquitously employed by the naturally-occurring genome. One of these mechanisms is alternative splicing, which has been relatively unexplored as a mechanism by which to regulate gene therapy cargo expression.
Alternative splicing occurs in -95% of all multi-exonic human genes, with a major portion of regulated exons showing a tissue or cell type-specific bias (1). The most studied form of alternative splicing is the “skipped exon” or “cassette exon”, in which an alternative exon can be included or excluded between a pair of constitutive exons. The present inventors have identified a subset of “switch-like” cassette exons that, show differences in inclusion level between tissues; these exons tend to preserve reading frame more frequently than other cassette exons and display increased phylogenetic conservation in the -200 intronic nucleotides both upstream and downstream of these exons.
Regulation of alternative splicing is controlled by core spliceosomal machinery, along with RNA binding proteins (RBPs); many RBPs themselves show tissue-specific expression profiles (2). Mechanistic studies of alternative splicing regulation are often performed by cloning the cassette exon sequence (e.g, upstream intron, cassette exon, and downstream intron) into a heterologous context in which the flanking constitutive exons are taken from a separate gene (3). For example, beta globin exons 1-3 (4) and SMN1 exons 6-8 (5,6) are commonly employed exon/intron contexts into which cassette exon sequences have been incorporated for further study. In addition, the behavior of alternatively spliced exons can be recapitulated in heterologous contexts and has even been re-purposed to control fluorescent reporter expression (7). Similar concepts have been used to regulate AAV-mediated gene expression in vivo using alternative splicing, wherein expression of a target gene is controlled via exposure to, for example, an aptamer ligand, such as a small molecule (8,9). However, no attempts have been made to use exons displaying ceil- or tissue-specific, or endogenous activity-dependent, splicing patterns to regulate gene therapy cargoes.
The current gene therapy landscape is focused on a multitude of disease indications, but several broad areas could benefit from improved cell or tissue type-specific regulation. Firstly, observed toxicides of AAV-delivered therapies in dorsal root ganglia suggest that minimization of heterologous cargo expression in this tissue could be beneficial, even if a major portion of the toxicity is capsid-mediated. Secondly, a great number of gene therapies are being developed for neuromuscular or cardiac indications; however, some cargoes that are therapeutic in one tissue may be toxic when over-expressed in the other, and there are limited approaches available to fully de- target either tissue.
Described herein is a general approach to re-purpose, engineer, and optimize alternative splicing cassettes to de-target specific tissues and cell types. Alternative splicing cassettes were engineered to control protein cargo expression in the context of AAV. These cassettes were designed such that incorporation of the AUG translation initiation codon within the cassette exon would lead to cargo production upon inclusion (FIG. 9), and/or such that incorporation of a premature stop codon within the cassette exon would lead to nonsense-mediated decay of the cargo mRNA upon inclusion. Screens were performed across hundreds of candidates in vivo, and proof of concept is provided herein for how to further optimize sequences that confer switch-like behavior. Individual sequences of interest were tested and both splicing patterns and total protein output were assessed as gold standards for the extent of de-targeting.
The approach described herein, Tissue-specific Alternative splicing to Restrict Globally- Expressed Therapeutic (TARGET), is broadly applicable to any set of tissues or cell types and can be applied to any cargo that satisfies viral packaging limit restrictions in any virus that supports packaging of splicing-competent transgenes. Some viruses that undergo splicing during packaging (e.g, lentivirus) would require encoding of the transgene on the minus strand of the viral genome to avoid removal of introns during the packaging process.
Results
Identification of alternative exon candidates and selection of transgene context RNAseq datasets were analyzed to identify candidate exons that display extreme “switchlike” behavior between human heart (10) and skeletal muscle (SRA project SRP082676). These candidates were further filtered by those that were also conserved to mouse, and those which displayed similar percent spliced in (psi) values in mouse heart (low psi) and skeletal muscle (high psi). A set of 11 cassette exons were selected and -500 nucleotides of total sequence were cloned — including the cassette exon and immediately adjacent flanking introns — into the SMN1 exon 6/intron 7 context, which has been previously used to study alternative splicing regulation (11) (FIG. 10). The MTM1 coding sequence, which expresses the myotubularin protein, a protein that is missing in boys affected by X-linked myotubular myopathy (12) (XLMTM), was chosen as the therapeutic cargo. Although MTM1 expression in skeletal muscle is therapeutic, questions have been raised about whether over-expression in heart can lead to toxicity (13), providing motivation to identify cassettes that may preferentially de-target heart but preserve skeletal muscle expression.
Mutations to facilitate translation initiation at the end of the alternative exon and avoid spurious downstream translation
For each of the alternative exon candidates, the final nucleotides of the exon were altered to either be “ATG”, “AT”, or “A” (depending on which nucleotides naturally occurred), such that initiation of translation could be achieved when the exon was included. Additionally, any upstream ATGs within the alternative exon were removed by substitution or deletion, to avoid translation initiation at an earlier location. In the case of exon skipping, downstream ATGs within the MTM1 coding sequence might lead to translation of un wanted protein fragments; thus, stop codons were introduced within 15 nucleotides of each of these ATGs, such that translation would terminate within just a few (<5) amino acids (FIG. 11). These ATGs and stop codons all resided in a reading frame distinct from the normal VI 1'. VI I reading frame, and thus mutations required to generate these stop codons could preserve the amino acid composition of MTM1. For other cargoes in which internal methionines are present, new out-of-frame short peptide sequences could be introduced upstream of these methionines such that translation of these short, benign peptides is favored over translation of a N-terminally truncated cargo (reinitiation of translation following a stop codon typically does not occur unless there are additional regulatory elements such as internal ribosomal entry sites). Mu tations to preserve splice site strength and considerations of the Kozak sequence
Because changes to the end of the alternative exon sequence can affect the strength of the alternative exon’s 5’ splice site, both the original and altered 5’ splice sites of the alternative exon were scored using MaxEntScan (14) and compensatory mutations were made to the intronic bases of the alternative exon’s 5’ splice site to compensate for any potential weakening of the splice site signal (FIG . 12; Table 3). The bases of the alternative exon which upstream of the ATG initiation sequence were also analyzed for translation initiation potential (15), and almost all sequences in this set showed reasonably strong scores. Additional mutations within the alternative exon could be made to increase similarity to the Kozak consensus sequence.
Barcoding method, to uniquely identify each alternative exon within the pool of candidates
A unique nucleotide “barcode” sequence was introduced within the MTM1 coding sequence such that it preserved the amino acid composition of MTM1, but also uniquely identified the upstream alternative exon cassette (FIG. 13). This barcode was necessary so that the frequency of alternative exon inclusion could be properly computed, the alternative exon identity is evident when it is included, but the barcode is required for identification when it is skipped. The number of deep sequencing reads that cross the splice site junctions (read 1 of each read pair) thus can be associated with the deep sequencing reads that capture each barcode (read 2 of each read pair), facilitating calculation of percent spliced in (psi, T) for each candidate. This is similar in principle to other published approaches (6,16).
Viral packaging, delivery in vivo, library preparation, and sequencing
All 11 alternative exon candidates (see Table 3 for exon coordinates, psi values, translational initiation scores, and sequence alterations) were packaged into AAV9 as a pool and administered to mice systemically (retro-orbital injection, 4 C57/BL6 mice and 2 FVB mice at 6 weeks of age, 2el3 vg/kg) and intramuscularly (4 C57/BL6 mice at 6 weeks of age, tibialis anterior, 2e11 vg total into one leg). Mice were sacrificed after 4 weeks; the heart and liver were harvested from the systemically injected animals and the tibialis anterior (TA) was harvested from the intramuscularly injected animals. Reverse transcription and polymerase chain reaction was performed using primers targeting the upstream SMN1 exon 6 and also a region in MTM1 3' of the barcode. Illumina adapters with unique indexes to identify each sample were incorporated into the final amplicon libraries and then sequenced.
Table 3: Alternative exon candidates.
Figure imgf000231_0001
Analysis of deep sequencing reads
Psi values were computed by associating junction reads to barcodes and computing the frequency of inclusion versus exclusion of each exon (FIGs. 14A-14C). Exons from BINI, CAMK2B, KIF13A, LGMN, and PICALM showed higher inclusion levels in skeletal muscle (TA) and heart (H), and the BINI exon candidate showed the largest dynamic range, with -15% inclusion in heart and -60% in TA. In addition, the BINI exon also showed -0% inclusion in liver (L). These results provide proof of concept for the overall screening strategy and identify alternative exon cassettes that would be predicted to minimize translation of MTM1 in heart as compared to skeletal muscle.
Supplemental data: reproducibility of results as a function of time following dosing
To assess whether the time following intramuscular administration might influence the psi value assessment, the same library was administered into 7 additional mice intramuscularly (2el 1 vg total into one tibialis anterior (TA) of each mouse). The TAs were harvested 1, 2, 3, or 4 weeks following dosing. Sequencing libraries were generated and the psi values were correlated for each exon candidate across all samples. The results were strongly concordant, regardless of what time point was analyzed (FIGs. 15A-15B).
Screens to identify sequences that further de-target heart but maximize expression in skeletal muscle
Based upon the initial hits of the first screen, described above, alternative exon cassette sequences were identified which might further enhance the switch-like behavior in heart versus skeletal muscle. A higher throughput approach was taken to simultaneously screen many sequence variants of candidate alternative exon cassettes. Core splice site sequences as well as intronic/exonic sequences play important roles in splicing decisions, by modulating the ability of specific trans-factors to bind a pre-mRNA. The core splicing signals, which include the 3’ splice site, 5’ splice site, and branch point, can all influence the frequency with which an alternatively spliced exon is chosen. These core splicing signals are recognized by the Ul, U5, and U2 snRNPs, among other components; but they may also be bound by other RNA binding proteins (RBPs), which play roles in modulating how well the basal splicing machinery can recognize the core signals. Furthermore, RBPs can bind to intronic or exonic sequence in the vicinity of these core splicing signals to affect overall splicing decisions. The abundance of certain RBPs in certain contexts can therefore influence splicing patterns in those contexts. To aid efforts to further optimize sequences that display switch-like alternative splicing in heart versus skeletal muscle, the expression level of RNA binding proteins in these 2 tissues was analyzed (FIGs. 16A-16B). RNA expression levels were obtained from G I EX (17), and RBPs were defined from RBPDB (18). The ratio of expression in heart versus skeletal muscle was computed, and used to identify RBPs showing strongest differential expression between these 2 tissues; these RBPs would be predicted to be trans-factors that might be responsible for influencing splicing decisions of highly heart, versus skeletal muscle-specific exons.
The high throughput screening approach described herein was first applied to BIN 1 exon 11 because it showed the largest dynamic range in psi between heart and skeletal muscle (see FIGs. 14A-14C). BINI exon 11 has been previously studied and demonstrated to be responsive to RNA binding proteins such as the Muscleblind-like proteins (19) and RBFOX proteins (20); consistent with this, RBFOX1 and MBNL1 are the 5th- and 1 Ith-most enriched RBPs, respectively, in skeletal muscle relative to heart. The upstream intron of BINI exon 11 is enriched for CAC motifs (10 instances versus an expectation of 3.8); pairs of CAC motifs separated by a variable spacer are known to bind RBPMS2 (21). RBPMS2 represses exon inclusion when binding to upstream introns (22) and is the 2nd-most enriched RBP in heart as compared to skeletal muscle. Notably, the psi values of BINI exon 11 in human and rhesus macaque heart are all close to 0%, but in dog, which contains only 1 instead of 2 CAC motifs in the 3’ splice site of BINI exon 11, unlike the other organisms, shows a psi value of -50% for BINI exon 11 (23) (FIGs. 17 and 18). Thus, RBPMS2 might be a critical factor that represses BINI exon 11 in heart.
Given all of the above information, the 3’ splice site, 5’ splice site (FIG. 19), and downstream intron (FIG. 20) of BINI exon 11 was systematically altered to explore different splice site strengths, different configurations of CAC motifs within the 3’ splice site, and different frequencies of MBNL and RBFOX binding sites within the downstream intron. In total, AAV plasmid libraries were generated that contained 7 possible 3’ splice sites, 6 possible 5’ splice sites, and 16 possible downstream intronic sequences. The splice sites varied in strength, and the intronic sequences varied in the number of predicted MBNL and RBFOX binding sites. A total of 672 sequence variants were possible; each variant was linked to unique 10 nucleotide- long barcodes placed within the downstream coding sequence of MTM1. Each variant could be linked to several unique barcodes, such that multiple barcodes could serve as “replicates” for each sequence variant. Deep sequencing of PCR products amplified from plasmid libraries (FIG.
21) showed the presence of 663/672 variants, with an average of -8 barcodes per variant (FIG.
22).
Viruses were generated using the eMyoAAV capsid (24) and administered to mice at a titer of 2.5el3 vg/kg. Heart, tibialis anterior, and triceps muscles were collected from mice sacrificed 3 weeks following administration. Sequencing libraries were prepared by RT-PCR and sequenced by Illumina sequencing. Psi values were computed for each barcode and a psi value for each variant was obtained by averaging the psi across every' barcode for each variant. The psi value for each variant is shown for 2 heart samples in a scatter plot (FIG. 23 A), and similarly, for 2 gastrocnemius samples (FIG. 23B), or a heart sample versus a gastrocnemius sample (FIG. 23C). The mean psi was computed for each variant across replicate tissues from multiple animals (n=4 animals for each tissue). These mean psi values for each variant were also plotted (FIGs. 24A-24B) and listed (Table 4). Psi values for each variant were also plotted as a function of 3’ splice site strength or 5’ splice site strength in heart and gastrocnemius (FIGs. 25A-25D), and clear dependencies of psi on splice site strength were observed. This relationship is particularly strong between 5’ splice strength and psi, supporting the idea that this screening approach can accurately quantitate relative psi values and identify sequence variants that exhibit specific splicing patterns.
The same BINI exon 11 variants were also tested with a different cargo, CAPN3. A separate AAV library was generated in which all 672 BINI variants (Table 4) were cloned upstream of the CAPN3 coding sequence, analogously to how they were cloned upstream of the MTM1 coding sequence. Similarly, a 10 nucleotide barcode was embedded within the CAPN3 coding sequence to identify each splice variant. The mean psi values across heart, gastrocnemius, and tibialis anterior tissues from 4 animals were plotted as scatters (FIGs. 26A-26B), showing that some variants show lower inclusion in heart than in skeletal muscles. The overall behavior of each variant is strongly correlated across MTM1 and CAPN3 cargoes, but the baseline inclusion level of the BINI cassette is lower when linked to the CAPN3 cargo; this trend is observable when plotting the scatter of psi variants for each cargo in heart and gastrocnemius (FIGs. 27A-27B).
Table 4. Table of BINI exon 11 vanants screened and associated psi values.
Figure imgf000236_0001
Figure imgf000237_0001
Figure imgf000238_0001
Figure imgf000239_0001
Figure imgf000240_0001
Figure imgf000241_0001
Figure imgf000242_0001
Figure imgf000243_0001
Figure imgf000244_0001
Figure imgf000245_0001
Figure imgf000246_0001
Figure imgf000247_0001
Figure imgf000248_0001
Figure imgf000249_0001
Figure imgf000250_0001
Figure imgf000251_0001
Figure imgf000252_0001
Figure imgf000253_0001
**The columns for Table 4 are further described as follows:
ID: (Variant ID): an identification number for each specific BINI alternative exon cassette variant.
3’ splice site ID: an identification number for each 3’ splice site, as indicated in FIG. 19. 3’ splice site ID: an identification number for each 5’ splice site, as indicated in FIG. 19. Intron insertions: The locations of specific intronic modifications within each variant, as listed in FIG. 20.
MTMl_Heart: psi of the variant in heart vzhen linked to the MTM1 cargo.
- MTM1_ Gastroc: psi of the variant in gastrocnemius when linked to the MTM1 cargo. MTMl_Tibialis: psi of the variant in tibialis when linked to the MTMI cargo.
CAPN3 Heart: psi of the variant in heart when linked to the CAPN3 cargo CAPN3_Gastroc: psi of the variant in gastrocnemius when linked to the CAPN3 cargo CAPN3_Tibialis: psi of the variant in tibialis when linked to the CAPN3 cargo
SEQ ID NO: the sequence identifier associated with the intron-exon-intron sequence of the particular cassette variant.
Application of this high throughput screening approach to identify alternative exon cassettes with regulated splicing patterns in additional tissues
The ability to limit or augment gene expression in a variety of tissues would be useful for gene therapies, and some notable tissues include the liver, different brain regions, dorsal root ganglia (DRG), skeletal muscle, cardiac muscle, and smooth muscle. GTEX data was mined as well as a human DRG-specific dataset (SRA runs SRR8533960-SRR8533986) to identify 110 alternative exons that show differential inclusion in these tissues (Table 5), and 96 exon cassettes were selected to test for splicing behavior within these tissues. A similar procedure as outlined above was followed, alternative exons that are <200 nucleotides in length were selected, all ATGs within the alternative exon body were removed, and the end of each alternative exon was modified to terminate in ATG. The 5’ splice sites of the new exons were scored and new' variants for each alternative exon cassette were designed that were 1 bit weaker, similar, and 1 bit stronger than the endogenous 5’ splice site in the absence of adjustments to generate a new' ATG. -500 nucleotides of total sequence were included from each alternative exon cassette, including the alternative exon itseif and immediately flanking intronic regions, and were cloned into the SMN1 exon 6/intron 7 context (as above). EGFP was used as the downstream cargo (rather than MTM1). A similar 10 nucleotide barcode was incorporated into the EGFP coding sequence to allow for identification of each alternative exon cassette. Two versions of the library' were generated; one driven by an MHCK7 promoter to bias expression towards cardiac, smooth, and skeletal muscles, and the other driven by a CBh promoter to drive ubiquitous expression. The MHCK7 promoter-driven construct will be packaged by the eMyoAAV capsid to bias delivery to muscle, whereas the CBh promoter-driven construct will be packaged by the PHP.eB capsid (25) to bias delivery’ to the nervous system, including DRG.
Table 5, Part I (Columns 1 -18): Table of exons screened to characterize behavior across multiple tissues.
Figure imgf000256_0001
Figure imgf000257_0001
Figure imgf000258_0001
Figure imgf000259_0001
Figure imgf000260_0001
Figure imgf000261_0001
Table 5, Part 2 (Columns 1, 2, and 19-38): Table of exons screened to characterize behavior across multiple tissues.
Figure imgf000261_0002
Figure imgf000262_0001
Figure imgf000263_0001
Figure imgf000264_0001
Figure imgf000265_0001
Figure imgf000266_0001
Figure imgf000267_0001
Figure imgf000268_0001
**The columns for Table 5 are further described as follows:
Coordinates: chromosome and splice site coordinates of the alternatively-spliced exon (from hg38). The 4 coordinates indicate the upstream constitutive 5’ splice site, the 3’ splice site of the alternative exon, the 5’ splice site of the alternative exon, and the downstream constitutive 3’ splice site, but the values are all in ascending order regardless of transcribed strand.
Gene: gene name for the gene that contains the screened exon.
Exon length: length of the exon in number of total nucleotides.
- Upstream intron sequence by SEQ ID NO: sequence of selected upstream intronic sequence.
- Exon sequence by SEQ ID NO: native sequence of the screened alternative exon. Native 5’ splice site by SEQ ID NO: native 5’ splice site of the alternative exon.
- Native 5’ splice site score: score of the native 5’ splice site of the alternative exon.
Exon sequence (with internal ATGs removed and ATG at the end) by SEQ ID NO: native exon sequence with all internal ATGs mutated, and with an ATG at the end of the alternative exon.
Compensated 5' splice site sequence by SEQ ID NO: a 5’ splice site that has been mutated to match the native splice site strength.
Compensated 5' splice site sequence score: score of the compensated 5’ splice site.
- Downstream intron sequence by SEQ ID NO: sequence of selected downstream intronic sequence.
- Downstream intron sequence (with compensated 5' splice site): sequence of selected downstream intronic sequence with the compensated 5’ splice site.
- Kozak sequence by SEQ ID NO: sequence that surrounds the ATG start codon. The first 2 bases following the ATG are GT in the context of a GFP coding sequence.
Kozak sequence score: a score for the efficiency of the Kozak sequence.
5' splice site sequence (~1 bit stronger) by SEQ ID NO: a 5’ splice site selected to be approximately 1 bit stronger than the native 5’ splice site.
5' splice site score (~1 bit stronger): the score of this 5’ splice site.
5' splice site sequence (~1 bit. weaker) by SEQ ID NO: a 5’ splice site selected to be approximately I bit weaker than the native 5’ splice site. 5' splice site score (~1 bit weaker): the score of this 5’ splice site.
Subsequent columns denote psi values of each alternative exon in various tissues of interest.
Applications of this high throughput screening approach to Identification of alternative exon cassettes that can be regulated by T-cell activation
The ability to increase or decrease exon inclusion in response to T-cell activation provides utility for various therapeutic purposes, such as CAR-T therapy or other immunotherapies. A major challenge in the context of CAR-T for solid tumors is T-cell exhaustion, a state in which the engineered T-cells no longer exhibit sufficient, potency to eliminate tumor cells expressing the neoantigen due to co-expression of multiple inhibitory receptors. It has been previously shown that transcription factors such as T-bet can repress the expression of these inhibitory receptors and can instead sustain the activity of T-cells during chronic infection (26) and also enhance antitumor activity and limit T-cell exhaustion in CAR-T cells (27). However, constitutive over-expression of T-bet may also lead to undesired or autoimmune-like responses (28,29), and thus the addition of T-bet must be regulated in a context-dependent, manner. Thus, an exon that is activated by T-cell activation might be engineered to control translation of T-bet or other cargoes that can modulate the state of the T- cell, thereby preventing or limiting T-cell exhaustion.
Publicly available transcriptome datasets in which T-cells were transcriptionally profiled before and after activation (30) were mined, and a set of 98 alternative exon cassettes were chosen (Table 6) to test for splicing behavior within the context of a lentivirus that can integrate into the T-cell genome. Two intronic regions, along with splice site-proximal exon fragments, were selected and fused together to form a new exon cassette. These alternative exon cassettes will be packaged into a lentivirus capsid and wall be used to transduce naive T-cells (31); T-cells will then be activated, RNA will be harvested, and deep sequencing libraries will be prepared and sequenced to identify alternative exon cassettes that show7 changes in splicing patterns upon activation.
Table 6: Table of T-cell activation-responsive exons.
Figure imgf000271_0001
Figure imgf000272_0001
**The columns for Table 6 are further described as follows:
Gene name: name of the gene.
Strand: transcribed strand for this gene.
Coordinate 1 : the coordinates of the upstream intron and 5’ end of the exon, including the 3’ splice site (hg!9 coordinates).
Sequence 1 (by SEQ ID NO): the sequence corresponding to coordinate 1.
Coordinate 2: the coordinates of the 3’ end of the exon, including the 5’ splice site, and downstream intron (hg!9 coordinates).
Sequence 2 (by SEQ ID NO): the sequence corresponding to coordinate 2.
References of Example 6
1. Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470 -476 (2008).
2. Gerstberger, S., Hafner, M., Ascano, M. & Tuschl, T. Evolutionary Conservation and Expression of Human RNA-Binding Proteins and Their Role in Human Genetic Disease, Adv. Exp. Nied. Biol. 825, 1—55 (2014).
3. Cooper, T. A. Use of minigene systems to dissect alternative splicing elements. Methods 37, 331-340 (2005).
4. Coulter, L. R., Landree, M. A. & Cooper, T. A. Identification of a new class of exonic splicing enhancers by in vivo selection. Mol. Cell. Biol. 17, 2143-2150 (1997).
5. Lorson, C. L., Hahnen, E., Androphy, E. J. & Wirth, B. 7k single nucleotide in the SMN gene regulates splicing and is responsible for spinal muscular atrophy. Proc. Natl. Acad. Sci. 96, 6307-6311 (1999).
6. Wong, M. S., Kinney, J. B. & Krainer, A. R, Quantitative Activity Profile and Context Dependence of Ah Human 5’ Splice Sites. Mol. Cell 'll, 1012- 1026. e3 (2018).
7. Orengo, J. P., Bundman, D. & Cooper, T. A. A bichromatic fluorescent reporter for cell-based screens of alternative splicing. Nucleic Acids Res. 34, el 48 (2006). 8. Boyne, A. R., et al. International Patent App. No. PCT/US2016/016234, entitled “Regulation of gene expression by aptamer-mediated modulation of alternative splicing”, filed February 2, 2016 and published as International Pub. No. WO2016126747A1.
9. Monteys, A. M. et al. Regulated control of gene therapies by drag-induced splicing. Nature 596, 291-295 (2021).
10. Freyermuth, F. et al. Splicing misregulation of SCN5A contributes to cardiacconduction delay and heart arrhythmia in myotonic dystrophy. Nat. Commun. 7 , 11067 (2016).
11. Cheung, R. et al. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol. Cell 'll, 183-194. e8 (2019).
12. Lawlor, M. W. & Dowling, J. J. X-linked myotubular myopathy. Neuromuscul. Disord. 31 , 1004-1012 (2021 ).
13. Childers, M. K. et al. Gene Therapy Prolongs Survival and Restores Function in Murine and Canine Models of Myotubular Myopathy. Sci. Transl. Med. 6, 220ral0 (2014).
14. Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. J Comput. Mol. Cell Biol. 11, 377- 394 (2004).
15. Noderer, W. L. et al. Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol. Syst. Biol. 10, 748 (2014).
16. Adamson, S. I., Zhan, L. & Graveley, B. R. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 19, 71 (2018).
17. Carithers, L. J. et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreservation Biobanking 13, 31 1-319 (2015).
18. Cook, K. B., Kazan, II., Zuberi, K., Morris, Q. & Hughes, T. R. RBPDB. a database of RNA-binding specificities. Nucleic Acids Res. 39, D301-D308 (2011).
19. Fugier, C. et al. Misregulated alternative splicing of BINI is associated with T tubule alterations and muscle weakness in myotonic dystrophy. Nat. Med. 17, 720-725 (2011 ).
20. Singh, R. K., Kolonin, A. M., Fiorotto, M. L. & Cooper, T. A. Rb fox- Splicing Factors Maintain Skeletal Muscle Mass by Regulating Calpain3 and Proteostasis. Cell Rep. 24, 197-208 (2018). 21. Farazi, T. A. et al. Identification of the RNA recognition element of the RBPMS family of RNA-binding proteins and their transcriptome-wide mRNA targets. RNA 20, 1090- 1102 (2014).
22. Nakagaki-Silva, E. E. et al. Identification of RBPMS as a mammalian smooth muscle master splicing regulator via proximity of its gene with super-enhancers. eLife 8, e46327 (2019).
23. Naqvi, S. et al. Conservation, acquisition, and functional impact of sex-biased gene expression in mammals. Science (2019) doi:10.1126/science.aaw7317.
24. Tabebordbar, M. et al. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species. Cell 184, 4919-4938. e22 (2021 ).
25. Chan, K. Y. et al. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nat. Neurosci. 20, 1172-1179 (2017).
26. Kao, C. et al. Transcription factor T-bet represses expression of the inhibitory receptor PD-1 and sustains virus-specific CD8+ T cell responses during chronic infection. Nat. Immunol. 12, 663-671 (2011).
27. Gacerez, A. T. & Sentman, C. L. T-bet promotes potent antitumor activity of CD4+ CAR T cells. Cancer Gene Ther. 25, 117—128 (2018).
28. Austin, J. W. et al. Overexpression of T-bet in HIV infection is associated with accumulation of B cells outside germinal centers and poor affinity maturation. Sei. Transl. Med. (2019) doi : 10.1126/ scitranslmed. aax0904.
29. Shimohata, H. et al. Overexpression of T-bet in T cells accelerates autoimmune glomerulonephritis in mice with a dominant Thl background. J Nephrol. 22, 123-129 (2009).
30. Martinez, N. M. et al. Alternative splicing networks regulated by signaling in human T cells. RNA 18(5), 1029- 1040 (2012).
31. Bemadin, O. et al. Baboon envelope LVs efficiently transduced human adult, fetal, and progenitor T cells and corrected SCID-X1 T-cell deficiency. Blood Adv. 3, 461-475 (2019).
G. Example 7: Ligand-Responsive Splicing Switches for Control of miRNA Expression In mammalian cells, alternative splicing greatly expands the diversity of RNA sequences that can be incorporated into transcript isoforms. These sequences include both coding and untranslated regions (UTRs). Consequently, different RNA isoforms from the same gene can exhibit distinct stability, localization, or translation behaviors in different cellular contexts and conditions (1). From a synthetic biology standpoint, alternative splicing can thus be used as a regulatory tool to control gene expression for various applications (2, 3). One attractive application is to use ligand-inducible alternative splicing to regulate AAV-delivered gene expression, which can confer greater control of therapeutic cargoes and also potentially avoid potential toxicities from constitutive over-expression of therapeutic cargoes (4, 5). Previous aptazyme-based approaches (6) in which a riboswitch was fused to a ribozyme to regulate RNA stability in mammalian cells yielded a 15-fold dynamic range; however, this approach lacks modularity and has leaky, non-zero basal expression. The recent Xon system (7) uses drug- responsive alternative splicing patterns to control AAV-mediated gene expression, but potentially affects many other cryptic splice sites (8), and is restricted to a single specific molecule.
Here, a high-throughput approach (9) to identify synthetic alternative splicing switches in mammalian cells, and several sequence designs that allow for ligand-inducible regulation of gene expression or knockdown is described. The approach uses rational design, coupled to deep sequencing, to characterize behavior of hundreds to thousands of synthetic intron/exon cassettes. Several riboswitch designs that facilitate small molecule-mediated regulation of alternative splicing and characterized multiple sequence variants were developed. Importantly, unlike switches that promote exon inclusion (7) (this can result in incorporation of unwanted RNA sequences within the final cargo), this design promotes exon skipping upon drug induction. Here, switches that can dynamically regulate protein isoforms, protein expression levels, and production of RNA interference triggers were designed. This approach is termed SPlicing by Ligand Induction for Controllable Expression based on Riboswitch (SPLICER). The designs are compact in size and promoter-independent, making them useful regulatory' tools that can be incorporated into gene expression cassettes for basic and translational applications.
Results
Exon Cassette Selection and Sequence Design An alternative exon/intron cassette in which a tetracycline aptamer was placed just downstream of the 5’ splice site of MBNL1 exon 5 was designed. This particular exon has been used extensively in the literature to study regulation of alternative splicing (10) and has relatively weak splice sites. Minimal sequence from intron 4 and intron 5 were incorporated around the exon 5 sequence while most of the binding sites for the MBNL protein that reside in intron 4 were not incorporated. The tetracycline aptamer was selected as the riboswitch to incorporate for three reasons. First, the tetracycline aptamer has very high affinity for its ligand (its dissociation constant is in the sub-nanomolar range) ( 11). Second, tetracycline riboswitches have been demonstrated to be functional in vivo (6, 12, 13). Third, tetracycline is non-toxic across a broad range of concentrations and is cell permeable. The tetracycline aptamer was placed in the intron downstream of the exon 5 5' splice site (see FIG. 28) and the communication stem of the riboswitch was designed to base pair with nucleotides -3 to +6 at the 5’ splice site. When bound to tetracycline, the riboswatch stabilizes the communication stem to block the 5’ splice site, leading to increased exon-skipping. To investigate how various splice site sequences influence the behavior of the alternative exon, 39 distinct 3’ splice site sequences and 20 distinct 5’ splice site sequences in combination were tested (780 total combinations possible). To study the behavior of each variant in parallel, a 9-nucleotide barcode was incorporated within the downstream constitutive exon that served as a unique identifier for each variant. A pooled library' of ~9Kdistinct plasmids representing all variants was generated, yielding about 12 distinct barcodes per sequence variant.
Massively Parallel Assay to Assess Splicing Patters of Library Variants
The pooled plasmid library was transfected into HEK293T cells in each well and treated with vehicle or 100 pM of tetracycline 6 hours after transfection (3 biological replicates) (see FIG. 29). Cellular RNA was extracted 24 hours after drug treatment. Reverse transcription and PCR were performed to amplify across the splice junctions and downstream barcode. Illumina adapters were added onto the amplicons via PCR. The “codebook” sequencing library was generated using primers that amplify sequence from the upstream 3’ splice site all the way through the downstream barcodes, using the plasmid library as a template. All libraries were subjected to paired end Illumina sequencing. The first read of the codebook libraries was used to read the specific 3’ and 5’ splice sites within each variant, and the second read to identify the specific barcode associated with those splice sites. For the cDNA libraries, the first read of each pair was used to determine splicing patterns, and the second read of each pair was used to identify the specific variant; percent spliced in (psi) values were calculated for each barcode to determine splicing patterns in the presence and absence of tetracycline for each variant.
Identification of Teti'acycline-Responsive Splicing Switches
From deep sequencing results, -9K reliable barcodes were identified, yielding an average of 12 barcodes for each variant. Each barcode yielded ~15K reads thus totaling- 180K reads per variant. In general, almost all variants showed decreased psi value in the presence of tetracycline, but some variants showed more dramatic regulation than others (see FIG. 30). Variants were grouped either by 3’ splice site or 5’ splice site identity, and sorted by mean psi values in the absence of tetracycline, (see FIGs. 31 A- 31 B). Delta psi were plotted in a heatmap, where each row/colunm combination denotes a specific 5’ and 3’ splice site combination. The 5’ and3’ splice sites were sorted according to their mean psi in the absence of tetracycline (see FIG. 31C). Variants with high or low baseline psi values were generally less sensitive to drug treatment. Variants exhibiting large delta psi were selected for further characterization and their splicing switching behavior was confirmed by testing them individually. Results for one particular variant (3 ’splice site number 10, 5’ splice site number 13) showed that this variant had a large delta psi (91.3% without tetracycline and 14.7% with tetracycline) and also preserved the amino acid composition of the end of exon 5 (see FIG. 32). Regulation of Protein Expression by Riboswitch-Induced Alternative Splicing arid Nonsense- Mediated Decay
To regulate protein expression, a poison cassette exon was combined with a riboswitch which allowed incorporation of a premature stop codon in the absence of drug and skipping of this exon in the presence of drug (see FIG. 33). The CRISPR-CasPhi-2 (14) coding sequence was placed downstream of the alternative splicing cassette as the gene of interest (POI). To further improve delta psi, a design in which the riboswitch comprises the entire exon and the communication stem is immediately adjacent to or overlapping with the 3’ and 5’ splice sites was generated (see FIG. 34). Thus, the ligand induced base pairing of the communication stem, blocking both the 5 ’and 3’ splice sites. The alternative exon contained an in-frame premature stop codon such that skipping of the exon allowed protein translation to proceed. A total of 14 constructs with different splice sites and communication stem lengths were tested and splicing assays for each of these constructs showed proof of concept for this design strategy (see FIG. 35). A dose response for one of these constructs showed an EC50 of ~10 gM (see FIG. 36), which is in the concentration range of what is achievable in vivo (15, 16).
Constructs with low baseline skipping (high psi) in the absence of drug (AltExl, AltEx2, AltEx9) were further studied in a heterologous context. Nano-luciferase was fused downstream of the cassette exon and splicing patterns and protein expression were characterized in the absence and presence of drug (see FIGs. 37A-37B). Cassettes AltExl, AltEx2, and Alt.Ex9 yielded increased skipping in the presence of drug, with fold-changes of -Infinite (divide by zero), 44x, and lOx, respectively, at the RNA level. However, at the protein level, by measurement relative to a firefly luciferase control, these cassettes yielded fold-changes of 9x, 22x and 13x respectively. Thus, the fold changes at the protein level were generally weaker than those observed at the RNA level.
Potentially Improved (~320-fold) Regulation of Protein Expression by Riboswitch-Induced
Alternative Splicing and Reconstitution of a Translation Initiation Codon Upon closer examination, it was revealed that a control cDNA construct encoding the inclusion isoform still yielded -4.5% of the expression levels of the exclusion isoform. From this, it was expected that the inclusion isoform should yield no expression at all. Previous studies suggest that ribosomes can potentially re-initiate following a stop codon if the distance to the translation initiation site is short (17). Therefore, the design was altered such that, the alternative exon skipping event reconstituted the Kozak sequence and AUG required for efficient translation initiation (see FIG. 38). Inclusion of the alternative exon disrupted the start codon, and inclusion of the alternative exon yielded a fully functional Kozak and AUG initiation codon. These alterations were predicted to decrease most of the unintended protein expression of the inclusion isoform. As a preliminary test, the dynamic range of protein expression achievable from an inclusion isoform and an exclusion isoform was assayed. The constructs yielded a 320-fold protein expression difference which set the upper bound of potential leaky expression to be <0.3% (see FIG. 38).
Ribosw itch-induced Conditional RNA Interference
The ability to control RNAi triggers in an inducible manner would be highly advantageous (18, 19). However, there are limited designs in which this can be achieved. Most of the approaches depend on inducible promoters that cannot be applied in therapeutic applications (20, 21). Meanwhile, others require extensive engineering on the RNA sequences that can only be achieved in vitro (22) or lack modularity (23, 24). The fact that an apical loop with size of --3-23 nt is an essential structural feature for a functional pri-miRNA that facilitates microRNA processing and maturation (25) was exploited. The apical loop of human miR-16_2 was split into 2 separate exons separated by an alternative splicing cassette (see FIG. 39). A luciferase RNAi trigger was substituted for the miR-16_2 trigger, and the AltEx9 cassette was inserted between the miR-16 exons. In the absence of drug, the 60-nt AltEx9 exon was included, disrupting the apical loop of the primary microRNA sequence and thus preventing microRNA processing. In the presence of drug, AltEx9 was skipped, and the primary' microRNA was processed into a mature microRNA targeting luciferase. In preliminary' assay s of luciferase signal (see FIG. 40), AltEx9 insertion inactivated RNAi processing and yielded high reporter signal. However, upon drug treatment, there was an 86'% decrease in reporter enzyme activity. These results demonstrated proof of concept of dynamic RN Ai regulation in response to a ligand. In some embodiments, the present disclosure relates to constructs encoding transgenes comprising any of the following polynucleotide sequences:
GCTGTTAGTGTCACACCAATTCGGGACACAAAATGGCTAACACTGGAAGTATGTAGAGAGTTCCAGAGGG GGACTTGCTCACGGCCAGACACGGAATGTAAATTTGCACATCCTTCGAAAAGCTGCCAAGTTGAAAATGG ACGAGTAATCGCCTGCTTTGATTCATTGAAAGGCCGTTGCTCCAGGGAGAACTGCAAATATCTTCATCCA CCCCCACATTTAAAAACGCAGTTGGAGATAAATGGACGCAATAACTTGATTCAGCAGAAGAACATGGCCA TGTTGGCCCAGCAAATGCAACTAGCCAATGCCATGATGCCTGGTGCCCCATTACAACCCGTGCCAATGTT TTCAGTTGCACCAAGCTTAGCCACCAATGCATCAGCAGCCGCCTTTAATCCCTATCTGGGACCTGTTTCT CCAAGCCTGGTCCCGGCAGAGATCTTGCCGACTGCACCAATGTTGGTTACAGGGAATCCGGGTGTCCCTG TACCTGCAGCTGCTGCAGCTGCTGCACAGAAATTAATGCGAACAGACAGACTTGAGGTATGTCGAGAGTA CCAACGTGGCAATTGCAACCGAGGAGAAAATGATTGTCGGTTTGCTCATCCTGCTGACAGCACAATGATT GACACC AATGAC AACACAGT CACT GT GT GT AT GGAT T ACATC AAAGGGAGATGCT CT CGGG AAAAGT GCA AATACTTTCATCCCCCTGCACATTTGCAAGCCAAGATCAAGGCTGCCCAATACCAGGTCAACCAGGCTGC AGCTGCACAGGCTGCAGCCACCGCAGCTGCCATGgtgagtagagatatcagctctctccttgttagcagt cagaaaagcaaagtgagcaactatatctgactacaagctattcatttagtaacctttttaaaaaaattgc t g a a g a t a t g 111 g 11 c a g g t a t c c c a g a c c a c gaga g a g a t c 1111 c t g t g 111 a t g g a t a c 11 g a g c a aaaatacagaaggcagactctctcctcctctcttcctttcactcttttttttttctgttagagtatcttg tttgtaattaactacaaagaggagttatcctcccaataacaactcagtagtgcctttattgtgcatgctt agtcttgttattcgttgtatatggcattccctaggtcgactaccgaagcttcatgcactttcatt cattt tagACTCAGTCGGCTGTCAAATCACTGAAGCGACCCCTCGAGGCAACCTTTGACCTGgtacgttaaaaca t a c c a g a c g g a a a c g t c t g g a g a g g t g a a g aatacgacca c c t a a c g t a c c a g t g a c c 111 c a c c 1111 a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t a a c a a 1111 a g c c 11 c g a c t g a 111 ttcttttttctttttctctttttactggtatttgttttttatactcattcactaaacagGGAATTCCTCA AGCTGTACTTCCCCCATTACCAAAGAGGCCTGCTCTTGAAAAAACCAACGGTGCCACCGCAGTCTTTAAC ACTGGTATTTTCCAATACCAACAaGCTCTAGCCAACATGCAGTTACAACAGCATACAGCATTTCTCCCAC CAGTTCCCATGGTGCACGGTGCTACGCCAGCCACTGTGTCCGCAGCAACAACATCTGCCACAAGTGTTCC CTTCGCTGCAACAGCCACAGCCAACCAGATACCCATAATATCTGCCGAACATCTGACTAGCCACAAGTAT GTTACCCAGATGTAG ( SEQ ID NO : 2080 )
Table 7: Annotated Sequences Corresponding to SEQ ID NO: 2080
Figure imgf000281_0001
Figure imgf000282_0001
AltExl :
TACCCATACGACGTACCAGATTACGCTATGgtgagtagagatatcagctctctccttgttagcagtcaga aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga a c g a t a t g 111 g 11 c g g t a t c c c g a c c a c g g a c g g a t c 1111 c t g t g 111 a t g g a t a c 11 g g c a a a a a t a c g a a c g g c g a c t c t c t c c t c c t c t c 11 c c 111 c a c t c 1111111111 c 1111 g a c g t a t c 11 g 111 g t a a ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg ttattcgttgtatatgctgatttatctttaatctttatctctatttcagACTTACCTGTAAAACATACCA TCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtaagttggatcctttcaccttttagct t g g c a t g t a g c 111 a 11 g t a t g c 11 g c 1 t g c t c a t g c 11 c c t a a c a a t 111 a g c c t tcgactga t 1111 c ttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAA GCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGT GAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG
( SEQ ID NO : 2091 )
Table 8: Annotated Sequences Corresponding to SEQ ID NO: 2091
Figure imgf000283_0001
T ACC C AT ACG AC GT AC C AGAT T AC GC T AT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c ag t c: a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga a c g a t a t g 111 g 11 c g g t a t c c c g a c c a c g g a c g g a t c 1111 c t g t g 111 a t g g a t a c 11 g g c a a a a a t a cgaacggcgactctctcctcctctcttcctttcactcttttttttttcttttgacgtatcttgtttgtaa ttaa ctacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg ttattcgttgtatatgctgatttatctt taatctt tatctctatttcagACGTACCAGTAAAACATACCA TCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACTGgtacgttggatcctttcaccttttagct tggcatgtagctttattgtatgcttgcttgctcatgcttcctaacaattttagccttcgactgatttttc ttt tttcttt ttctctttt tcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAA
GCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGT
GAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2099 ;
Table 9: Annotated Sequences Corresponding to SEQ ID NO: 2099
Figure imgf000284_0001
TACO CAT AC G AC GT AC C AGAT TAG GC T AT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c a g t c a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtt tg ttcggtatcccgaccacggacggatctttt ctgtgtt tatggatacttggcaaaaata cgaacggcgactctctcctcctctcttcctttcactc 1111111111 c t T 11 g a c g t a t c 11 g t 11 g t a a ttaactacaaatgggacgt tatcctcccaataacaactcgtacgtgcct ttattgtgcatgcttg tcttg ttattcgttgtatatgctgatttatctt taatctt tatctctatttcagACTCACATCTAAAACATACCA TCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTAGATgtgagttggatcctttcaccttttagct t g g c a t g tag c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t a a c a a 1111 a g c c 11 c g a c t g a 11111 c ttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAA
GCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGT
GAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2102 ) Table 10: Annotated Sequences Corresponding to SEQ ID NO: 2102
Figure imgf000285_0001
AltEx4 : TACCCATACGACGTACCAGATTACGCTATGgtgagtagagatatcagctctctccttgttagcagtcaga aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtttgttcggtatcccgaccacggacggatcttttctgtgtttatggatacttggcaaaaata c g a a c g g c g a c t c t c t c c t c c t c t c 11 c c 111 ca c t c 1111111111 c t T 11 g a c g t a t c 11 g 111 g t a a 11 a a c t a caaat g g g a c g 11 a t c c t c c c a a t a a c a a c t c g t a c g t g c c 111 a 11 g t g c a t g c 11 g t c 11 g ttattcgttgtatatgctgatttatctttaatctttatctctatttcag T AGT AC CAGT AAAACAT ACC A TCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACTGgtactatggatcctttcaccttttagct tggcatgtagctttattgtatgcttgcttgctcatgcttcctaacaattt tagcctt cgactgatt tttc ttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAA GCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGT
GAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2105 )
Table 11 : Annotated Sequences Corresponding to SEQ ID NO: 2105
Figure imgf000286_0001
Alt Ex 5 :
T ACC C AT ACGAC GT AC C AGAT T AC GC TA1]? G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c a g t c a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtt tg ttcggtatcccgaccacggacggatctttt ctgtgtt tatggatacttggcaaaaata cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg ttattcgttgtatatgttcatgcactttcattcattttagACTTACCTGTAAAACATACCATCCGTAACC GGATGGAGAGGTGAAGAATACGACCACCTACAGgtaagttggatcctttcaccttttagcttggcatgta gctttattgtatgcttgcttgctcatgcttcctaacaattttagccttcgactgatttttcttttttctt tttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAAGCACTTTCC GGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGTGAA.GAAGCG GTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2108 ;
Table 12: Annotated Sequences Corresponding to SEQ ID NO: 2108
Figure imgf000287_0001
Figure imgf000288_0002
Alt Ex 6 :
T ACC CAT AC GAG GT AC C AGAT T AC GC TAT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c a g t c a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtttgttcggtatcccgaccacggacggatcttttctgtgtttatggatact tggcaaaaata cgaacggcgactctctcctcctctcttcctttcactctttttt ttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg ttattcgttgtatatgttcatgcactttcattcattttagACGTACCAGTAAAACATACCATCCGTAACC GGATGGAGAGGTGAAGAATACGACCACCTACTGgtacgttggatcctttcaccttttagcttggcatgta gctttattgtatgcttgcttgctcatgcttcctaacaattttagccttcgactgatttttcttttttctt tttctctttt tcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAAGCACTTTCC GGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGTGAAGAAGCG GTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2109 )
Table 13 : Annotated Sequences Corresponding to SEQ ID NO: 2109
Figure imgf000288_0001
Figure imgf000289_0001
TACCCATACGACGTACCAGATTACGCTATGgtgagtagagatatcagctctctccttgttagcagtcaga aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga a c g a t a t g 111 g 11 c g g t a t c c cgac c a c g g a c g g a t c 1111 c t g t g 111 a t g g a t a c 11 g g c a a a a a t a cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg tt at tcgttgtatatgtt catgcactt tcattcatt ttagACTCACATCTAAAACATACCATCCGTAACC GG AT GG AG AGGT GAAG AAT AC GAG C ACC T AGAT g t g a g 11 g g a t c c 111 c a c c 1111 a g c: 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t catgcttcctaaca a 1111 a g c c 11 c g a c t g a 11111 c 111111 c 11 tttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCAAGAAGCACTTTCC GGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCAGGGTGAAGAAGCG GTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCGGCG ( SEQ ID NO : 2110 )
Table 14: Annotated Sequences Corresponding to SEQ ID NO: 2110
Figure imgf000289_0002
Figure imgf000290_0001
Figure imgf000290_0003
NO: 2111)
Table 15: Annotated Sequences Corresponding to SEQ ID NO: 2111
Figure imgf000290_0002
Figure imgf000291_0001
Alt Ex 9 :
TACCCATACGACGTACCAGATTACGCTATGgtgagtagagatatcagctctctccttgttagcagtcaga a a a t g c a a a c g t g g c a a c t a t a t c t g a c t a c a a t g c t a 11 c a 111 g t a a c c 11111 a a a a a a a 11 g c t g a a c g a t a t g 111 g 11 c g g t a t c c c g a c c a c g g a c g g a t c 1111 c t g t g 111 a t g g a t a c 11 g g c a aa a a t a cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctt tattgtgcatgcttgtcttg ttattcgttgtatatggaattcctgatttatctttaatctttatctctttttcagTACCTGTAAAACATA CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactgctggatcctttcacctttt a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t a a c a a 1111 a g c c 11 c g a c t g a 11 tttcttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG GCG ( SEQ ID NO : 2112 )
Table 16: Annotated Sequences Corresponding to SEQ ID NO: 2112
Figure imgf000291_0002
Figure imgf000292_0001
Alt Ex 10 :
T ACCC AT ACGAC GT AC C AGAT T AC GC T AT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c ag t c: a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtttgttcggtatcccgaccacggacggatcttttctgtgtttatggatacttggcaaaaata cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgt tatcctcccaataacaactcgtacgtgcct ttattgtgcatgcttg tcttg ttattcgttgtatatggaattcctgatt tatctttaatctttatctctt tt tcagTACCTGTAAAACATA CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactggtggatcctttcacctttt a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t aaca a 1111 a g c c 11 c g a c t g a 11 tttcttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG GCG ( SEQ ID NO : 2116 )
Table 17: Annotated Sequences Corresponding to SEQ ID NO: 2116
Figure imgf000292_0002
Figure imgf000293_0001
TACCCATACGACGTACCAGATTACGCTATGgtgagtagagatatcagctctctccttgttagcagtcaga a a a t g c a a a c g t g g c a a c t a t a t c t g a c t a c a a t g c t a 11 c a 111 g t a a c c 11111 a a a a a a a 11 g c t g a acgatatgtttgttcggtatcccgaccacggacggatcttttctgtgtttatggatacttggcaaaaata cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctt tattgtgcatgcttgtcttg 11 a 11 c g 11 g t a t a t g g a a 11 c c t g a 111 a t c 111 a a t c 111 a t c t c 11111 c a g T AC C T GT AAAAC AT A CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactgatggatcctttcacctttt a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t a a c a a 1111 a g c c 11 c g a c t g a 11 tttcttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG GOG ( SEQ ID NO : 2118 )
Table 18: Annotated Sequences Corresponding to SEQ ID NO: 21 18
Figure imgf000294_0001
Alt Ex 1
T ACC C AT A CG AC GT AC C AG AT T AC GC TA']? G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c a g t c a g a aaatgcaaa.cgtggca.actatatctgactaca.atgctattcatttgta.acctttttaaaaaaattgctga acgatatgtttgttcggtatcccgaccacggacggatcttttctgtgtttatggata cttggcaaaaata cgaacggcgactctctcctcctctct tcctttcactcttttt tttttctTt tgacgtatcttgtt tg taa ttaactacaaatgggacgt tatcctcccaataacaactcgtacgtgcct ttattgtgcatgcttgtcttg 11 a 11 c g 11 g t a t a t g g a a 11 c c t g a 111 a t c 111 a a t c 111 a t c t c 111111 a g T AC C T GT AAAAC AT A CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactgatggatcctttcacctttt a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t a a c a a 1111 a g c c 11 c g a c t g a 11 tttcttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG
GCG ( SEQ ID NO : 2120 )
Table 19: Annotated Sequences Corresponding to SEQ ID NO: 2120
Figure imgf000295_0001
T ACC C AT ACG AC GT AC C AGAT T AC GC T AT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c ag t c: a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga a c g a t a t g 111 g 11 c g g t a t c c c g a c c a c g g a c g g a t c 1111 c t g t g 111 a t g g a t a c 11 g g c a a a a a t a cgaacggcgactctctcctcctctcttcctttcactcttttttttttctTttgacgtatcttgtttgtaa ttaactacaaatgggacgttatcctcccaataacaactcgtacgtgcctttattgtgcatgcttgtcttg ttattcgttgtatatggaattcctgatt tatctttaatctttatctctt tt ttagTACCTGTAAAACATA CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactgctggatcctttcacct tt t agcttggcatgtagctttattgtatgct tgcttgctcatgct tcctaacaattttagccttcgactgatt ttt ctttttt ct ttttct cttttt cagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAAAATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG GCG ( SEQ I D NO : 2123 )
Table 20: Annotated Sequences Corresponding to SEQ ID NO: 2123
Figure imgf000296_0001
Alt Ex 14 : TACO CAT AC GAG GT AC C AGAT TAG GC T AT G g t g a g t a g a g a t a t c a g c t c t c t c c 11 g 11 a g c a g t c a g a aaatgcaaacgtggcaactatatctgactacaatgctattcatttgtaacctttttaaaaaaattgctga acgatatgtt tg ttcggtatcccgaccacggacggatctttt ctgtgtt tatggatacttggcaaaaata c g a a c g g c g a c t c t c t c c t c c t c t c 11 c c 111 c a c t c 1111111111 c t T 11 g a c g t a t c 11 g 111 g t a a ttaactacaaatgggacgt tatcctcccaataacaactcgtacgtgcct ttattgtgcatgcttg tcttg ttattcgttgtatatggaattcctgatt tatctttaatctttatctctt tt ttagTACCTGTAAAACATA CCATCCGTAACCGGATGGAGAGGTGAAGAATACGACCACCTACAGgtactgttggatcctttcacctttt a g c 11 g g c a t g t a g c 111 a 11 g t a t g c 11 g c 11 g c t c a t g c 11 c c t aaca a 1111 a g c c 11 c g a c t g a 11 tttcttttttctttttctctttttcagGGGCCCAAGCCAGCCGTGGAGTCTGAGTTTTCTAAGGTACTCA AGAAGCACTTTCCGGGCGAGCGATTTAGGTCTAGCTACATGAAGCGGGGTGGTAASATCTTGGCAGCCCA GGGTGAAGAAGCGGTCGTCGCGTATCTGCAAGGCAAGTCCGAGGAGGAACCCCCGAATTTTCAGCCGCCG GCG ( SEQ ID NO : 2128 ) Table 21: Annotated Sequences Corresponding to SEQ ID NO: 2128
Figure imgf000297_0001
Design 3 Sequence (+) Drug Control Sequence -> Exclusion of exon (lower case intron spliced out to produce a continuous start codon; :
Figure imgf000298_0002
NO: 2131)
Figure imgf000298_0003
Table 22: Annotated Sequences Corresponding to SEQ ID NO: 2131 and 2132
Figure imgf000298_0001
Note : The le tters that are bolded comprise the Kozak sequence and start codon .
Design 4 Sequence
RN Ai A 11 E x 9 : ATAAACATACATGCGCAACTGACATACTTGTTCCACTCACCGTGTTGCTACAGCTATAAGTAGgtaagta gagatatcagctctctccttgttagcagtcagaaaatgcaaacgtggcaactatatctgactacaatgct attcatttgtaacctttttaaaaaaattgctgaacgatatgt ttgttcggtatcccgaccacggacggat ctt ttctgtg tt tatggatacttggcaaaaatacgaacggcgactct ct cctcctctcttcct tt cactc 1111111111 c 1111 g a c g t a t c 11 g 111 g t a a 11 aac t acaaat gggacg 11 at cc t cccaa t aacaa c tcgtacgtgcct ttattgtgcatgcttg tcttgttattcgttgtatatggaattcctgatttatctt taa tctttatctctttttcagTACCTGTAAAACATACCATCCGTAACCGGATGGAGAGGTGAAGAATACGACC ACCTACAGgtactgctggatcctttcaccttttagcttggcatgtagctttattgtatgcttgcttgctc atgcttcctaacaattttagccttcgactgatttttcttttttctttttctctttttcagTGAAATATAT ATTAAACATA.TAGCTGATGCAACACGGATAGTGTGACAGGGA.TACAGCAAC ( SEQ ID NO : 2138 )
Table 23: Annotated Sequences Corresponding to SEQ ID NO: 2138
Figure imgf000299_0001
Figure imgf000300_0003
Figure imgf000300_0001
Table 24: 3’ Splice Site Sequences
Figure imgf000300_0002
Figure imgf000301_0001
Table 25: 5’ Splice Site Sequences
Figure imgf000301_0002
References of Example 7
1. Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437-451 (2017).
2. Chang, A. L., Wolf J. J. & Smolke, C. D. Synthetic RNA switches as a tool for temporal and spatial control over gene expression. Curr. Opin. Biotechnol. 23, 679-688 (2012).
3. Mathur, M. et al. Programmable mutually exclusive alternative splicing for generating RNA and protein diversity. doi: 10.1038/s41467-019-10403-w.
4. Van Alstyne, M. et al. Gain of toxic function by long-term AAV9-mediated SMN overexpression in the sensorimotor circuit. Nat. Neurosci. (2021) doi: 10. 1038/s41593- 021-00827-3.
5. Garcia-Sanz, J. A. et al. Externally-Controlled Systems for Immunotherapy: From Bench to Bedside. Front. Immunol. I l , (2020).
6. Strobel, B. et al. A Small-Molecule-Responsive Riboswitch Enables Conditional Induction of Viral Vector-Mediated Gene Expression in Mice. ACS Synth, Biol, 9, 1292- 1305 (2020).
7. Monteys, A. M. et al. Regulated control of gene therapies by drug-induced splicing. Nat. 2021 5967871 596, 291-295 (2021).
8. Keller, C. G. et al. An orally available, brain penetrant, small molecule lowers huntingtin levels by enhancing pseudoexon inclusion. Nat. Commun. 2022 131 13, 1-11 (2022).
9. Wong, M. S., Kinney, J. B. & Krainer, A. R. Quantitative Activity Profile and Context Dependence of All Human 5' Splice Sites. Mol. Cell 71, 1012-1026 (2018).
10. Wagner, S. D. et al. Dose-Dependent Regulation of Alternative Splicing by MBNL ProteinsReveal s Biomarkers for Myotonic Dystrophy. PLoS Genet. 12, (2016).
11. Xiao, H., Edwards, T. E. & Ferre-D’Amare, A. R. Structural basis for specific, high-affinity tetracycline binding by an in vitro evolved aptamer and artificial riboswitch. Chem. Biol,
15, 1125 (2008).
12. Wurmthaler, L. A., Sack, M., Gense, K., Hartig, J. S. & Gamerdinger, M. A tetracy cl in edependent ribozyme switch allows conditional induction of gene expression in Caenorhabditis elegans, Nat. Commun. 2019 101 10, 1-8 (2019).
13. Tickner, Z. J. & Farzan, M. Riboswitches for Controlled Expression of Therapeutic Transgenes Delivered by Adeno-Associated Viral Vectors. Pharmaceuticals (Basel). 14, (2021).
14. Pausch, P. et al. CRISPR-CasF from huge phages is a hypercompact genome editor.
Science (80-. ). 369, 333-337 (2020).
15. Regenthal, R., Krueger, M., Koeppel, C. & Preiss, R. DRUG LEVELS: THERAPEUTIC AND TOXIC SERUM/ PLASMA CONCENTRATIONS OF COMMON DRUGS. Journal of Clinical Monitoring and Computing vol. 15 (1999).
16. Schulz, M, Iwersen-Bergmann, S., Andresen, H. & Schmoldt, A. Therapeutic and toxic blood concentrations of nearly 1,000 drugs and other xenobiotics. Crit. Care 16, 1-4 (2012).
17. Kalstrup, T. & Blunck, R. Reinitiation at non-canonical start codons leads to leak expression when incorporating unnatural amino acids. Sci. Reports 2015 51 5, 1-9 (2015).
18. Wiznerowicz, M., Szulc, J. & Trono, D. Tuning silence: conditional systems for RNA interference. Nat. Methods 2006 39 3, 682-688 (2006).
19. Dickins, R, A. et al. Tissue-specific and reversible RNA interference in transgenic mice. Nat. Genet. 39, 914 (2007).
20. Lee, S. K. & Kumar, P. Conditional RNAi: Towards a silent gene therapy. Adv. Drug Deliv. Rev. 61, 650-664 (2009).
21. Premsrirut, P. K. et al. A Rapid and Scalable System for Studying Gene Function in Mice Using Conditional RNA Interference. Cell 145, 145-158 (2011 ).
22. Hochrein, L. M., Schwarzkopf, M., Shahgholi, M., Yin, P. & Pierce, N. A. Conditional Dicer Substrate Formation via Shape and Sequence Transduction with Small Conditional RNAs. J. Am. Chem. Soc. 135, 17322-17330 (2013).
23. Kumar, D., An, C. Il & Yokobayashi, Y. Conditional RNA interference mediated by allosteric ribozyme. J. Am. Chem. Soc. 131, 13906-13907 (2009).
24. Kumar, D., Kim, S. H. & Yokobayashi, Y. Combinatorially inducible RNA interference triggered by chemically modified oligonucleotides. J. Am. Chem. Soc. 133, 2783-2788 (2011).
25. Adams, L. Non-coding RNA: Pri-miRNA. processing: structure is key. Nat, Publ. Gr.
(2017) doi : 10.110 l/gr.208900.116. H. Example 8; rAAVs for Transgene Expression
Recombinant adeno-associated viruses (rAAVs) for enhancing the expression of a transgene comprising an inducibly-spliced exon cassette in a subject, such as a human, are produced. Each of the rAAVs encoding a transgene comprising an inducibly-spliced exon cassette is flanked with wild-type AAV2 ITR sequences, although in principle alternate ITR sequences could be used to control targeting to different tissues. Additionally, rAAVs are designed with a 3’ UTR such as bovine growth hormone (bGH) poly A signal sequence or SV40 poly adenylation sequence, for efficient transcription termination in a eukaryotic host. For expression from the transgene, each rAAV further encodes a promoter sequence that is 5’ to the start of the transgene gene. rAAVs may be produced, for example, having either a chicken p~ actin (CBA) promoter with a CMV enhancer (CMVe), a CB.A intron, and a CBA exon, collectively referred to as a CAG promoter, or promoters such as MHCK7 or synapsin. These rAAVs can be used to increase expression of the transgene in cells of a subject, particularly in a subject known to have a mutation in a gene implicated in a disease or disorder.
I. Example 9. In vivo Expression Studies rAAVs comprising a transgene are made as described above and administered intravenously to mice (n=3-6/group) at 1 x 1013 or 5 x IO13 vector genomes per kg subject (vg/kg). Two to four weeks after rAAV dosing, mice go either untreated or treated with ligand at varying doses. Then, target tissues from mice subjects are harvested, nucleic acids are extracted and analyzed for transgene expression and splicing patterns using RT-PCR and/or deep sequencing analysis. Additionally, expression of the target which the miRNA encoded by the transgene binds to may be assessed in animal tissues via methods known in the art such as ELIS A, western blot, mass spectrometry, qPCR, or RNA-seq.
J. _ Example 10. Ligand-Regulation of Protein and microRNA Production
This Example describes ligand-mediated regulation of splicing cassettes that can control production of proteins and/or microRNAs. Exemplary nucleic acids (e.g., one comprising cassettes) are described which are regulated by tetracycline, risdiplam, and branaplam. High- throughput screening analyses are also described which provides a platform that can be used to identify cassettes that respond differently in the presence and absence of a ligand of interest.
Tetracycline-Responsive Nucleic Acids
Nucleic acids were designed to regulate S. aureus Cas9 using tetracycline. An alternatively spliced exon cassette comprising a tetracycline aptamer was used to interrapt the canonical AUG start codon for 5. aureus Cas9. Cassettes were designed such that: exon inclusion, in the absence of tetracycline, omitted the start codon, thereby yielding lack of Cas9 expression; and exon inclusion, in the presence of tetracycline, reconstituted the start codon, thereby yielding Cas9 expression (FIG. 41).
A screen was performed in HEK293T cells to test different combinations of the alternative exon 3’ splice site, 5’ splice site, and the stem length of the tetracycline aptamer. A total of 2760 sequence variants were tested in these analyses to identify sequence variants that exhibited strong responsiveness to tetracycline. The variants were evaluated by aptamer stem length and splice site strength at varying doses (25 pM, 50 pM and 100 pM) of tetracycline. Sequence variants that, yielded the largest delta PSI values tended to have longer stem lengths and slightly weaker splice sites (FIGs. 42A-42C). The sequences corresponding to FIGs. 41 A- 42C are disclosed in Table 26.
Further nucleic acids were designed which leveraged tetracycline aptamer-regulated splicing to control microRNA biogenesis via exon skipping. A tetracycline-responsive exon cassette was placed between two portions of a primary microRN A sequence such that: exon inclusion, in the absence of tetracycline, led to suboptimal recognition of the microRNA precursor by Dicer and thus lower production of a mature microRNA; and exon skipping, in the presence of tetracycline, led to proper recognition of the microRNA precursor by Dicer and thus higher production of the mature microRNA. The nucleic acids were introduced into a Drosha knockout HEK293 cell line. Absence of Drosha is required to be able to accurately quantitate splicing patterns, since the primary microRNA is expected to be cleaved out of the pre-mRNA by Drosha. Cells either went untreated or treated with 100 pM tetracycline to induce exon skipping. RT-PCR and Northern blot analyses were used to measure microRNA abundance. The results indicated the encoded microRNA showed low abundance relative to U6 snRNA in the absence of tetracycline, and high relative abundance in the presence of tetracycline (FIGs. 49A- 49C). The sequences corresponding to FIGs. 49A-49C are disclosed in Table 27.
Risdiplam-Responsive Nucleic Acids
Further nucleic acid designs were generated in order to regulate erythropoietin (EPO) expression via a risdiplam -responsive intron. Nucleic acids were designed such that risdiplam enhanced recognition of suboptirnal 5’ splice sites. Known risdiplam-responsive sequences were incorporated into an exon-intron cassette such that: intron retention, in the absence of risdiplam, precluded expression of EPO; and intron removal, in the presence of risdiplam, reconstituted the AUG start codon for EPO (FIG. 43).
A library of sequences with selected randomized positions of the responsive sequence was tested for sensitivity to risdiplam. The percent intron removal for 30,455 variants were analyzed in a high throughput screen in HEK293T cells, wherein cells were treated with varying concentrations of risdiplam (250 nM, 500 nM, and 1000 nM risdiplam). Sequences that yielded in enhanced intron removal and those that yielded enhanced retention were identified (FIG. 44). The sequences corresponding to FIGs. 43-44 are disclosed in Table 28,
Seven sequence variants were selected, cloned, and tested in further analyses using real- time polymerase chain reaction (RT-PCR). Cells comprising one of the seven sequence variants went either untreated or treated with 1 uM risdiplam. The abundance of intron-retained products and intron-spliced products as a result of risdiplam treatment were quantified (FIGs. 45A-45B). The sequences corresponding to FIGs. 45A-45B are disclosed in Table 29.
Based on the screen performed in HEK293T cells, sequences corresponding to variants 3 and 7 were then used to engineer nucleic acids which regulate expression of GABRG2 isoforms in response to risdiplam. Variants 3 and 7 were incorporated into an alternatively spliced nucleic acid that allows for production of either the exon 9-containing (long) i soform of GABRG2 or the exon 9-skipped (short) isoform. Nucleic acids were designed such that addition of risdiplam led to inclusion of the alternative exon and production of GABRG2L. The nucleic acids were introduced into Neuro2A cells which then either went untreated or treated with 1 pM risdiplam. RT-PCR was performed using primers that target sequences flanking the alternatively spliced exon to detect expression of short, and long isoforms of GABRG2 (FIGs. 46A-46C). The sequences corresponding to FIGs. 46A-46C are disclosed in Table 30. Further nucleic acids were designed to comprise a risdiplam-responsive sequence from POMT2 exon 1 lb for regulation of CSNK1D isoform expression. Risdiplam-responsive sequences were incorporated into an alternatively spliced gene that allows for production of either the exon 9-containing (long) isoform of CSNK1D or the exon 9-skipped (short) isoform such that addition of risdiplam leads to inclusion of the alternative exon and production of the long isoform of CSNK I D. The nucleic was introduced into HEK293T and Neuro2A cells which then went either untreated or treated with 1 pM risdiplam. RT-PCR was performed using primers that target sequences flanking the alternatively spliced exon to detect expression of CSNK ID isoforms. Further modifications to the risdiplam-responsive nucleic acid were made by introducing an A:C mutation at the +10 position in the intron downstream of POMT2 El 1 B. After introducing the nucleic acid comprising said A:C mutation, western blot was performed against protein tags incorporated into respective ends of the alternatively spliced sequences to confirm the expression of different CSNK1D isoforms (FIGs. 47A-47D). The sequences corresponding to FIGs. 47A-47D are disclosed in Table 31.
The risdiplam-responsive sequence from POMT2 exon 1 lb were then re-purposed to regulate CasMini expression. The nucleic acid comprised a risdiplam-responsive splicing cassette that regulates translation of the N-terminal portion of CasMini. Exon l ib and flanking introns from POMT2 were modified to contain a start codon in frame with the downstream CasMini such that inclusion of this exon led to production of an N-terminal portion of CasMini fused to nanoluciferase. This cassette was tested for responsiveness to varying concentrations of risdiplam in HEK293T. Several additional variants were cloned and assayed for nanoluciferase signal in the absence or presence of IpM risdiplam in Neuro2A cells (FIGs. 48A-48C). The sequences corresponding to FIGs. 48A-48C are disclosed in Table 32.
Branaplam-Responsive Nucleic A cids
Nucleic acids were designed leveraging branapl am-regulated splicing to control microRNA biogenesis via exon inclusion. Since branaplam can enhance exon inclusion via recognition of certain sequences near the 5’ splice site of the alternatively spliced exon, the primary microRNA sequence was split across the 2nd intron of a cassette exon event derived from SF3B3 such that inclusion of the cassette exon facilitated formation of the full microRNA base stem, which can enhance Drosha recognition and processing. YZ230 was used as a control that encodes the sequence expected with exon inclusion. YZ231 was used as a control that encodes the sequence expected with exon skipping. YZ232 was used as a variant in which the exon cassette is present and can respond to branaplam. Nucleic acids were introduced into cells and Northern blot was used to test responsiveness to branaplam. An increase in niicroRNA production was observed in cells comprising YZ232 and treated with IpM branaplam. Further nucleic acids were then designed to test the ability of microRNAs encoded by branaplam- responsive cassettes to regulate luciferase expression. YZ95 is a potent primary' microRNA scaffold that is effectively recognized by Drosha and can downregulate a GFP reporter transcript comprising a target site. YZ293 was produced by mutating bases in the stem of YZ95 which are recognized by Drosha. YZ301 was produced by re-constituting the complete microRNA stem. In each case, luciferase transcript is targeted by the microRNA. Further modifications were made to reduce leaky microRNA production due to basal recognition of an incomplete microRNA stem. In control conditions (CTRL), abundance GFP was visible in HEK293T cells. YZ95, a potent primary microRNA scaffold that, is effectively recognized by Drosha, was downregulated a GFP reporter transcript containing a target site. YZ293, comprising several bases in the niicroRNA stem that were mutated to minimize recogniti on by Drosha; showed de-targeting of the GFP transcript. This configuration mimicked the nucleic acid descried in FIG. 50 in which the branaplam-responsive exon was skipped. Despite this weakening, reconstitution of the complete stem, mimicking the construct from FIG. 50 in which the branaplam-responsive exon was included, resulted in effective targeting of the GFP transcript (FIGs. 50A-50C and 51A-51B). The sequences corresponding to FIGs. 50A-C and 51A-51B are disclosed in Table 33.
Table 26. YZ250 Tetracycline-Responsive saCas9 Library
Figure imgf000308_0001
Figure imgf000309_0001
Figure imgf000310_0001
Table 27. Non-Limiting Examples of Sequences for Tetracycline-Responsive Interfering RNAs
Figure imgf000310_0002
Table 28, YZ237 Risdi pl am -Responsive Library for Mouse Erythropoietin Expression
Figure imgf000310_0003
Table 29. Non-limiting Examples of Risdiplam-Responsive Sequences
Figure imgf000311_0001
Table 30. Non-limiting Examples of Risdiplam -Responsive GARB2 Nucleic Acids
Figure imgf000311_0002
Figure imgf000312_0001
Figure imgf000313_0002
Table 31. Non-limiting Examples of Risdipl am -Responsive CSNK1D Nucleic Acids
Figure imgf000313_0001
Figure imgf000314_0001
Figure imgf000315_0001
Table 32. Non-limiting Examples of Risdiplam-Responsive CasMini Nucleic Acids
Figure imgf000315_0002
Figure imgf000316_0002
Figure imgf000316_0001
Figure imgf000317_0001
Figure imgf000318_0001
Table 34. Non-limiting Examples of Sequences Capable of Binding a Ligand
Figure imgf000318_0002
Figure imgf000319_0001
Figure imgf000320_0001
Figure imgf000321_0001
OTHER EMBODIMENTS
All of the features disclosed in this specification may be combined in any combination.
Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent, or similar purpose. Thus, unless expressly stated otherwise, each feature disclosed is only an example of a generic series of equivalent or similar features. From the above description, one skilled in the art can easily ascertain the essential characteristics of the present disclosure, and without departing from the spirit and scope thereof, can make various changes and modifi cations of the present disclosure to adapt it to various usages and conditions. Thus, other embodiments are also within the claims.
EQUIVALENTS
While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary' and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that, are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, /.<?., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of’ or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary' meaning as used in the field of patent law.
As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
It should also be understood that, unless clearly indicated to the contrary’, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited. In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of’ and “consisting essentially of’ shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03. It should be appreciated that embodiments described in this document using an open-ended transitional phrase (e.g, “comprising”) are also contemplated, in alternative embodiments, as “consisting of” and “consisting essentially of” the feature described by the open-ended transitional phrase. For example, if the disclosure describes “a composition comprising A and B”, the disclosure also contemplates the alternative embodiments “a composition consisting of A and B” and “a composition consisting essentially of A and B”.

Claims

CLAIMS What is claimed is:
1 . A polynucleotide comprising a sequence encoding a ligand-responsive sequence, wherein the polynucleotide is capable of being alternatively spliced in the presence of a ligand to produce a first RNA or a second RNA.
2. The polynucleotide of claim I, wherein the polynucleotide comprises an alternative exon flanked by at least two introns, wherein the alternative exon is operably linked to the ligand- responsive sequence.
3. The polynucleotide of claim 2, wherein the first KN A comprises the alternative exon, wherein the second RNA does not comprise the alternative exon.
4, The polynucleotide of any one of claims 1-3, wherein the first RNA encodes a long isoform of an RNA of interest and/or the second RNA encodes a short isoform of the RNA of interest.
5 The polynucleotide of any one of claims 1-3, wherein the first RNA encodes an RNA of interest.
6. The polynucleotide of claim 5, wherein the first RNA is not operably linked to a premature stop codon.
7. The polynucleotide of claim 5, wherein the first RNA is operably linked to a start codon.
8. The polynucleotide of any one of claims 1-3, wherein the second RNA encodes an RNA of interest.
9 The polynucleotide of claim 8, wherein the second RNA is not operably linked to a pre- mature stop codon.
10. The polynucleotide of claim 9, wherein the second RNA is operably linked to a start codon.
11. The polynucleotide of claim 9, wherein the RNA of interest i s an interfering RNA.
12. The polynucleotide of claim 5 or 8, wherein the RNA of interest is a microRNA.
13. The polynucleotide of claim 12, wherein the second RNA encodes the microRNA.
14. The polynucleotide of any one of claims 5-10, wherein the RNA of interest encodes a protein.
15. The polynucleotide of any one of claims 5-10, wherein the RNA of interest encodes a CRISPR/Cas nuclease or a guide RNA (gRNA).
16. The polynucleotide of claim any one of claims 5-15, wherein the RNA of interest encodes a therapeutic RNA and/or a therapeutic protein.
17. The polynucleotide of any one of claims 1-16, wherein the ligand-responsive sequence is a risdiplam-responsive sequence or a branaplam-responsive sequence.
18. The polynucleotide of claim 17, wherein the alternative exon comprises a first, portion of the risdiplam-responsive sequence and an intron downstream of the alternative exon comprises a second portion of the risdiplam-responsive sequence.
19. The polynucleotide of claim 17 or 18, wherein the first portion of the risdiplam- responsive sequence comprises a WGA sequence and the second portion of the risdiplam- responsive sequence comprises a GTAAGW sequence.
20. The polynucleotide of any one of claims 17-19, wherein the alternative exon further comprises a AGGAAG sequence which is 5’ to the WGA sequence.
21 . The polynucleotide of any one of claims 17-20, wherein the alternative exon further comprises an upstream sequence which is 5’ to the AGGAAG sequence.
22. The polynucleotide of claim 21, wherein the upstream sequence comprises at least 10 nucleotides.
23. The polynucleotide of any one of claims 20-22, wherein the alternative exon further comprises a down stream sequence which is 3’ to the AGGAAG sequence and 5’ to the WGA sequence.
24. The polynucleotide of claim 23, wherein the downstream sequence comprises at least 6 nucleotides.
25. The polynucleotide of any one of claims 17-24, wherein the risdiplam-responsive sequence comprises NNNNNNNNNNAGGAAGNNNNNNNNNN AWGAGTAAGW (SEQ ID NO: 2183), wherein N is any nucleotide and W is A or T.
26. The polynucleotide of any one of claims 17-24, wherein the risdiplam-responsive sequence comprises YWWKWWWMKYAGGAAGYTAKTWGTTAWG AGTAAGW (SEQ ID NO: 2184) or YWWKWWWMKY AGGAAGYTAKTRWGTT AWGAGTAAGW (SEQ ID NO: 2185), wherein Y is C or T, K is G or T, W is A or T, M is A or C, and R is A or G.
27. The polynucleotide of claim 25 or 26, wherein the upstream sequence comprises ATAATTTTTT (SEQ ID NO: 2191), CACTTTTATT (SEQ ID NO: 2192), CATTATAATC (SEQ ID NO: 2193), CCATAAGTTT (SEQ ID NO: 2194), TACTATTTAT (SEQ ID NO: 2195), TCATATCTAT (SEQ ID NO: 2196), or TTAGTATCGT (SEQ ID NO: 2197) and/or the downstream sequence comprises GTTACGCTTT (SEQ ID NO: 2198), TTGTGTTGTT (SEQ ID NO: 2199), TTAGTGTGTT (SEQ ID NO: 2200), TGATGTATAT (SEQ ID NO: 2201), TTTATCTATC (SEQ ID NO: 2202), TTTTTTACAG (SEQ ID NO: 2203), or CTATTAGTTA (SEQ ID NO: 2204).
28. The polynucleotide of claim 26 or 27, wherein the risdiplam-responsive sequence comprises
CATTATAATCAGGAAGTTAGTGTGTTAAGAGTAAGT (SEQ ID NO: 2207) or TTAGT ATCGT AGGAAGCT ATT AGTTAATGGT AAGT (SEQ ID NO : 2208).
29. The polynucleotide of claim 17 or 18, wherein the risdiplam-responsive sequence comprises ATRTCCACTYAAAAAAATCTGGCGATGGGAGCAGAAWGAGTAAGW (SEQ ID NO: 2186), wherein R is A or G, Y is C or T, and W is A or T.
30. The polynucleotide of claim 29, wherein the risdiplam-responsive sequence comprises ATGTCC ACT TAAAAAAATCTGGCGATGGGAGCAGAAAGAGr AAGT (SEQ ID NO: 2209), ATGTCCACTCAAAAAAATCTGGCGATGGGAGCAGAAAGAGTAAGT (SEQ ID NO : 2210), or ATATCC ACTTAA AA AA ATCTGGCG ATGGG AGO AGA A AGAGTAAGT (SEQ ID NO: 2211).
31. The polynucleotide of claim 17, wherein the branaplam-responsive sequence comprises ATTTAACATTTTTGAGTCAATCCAAGTAATGCAGGAGGTTCATGATTGTGTAGA (SEQ ID NO: 2187).
32. The polynucleotide of any one of claims 1-4 or 9-16, wherein the ligand-responsive sequence is a tetracycline-responsive sequence.
33. The polynucleotide of claim 32, wherein the tetracycline-responsive sequence is located in a tetracycline-responsive aptamer comprising the sequence
TAAAACATACCWDMCGKAAMCGKHWGGAGAGGTGAAGAATACGACCACCTA (SEQ ID NO: 2188), wherein W is A or T, wherein D is A, G, or T, wherein M is A or C, wherein K is G or T, and wherein H is A, C, or T.
34. The polynucleotide of claim 32 or 33, wherein the polynucleotide comprises, from 5’ to 3’, an upstream 3' splice site, a first stem region, a 5' splice site reverse complementary sequence, the tetracycline-responsive sequence, a 5' splice site, a sequence comprising GT, the second stem region, and a downstream 3’ splice site.
35. The polynucleotide of claim 34, wherein the upstream 3’ splice site is at least 20 nucleotides long and the two nucleotides at the 3’ end are AG.
36. The polynucleotide of claim 35, wherein the 18 nucleotides 5’ of the AG nucleotides in the upstream 3' splice site comprises TCCTCATTTCCTCTCCTT (SEQ ID NO: 2213), TTTCCAACTTATTTCCCT (SEQ ID NO: 2214), CTTACTTTGTATTCCCAT (SEQ ID NO: 2215), AATCTTTATCTCTATTTC (SEQ ID NO: 2216), TGCCCTATCTTACCTTAT (SEQ ID NO: 2217), TGCACTTTCATTCATTTT (SEQ ID NO: 2218), CCACCTTTTTTTATTTTC (SEQ ID NO: 2219), or CCCCCATTTGTCTTCCCC (SEQ ID NO: 2220).
37. The polynucleotide of any one of claims 34-36, wherein the downstream 3’ splice site is at least 20 nucleotides long.
38. The polynucleotide of any one of claims 34-37, wherein the downstream 3’ splice site comprises TTTCTTTTTCTCTTTTTCAG (SEQ ID NO: 2237), TTTCTTATTCTCCCTTTCAG (SEQ ID NO: 2238), or TTTCTTCTTCTACCTTTCAG (SEQ ID NO: 2239).
39. The polynucleotide of any one of claims 34-38, wherein the first stem region and the second stem region are at least 2 nucleotides long.
40. The polynucleotide of any one of claims 34-39, wherein the first stem region and second stem region are selected from: CA and AC, CC and AC, AC and AC, AC and CC, or AC and CT.
41 . The polynucleotide of any one of claims 34-40, wherein the 5’ reverse complementary' sequence and the 5’ splice site are at least 7 nucleotides long.
42. The polynucleotide of any one of claims 34-41, wherein the upstream 5' splice site comprises the CAGGTAA, AACGTAA, CAGGTAC, CCGGTAC, ATCGTAA, GCGGTAC, GAGGTAC, ACGGTAG, CAAGTAA, GAGGTGA, CGCGTAA, GTCGTAA, GAGGTAT, AAGGTAT, TTCGTAA, CCGGTGC, GAGGTAG, CTCGTAA, CTGGTAC, AACGTGA, GCGGTAT, CCGGTAG, or CACGTGA and the 5’ splice site reverse complementary sequence comprises the reverse complement thereof.
43. The polynucleotide of any one of claims 1-42, wherein the polynucleotide is a transgene.
44. The polynucleotide of claim 43, wherein the transgene comprises the polynucleotide of any one of claims 32-42.
45. A polynucleotide comprising a transgene, wherein the transgene comprises: at least one alternative exon, at least two introns flanking the alternative exon, and a ligand-responsive aptamer; wherein the presence of the ligand results in splicing out the at least one alternative exon, the at least two introns flanking the alternative exon, and the ligand-responsive aptamer from the transgene.
46. The transgene of claim 44 or 45, wherein the at least one alternative exon and the at least two introns are from the same gene.
47. The transgene of claim 44 or 45, wherein the at least one alternative exon and the at least two introns are from different genes.
48. The transgene of any one of claims 44 to 47, wherein the transgene further comprises two exons flanking the at least one alternative exon, the at least two introns flanking the alternative exon, and the ligand-responsive aptamer comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
49. The transgene of any one of claims 44 to 48, wherein the transgene further comprises two exons flanking the transgene further comprises two exons flanking the at least one alternative exon, the at least two introns flanking the alternative exon, and the ligand-responsive aptamer comprising a polynucleotide have a nucleic acid sequence set forth as in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
50. The transgene of any one of claims 44 to 49, wherein the alternative exon comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137, 2236, or 2247-2256.
51 . The transgene of any one of claims 44 to 49, wherein the alternative exon comprises a polynucleotide having a nucleic acid sequence set forth as in SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, 2137, 2236, or 2247-2256.
52. The transgene of any one of claims 44 to 51, wherein at ieast one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at ieast 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 21 15, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
53. The transgene of any one of claims 44 to 51, wherein at least one of the introns comprise a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2082, 2088, 2093, 2096, 2101, 2104, 2107, 2113, 2115, 2117, 2118, 2121, 2127, 2129, 2130, or 2141.
54. The transgene of claims 48 or 49, wherein at least one of the exons comprise a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA- 16_2 gene.
55. The transgene of any one of claims 44 to 54, wherein the ligand-response aptamer comprises a polynucleotide comprising a nucleic acid sequence that is 20-60 nucleotides in length,
56. The transgene of any one of claims 44 to 55, wherein the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
57. The transgene of any one of claims 44 to 55, wherein the ligand-responsive aptamer comprises a polynucleotide having at nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
58. The transgene of any one of claims 44 to 57, wherein the ligand-responsive aptamer binds to tetracycline.
59. The transgene of any one of claims 44 to 57, wherein the ligand-responsive aptamer is located in the intron downstream of the alternative exon.
60. The transgene of claim 44 or 45, wherein the ligand-responsive aptamer is located in the intron upstream of the alternative exon.
61. The transgene of claim 44 or 45, wherein the ligand-responsive aptamer is located in the alternative exon .
62. The transgene of claim 44 or 45, wherein the ligand-responsive aptamer is located in the intron downstream of the alternative exon.
63. The transgene of any one of claims 44 to 62, wherein the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239 and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: Tables 7, 25, 26, or 34.
64. The transgene of claim 44 or 45, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 211 1 , 2112, 21 16, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
65. The transgene of claim 44 or 45, wherein the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
66. A vector comprising the polynucleotide of any one of claims 1-45 or the transgene of anyone of claims 46-65.
67. The vector of claim 66, wherein the vector is a plasmid.
68. A cell comprising the vector of claim 66 or claim 67.
69. The cell of claim 68, wherein the cell is a mammalian cell.
70. The mammalian cell of claim 69, wherein the cell is a human cell or cell from a human subject.
71. A recombinant viral genome comprising the polynucleotide of any one of claims 1 to 70.
72. The recombinant viral genome of claim 71, wherein the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
73. The recombinant viral genome of claim 72, wherein the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
74. The recombinant viral genome of claim 73, wherein the A AV ITR sequences are AAV2 ITR sequences.
75. The recombinant viral genome of any one of claims 71 to 74, wherein the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
76. The recombinant viral genome of any one of claims 71 to 74, wherein the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131 , 2132, 2138, or 2183-2260.
77. An rAAV particle comprising the recombinant viral genome according to any one of claims 71 to 74.
78. The rAAV particle of claim 77, wherein the rAAV particle comprises AAV serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV' derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a./3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt- P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, A.AV2 (Y"»F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
79. The rAA V particle of claim 77 or 78, further comprising at least one helper plasmid.
80. The rAAV particle of any one of claims 77 to 79, wherein the helper plasmid comprises a rep gene and a cap gene.
81 . The rA AV particle of claim 80, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
82. The rAAV particle of any one of claims 77 to 81, wherein the rAAV particle comprises two helper plasmids.
83. The rAAV particle of claim 82, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a El a gene, a E lb gene, a E4 gene, a E2a gene, and a VA gene.
84. A method of treating a disease or condition in a subject comprising administering a recombinant viral genome according to any one of claims 71-76 or an rAAV particle according to any one of claims 77-83, to the subject.
85. The method of claim 84, wherein the subject is a mammal.
86. The method of claim 85, wherein the mammal is a human.
87. The method of any one of claims 84 to 86, wherein the recombinant viral genome or rAAV particle is administered to the subject at least one time.
88. The method of claim 87, wherein the viral genome or rAAV particle is administered to the subject 2, 3, 4, 5, 6, 7, 8, 9, or 10 times.
89. The method of any one of claims 84-88, wherein the viral genome or rAAV' particle is administered to the subject parenterally, subcutaneously, intraocularly, intravitreally, subretinally, intravenously (IV), intracerebro-ventricularly, intramuscularly, intrathecally (IT), intracistemally, intraperitoneally, enterally, via inhalation, topically, or by direct injection to one or more cells, tissues, or organs.
90. A method of regulating the expression of a polynucleotide in a subject comprising administering to a subject the polynucleotide of any one of claims 1-44 and the ligand which binds the ligand-responsive sequence.
91. A method of regulating the expression of a transgene in a subject comprising administering to a subject
(i) a polynucleotide comprising a transgene comprising:
(a) at least one alternative exon,
(b) at least two introns flanking the alternative exon, and
(c) a ligand-responsive aptamer; and
(ii) a ligand, wherein the presence of the ligand results in splicing out ( a. )-( c ) from the transgene.
92. The method of claim 91, wherein the transgene further comprises two exons flanking (a)- (c) comprising a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
93. The method of claim 91, wherein the transgene further comprises two exons flanking (a)~ (c) comprising a polynucleotide having the nucleic acid sequence set forth in SEQ ID NO: 2081, 2089, 2092, 2097, 2135, 2142, or 2143.
94. The method of claim 91, wherein the transgene comprises a polynucleotide having a nucleic acid sequence from a microRNA (miRNA) gene, optionally wherein the miRNA gene is a miRNA- 16 2 gene.
95. The method of claim 94, wherein the transgene comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in SEQ ID NO: 2281 .
96. The method of claim 94, wherein the transgene comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NO: 2281.
97. The method of any one of claims 91 to 96, wherein the at least one alternative exon comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137, 2236, or 2247-2256.
98. The method of any one of claims 91 to 96, wherein the at least one alternative exon comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2084, 2094, 2100, 2103, 2106, 2114, or 2137, 2236, or 2247-2256.
99. The method of any one of claims 91 to 98, wherein at least one of the introns comprise a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101 , 2104, 2107, 2113, 2115, 21 17, 2118, 2121, 2127, 2129, 2130, or 2141.
100. The method of any one of claims 91 to 99, wherein at least one of the introns comprise a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2082, 2088, 2093, 2096, 2101 , 2104, 2107, 2113, 2115, 21 17, 2118, 2121, 2127, 2129, 2130, or 2141.
101. The method of any one of claims 91 to 100, wherein the ligand-responsive aptamer comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2086, 2095, 2112, or 2187-2189.
102. The method of any one of claims 91 to 101, wherein the ligand-responsive aptamer comprises a polynucleotide having a nucleic acid sequence as set forth in SEQ ID NOs: 2086,
2095, 2112, or 2187-2189.
103. The method of any one of claims 91 to 102, wherein the transgene comprises a 3' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239and a 5' splice site comprising a polynucleotide comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: SEQ ID NOs: 2083, 2144-2182, 2213-2220, or 2237-2239.
104. The method of any one of claims 91 to 103, wherein the ligand is tetracycline.
105. The method of claim 91, wherein the transgene compri ses a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%>, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
106. The method of claim 91, wherein the transgene comprises a polynucleotide having a nucleic acid sequence set forth in SEQ ID NOs: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
107. The method of any one of claims 91-106, wherein the transgene is provided in a recombinant viral genome.
108. The method of claim 107, wherein the recombinant viral genome is a genome from a recombinant adeno-associated virus (rAAV).
109. The method of claim 108, wherein the transgene is flanked by AAV inverted terminal repeat (ITR) sequences.
110. The method of claim 109, wherein the AAV ITR sequences are AAV2 ITR sequences.
111. The method of any one of claims 107 to 110, wherein the recombinant viral genome comprises a polynucleotide having at least 70%, at least 75%, at least 80%, at least 90%, at least 92%, at least 95%, at least 98%, or at least 99% sequence identity, relative to a nucleic acid sequence as set forth in either SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 2110, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
1 12. The method of any one of claims 107 to 110, wherein the recombinant viral genome comprises a polynucleotide having a nucleic acid sequence as set forth in either SEQ ID NO: 2080, 2091, 2099, 2102, 2105, 2108, 2109, 21 10, 2111, 2112, 2116, 2118, 2120, 2123, 2128, 2131, 2132, 2138, or 2183-2260.
113. The method of any one of claims 107 to 112, wherein the recombinant viral genome is provided in a an rAAV particle.
114. The method of claim 113, wherein the rAAV particle comprises AAA7 serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, or AAV derivative or pseudotype AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt- P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41 , AAV9.45, AAV6(Y445F/Y73 IF), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45.
115. The method of claim 1 13 or 114, wherein the rAAV' particle further comprises at least one helper plasmid.
1 16. The method of claim 115, wherein the helper plasmid comprises a rep gene and a cap gene.
1 17. The method of claim 116, wherein the rep gene encodes Rep78, Rep68, Rep52, or Rep40, and/or wherein the cap gene encodes a VP1, VP2, and/or VP3 region of the viral capsid protein.
118. The method of claim 113 or claim 114, wherein the rAAV particle comprises two helper plasmids.
119. The method of claim 118, wherein the first helper plasmid comprises a rep gene and a cap gene and the second helper plasmid comprises a Ela gene, a Elb gene, a E4 gene, a E2a gene, and a VA gene.
120. The method of any one of claims 91 to 119, wherein administration of the ligand to the subject results in a fold increase in the RNA level of the exclusion isoform of about 300-400- fold.
121. The method of any one of claims 91 to 120, wherein administration of the ligand to the subject results in a fold increase in the protein level of the exclusion isoform of about. 5-25-fold.
122. The transgene of any one of claims 44 to 65, the recombinant viral genome of any one of claims 71 to 76, the rAAV particle of any one of claims 77-83, or the method of any one of claims 84 to 121, wherein splicing out the alternative exon, the at least two introns, and the aptamer results in the production of a functional start codon in the transgene.
123. The transgene of any one of claims 44 to 65, the recombinant viral genome of any one of claims 71 to 76, the rAAV particle of any one of claims 77-83, or the method of any one of claims 84 to 121, wherein splicing out the alternative exon, the at least two introns, and the aptamer results in the removal of a pre-mature stop codon from the transgene.
PCT/US2023/072823 2022-08-24 2023-08-24 Small molecule-inducible gene expression switches WO2024044689A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263373451P 2022-08-24 2022-08-24
US63/373,451 2022-08-24

Publications (2)

Publication Number Publication Date
WO2024044689A2 true WO2024044689A2 (en) 2024-02-29
WO2024044689A3 WO2024044689A3 (en) 2024-04-18

Family

ID=90014092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/072823 WO2024044689A2 (en) 2022-08-24 2023-08-24 Small molecule-inducible gene expression switches

Country Status (1)

Country Link
WO (1) WO2024044689A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200878A1 (en) * 2004-12-21 2006-09-07 Linda Lutfiyya Recombinant DNA constructs and methods for controlling gene expression
EA202092665A3 (en) * 2015-02-02 2021-06-30 МЕИРЭДжТиЭкс ЮКей II ЛИМИТЕД REGULATION OF GENE EXPRESSION THROUGH APTAMER-MEDIATED MODULATION OF ALTERNATIVE SPLICING

Also Published As

Publication number Publication date
WO2024044689A3 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
EP2414524B1 (en) Gene transfer vectors comprising genetic insulator elements and methods to identify genetic insulator elements
Goyenvalle et al. Engineering multiple U7snRNA constructs to induce single and multiexon-skipping for Duchenne muscular dystrophy
US20220096606A1 (en) Compositions and Methods for Treatment of Duchenne Muscular Dystrophy
WO2011113889A1 (en) Modified u7 snrnas for treatment of neuromuscular diseases
US20220411821A1 (en) Gene therapy vectors
US20230323391A1 (en) Transgene expression system
JP2023540429A (en) Methods and compositions for treating epilepsy
EP4399307A2 (en) Hbb-modulating compositions and methods
KR20230137399A (en) Functional nucleic acid molecules and methods
WO2024086586A2 (en) Improved gene editing systems utilizing trans recruiting components
WO2024044689A2 (en) Small molecule-inducible gene expression switches
US20240141384A1 (en) Methods and compositions to confer regulation to gene therapy cargoes by heterologous use of alternative splicing cassettes
CN115806984A (en) Circular RNA, vector and application of vector
JP2023542130A (en) Compositions and methods for treating amyotrophic lateral sclerosis (ALS) with AAV-MIR-SOD1
IL311219A (en) Methods and compositions for modulating a genome
Xu et al. High-throughput quantification of in vivo adeno-associated virus transduction with barcoded non-coding RNAs
AU2020229886A1 (en) Compositions and methods for treating oculopharyngeal muscular dystrophy (OPMD)
US20240067957A1 (en) Autocatalytic base editing for rna-responsive translational control
WO2024175903A2 (en) Shrna for the treatment of disease
KR20240099164A (en) PAH-modulating compositions and methods
KR20240099167A (en) Mobilization of gene editing system components into trans
CN118434855A (en) PAH modulating compositions and methods
TW202417466A (en) Aav capsid variants and uses thereof
CN118556123A (en) HBB modulating compositions and methods
CA3175419A1 (en) Diagnostic methods using mir-485-3p expression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858303

Country of ref document: EP

Kind code of ref document: A2