WO2024050547A2 - Compact bidirectional promoters for gene expression - Google Patents

Compact bidirectional promoters for gene expression Download PDF

Info

Publication number
WO2024050547A2
WO2024050547A2 PCT/US2023/073367 US2023073367W WO2024050547A2 WO 2024050547 A2 WO2024050547 A2 WO 2024050547A2 US 2023073367 W US2023073367 W US 2023073367W WO 2024050547 A2 WO2024050547 A2 WO 2024050547A2
Authority
WO
WIPO (PCT)
Prior art keywords
variant
promoter
functional fragment
cell
coding sequence
Prior art date
Application number
PCT/US2023/073367
Other languages
French (fr)
Other versions
WO2024050547A3 (en
Inventor
Vinod JASKULA-RANGA
Todd HARTMAN
Original Assignee
Hunterian Medicine Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunterian Medicine Llc filed Critical Hunterian Medicine Llc
Publication of WO2024050547A2 publication Critical patent/WO2024050547A2/en
Publication of WO2024050547A3 publication Critical patent/WO2024050547A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • A61K48/005Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/20Vector systems having a special element relevant for transcription transcription of more than one cistron
    • C12N2830/205Vector systems having a special element relevant for transcription transcription of more than one cistron bidirectional

Abstract

The invention relates generally to compact bidirectional promoters and their use in expressing genes, e.g., for treating disease.

Description

COMPACT BIDIRECTIONAL PROMOTERS FOR GENE EXPRESSION
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of and priority to U.S. Provisional Application No. 63/403,571, filed September 2, 2022, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes.
FIELD OF THE INVENTION
[0002] The invention relates generally to compact bidirectional promoters and their use in expressing genes, e.g., for treating disease.
BACKGROUND
[0003] Adeno-associated viruses (AAV) provide a safe means of therapeutic gene delivery; however, a significant technical obstacle limits an AAV vector’s utility: its small payload capacity. The large size of certain genes, in addition to a promoter, terminator, and two inverted terminal repeats (ITRs), presents a significant barrier to AAV packaging. In particular, due to the large size of current promoters, there is less space in vectors for regulatory elements that can improve safety, thereby making manufacturing less efficient. Initially, efforts were aimed at fitting the expression cassette within a single AAV by eliminating the promoter entirely. More recent attempts at overcoming the limited payload capacity of AAVs have focused on a combination of small synthetic promoters and/or a truncated payload gene. There exists an outstanding need for compositions and methods for packaging larger genes in vectors, such as AAV, which are suitable for gene delivery.
[0004] In addition to the above, viral promoters with ubiquitous expression (e.g., CMV, CBA, and CAG) have been the standard for decades. The reliance on novel capsid technologies has failed to address the necessity of tissue-specificity as a feature in successful gene therapy. Further, existing promoters strongly overexpress proteins, leading to cell stress, toxicity, immunogenicity, and silencing, while existing enhancers are known to increase the risk of oncogenicity. Therefore, there exists an additional outstanding need for compositions that may provide a spectrum of gene expression.
SUMMARY OF THE INVENTION
[0005] The invention is based, at least in part, upon the surprising discovery that compact bidirectional promoters can effectively drive expression of one or more genes (e.g., by RNA polymerase II) useful in, for example, gene therapy applications. Adeno-associated viruses (AAV) are a promising delivery vehicle for nucleic acids for gene therapy, but the small size of AAV is a barrier to delivery of genes, such as those having coding sequences above about 4000 bp, and vector components. Here, the disclosure provides a solution to this problem using a compact bidirectional promoter to deliver sufficient and sustained expression of genes, e.g., by RNA polymerase II, via AAV. In some embodiments, the bidirectional promoter is capable of promoting transcription e.g., by RNA polymerase II) of two coding sequences positioned on opposite sides of the promoter. Accordingly, the compact bidirectional promoters of the invention provide at least four notable advantages over the prior art, including 1) providing space for regulatory elements that can improve safety of a vector, as well as 2) increased tissuespecificity and 3) tunable expression profiles to overcome issues of lack of tissue- and expression-sensitivity. Further, 4) the compact bidirectional promoters of the invention are derived from mammalian promoters, enabling increased durability as compared to viral promoters that have a propensity to be silenced. As yet another advantage, the nucleic acid molecules of the invention provide the notable advantages of lower oncogenicity, for example, due to omission of enhancers, as well as lower immunogenicity, as provided by adjusting tissue- and expression-specificity such that antigen-presenting cells are reduced compared to expression driven by canonical nucleic acid molecules and promoters, respectively.
[0006] Accordingly, in one aspect, the disclosure relates to a nucleic acid including a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to at least one heterologous coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
[0007] In another aspect, the disclosure relates to an expression construct including the nucleic acid of the foregoing aspect.
[0008] In another aspect, the disclosure relates to a vector including the expression construct of the foregoing aspect, optionally wherein the vector is a plasmid, a DNA vector, an RNA vector, a virion, or a viral vector.
[0009] In some embodiments of the foregoing aspect, the vector is a viral vector. In some embodiments, the viral vector is an AAV, lentivirus, adenovirus, simian virus 40, vaccinia virus, measles virus, herpes virus, or poxvirus. In some embodiments, the viral vector is an AAV vector. For example, in some embodiments, the AAV is a single-stranded AAV (ssAAV) vector. In some embodiments, the AAV is a self-complementary AAV (scAAV) vector. [0010] In another aspect, the disclosure relates to a method of expressing a heterologous coding sequence in a cell, the method including transfecting the cell with the expression construct or the vector of any one of the foregoing aspects.
[0011] In another aspect, the disclosure relates to a method of treating a disease in a subject in need thereof, the method including administering to the subject the vector of any one of the foregoing aspects.
[0012] In another aspect, the disclosure relates to a method of expressing at least one heterologous coding sequence in a target cell, the method including introducing into a subject a nucleic acid including a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to at least one heterologous coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter in the cell.
[0013] In another aspect, the disclosure relates to a method of expressing two heterologous coding sequences in different target cells, the method including introducing into a subject a nucleic acid including a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to the two heterologous coding sequences positioned on opposite sides of the compact bidirectional promoter in the cell, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the compact bidirectional promoter promotes transcription of one of the coding sequences in a first target cell and promotes transcription of the other coding sequence in a second target cell.
[0014] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses the at least one heterologous coding sequence in a target cell.
[0015] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing each of the two heterologous coding sequences in a partially overlapping set of target cells.
[0016] In some embodiments of any of the foregoing aspects, the at least one coding sequence is codon optimized. In some embodiments, the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto.
[0017] In another aspect, the disclosure relates to a method of administering an scAAV vector including a therapeutic coding sequence at a reduced dose for treating a disease treatable by the therapeutic coding sequence, the method including, administering to a subject a scAAV including a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to the therapeutic coding sequence, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is less than about 1000 bp and is heterologous to the therapeutic coding sequence, wherein the scAAV vector is administered at a reduced dose as compared to the therapeutically effective dose for an ssAAV vector including the therapeutic coding sequence.
[0018] In some embodiments of the foregoing aspect, the reduced dose is between about 10-fold and about 600-fold lower than the therapeutically effective dose for an ssAAV vector. For example, in some embodiments, the reduced dose is about 10-fold lower than the therapeutically effective dose for an ssAAV vector.
[0019] In some embodiments of any of the foregoing aspects, the bidirectional promoter, or the functional fragment or the variant thereof, is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
[0020] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter includes a nucleic acid sequence selected from any one of SEQ ID NOs: 1-800, or a nucleic acid sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto.
[0021] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter, or the functional fragment or the variant thereof, includes at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or 100% sequence identity to a naturally occurring mammalian promoter.
[0022] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses the therapeutic coding sequence in a target cell.
[0023] In some embodiments of any of the foregoing aspects, the therapeutic coding sequence encodes A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, RIGA, PRKN, RPE65, SERPINGI, SGSH, SLCI3A5, SLC6A1, or a functional fragment or variant thereof.
[0024] In some embodiments of any of the foregoing aspects, the therapeutic coding sequence is codon optimized. In some embodiments, the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto.
[0025] In some embodiments of a foregoing aspect, the therapeutic coding sequence is less than about 750 amino acids. For example, in some embodiments, the therapeutic coding sequence is from about 350 amino acids to about 750 amino acids.
[0026] In another aspect, the disclosure relates to a method including: obtaining a genome file including information about the location of transcription start sites on the plus and minus strands of a chromosome; and identifying regions between a transcription start site on the minus strand of the chromosome and a transcription start site on the plus strand of the chromosome, thereby identifying one or more bidirectional promoters.
[0027] In some embodiments of the foregoing aspect, the genome file including annotations categorized by chromosome, wherein the annotations include indices, wherein the indices include genes, pseudogenes, and coding regions for protein-coding genes, wherein each coding region includes a transcription start site.
[0028] In some embodiments of the foregoing aspect, the one or more bidirectional promoters are identified by obtaining a non-transitory computer readable medium including instructions that, when executed by a processor, cause the processor to identify the regions between the transcription start site on the minus strand of a chromosome and the transcription start site on the plus strand of the chromosome.
[0029] In some embodiments of the foregoing aspect, the genome file including annotations includes mammalian annotations. For example, in some embodiments, the mammalian annotations include human annotations or mouse annotations.
[0030] In some embodiments of the foregoing aspect, the genome file including annotations is GRCh38_latest_genomic.gff or GRCm39_vM27.gff3. For example, in some embodiments, the genome file is GRCm39_vM27.gff3.
[0031] In some embodiments of any of the foregoing aspects, the one or more bidirectional promoters are less than about 1000 bp. For example, in some embodiments, the one or more bidirectional promoters are between about 30 bp and about 800 bp. In some embodiments, the one or more bidirectional promoters are between about 30 bp and about 600 bp. In some embodiments, the one or more bidirectional promoters are between about 30 bp and about 400 bp. In some embodiments, the one or more bidirectional promoters are between about 30 bp and about 200 bp. [0032] In some embodiments of a foregoing aspect, the method further includes linking the one or more bidirectional promoters to at least one heterologous coding sequence.
[0033] In some embodiments of a foregoing aspect, the method further includes linking the one or more bidirectional promoters to two heterologous coding sequences. In some embodiments, the one or more bidirectional promoters are capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
[0034] In some embodiments of any of the foregoing aspects, the compact promoter is operably linked to a 5' UTR.
[0035] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to a Kozak consensus sequence.
[0036] In some embodiments of a foregoing aspect, the method further includes linking each of the one or more bidirectional promoters to only one heterologous coding sequence. For example, in some embodiments, the method further includes, linking each of the one or more bidirectional promoters to two heterologous coding sequences positioned on opposite sides of the promoter. [0037] In some embodiments of any of the foregoing aspects, the two heterologous coding sequences include the same coding sequence. In some embodiments, the two heterologous coding sequences include different coding sequences.
[0038] In some embodiments of any of the foregoing aspects, the one or more bidirectional promoters are capable of expressing the at least one heterologous coding sequence in a target cell. For example, in some embodiments, the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell.
[0039] In some embodiments of any of the foregoing aspects, the one or more bidirectional promoters are capable of expressing each of the two heterologous coding sequences: (a) in the same target cell or cells, (b) in different target cells, or (c) in a partially overlapping set of target cells.
[0040] In some embodiments of any of the foregoing aspects, the compact bidirectional promoter expresses a luciferase reporter at a higher level than is a herpes simplex virus (HSV) thymidine kinase (TK) promoter.
[0041] In some embodiments of a foregoing aspect, the at least one coding sequence encodes CFTR, ATP7B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, or SLC6AJ.
[0042] These and other aspects and features of the invention are described in the following detailed description and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The invention can be more completely understood with reference to the following drawings.
[0044] FIG. 1 is a schematic showing an exemplary nucleic acid that includes a bidirectional promoter (“Promoter Region;” e.g., a compact bidirectional promoter) operably linked to two heterologous coding regions, each of which is transcribed by RNA polymerase II.
[0045] FIG. 2 is a graph showing the number of genes in the human genome as a function of their length in base pairs (bp). The subset of genes that can be packaged into an ssAAV using the compact bidirectional promoters identified herein is highlighted in grey.
[0046] FIG. 3 is a graph showing the number of genes in the human genome as a function of their length in bp. The subset of genes that can be packaged into an scAAV using the compact bidirectional promoters identified herein is highlighted in grey.
[0047] FIGs. 4A-4B are a set of graphs depicting the unique tissue expression profiles of two genes, COX15 and CUTC, that flank a bidirectional promoter identified in Example 1. The tissue expression data is plotted as a function of normalized protein-coding transcripts per million (nTPM; y-axis) and was obtained using the Human Protein Atlas (HP A) and the Genotype-Tissue Expression (GTEx) databases, with expression data from HPA shown in FIG. 4A and consensus expression data from HPA and GTEx shown in FIG. 4B.
[0048] FIGs. 5A-5H are a set of radar plots depicting the unique liver-, hepatocyte-, neuronal-, kidney tubular-, skeletal muscle-, cerebral cortex-, retina-, and rod photoreceptor-specific expression profiles the compact bidirectional promoters of the disclosure (e.g., a promoter having less than 300 bp). Each radar plot reflects a single promoter, with specific tissues indicated at the vertices. This provides a y-axis for each tissue (with zero at the center) and with increasing promoter activity radiating from the center, such that the value of the number indicates nTPM levels from the GTEx transcriptomics dataset.
[0049] FIGs 6A-6D are a set of radar plots, as described in FIGs. 5A-5H, depicting cell subtype expression profiles in the lung for four exemplary compact bidirectional promoters of the disclosure. [0050] FIG. 7 is a schematic outline of a method of the disclosure used to identify a bidirectional promoter (e.g., a compact bidirectional promoter). In brief, the schematic depicts, from top-to-bottom the steps of (a) obtaining a genome file (experimental data set) including database-derived annotations categorized by chromosome, wherein the annotations are indexed by, for example, genes, pseudogenes, and coding regions for protein-coding genes, wherein each coding region includes a transcription start site; and (b) obtaining a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: determine transcription start sites and orientations, identify divergent transcription and the genomic coordinates thereof, and extract the sequence between the divergent transcription, thereby identifying one or more bidirectional promoters.
[0051] FIG. 8 is a graph depicting the expression profiles of the thymidine kinase (TK; “p322”), human Hl (control pol II/pol III promoter; “p096”), human M0RN5 (“p387”), human RPL9 (“p389”), human NDUFB9 (“p390”), human RPS28 (“p391”), and human SLIRP (“p392”) promoters in HeLa cells using a luciferase reporter assay. Data was obtained from n > 3 technical replicates and n > 3 biological replicates, with error bars indicating mean ±SEM, where SEM = (SD/ N).
[0052] FIG. 9 is a graph depicting the expression profiles of the TK (“p322”), human Hl (control pol II/pol III promoter; “p096”), human M0RN5 (“p387”), human RPL9 (“p389”), human NDUFB9 (“p390”), human RPS28 (“p391”), and human SLIRP (“p392”) promoters in A549 cells using a luciferase reporter assay. Data was obtained from n > 3 technical replicates and n > 3 biological replicates, with error bars indicating mean ±SEM, where SEM = (SD/ N). [0053] FIG. 10 is a graph depicting the expression profiles of the TK (“p322”), human Hl (control pol II/pol III promoter; “p096”), human M0RN5 (“p387”), human RPL9 (“p389”), human NDUFB9 (“p390”), human RPS28 (“p391”), and human SLIRP (“p392”) promoters in CFBE cells using a luciferase reporter assay. Data was obtained from n > 3 technical replicates and n > 3 biological replicates, with error bars indicating mean ±SEM, where SEM = (SD/ N). [0054] FIGs. 11A-11B are a set of graphs depicting the unique tissue expression profiles of two genes, M0RN5 and NDUFA8, that flank the bidirectional promoter M0RN5 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis) and was obtained using the HPA and GTEx databases, with expression data from HPA shown in FIG. HA and consensus expression data from HPA and GTEx shown in FIG. 11B.
[0055] FIGs. 12A-12M are a set of graphs depicting the unique central nervous system tissue expression profiles of two genes, M0RN5 and NDUFA8, that flank the bidirectional promoter M0RN5 described in Example 2 was obtained using the HPA and GTEx databases, with expression data from the cerebral cortex shown in FIG. 12A, olfactory bulb shown in FIG. 12B, hippocampal formation shown in FIG. 12C, amygdala shown in FIG. 12D, basal ganglia shown in FIG. 12E, thalamus shown in FIG. 12F, hypothalamus shown in FIG. 12G, cerebellum shown in FIG. 12H, midbrain shown in FIG. 121, pons shown in FIG. 12J, medulla oblongata shown in FIG. 12K, spinal cord shown in FIG. 12L, and white matter shown in FIG. 12M. [0056] FIG. 13 is a set of graphs depicting the unique single cell RNA expression profiles of two genes, M0RN5 and NDUFA8, that flank the bidirectional promoter M0RN5 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0057] FIG. 14 is a graph depicting the unique blood cell RNA expression profiles of two genes, M0RN5 and NDUFA8, that flank the bidirectional promoter M0RN5 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0058] FIGs. 15A-15B are a set of graphs depicting the unique tissue expression profiles of two genes, NDUFB9 and TATDN1, that flank the bidirectional promoter NDUFB9 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis) and was obtained using the HPA and GTEx databases, with expression data from HPA shown in FIG. 15A and consensus expression data from HPA and GTEx shown in FIG. 15B.
[0059] FIGs. 16A-16M are a set of graphs depicting the unique central nervous system tissue expression profiles of two genes, NDUFB9 and TATDN1, that flank the bidirectional promoter NDUFB9 described in Example 2 was obtained using the HPA and GTEx databases, with expression data from the cerebral cortex shown in FIG. 16A, olfactory bulb shown in FIG. 16B, hippocampal formation shown in FIG. 16C, amygdala shown in FIG. 16D, basal ganglia shown in FIG. 16E, thalamus shown in FIG. 16F, hypothalamus shown in FIG. 16G, cerebellum shown in FIG. 16H, midbrain shown in FIG. 161, pons shown in FIG. 16J, medulla oblongata shown in FIG. 16K, spinal cord shown in FIG. 16L, and white matter shown in FIG. 16M. [0060] FIG. 17 is a set of graphs depicting the unique single cell RNA expression profiles of two genes, NDUFB9 and TATDN1, that flank the bidirectional promoter NDUFB9 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0061] FIG. 18 is a graph depicting the unique blood cell RNA expression profiles of two genes, NDUFB9 and TATDN1, that flank the bidirectional promoter NDUFB9 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0062] FIGs. 19A-19B are a set of graphs depicting the unique tissue expression profiles of two genes, NDUFA7 and RPS28, that flank the bidirectional promoter RPS28 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis) and was obtained using the HPA and GTEx databases, with expression data from HPA shown in FIG. 19A and consensus expression data from HPA and GTEx shown in FIG. 19B.
[0063] FIGs. 20A-20M are a set of graphs depicting the unique central nervous system tissue expression profiles of two genes, NDUFA7 and RPS28, that flank the bidirectional promoter RPS28 described in Example 2 was obtained using the HPA and GTEx databases, with expression data from the cerebral cortex shown in FIG. 20A, olfactory bulb shown in FIG. 20B, hippocampal formation shown in FIG. 20C, amygdala shown in FIG. 20D, basal ganglia shown in FIG. 20E, thalamus shown in FIG. 20F, hypothalamus shown in FIG. 20G, cerebellum shown in FIG. 20H, midbrain shown in FIG. 201, pons shown in FIG. 20J, medulla oblongata shown in FIG. 20K, spinal cord shown in FIG. 20L, and white matter shown in FIG. 20M. [0064] FIG. 21 is a set of graphs depicting the unique single cell RNA expression profiles of two genes, NDUFA7 and RPS28, that flank the bidirectional promoter RPS28 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0065] FIG. 22 is a graph depicting the unique blood cell RNA expression profiles of two genes, NDUFA7 and RPS28, that flank the bidirectional promoter RPS28 described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
[0066] FIGs. 23A-23B are a set of graphs depicting the unique tissue expression profiles of two genes, ALKBH1 and SLIRP, that flank the bidirectional promoter SLIRP described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis) and was obtained using the HPA and GTEx databases, with expression data from HPA shown in FIG. 23A and consensus expression data from HPA and GTEx shown in FIG. 23B.
[0067] FIGs. 24A-24M are a set of graphs depicting the unique central nervous system tissue expression profiles of two genes, ALKBH1 and SLIRP, that flank the bidirectional promoter SLIRP described in Example 2 was obtained using the HPA and GTEx databases, with expression data from the cerebral cortex shown in FIG. 24A, olfactory bulb shown in FIG. 24B, hippocampal formation shown in FIG. 24C, amygdala shown in FIG. 24D, basal ganglia shown in FIG. 24E, thalamus shown in FIG. 24F, hypothalamus shown in FIG. 24G, cerebellum shown in FIG. 24H, midbrain shown in FIG. 241, pons shown in FIG. 24J, medulla oblongata shown in FIG. 24K, spinal cord shown in FIG. 24L, and white matter shown in FIG. 24M. [0068] FIG. 25 is a set of graphs depicting the unique single cell RNA expression profiles of two genes, ALKBH1 and SLIRP, that flank the bidirectional promoter SLIRP described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis). [0069] FIG. 26 is a graph depicting the unique blood cell RNA expression profiles of two genes, ALKBH1 and SLIRP, that flank the bidirectional promoter SLIRP described in Example 2. The tissue expression data is plotted as a function of nTPM (y-axis).
DETAILED DESCRIPTION
[0070] Various features and aspects of the invention are discussed in more detail below.
[0071] In particular, the disclosure provides nucleic acids, expression constructs, and vectors including a compact bidirectional promoter and a gene, wherein the compact bidirectional promoter is small enough to allow for the inclusion of a heterologous coding sequence in a vector, such as an AAV vector, having a size limit that makes expression of genes difficult using conventional promoters. The disclosure herein also provides methods of identifying and using the same. Unless otherwise defined herein, scientific and technical terms used in this application shall have the meanings that are commonly understood by those of ordinary skill in the art. [0072] Generally, nomenclature used in connection with, and techniques of, pharmacology, cell and tissue culture, molecular biology, cell and cancer biology, neurobiology, neurochemistry, virology, immunology, microbiology, genetics, and protein and nucleic acid chemistry, described herein, are those well-known and commonly used in the art. In case of conflict, the present specification, including definitions, will control.
[0073] The practice of the present disclosure will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al. , 1989) Cold Spring Harbor Press; Oligonucleotide Synthesis (M.J. Gait, ed., 1984); Methods in Molecular Biology, Humana Press; Cell Biology: A Laboratory Notebook (J.E. Cellis, ed., 1998) Academic Press; Animal Cell Culture (R.I. Freshney, ed., 1987); Introduction to Cell and Tissue Culture (J. P. Mather and P.E. Roberts, 1998) Plenum Press; Cell and Tissue Culture: Laboratory Procedures (A. Doyle, J.B. Griffiths, and D.G. Newell, eds., 1993-1998) J. Wiley and Sons; Methods in Enzymology (Academic Press, Inc.); Gene Transfer Vectors for Mammalian Cells (J.M. Miller and M.P. Calos, eds., 1987); Current Protocols in Molecular Biology (F.M. Ausubel et al., eds., 1987); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3rd. ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001); Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, NY (2002); Harlow and Lane Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1998); Coligan et al., Short Protocols in Protein Science, John Wiley & Sons, NY (2003); Short Protocols in Molecular Biology (Wiley and Sons, 1999).
[0074] Enzymatic reactions and purification techniques are performed according to manufacturer’s specifications, as commonly accomplished in the art or as described herein. The nomenclatures used in connection with, and the laboratory procedures and techniques of, analytical chemistry, biochemistry, immunology, molecular biology, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, and chemical analyses.
[0075] Throughout this specification and embodiments, the word “comprise,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
[0076] It is understood that wherever embodiments are described herein with the language “comprising,” otherwise analogous embodiments described in terms of “consisting of’ and/or “consisting essentially of’ are also provided.
[0077] The term “including” is used to mean “including but not limited to.” “Including” and “including but not limited to” are used interchangeably.
[0078] Any example(s) following the term “e.g.” or “for example” is not meant to be exhaustive or limiting.
[0079] Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
[0080] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements. Moreover, all ranges disclosed herein are to be understood to encompass any and all subranges subsumed therein. For example, a stated range of “1 to 10” should be considered to include any and all subranges between (and inclusive of) the minimum value of 1 and the maximum value of 10; that is, all subranges beginning with a minimum value of 1 or more, e.g., 1 to 6.1, and ending with a maximum value of 10 or less, e.g., 5.5 to 10.
[0081] Where aspects or embodiments of the disclosure are described in terms of a Markush group or other grouping of alternatives, the present disclosure encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group, but also the main group absent one or more of the group members. The present disclosure also envisages the explicit exclusion of one or more of any of the group members in an embodiment of the disclosure.
[0082] Exemplary methods and materials are described herein, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure. The materials, methods, and examples are illustrative only and not intended to be limiting.
I. Definitions
[0083] The following terms, unless otherwise indicated, shall be understood to have the following meanings:
[0084] The articles “a” and “an” are used herein to refer to one or to more than one (z.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element. Reference to “about” a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X .” Numeric ranges are inclusive of the numbers defining the range. Where the use of the term “about” is before a quantitative value, the present invention also includes the specific quantitative value itself, unless specifically stated otherwise. As used herein, the term “about” refers to a ± 10% variation from the nominal value unless otherwise indicated or inferred.
[0085] As used herein, the term “adeno-associated virus” (AAV) refers to a vector derived from an adeno-associated virus serotype, including without limitation, AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV.rh8, AAV.rhlO, AAV.rh20, AAV.rh39, AAV.Rh74, AAV.RHM4-1, AAV.hu37, AAV.Anc80, AAV.Anc80L65, AAV.7m8, AAV.PHP.B, AAV.PHP.EB, AAV2.5, AAV2tYF, AAV3B, AAV.LK03, AAV.HSC1, AAV.HSC2, AAV.HSC3, AAV.HSC4, AAV.HSC5, AAV.HSC6, AAV.HSC7, AAV.HSC8, AAV.HSC9, AAV.HSC10, AAV.HSC11, AAV.HSC12, AAV.HSC13, AAV.HSC14, AAV.HSC15, AAV-TT, AAV-DJ8, and AAV.HSC16. AAV vectors can have one or more of the AAV wild-type genes deleted in whole or part, e.g., the Rep and/or Cap genes, but retain functional flanking inverted terminal repeat (ITR) sequences. Functional ITR sequences promote the rescue, replication, and packaging of the AAV virion. Thus, an AAV vector is defined herein to include at least those sequences required in cis for replication and packaging e.g., functional ITRs) of the virus. ITRs do not need to be the wildtype polynucleotide sequences and may be altered, e.g., by the insertion, deletion, or substitution of nucleotides, so long as the sequences provide for functional rescue, replication, and packaging. AAV expression vectors are constructed using known techniques to at least provide as operatively linked components in the direction of transcription, control elements including a transcriptional initiation region, the DNA of interest (e.g., a polynucleotide encoding a nucleic acid molecule of the disclosure) and a transcriptional termination region. The terms “adeno- associated virus inverted terminal repeats” and “AAV ITRs” refer to art-recognized regions flanking each end of the AAV genome which function together in cis as origins of DNA replication and as packaging signals for the virus. AAV ITRs, together with the AAV Rep coding region, provide for the efficient excision and integration of a polynucleotide sequence interposed between two flanking ITRs into a mammalian genome. The polynucleotide sequences of AAV ITR regions are known. As used herein, an “AAV ITR” does not necessarily include the wild-type polynucleotide sequence, which may be altered, e.g., by the insertion, deletion, or substitution of nucleotides. Additionally, the AAV ITR may be derived from any of several AAV serotypes, including without limitation AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV.rh8, AAV.rhlO, AAV.rh20, AAV.rh39, AAV.Rh74, AAV.RHM4-1, AAV.hu37, AAV.Anc80, AAV.Anc80L65, AAV.7m8, AAV.PHP.B, AAV.PHP.EB, AAV2.5, AAV2tYF, AAV3B, AAV.LK03, AAV.HSC1, AAV.HSC2, AAV.HSC3, AAV.HSC4, AAV.HSC5, AAV.HSC6, AAV.HSC7, AAV.HSC8, AAV.HSC9, AAV.HSC10, AAV.HSC11, AAV.HSC12, AAV.HSC13, AAV.HSC14, AAV.HSC15, AAV-TT, AAV-DJ8, and AAV.HSC16, among others. Furthermore, 5' and 3' ITRs which flank a selected polynucleotide sequence in an AAV vector need not be identical or derived from the same AAV serotype or isolate, so long as they function as intended, e.g., to allow for excision and rescue of the sequence of interest from a host cell genome or vector, and to allow integration of the heterologous sequence into the recipient cell genome when AAV Rep gene products are present in the cell. Additionally, AAV ITRs may be derived from any of several AAV serotypes, including without limitation, AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV.rh8, AAV.rhlO, AAV.rh20, AAV.rh39, AAV.Rh74, AAV.RHM4-1, AAV.hu37, AAV.Anc80, AAV.Anc80L65, AAV.7m8, AAV.PHP.B, AAV.PHP.EB, AAV2.5, AAV2tYF, AAV3B, AAV.LK03, AAV.HSC1, AAV.HSC2, AAV.HSC3, AAV.HSC4, AAV.HSC5, AAV.HSC6, AAV.HSC7, AAV.HSC8, AAV.HSC9, AAV.HSC10, AAV.HSC11, and AAV.HSC12.
[0086] An “AAV inverted terminal repeat (ITR)” sequence, a term well-understood in the art, is an approximately 145 -nucleotide sequence that is present at both termini of the native single- stranded AAV genome. The outermost 125 nucleotides of the ITR can be present in either of two alternative orientations, leading to heterogeneity between different AAV genomes and between the two ends of a single AAV genome. The outermost 125 nucleotides also contains several shorter regions of self-complementarity (designated A, A', B, B', C, C and D regions), allowing intrastrand base-pairing to occur within this portion of the ITR.
[0087] “Administering” or “administration” of a substance, a compound, or an agent to a subject can be carried out using one of a variety of methods known to those skilled in the art. In some embodiments, administration may be local. In other embodiments, administration may be systemic. Administering can also be performed, for example, once, a plurality of times, and/or over one or more extended periods. In some aspects, the administration includes both direct administration, including self-administration, and indirect administration, including the act of prescribing a drug. For example, as used herein, a physician who instructs a subject to selfadminister a drug, or to have the drug administered by another and/or who provides a subject with a prescription for a drug is administering the drug to the subject.
[0088] It should be understood that the expression of “at least one of’ includes individually each of the recited objects after the expression and the various combinations of two or more of the recited objects unless otherwise understood from the context and use. The expression “and/or” in connection with three or more recited objects should be understood to have the same meaning unless otherwise understood from the context.
[0089] As used herein, a “coding sequence” is a portion of a nucleic acid that contains codons that can be translated into amino acids. Although a “stop codon” (TAG, TGA, and TAA) is not translated into an amino acid, it may be considered to be part of a coding region, if present, but any flanking sequences, for example, promoters, ribosome binding sites, transcriptional terminators, introns, 5' and 3" untranslated regions, and the like, are not part of the coding region.
[0090] As used herein, “codon optimization” refers to the process of modifying a nucleic acid sequence in accordance with the principle that the frequency of occurrence of synonymous codons (e.g., codons that code for the same amino acid) in coding DNA is biased in different species. Such codon degeneracy allows an identical polypeptide to be encoded by a variety of nucleotide sequences. Sequences modified in this way are referred to herein as “codon optimized.” This process may be performed on any of the sequences described in this specification to enhance expression or stability. Codon optimization may be performed in a manner, such as that described in, e.g., U.S. Patent Nos. 7,561,972; 7,561,973; and 7,888,112, the entire contents of each of which is incorporated herein by reference. The sequence surrounding the translational start site can be converted to a consensus Kozak sequence according to known methods. See, e.g., Kozak et al. (Nucleic Acids Res AS (20): 8125-8148, 1987), the entire contents of which is hereby incorporated by reference. In some embodiments, codon optimization includes the incorporation of multiple stop codons.
[0091] Throughout this specification and embodiments, the word “include,” or variations such as “includes” or “including,” will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
It is understood that wherever embodiments are described herein with the language “including,” otherwise analogous embodiments described in terms of “consisting of’ and/or “consisting essentially of’ are also provided.
[0092] The term “consensus sequence,” as used herein in the context of nucleic acid sequences, refers to a calculated sequence representing the most frequent nucleotide residues found at each position in a plurality of similar sequences. Typically, a consensus sequence is determined by sequence alignment in which similar sequences are compared to each other and similar sequence motifs are calculated.
[0093] A “deletion” may include the deletion of subject amino acids, deletion of small groups of amino acids such as 2, 3, 4, or 5 amino acids, or deletion of larger amino acid regions, such as the deletion of specific amino acid domains or other features.
[0094] Any example(s) following the terms “e.g.” or “for example” are not meant to be exhaustive or limiting.
[0095] As used herein, the term “functional fragment” refers to a fragment of (a) a promoter or (b) a gene or coding sequence (e.g., an mRNA) that encodes a protein that retains, for example, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or 100% of at least one activity of the corresponding full- length, naturally occurring promoter or protein. The term “fragment of,” or “fragment thereof,” as used herein, refers to a segment (e.g., a segment of at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or at least about 99.9%) of the full length gene(s) or nucleic acid molecule(s) of interest. [0096] A “helper virus” for AAV refers to a virus that allows AAV (which is a defective parvovirus) to be replicated and packaged by a host cell. A number of such helper viruses are known in the art.
[0097] As used herein, the term “heterologous” refers to regions that are not normally associated with a particular nucleic acid in nature. For example, a “coding region heterologous to a promoter” is a coding region that is not normally associated with the promoter in nature.
[0098] As used herein, a “host cell” includes an individual cell or cell culture that can be or has been a recipient for vector(s) for incorporation of polynucleotide inserts. The term host cell may refer to the packaging cell line in which a recombinant AAV (rAAV) is produced from a plasmid. In the alternative, the term “host cell” may refer to a target cell in which expression of a transgene is desired.
[0099] The use of the terms “include,” “includes,” “including,” “have,” “has,” “having,” “contain,” “contains,” or “containing,” including grammatical equivalents thereof, should be understood generally as open-ended and non-limiting, for example, not excluding additional unrecited elements or steps, unless otherwise specifically stated or understood from the context. [0100] An “insertion” may include the insertion of subject amino acids, insertion of small groups of amino acids such as 2, 3, 4, or 5 amino acids, or insertion of larger amino acid regions, such as the insertion of specific amino acid domains or other features.
[0101] An “inverted terminal repeat” or “ITR” sequence is a term well understood in the art and refers to relatively short sequences found at the termini of viral genomes which are in opposite orientation.
[0102] As used herein, “isolated molecule” (where the molecule is, for example, a polypeptide, a polynucleotide, or fragment thereof) is a molecule that by virtue of its origin or source of derivation (1) is not associated with one or more naturally-associated components that accompany it in its native state, (2) is substantially free of one or more other molecules from the same species (3) is expressed by a cell from a different species, or (4) does not occur in nature. [0103] “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
[0104] The terms “patient,” “subject,” and “individual” are used interchangeably herein and refer to either a human or a non-human animal. These terms include mammals, such as humans, non-human primates, laboratory animals, livestock animals (including bovines, porcines, camels, efc.), companion animals (e.g., canines, felines, other domesticated animals, efc.) and rodents (e.g., mice and rats). In some embodiments, the subject is a human that is at least 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, or 95 years of age.
[0105] “Percent (%) sequence identity” or “percent (%) identical to” with respect to a reference polypeptide (or nucleotide) sequence is defined as the percentage of amino acid residues (or nucleic acids) in a candidate sequence that are identical with the amino acid residues (or nucleic acids) in the reference polypeptide (nucleotide) sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
[0106] As known in the art, “polynucleotide,” or “nucleic acid,” are used interchangeably herein and refer to chains of nucleotides of any length, and include DNA and RNA. The nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a chain by DNA or RNA polymerase. A polynucleotide may include modified nucleotides, such as methylated nucleotides and their analogs. If present, modification to the nucleotide structure may be imparted before or after assembly of the chain. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component. Other types of modifications include, for example, “caps,” substitution of one or more of the naturally occurring nucleotides with an analog, intemucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoamidates, carbamates) and with charged linkages (e.g., phosphorothioates, phosphorodithioates), those containing pendant moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine), those with intercalators (e.g., acridine, psoralen), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids), as well as unmodified forms of the polynucleotide(s). Further, any of the hydroxyl groups ordinarily present in the sugars may be replaced, for example, by phosphonate groups, phosphate groups, protected by standard protecting groups, or activated to prepare additional linkages to additional nucleotides, or may be conjugated to solid supports. The 5' and 3' terminal OH can be phosphorylated or substituted with amines or organic capping group moieties of from 1 to 20 carbon atoms. Other hydroxyls may also be derivatized to standard protecting groups. Polynucleotides can also contain analogous forms of ribose or deoxyribose sugars that are generally known in the art, including, for example, 2’-O-methyl-, 2’- O-allyl, 2’ -fluoro- or 2’ -azido-ribose, carbocyclic sugar analogs, alpha- or beta-anomeric sugars, epimeric sugars such as arabinose, xyloses or lyxoses, pyranose sugars, furanose sugars, sedoheptuloses, acyclic analogs, and abasic nucleoside analogs, such as methyl riboside. One or more phosphodiester linkages may be replaced by alternative linking groups. These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(O)S(“thioate”), P(S)S (“dithioate”), (O)NRi (“amidate”), P(O)R, P(O)OR’, CO or CH2 (“formacetal”), in which each R or R' is independently H or substituted or unsubstituted alkyl (1-20 C) optionally containing an ether (-O-) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or araldyl. Not all linkages in a polynucleotide need be identical. The preceding description applies to all polynucleotides referred to herein, including RNA and DNA.
[0107] IUPAC nucleotide code is used throughout. IUPAC nucleotide code is provided in Table 1, below.
Table 1. IUPAC nucleotide code
Figure imgf000020_0001
[0108] The terms “polypeptide,” “oligopeptide,” “peptide,” and “protein” are used interchangeably herein to refer to chains of amino acids of any length. The chain may be linear or branched, it may include modified amino acids and/or may be interrupted by non-amino acids. The terms also encompass an amino acid chain that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (e.g., unnatural amino acids), as well as other modifications known in the art. It is understood that the polypeptides can occur as single chains or associated chains.
[0109] As used herein, the term “promoter” refers to a recognition site on DNA that is bound by an RNA polymerase. The polymerase drives transcription of a transgene. Exemplary promoters suitable for use with the compositions and methods described herein are described herein. Additionally, the term “promoter” may refer to a synthetic promoter, such as a regulatory DNA sequence that does not occur naturally in a biological system. Synthetic promoters contain parts of naturally occurring promoters combined with polynucleotide sequences that do not occur in nature and can be optimized to express recombinant DNA.
[0110] A “recombinant adeno-associated virus (rAAV virus)” or “rAAV viral particle” refers to a viral particle composed of at least one AAV capsid protein and an encapsidated rAAV vector genome.
[OHl] A “recombinant AAV vector (rAAV vector)” refers to a polynucleotide vector based on an AAV including one or more heterologous sequences (z.e., nucleic acid sequence not of AAV origin) that are flanked by at least one AAV ITR. Such rAAV vectors can be replicated and packaged into infectious viral particles when present in a host cell that has been infected with a suitable helper virus (or that is expressing suitable helper functions) and that is expressing AAV Rep and Cap gene products (i.e. AAV Rep and Cap proteins). When a rAAV vector is incorporated into a larger polynucleotide (e.g., in a chromosome or in another vector such as a plasmid used for cloning or transfection), then the rAAV vector may be referred to as a “provector” which can be “rescued” by replication and encapsidation in the presence of AAV packaging functions and suitable helper functions. An rAAV vector can be in any of a number of forms, including, but not limited to, plasmids, linear artificial chromosomes, complexed with lipids, encapsulated within liposomes, and encapsidated in a viral particle, e.g., an AAV particle. An rAAV vector can be packaged into an AAV virus capsid to generate a “recombinant adeno- associated viral particle (rAAV particle)”.
[0112] The term “regulatory element” or “regulatory sequence” is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory sequences are described, for example, in Goeddel (1990) Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego Calif. Regulatory sequences include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). A tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g. liver or pancreas), or particular cell types (e.g. lymphocytes). Regulatory sequences may also direct expression in a temporal -dependent manner, such as in a cell cycle-dependent or developmental stage-dependent manner, which may not also be tissue- or cell type-specific. Also encompassed by the term “regulatory element” are enhancer elements, such as WPRE; CMV enhancers; the R-U5' segment in LTR of HTLV-I (Takebe et al. (1988) MOL. CELL. BIOL. 8:466-472); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit P-globin (O'Hare et al. (1981) PROC. NATL. ACAD. SCI. USA. 78(3): 1527-31). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression desired, etc. [0113] As used herein, “residue” refers to a position in a protein and its associated amino acid identity.
[0114] A “substitution” includes replacing a wild-type amino acid with another (e.g, a nonwild-type amino acid). In some embodiments, the another (e.g, non-wild-type) or inserted amino acid is Ala (A), His (H), Lys (K), Phe (F), Met (M), Thr (T), Gin (Q), Asp (D), or Glu (E). In some embodiments, the another (e.g., non-wild-type) or inserted amino acid is A. In some embodiments, the another (e.g., non-wild-type) amino acid is Arg (R), Asn (N), Cys (C), Gly (G), He (I), Leu (L), Pro (P), Ser (S), Trp (W), Tyr (Y), or Vai (V). Conventional or naturally occurring amino acids are divided into the following basic groups based on common side-chain properties: (1) non-polar: Norleucine, Met, Ala, Vai, Leu, and He; (2) polar without charge: Cys, Ser, Thr, Asn, and Gin; (3) acidic (negatively charged): Asp and Glu; (4) basic (positively charged): Lys and Arg; and (5) residues that influence chain orientation: Gly and Pro; and (6) aromatic: Trp, Tyr, Phe and His. Conventional amino acids include L or D stereochemistry. In some embodiments, the another (e.g., non-wild-type) amino acid is a member of a different group (e.g., an aromatic amino acid is substituted for a non-polar amino acid). Substantial modifications in the biological properties of the polypeptide are accomplished by selecting substitutions that differ significantly in their effect on maintaining (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a P-sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. Naturally occurring residues are divided into groups based on common sidechain properties: (1) non-polar: Norleucine, Met, Ala, Vai, Leu, and He; (2) polar without charge: Cys, Ser, Thr, Asn, and Gin; (3) acidic (negatively charged): Asp and Glu; (4) basic (positively charged): Lys and Arg; (5) residues that influence chain orientation: Gly and Pro; and (6) aromatic: Trp, Tyr, Phe, and His. In some embodiments, the another (e.g., non-wild-type) amino acid is a member of a different group (e.g., a hydrophobic amino acid for a hydrophilic amino acid, a charged amino acid for a neutral amino acid, or an acidic amino acid for a basic amino acid). In some embodiments, the another (e.g., non-wild-type) amino acid is a member of the same group (e.g., another basic amino acid, another acidic amino acid, another neutral amino acid, another charged amino acid, another hydrophilic amino acid, another hydrophobic amino acid, another polar amino acid, another aromatic amino acid, or another aliphatic amino acid). In some embodiments, the another (e.g., non-wild-type) amino acid is an unconventional amino acid. Unconventional amino acids are non-naturally occurring amino acids. Examples of an unconventional amino acid include, but are not limited to, aminoadipic acid, beta-alanine, betaaminopropionic acid, aminobutyric acid, piperidinic acid, aminocaprioic acid, aminoheptanoic acid, aminoisobutyric acid, aminopimelic acid, citrulline, diaminobutyric acid, desmosine, diaminopimelic acid, diaminopropionic acid, N-ethylglycine, N-ethylaspargine, hyroxylysine, allo-hydroxylysine, hydroxyproline, isodesmosine, allo-isoleucine, N-m ethylglycine, sarcosine, N-methylisoleucine, N-methylvaline, norvaline, norleucine, orithine, 4-hydroxyproline, y- carboxyglutamate, s-N,N,N-trimethyllysine, s-N-acetyllysine, O-phosphoserine, N-acetylserine, N-formylmethionine, 3-methylhistidine, 5 -hydroxy lysine, c-N-methylarginine, and other similar amino acids and amino acids (e.g., 4-hydroxyproline).
[0115] The term “transgene” refers to a polynucleotide that is introduced into a cell and is capable of being transcribed into RNA and optionally, translated and/or expressed under appropriate conditions. In aspects, it confers a desired property to a cell into which it was introduced, or otherwise leads to a desired therapeutic or diagnostic outcome.
[0116] “Treating” a condition or subject refers to taking steps to obtain beneficial or desired results, including clinical results. With respect to a disease or condition, treatment refers to the reduction or amelioration of the progression, severity, and/or duration of one or more symptoms of the disease, or the amelioration of one or more symptoms resulting from the administration of one or more therapies (including, but not limited to, the administration of one or more prophylactic or therapeutic agents).
[0117] As used herein, the term “variant” refers to a variant of (a) a promoter or (b) a gene or coding sequence (e.g., an mRNA) that encodes a protein that retains, for example, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or 100% of at least one activity of the corresponding full-length, naturally occurring promoter or protein. For example, a variant can include a splice variant or a gene including a mutation such as an insertion, deletion, or substitution.
[0118] As used herein, the term “vector” includes a nucleic acid vector, e.g., a DNA vector, such as a plasmid, an RNA vector, or another suitable replicon (e.g., viral vector). A variety of vectors have been developed for the delivery of polynucleotides encoding exogenous polynucleotides or proteins into a prokaryotic or eukaryotic cell. Examples of such expression vectors are disclosed in, e.g., WO 1994/011026; incorporated herein by reference as it pertains to vectors suitable for the expression of a nucleic acid molecule of interest. Expression vectors suitable for use with the compositions and methods described herein contain a polynucleotide sequence as well as, e.g., additional sequence elements used for the expression of heterologous nucleic acid materials (e.g., a nucleic acid molecule) in a mammalian cell. Certain vectors that can be used for the expression of the nucleic acid molecules described herein include plasmids that contain regulatory sequences, such as promoter and enhancer regions, which direct gene transcription. In some embodiments, the compact bidirectional promoters do not contain an enhancer. Other useful vectors for expression of nucleic acid molecule agents disclosed herein contain polynucleotide sequences that enhance the rate of translation of these polynucleotides or improve the stability or nuclear export of the RNA that results from gene transcription. These sequence elements include, e.g., 5' and 3' untranslated regions, an internal ribosomal entry site (IRES), and polyadenylation signal (poly A) in order to direct efficient transcription of the gene carried on the expression vector. The expression vectors suitable for use with the compositions and methods described herein may also contain a polynucleotide encoding a marker for selection of cells that contain such a vector. Examples of a suitable marker are genes that encode resistance to antibiotics, such as ampicillin, chloramphenicol, kanamycin, nourseothricin, or zeocin.
[0119] In some embodiments, a vector comprises one or more pol II promoters. Examples of pol II promoters include, but are not limited to the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) e.g., Boshart et al. (1985) CELL 41 :521-530), the SV40 promoter, the dihydrofolate reductase promoter, the P-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFla promoter.
[0120] A vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., clustered regularly interspersed short palindromic repeats (CRISPR) transcripts, proteins, enzymes, mutant forms thereof, fusion proteins thereof, etc.). Advantageous vectors include lentiviruses and AAVs, and types of such vectors can also be selected for targeting particular types of cells.
[0121] The term “vector genome (vg)” as used herein may refer to one or more polynucleotides comprising a set of the polynucleotide sequences of a vector, e.g., a viral vector. A vector genome may be encapsidated in a viral particle. Depending on the particular viral vector, a vector genome may comprise single-stranded DNA, double-stranded DNA, single-stranded RNA, or double-stranded RNA. A vector genome may include endogenous sequences associated with a particular viral vector and/or any heterologous sequences inserted into a particular viral vector through recombinant techniques. For example, a recombinant AAV vector genome may include at least one ITR sequence flanking a promoter, a stuffer, a sequence of interest, and a polyadenylation sequence. A complete vector genome may include a complete set of the polynucleotide sequences of a vector. In some embodiments, the nucleic acid titer of a viral vector may be measured in terms of vg/mL. Methods suitable for measuring this titer are known in the art (e.g., quantitative PCR).
[0122] As used herein the term “wild-type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene, or characteristic as it occurs in nature as distinguished from mutant or variant forms.
[0123] Each embodiment described herein may be used individually or in combination with any other embodiment described herein.
II. Compact Bidirectional Promoters
[0124] The present disclosure provides, among other things, compact bidirectional promoters that can effectively drive expression of genes useful in, for example, gene therapy applications such as those involving AAV. In some embodiments, the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter. [0125] In some embodiments, the compact bidirectional promoter is operably linked to at least one (e.g., two) heterologous coding sequence. For example, in some embodiments, the compact bidirectional promoter is operably linked to two heterologous coding sequences.
[0126] In some embodiments, the compact bidirectional promoter promotes transcription of a heterologous coding sequence by an RNA polymerase II (“pol II”). For example, in some embodiments, the compact bidirectional promoter promotes transcription of a first heterologous coding sequence in one direction (e.g., on one strand of a DNA molecule), and a second heterologous coding sequence in another direction (e.g., on the opposite strand of the DNA molecule), as shown in FIG. 1. In some embodiments, the heterologous promoter does not promote transcription by an RNA polymerase III (“pol III”) (i.e., the promoter is not a pol III promoter.).
[0127] In some embodiments, the compact bidirectional promoter is less than about 1000 base pairs (bp) (e.g., less than about 800 bp, less than about 600 bp, less than about 400 bp, or less than about 200 bp). For example, in some embodiments, the promoter is less than about 800 bp. In some embodiments, the promoter is less than about 600 bp. In some embodiments, the promoter is less than about 400 bp. In some embodiments, the promoter is or less than about 200 bp.
[0128] In some embodiments, the compact bidirectional promoter is between about 30 bp and about 800 bp (e.g., between about 31 bp and about 750 bp, between about 32 bp and about 700 bp, between about 33 bp and about 600 bp, between about 34 bp and about 500 bp, between about 35 bp and about 400 bp, between about 36 bp and about 300 bp, between about 37 bp and about 250 bp, between about 40 bp and about 200 bp, or between about 50 bp and about 100 bp). For example, in some embodiments, promoter is between about 31 bp and about 750 bp. In some embodiments, the promoter is between about 32 bp and about 700 bp. In some embodiments, the promoter is between about 33 bp and about 600 bp. In some embodiments, the promoter is between about 34 bp and about 500 bp. In some embodiments, the promoter is between about 35 bp and about 400 bp. In some embodiments, the promoter is between about 36 bp and about 300 bp. In some embodiments, the promoter is between about 37 bp and about 250 bp. In some embodiments, the promoter is between about 40 bp and about 200 bp. In some embodiments, the promoter is between about 50 bp and about 100 bp.
[0129] In some embodiments, the compact bidirectional promoter is smaller than a CMV promoter. [0130] In some embodiments, the compact bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter, as shown in FIG. 1
[0131] In some embodiments, the promoter includes a nucleotide sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. For example, in some embodiments, the promoter includes a nucleotide sequence having at least 86% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 87% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 88% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 89% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 91% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 92% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 93% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 94% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 96% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 97% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 98% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99.5% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having 100% sequence identity to any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes the nucleotide sequence of any one of SEQ ID NOs: 1-800 or a functional fragment or variant thereof.
[0132] In some embodiments, the promoter includes a nucleotide sequence derived from an origin species, such as a homo sapiens or mus musculus. For example, in some embodiments, the promoter includes a nucleotide sequence derived from a homo sapiens promoter, such as a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. Alternatively, for example, in some embodiments, the promoter includes a nucleotide sequence derived from a mus musculus promoter, such as a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof.
[0133] In some embodiments, the promoter includes a nucleotide sequence having at least 85% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. For example, in some embodiments, the promoter includes a nucleotide sequence having at least 86% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 87% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 88% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 89% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 91% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 92% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 93% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 94% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 96% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 97% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 98% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99.5% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having 100% sequence identity to any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof. In some embodiments, the promoter includes the nucleotide sequence of any one of SEQ ID NOs: 1-400 or a functional fragment or variant thereof.
[0134] In some embodiments, the promoter includes a nucleotide sequence having at least 85% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. For example, in some embodiments, the promoter includes a nucleotide sequence having at least 86% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 87% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 88% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 89% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 91% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 92% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 93% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 94% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 95% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 96% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 97% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 98% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having at least 99.5% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes a nucleotide sequence having 100% sequence identity to any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof. In some embodiments, the promoter includes the nucleotide sequence of any one of SEQ ID NOs: 401-800 or a functional fragment or variant thereof.
[0135] In some embodiments, a functional fragment includes a truncation of from about 10 to about 70 e.g., about 20, 30, 40, 50, or 60) bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof (e.g., a variant having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-800). For example, in some embodiments, a functional fragment includes a truncation of about 20 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof. In some embodiments, a functional fragment includes a truncation of about 30 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof. In some embodiments, a functional fragment includes a truncation of about 40 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof. In some embodiments, a functional fragment includes a truncation of about 50 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof . In some embodiments, a functional fragment includes a truncation of about 60 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof . In some embodiments, a functional fragment includes a truncation of about 70 bp at the 5' end, at the 3' end, or at each of the 5' and 3' ends of any one of SEQ ID NOs: 1-800 or a variant thereof . [0136] In some embodiments, the compact bidirectional promoter includes at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or 100% sequence identity to a naturally occurring mammalian promoter. For example, in some embodiments, the compact bidirectional promoter includes at least about 96% sequence identity to a naturally occurring mammalian promoter. In some embodiments, the compact bidirectional promoter includes at least about 97% sequence identity to a naturally occurring mammalian promoter. In some embodiments, the compact bidirectional promoter includes at least about 98% sequence identity to a naturally occurring mammalian promoter. In some embodiments, the compact bidirectional promoter includes at least about 99% sequence identity to a naturally occurring mammalian promoter. In some embodiments, the compact bidirectional promoter includes at least about 99.5% sequence identity to a naturally occurring mammalian promoter. In some embodiments, the compact bidirectional promoter includes 100% sequence identity to a naturally occurring mammalian promoter.
[0137] For example, in some embodiments, the compact bidirectional promoter includes at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or 100% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes at least about 96% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes at least about 97% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes at least about 98% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes at least about 99% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes at least about 99.5% sequence identity to a naturally occurring human promoter. In some embodiments, the compact bidirectional promoter includes 100% sequence identity to a naturally occurring human promoter.
[0138] In some embodiments, the compact bidirectional promoter or a functional fragment or variant thereof (e.g., a variant having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1- 800) has higher activity than standard promoters (e.g., higher activity than a herpes simplex virus (HSV) thymidine kinase (TK) promoter). For example, in some embodiments, the compact bidirectional promoter or a functional fragment or variant thereof (e.g., a variant having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-800) is capable of expressing a luciferase reporter at a higher level than is a HSV TK promoter. The expression level of a compact bidirectional promoter can be determined, for example, by expressing a reporter molecule in a cell, e.g., a human embryonic kidney (HEK) cell line or an N2A cell line.
[0139] In some embodiments, the compact bidirectional promoter or a functional fragment or variant thereof (e.g., a variant having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1- 800) is capable of promoting expression of a gene in a tissue or a subset of tissues as identified in the Human Protein Atlas (HP A), FANTOM, or Genotype-Tissue Expression (GTEx) databases, for example, as shown in FIGs. 11 A-25, and/or as shown in Appendix A of U.S. Provisional Application No. 63/403,571, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes. For example, the compact bidirectional promoter of SEQ ID NO: 17 (flanking genes ALKBH 1 and SLIRP can express a heterologous coding sequence at a low level in adipose tissue, adrenal glands, amygdala, basal ganglia, breast, cerebellum cerebral cortex, cervix, uterine tissue, colon, endometrium, esophagus, fallopian tube, heart muscle, hippocampal formation, etc., as identified in the HP A, FANTOM, or GTEx databases, for example, as shown in FIGs. 11 A-25, and/or as shown in Appendix A of U.S.
Provisional Application No. 63/403,571, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes, if positioned on the side of the promoter where the ALKBH1 gene naturally occurs. In addition, the compact bidirectional promoter of SEQ ID NO: 17 can express a heterologous coding sequence at varying levels in adipose tissue, adrenal glands, amygdala, basal ganglia, breast, cerebellum cerebral cortex, cervix, uterine tissue, colon, endometrium, esophagus, fallopian tube, heart muscle, hippocampal formation, etc., as shown in identified in the HP A, FANTOM, or GTEx databases, for example, as shown in FIGs. 11 A-25, and/or as shown in Appendix A of U.S. Provisional Application No. 63/403,571, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes, if positioned on the side of the promoter where the SLIRP gene naturally occurs. Expression data for promoters of SEQ ID NOs: 1-800 are shown in identified in the HP A, FANTOM, or GTEx databases, for example, as shown in FIGs. 11 A-25, and/or as shown in Appendix A of U.S.
Provisional Application No. 63/403,571, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes, identified by flanking gene names (Table 2). Expression data is shown for the human promoter (a promoter selected from SEQ ID NOs: 1-400), except where indicated as mouse expression data, which refers to the corresponding promoter in SEQ ID NOs: 401-800. Accordingly, the present disclosure includes a method of expressing one or two heterologous coding sequences using a compact bidirectional promoter or functional fragment of variant thereof, as disclosed herein, wherein the bidirectional promoter or functional fragment of variant thereof promotes expression of the one or two heterologous coding sequences in the tissues shown in identified in the HP A, FANTOM, or GTEx databases, for example, as shown in FIGs. 11 A-25, and/or as shown in Appendix A of U.S. Provisional Application No. 63/403,571, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes. Table 2. Promoters of SEQ ID NOs: 1-800 identified by flanking gene names
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
[0140] In some embodiments, the compact bidirectional promoter is operably linked to a 5' untranslated region (UTR). For example, in some embodiments, the 5' UTR includes at least a portion of a beta-globin 5' UTR sequence. For example, in some embodiments, the 5' UTR includes the nucleotide sequence 5'- GCCGCCRCC -3', or a 6 bp, 7 bp, or 8 bp fragment thereof. In some embodiments, the 6 bp fragment is 5'-GCCACC-3'.
[0141] In some embodiments, the compact promoter is operably linked to a Kozak consensus sequence. [0142] In some embodiments, the compact bidirectional promoter includes a TATA mutation. For example, in some embodiments, the TATA mutation is a TATAA
Figure imgf000045_0001
TCGAA mutation. [0143] In some embodiments, the compact bidirectional promoter is coupled with a viral intron (e.g., an SV40i intron, a MVM intron, a Mv2 intron, an HNRNPH1 intron, a chimeric intron, or a synthetic intron). For example, in some embodiments, the compact bidirectional promoter is coupled with an SV40i intron. In some embodiments, the compact bidirectional promoter is coupled with a MVM intron. In some embodiments, the compact bidirectional promoter is coupled with a Mv2 intron. In some embodiments, the compact bidirectional promoter is coupled with an HNRNPH1 intron. In some embodiments, the compact bidirectional promoter is coupled with a chimeric intron. In some embodiments, the compact bidirectional promoter is coupled with a synthetic intron.
[0144] In some embodiments, the compact bidirectional promoter does not include a viral promoter or a synthetic promoter. For example, in some embodiments, the compact bidirectional promoter does not include a viral promoter. In some embodiments, the compact bidirectional promoter does not include a synthetic promoter.
[0145] In some embodiments, the functional fragment of a compact bidirectional promoter described herein includes a transcription factor binding site. Identification of transcription factor binding sites can be determined, for example, by consensus or by using a differential distance matrix or multidimensional scaling (De Bleser P. et al. (2007) Genome Biol 8(5):R83). In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a transcription factor binding site selected from Staf, DSE, PSE, c-REL, GATA-1, GATA-2, and CREB. For example, in some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a Staf transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a DSE transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a PSE transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a c-REL transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a GATA-1 transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a GATA-2 transcription factor binding site. In some embodiments, a functional fragment of a compact bidirectional promoter described herein includes a CREB transcription factor binding site. [0146] In some embodiments, a functional fragment of a compact bidirectional promoter described herein can include a B recognition sequence (BRE) or TATA box. For example, in some embodiments, a functional fragment of a compact bidirectional promoter described herein can include a BRE. In some embodiments, a functional fragment of a compact bidirectional promoter described herein can include a TATA box.
[0147] In some embodiments, a nucleic acid including a compact bidirectional promoter described herein further includes a terminator sequence. In some embodiments, the terminator sequence includes one of the exemplary, non-limiting terminator sequences in Table 3, below.
Table 3. Exemplary terminator sequences
Figure imgf000046_0001
Figure imgf000047_0001
III. Methods of Identifying Bidirectional Promoters
[0148] The present disclosure also provides, among other things, methods of identifying bidirectional promoters of the disclosure. For example, in some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) of the disclosure is identified as such by a method including identifying regions between a transcription start site on the minus strand and a transcription start site on the plus strand. For example, the disclosure provides a method including: (a) obtaining a genome file including annotations categorized by chromosome, wherein the annotations include indices, wherein the indices include genes, pseudogenes, and coding regions for protein-coding genes, wherein each coding region includes a transcription start site; and (b) obtaining a non-transitory computer readable medium including instructions that, when executed by a processor, cause the processor to: identify regions between a transcription start site on the minus strand and a transcription start site on the plus strand.
[0149] In some embodiments, the instructions, when executed by a processor, further cause the processor to: save annotations and/or sort indices by chromosome.
[0150] For example, the methods of the disclosure include developing a script (e.g., a python script) to identify bidirectional promoters (e.g., compact bidirectional promoters) from genomic annotation files, including, for example, mammalian (e.g., human) annotations. In some embodiments, the script can be applied to genome-wide transcription data files. In an exemplary method, an input data file is obtained (e.g., GRCh38_latest_genomic.gff or GRCm39_vM27.gff3). The file can then be, for example, categorized by chromosome with each line pertaining to each region of interest in the genome with examples including genes, pseudogenes, and coding regions for protein-coding genes. The script can, for example, iterate through every line in the file and store the type of annotation. The genes can be, for example, sorted by index on a per-chromosome basis and/or, the script may identify regions in-between transcription on the minus strand and transcription on the plus strand, thereby defining the intervening region as a bidirectional promoter (e.g., a compact bidirectional promoter). In some embodiments, the transcripts are filtered for those that are orientated in opposite directions (divergent transcription). Promoter boundaries can be, for example, further refined using the coding sequence (CDS) start for protein coding genes.
[0151] In some embodiments, the annotations include mammalian annotations, such as, for example, human or mouse annotations. For example, in some embodiments, the annotations include human annotations (e.g., the genome file including annotations is GRCh38_latest_genomic.gff). In some embodiments, the annotations include mouse annotations (e.g., the genome file including annotations is GRCm39_vM27.gff3).
[0152] In some embodiments, the genome file including annotations is GRCh38_latest_genomic.gff or GRCm39_vM27.gff3. For example, in some embodiments, the genome file including annotations is GRCh38_latest_genomic.gff In some embodiments, the genome file including annotations is GRCm39_vM27.gff3.
[0153] In some embodiments, the genome file includes experimentally-derived annotations. For example, in some embodiments, the genome file includes annotations derived from serial analysis of gene expression (SAGE). In some embodiments, the genome file includes annotations derived from RNA sequencing (RNAseq). In some embodiments, the genome file includes annotations derived from H3K4mel chromatin immunoprecipitation (ChIP) sequencing (ChlP-seq). In some embodiments, the genome file includes annotations derived from H3K4me3 ChlP-seq. In some embodiments, the genome file includes annotations derived from RNA polymerase II ChlP-seq. In some embodiments, the genome file includes annotations derived from Cap Analysis of Gene Expression (CAGE).
[0154] In some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is operably linked to at least one (e.g., two) heterologous coding sequence. For example, in some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is operably linked to two heterologous coding sequences.
[0155] In some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
[0156] In some embodiments, the compact bidirectional promoter promotes transcription of a heterologous coding sequence by an RNA polymerase II (“pol II”). For example, in some embodiments, the compact bidirectional promoter promotes transcription of a first heterologous coding sequence in one direction (e.g., on one strand of a DNA molecule), and a second heterologous coding sequence in another direction (e.g., on the opposite strand of the DNA molecule), as shown in FIG. 1. In some embodiments, the heterologous promoter does not promote transcription by an RNA polymerase III (“pol III”) (z.e., the promoter is not a pol III promoter).
[0157] In some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is less than about 1000 bp (e.g., less than about 800 bp, less than about 600 bp, less than about 400 bp, or less than about 200 bp). For example, in some embodiments, the promoter is less than about 800 bp. In some embodiments, the promoter is less than about 600 bp. In some embodiments, the promoter is less than about 400 bp. In some embodiments, the promoter is less than about 200 bp.
[0158] In some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is between about 30 bp and about 800 bp (e.g., between about 31 bp and about 750 bp, about 32 bp and about 700 bp, about 33 bp and about 600 bp, about 34 bp and about 500 bp, about 35 bp and about 400 bp, about 36 bp and about 300 bp, about 37 bp and about 250 bp, about 40 bp and about 200 bp, or about 50 bp and about 100 bp).
For example, in some embodiments, promoter is between about 31 bp and about 750 bp. In some embodiments, the promoter is between about 32 bp and about 700 bp. In some embodiments, the promoter is between about 33 bp and about 600 bp. In some embodiments, the promoter is between about 34 bp and about 500 bp. In some embodiments, the promoter is between about 35 bp and about 400 bp. In some embodiments, the promoter is between about 36 bp and about 300 bp. In some embodiments, the promoter is between about 37 bp and about 250 bp. In some embodiments, the promoter is between about 40 bp and about 200 bp. In some embodiments, the promoter is between about 50 bp and about 100 bp.
[0159] In some embodiments, a bidirectional promoter (e.g., a compact bidirectional promoter) identified by a method of the disclosure is smaller than a CMV promoter.
[0160] In some embodiments, the bidirectional promoter (e.g., a compact bidirectional promoter) has higher activity than standard promoters (e.g., higher activity than a HSV TK promoter). For example, in some embodiments, the bidirectional promoter (e.g., a compact bidirectional promoter) is capable of expressing a luciferase reporter at a higher level than is a HSV TK promoter. The expression level of a bidirectional promoter (e.g., a compact bidirectional promoter) can be determined, for example, by expressing a reporter molecule in a cell, e.g., a HEK cell line or an N2A cell line.
IV. Coding Sequences
[0161] In some embodiments, a compact bidirectional promoter of the disclosure is operably linked to at least one (e.g., two) heterologous coding sequence. For example, in some embodiments, the compact bidirectional promoter is operably linked to only one heterologous coding sequence.
[0162] In some embodiments, the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
[0163] In some embodiments, the compact bidirectional promoter of the disclosure is operably linked to two heterologous coding sequence. In some embodiments, the two heterologous coding sequences include the same coding sequence. Alternatively, for example, in some embodiments, the two heterologous coding sequences include different coding sequences.
[0164] In some embodiments, the compact bidirectional promoter is capable of expressing the at least one (e.g., two) heterologous coding sequence in a target cell (e.g., a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell). For example, in some embodiments, the compact bidirectional promoter is capable of expressing each of the two heterologous coding sequences: (a) in the same target cell or cells, (b) in different target cells, or (c) in a partially overlapping set of target cells. In some embodiments, the compact bidirectional promoter is capable of expressing each of the two heterologous coding sequences in the same target cell or cells. In some embodiments, the compact bidirectional promoter is capable of expressing each of the two heterologous coding sequences in different target cells. In some embodiments, the compact bidirectional promoter is capable of expressing each of the two heterologous coding sequences in a partially overlapping set of cells.
[0165] In some embodiments, a coding sequence encodes one or more genes selected from the non-limiting list of: CFTR, ATP2B, ATP7A, AGL, CPS1, AIAT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, RIGA, PRKN, RPE65, SERPINGI, SGSH, SLC13A5, and SLC6A1 or a functional fragment or variant thereof. For example, in some embodiments, a coding sequence encodes CFTR or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes A TP2B or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes A TP 7 A or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes AGL or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CPS1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes A 1AT or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes ALPL or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes ARSA or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes BBS1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes BEST1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CAH or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CFH or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CFI or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CHMo a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CLN2 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CLN7 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CNGA3 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes CYP46A1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes F9 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes FKRP or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes FMRI or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes FMRP or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes F0XG1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GAD or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GALC or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GALGT2 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GBA1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GBE1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GLB1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes GRN or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes HEXA or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes HTRA1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes IDS or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes IDUA or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes LAMP2 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes LCA5 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes MECP2 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes MFN2 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes MMUT or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes MTM1 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes NAGLU or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes ND4 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes PAH or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes PIGA or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes PPKN or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes PPE65 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes SERPINGI or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes SGSH or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes SLC13A5 or a functional fragment or variant thereof. In some embodiments, a coding sequence encodes SLC6A1 or a functional fragment or variant thereof.
[0166] In some embodiments, a coding sequence encodes one or more genes selected from the non-limiting list of: F8, F9, PIGA, SGSH, G6PC, NAGLU, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDUA, LAMP2, CLN1, ATP7B, Al AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGU, MMUT, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MIMI, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASPA, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, UBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, UPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, EMD, SURF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GUCY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPINGI, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, andMYBPC3, or a functional fragment of a variant thereof.
[0167] For example, in some embodiments, a coding sequence encodes F8. In some embodiments, a coding sequence encodes F9. In some embodiments, a coding sequence encodes PIGA. In some embodiments, a coding sequence encodes SGSH. In some embodiments, a coding sequence encodes G6PC. In some embodiments, a coding sequence encodes NAGLU. In some embodiments, a coding sequence encodes CLN3. In some embodiments, a coding sequence encodes GBA. In some embodiments, a coding sequence encodes IDS. In some embodiments, a coding sequence encodes GAA. In some embodiments, a coding sequence encodes OTC. In some embodiments, a coding sequence encodes GLA. In some embodiments, a coding sequence encodes CAH. In some embodiments, a coding sequence encodes IDUA. In some embodiments, a coding sequence encodes LAMP2. In some embodiments, a coding sequence encodes CLN1. In some embodiments, a coding sequence encodes A TP7B. In some embodiments, a coding sequence encodes A1AT. In some embodiments, a coding sequence encodes GALT. In some embodiments, a coding sequence encodes EMNA. In some embodiments, a coding sequence encodes ENPP1. In some embodiments, a coding sequence encodes CLN2. In some embodiments, a coding sequence encodes CLN5. In some embodiments, a coding sequence encodes CLN7/MFSD8. In some embodiments, a coding sequence encodes AGU. In some embodiments, a coding sequence encodes MMUT. In some embodiments, a coding sequence encodes NPC2. In some embodiments, a coding sequence encodes ABCB11. In some embodiments, a coding sequence encodes ABCB4. In some embodiments, a coding sequence encodes ASS1. In some embodiments, a coding sequence encodes SMN1. In some embodiments, a coding sequence encodes AADC. In some embodiments, a coding sequence encodes MTM1. In some embodiments, a coding sequence encodes GBA1. In some embodiments, a coding sequence encodes GRN In some embodiments, a coding sequence encodes GAD. In some embodiments, a coding sequence encodes GALGT2. In some embodiments, a coding sequence encodes SGCB. In some embodiments, a coding sequence encodes GDNF. In some embodiments, a coding sequence encodes ASPA. In some embodiments, a coding sequence encodes GLB1. In some embodiments, a coding sequence encodes GALC. In some embodiments, a coding sequence encodes SGCA. In some embodiments, a coding sequence encodes DYSF. In some embodiments, a coding sequence encodes HEXA. In some embodiments, a coding sequence encodes GAN. In some embodiments, a coding sequence encodes FXN. In some embodiments, a coding sequence encodes ARSA. In some embodiments, a coding sequence encodes MECP2. In some embodiments, a coding sequence encodes IGHMBP2. In some embodiments, a coding sequence encodes UBE3A. In some embodiments, a coding sequence encodes CDKL5. In some embodiments, a coding sequence encodes PGRN. In some embodiments, a coding sequence encodes FKRP. In some embodiments, a coding sequence encodes CYP46A1. In some embodiments, a coding sequence encodes OPMD. In some embodiments, a coding sequence encodes Cavl. In some embodiments, a coding sequence encodes neuropeptide Y/Y2. In some embodiments, a coding sequence encodes SCN1A. In some embodiments, a coding sequence encodes SHANK3. In some embodiments, a coding sequence encodes APOE2(R158C). In some embodiments, a coding sequence encodes FMRI. In some embodiments, a coding sequence encodes UPF1. In some embodiments, a coding sequence encodes CMT4J. In some embodiments, a coding sequence encodes MFN2. In some embodiments, a coding sequence encodes PRKN. In some embodiments, a coding sequence encodes CAPN3. In some embodiments, a coding sequence encodes NTF3. In some embodiments, a coding sequence encodes AN05. In some embodiments, a coding sequence encodes SGCG. In some embodiments, a coding sequence encodes EMD. In some embodiments, a coding sequence encodes SURF1. In some embodiments, a coding sequence encodes GBE1. In some embodiments, a coding sequence encodes FMRP. In some embodiments, a coding sequence encodes RPE65. In some embodiments, a coding sequence encodes RPGR. In some embodiments, a coding sequence encodes CHM. In some embodiments, a coding sequence encodes ND4. In some embodiments, a coding sequence encodes CNGB3. In some embodiments, a coding sequence encodes PDE6b. In some embodiments, a coding sequence encodes CFI. In some embodiments, a coding sequence encodes CNGA3. In some embodiments, a coding sequence encodes GUCY2D. In some embodiments, a coding sequence encodes RLBP1. In some embodiments, a coding sequence encodes CD59. In some embodiments, a coding sequence encodes 0PN1LW. In some embodiments, a coding sequence encodes CFH. In some embodiments, a coding sequence encodes MYO 7 A. In some embodiments, a coding sequence encodes RSI. In some embodiments, a coding sequence encodes ABCA4. In some embodiments, a coding sequence encodes ND1. In some embodiments, a coding sequence encodes BEST1. In some embodiments, a coding sequence encodes RHO. In some embodiments, a coding sequence encodes LCA5. In some embodiments, a coding sequence encodes RDH12. In some embodiments, a coding sequence encodes NMNA Tl. In some embodiments, a coding sequence encodes SERPING1. In some embodiments, a coding sequence encodes AQP1. In some embodiments, a coding sequence encodes PPP1R1A. In some embodiments, a coding sequence encodes IL-lRa. In some embodiments, a coding sequence encodes CFTR. In some embodiments, a coding sequence encodes OTOF. In some embodiments, a coding sequence encodes CLRN1. In some embodiments, a coding sequence encodes GJB2. In some embodiments, a coding sequence encodes ALPL. In some embodiments, a coding sequence encodes TMC1. In some embodiments, a coding sequence encodes STRC. In some embodiments, a coding sequence encodes AT0H1. In some embodiments, a coding sequence encodes MYBPC3.
[0168] In some embodiments, the therapeutic coding sequence is less than about 750 (e.g., less than about 700, less than about 600, less than about 500, or less than about 400) amino acids. For example, in some embodiments, the therapeutic coding sequence is less than about 700 amino acids. In some embodiments, the therapeutic coding sequence is less than about 600 amino acids. In some embodiments, the therapeutic coding sequence is less than about 500 amino acids. In some embodiments, the therapeutic coding sequence is less than about 400 amino acids. [0169] In some embodiments, the therapeutic coding sequence is from about 350 amino acids to about 750 amino acids (e.g., from about 400 amino acids to about 700 amino acids or from about 500 amino acids to about 600 amino acids). For example, in some embodiments, the therapeutic coding sequence is from about 400 amino acids to about 700 amino acids. In some embodiments, the therapeutic coding sequence is from about 500 amino acids to about 600 amino acids.
[0170] For example, in some embodiments, any such coding sequence may be provided in an expression construct and the construct itself may be provided as a transgene in a vector, such as the exemplary vectors of the disclosure (e.g., rAAV). The transgene is a nucleic acid sequence, heterologous to the vector sequences flanking the transgene, which encodes a polypeptide, protein, or other product, of interest. The nucleic acid coding sequence may be operatively linked to regulatory components in a manner which permits transgene transcription, translation, and/or expression in a target cell. The heterologous nucleic acid sequence e.g., transgene) can be derived from any organism. In some embodiments, the transgene is derived from a mammal, such as a human.
[0171] In some embodiments, the expression construct includes, in addition to a compact bidirectional promoter and a coding sequence, a second coding sequence positioned on the opposite side of the promoter that encodes an RNA molecule or a protein. For example, in some embodiments, the second coding sequence encodes a molecule (e.g., an RNA molecule or a second protein) smaller than a molecule encoded by the first coding sequence. In some embodiments, the second coding sequence encodes a molecule (e.g., an RNA molecule or a second protein) larger than a molecule encoded by the first coding sequence. In some embodiments, the second coding sequence encodes a molecule (e.g., an RNA molecule or a second protein) having a substantially equal size to a molecule encoded by the first coding sequence.
[0172] In some embodiments, the coding sequence is expressed in a target cell. In some embodiments, the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. For example, in some embodiments, the target cell is a lung cell. In some embodiments, the target cell is a pancreatic cell. In some embodiments, the target cell is a kidney cell. In some embodiments, the target cell is a muscle cell. In some embodiments, the target cell is a liver cell. In some embodiments, the target cell is a retinal cell. In some embodiments, the target cell is a retinal cell. In some embodiments, the target cell is a neuron. In some embodiments, the target cell is a glial cell. In some embodiments, the target cell is an endothelial cell. In some embodiments, the target cell is an epithelial cell. A. Codon Optimization
[0173] The coding sequences described herein can be codon optimized variants of a nucleic acid sequence of a gene or RNA equivalent thereof encoding a protein of interest so as to achieve, for instance, enhanced expression of the protein in a particular cell type (e.g., a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell). For example, genes and RNA equivalents thereof can be optimized for tissue-specific expression of an encoded protein. Optimized genes and RNA equivalents thereof can be synthesized by methods known in the art, such as chemical synthesis techniques, and may be amplified, for instance, using polymerase chain reaction (PCR)-based amplification methods or by transfection of the gene into a cell, such as a bacterial cell or mammalian cell capable of replicating exogenous nucleic acids.
(i) Increasing quantity of high-frequency codons
[0174] For example, one of skill in the art can design variants of the target gene that contain greater quantities of high-frequency codons within the target organism of interest. For instance, after enhancing the protein-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the target gene relative to the coding strands of genes expressed at high levels within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the quantity of codons that frequently occur in endogenous genes within the target organism (e.g., a mammal, such as a human). For example, codons that have increased GC content tend to be employed more frequently in protein-coding genes.
(ii) Reducing CpG content and homopolymer content
[0175] Alternatively, or in addition to the above, one of skill in the art can manipulate the protein-encoding gene sequence of a target gene by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the gene. For instance, one can begin with a wild-type gene sequence and introduce substitutions e.g., single-nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene while preserving the identity of the encoded proteins sequence. One can then, for example, obtain a gene sequence that minimally resembles the genes encoded in a cell type of interest. Alternatively, one can begin with a sequence that has been codon optimized and subsequently can be manipulated by the introduction of mutations (e.g., single- nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene. Once designed, the final codon optimized gene can be prepared, for instance, by solid phase nucleic acid procedures known in the art. Additionally, the prepared gene can be amplified, for instance, using PCR-based techniques described herein or known in the art, and/or by transformation of cells with a plasmid containing the designed gene.
(Hi) Exemplary codon optimized coding sequences
[0176] In some embodiments, the one or more (e.g., two) coding sequences of the disclosure encodes one or more codon optimized genes selected from the non-limiting list of: CFTR, ATP2B, ATP7A, AGL, CPS1, AIAT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, RIGA, PRKN, RPE65, SERPINGI, SGSH, SLCI3A5, and SLC6A1. For example, in some embodiments, a coding sequence encodes a codon optimized variant of CFTR. In some embodiments, a coding sequence encodes a codon optimized variant of ATP2B. In some embodiments, a coding sequence encodes a codon optimized variant of ATP7A. In some embodiments, a coding sequence encodes a codon optimized variant of AGL. In some embodiments, a coding sequence encodes a codon optimized variant of CPS1. In some embodiments, a coding sequence encodes a codon optimized variant of A1AT. In some embodiments, a coding sequence encodes a codon optimized variant of ALPL. In some embodiments, a coding sequence encodes a codon optimized variant of ARSA. In some embodiments, a coding sequence encodes a codon optimized variant of BBS1. In some embodiments, a coding sequence encodes a codon optimized variant of BEST1. In some embodiments, a coding sequence encodes a codon optimized variant of CAH. In some embodiments, a coding sequence encodes a codon optimized variant of CFH. In some embodiments, a coding sequence encodes a codon optimized variant of CFI. In some embodiments, a coding sequence encodes a codon optimized variant of CHM. In some embodiments, a coding sequence encodes a codon optimized variant of CLN2. In some embodiments, a coding sequence encodes a codon optimized variant of CLN7. In some embodiments, a coding sequence encodes a codon optimized variant of CNGA3. In some embodiments, a coding sequence encodes a codon optimized variant of CYP46A1. In some embodiments, a coding sequence encodes a codon optimized variant of F9. In some embodiments, a coding sequence encodes a codon optimized variant of FKRP. In some embodiments, a coding sequence encodes a codon optimized variant of FMRI. In some embodiments, a coding sequence encodes a codon optimized variant of FMRP. In some embodiments, a coding sequence encodes a codon optimized variant of F0XG1. In some embodiments, a coding sequence encodes a codon optimized variant of GAD. In some embodiments, a coding sequence encodes a codon optimized variant of GALC. In some embodiments, a coding sequence encodes a codon optimized variant of GALGT2. In some embodiments, a coding sequence encodes a codon optimized variant of GBA1. In some embodiments, a coding sequence encodes a codon optimized variant of GBE1. In some embodiments, a coding sequence encodes a codon optimized variant of GLB1. In some embodiments, a coding sequence encodes a codon optimized variant of GRN. In some embodiments, a coding sequence encodes a codon optimized variant of HEXA. In some embodiments, a coding sequence encodes a codon optimized variant of HTRA1. In some embodiments, a coding sequence encodes a codon optimized variant of IDS. In some embodiments, a coding sequence encodes a codon optimized variant of IDUA. In some embodiments, a coding sequence encodes a codon optimized variant of LAMP2. In some embodiments, a coding sequence encodes a codon optimized variant of LCA5. In some embodiments, a coding sequence encodes a codon optimized variant oiMECP2. In some embodiments, a coding sequence encodes a codon optimized variant oiMFN2. In some embodiments, a coding sequence encodes a codon optimized variant of MMUT. In some embodiments, a coding sequence encodes a codon optimized variant of MTMl. In some embodiments, a coding sequence encodes a codon optimized variant of NAGLU. In some embodiments, a coding sequence encodes a codon optimized variant of ND4. In some embodiments, a coding sequence encodes a codon optimized variant of PAH. In some embodiments, a coding sequence encodes a codon optimized variant of PIGA. In some embodiments, a coding sequence encodes a codon optimized variant of PRKN. In some embodiments, a coding sequence encodes a codon optimized variant of RPE65. In some embodiments, a coding sequence encodes a codon optimized variant of SERPING1. In some embodiments, a coding sequence encodes a codon optimized variant of SGSH. In some embodiments, a coding sequence encodes a codon optimized variant of SLC13A5. In some embodiments, a coding sequence encodes a codon optimized variant of SLC6A1.
[0177] In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 85% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 91% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 92% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 93% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 94% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 95% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 96% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 97% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 98% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having at least 99% sequence identity to any one of SEQ ID NOs: 819-826. In some embodiments, a coding sequence encoding a codon optimized variant of CFTR has a nucleotide sequence having 100% sequence identity to any one of SEQ ID NOs: 819-826.
[0178] In some embodiments, a coding sequence encoding a codon optimized variant o MTMl has a nucleotide sequence having at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 85% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 91% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 92% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 93% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 94% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 95% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 96% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 97% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 98% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having at least 99% sequence identity to any one of SEQ ID NOs: 827-836. In some embodiments, a coding sequence encoding a codon optimized variant of MTM1 has a nucleotide sequence having 100% sequence identity to any one of SEQ ID NOs: 827-836.
[0179] In some embodiments, the one or more (e.g., two) coding sequences of the disclosure encodes one or more codon optimized genes selected from the non-limiting list of: F8, F9, PIGA, SGSH, G6PC, NAGLU, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDUA, LAMP2, CLN1, ATP7B, Al AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGU, MMUT, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MIMI, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASPA, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, UBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, UPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, FMD, SURF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GUCY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPING1, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, and MYBPC 3, or a functional fragment of a variant thereof.
[0180] For example, in some embodiments, the one or more (e.g., two) coding sequences of the disclosure encodes codon optimized F8 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized F9 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized PIGA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SGSH or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized G6PC or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized NAGLU or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLN3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GBA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized IDS or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GAA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized OTC or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GLA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CAH or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized IDUA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized LAMP2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLN1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized A TP 7B or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized A1AT or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GALT or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized LMNA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ENPP1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLN2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLN5 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLN7/MFSD8 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized AGU or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MMUT or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized NPC2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ABCB11 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ABCB4 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ASS1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SMN1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized AADC or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MTM1 or a functional fragment of a variant thereof (e.g., a nucleotide sequence having at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 827-836) . In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GBA1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GRN or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GAD or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GALGT2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SGCB or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GDNF or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ASPA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GLB1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GALC or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SGCA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized DYSF or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized HEXA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GAN or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized FXN or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ARSA or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MECP2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized IGHMBP2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized UBE3A or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CDKL5 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized PGRN or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized FKRP or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CYP46A1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized OPMD or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized Cavl or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized neuropeptide Y/Y2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SCN1A or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SHANK3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized APOE2(R158C) or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized FMRI or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized UPF1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CMT4J or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MFN2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized PRKN or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CAPN3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized NTF3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized AN05 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SGCG or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized EMD or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SURF1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GBE1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized FMRP or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized RPE65 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized RPGR or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CHM or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ND4 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CNGB3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized PDE6b or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CFI or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CNGA3 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized
GUCY2D or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized BEBP1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CD59 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized 0PN1LW or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CFH or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MY07A or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized RSI or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ABCA4 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ND1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized BEST1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized RHO or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized LCA5 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized RDH12 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized NMNA T1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized SERPING1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized AQP1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized PPP 1R1A or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized IL-IRa or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CFTR or a functional fragment of a variant thereof (e.g., a nucleotide sequence having at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 819-826). In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized OTOF or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized CLRN1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized GJB2 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized ALPL or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized TMC1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized STRC or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized A T0H1 or a functional fragment of a variant thereof. In some embodiments, the one or more coding sequences of the disclosure encodes codon optimized MYBPC3.
V. Methods for the Delivery of Exogenous Nucleic Acids to Target Cells
[0181] A compact bidirectional promoter provided herein can be selected to express the selected coding sequence in a desired target cell. For example, the disclosure herein provides a method of expressing a heterologous coding sequence in a cell (e.g., a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell), the method including transfecting the cell with any of the described expression constructs, such as with the methods described herein.
[0182] The disclosure also provides, a method of expressing at least one heterologous coding sequence in a target cell, the method including introducing into a subject a nucleic acid (e.g., such as with the methods described in this section) including a compact bidirectional promoter operably linked to at least one heterologous coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp (e.g., less than about 800 bp, less than about 600 bp, less than about 400 bp, or less than about 200 bp), and wherein the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter in the cell.
[0183] In yet another embodiment, the disclosure provides a method of expressing two heterologous coding sequences in different target cells (e.g., a combination of two cell types selected from a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, and an epithelial cell), the method including introducing into a subject a nucleic acid (e.g., such as with the methods described in this section) including a compact bidirectional promoter operably linked to the two heterologous coding sequences positioned on opposite sides of the compact bidirectional promoter in the cell, wherein the compact bidirectional promoter is less than about 1000 bp (e.g., less than about 800 bp, less than about 600 bp, less than about 400 bp, or less than about 200 bp), and wherein the compact bidirectional promoter promotes transcription of one of the coding sequences in a first target cell (e.g., a kidney cell) and promotes transcription of the other coding sequence in a second target cell (e.g., a muscle cell).
[0184] In some embodiments, the promoter comprises a COX15 bidirectional promoter (SEQ ID NO. 80) that expresses one coding sequence in one or more of the tissues shown for gene “COX15” in FIG. 4A-FIG. 4B and the other coding sequence in one or more of the tissues shown for gene “CUTC” in FIG. 4A-FIG. 4B.
[0185] In some embodiments, the promoter comprises a M0RN5 bidirectional promoter (SEQ ID NO. 221) that expresses one coding sequence in one or more of the tissues shown for gene “M0RN5” in FIGs 11A-FIG. 14 and the other coding sequence in one or more of the tissues shown for gene “NDUFA8” in FIGs 11A-FIG. 14.
[0186] In some embodiments, the promoter comprises an NDUFB9 bidirectional promoter (SEQ ID NO. 339) that expresses one coding sequence in one or more of the tissues shown for gene “NDUFB9” in FIGs 15A-FIG. 18 and the other coding sequence in one or more of the tissues shown for gene “TATDNT in FIGs 15A-FIG. 18.
[0187] In some embodiments, the promoter comprises an NDUFA7 bidirectional promoter (SEQ ID NO. 220) that expresses one coding sequence in one or more of the tissues shown for gene “NDUFA7' in FIGs 19A-FIG. 22 and the other coding sequence in one or more of the tissues shown for gene “RPS28” in FIGs 19A-FIG. 22.
[0188] In some embodiments, the promoter comprises an ALKBH1 bidirectional promoter (SEQ ID NO. 17) that expresses one coding sequence in one or more of the tissues shown for gene “ALKBHT in FIGs 23A-FIG. 26 and the other coding sequence in one or more of the tissues shown for gene “SLIRP” in FIGs 23A-FIG. 26. In some embodiments, a coding sequence is expressed in a target cell. In some embodiments, the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. For example, in some embodiments, the target cell is a lung cell. In some embodiments, the target cell is a pancreatic cell. In some embodiments, the target cell is a kidney cell. In some embodiments, the target cell is a muscle cell. In some embodiments, the target cell is a liver cell. In some embodiments, the target cell is a retinal cell. In some embodiments, the target cell is a retinal cell. In some embodiments, the target cell is a neuron. In some embodiments, the target cell is a glial cell. In some embodiments, the target cell is an endothelial cell. In some embodiments, the target cell is an epithelial cell. In some embodiments, the target cell is any cell in FIG. 4A-FIG. 6D and FIG. 11A-FIG. 26
[0189] Techniques that can be used to introduce a nucleic acid molecule into a mammalian cell are well known in the art. For example, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al., NUCLEIC ACIDS RESEARCH 15: 1311 (1987), the disclosure of which is incorporated herein by reference. A similar technique, Nucleofection™, utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell.
[0190] Nucleofection™ and protocols useful for performing this technique are described in detail, e.g., in Distler et al., EXPERIMENTAL DERMATOLOGY 14:315 (2005), as well as in US 2010/0317114, the disclosures of each of which are incorporated herein by reference.
[0191] Additional techniques useful for the transfection of target cells are the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al., JoVE 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
[0192] Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for example, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for example, in U.S. Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
[0193] Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids are contacting a cell with a cationic polymer-nucleic acid complex. Exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane are activated dendrimers (described, e.g., in Dennig, TOPICS IN CURRENT CHEMISTRY 228:227 (2003), the disclosure of which is incorporated herein by reference) polyethylenimine, and diethylaminoethyl (DEAE)- dextran, the use of which as a transfection agent is described in detail, for example, in Gulick et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY 40: 1 :9.2:9.2.1 (1997), the disclosure of which is incorporated herein by reference. Magnetic beads are another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for example, in US 2010/0227406, the disclosure of which is incorporated herein by reference.
[0194] Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, also called optical transfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. The bioactivity of this technique is similar to, and in some cases found superior to, electroporation.
[0195] Impalefection is another technique that can be used to deliver genetic material to target cells. It relies on the use of nanomaterials, such as carbon nanofibers, carbon nanotubes, and nanowires.
[0196] Needle-like nanostructures are synthesized perpendicular to the surface of a substrate. DNA containing the gene, intended for intracellular delivery, is attached to the nanostructure surface. A chip with arrays of these needles is then pressed against cells or tissue. Cells that are impaled by nanostructures can express the delivered gene(s). An example of this technique is described in Shalek et al., PNAS 107: 1870 (2010), the disclosure of which is incorporated herein by reference.
[0197] Magnetofection can also be used to deliver nucleic acids to target cells. The magnetofection principle is to associate nucleic acids with cationic magnetic nanoparticles. The magnetic nanoparticles are made of iron oxide, which is fully biodegradable, and coated with specific cationic proprietary molecules varying upon the applications. Their association with the gene vectors (DNA, viral vector) is achieved by salt-induced colloidal aggregation and electrostatic interaction. The magnetic particles are then concentrated on the target cells by the influence of an external magnetic field generated by magnets. This technique is described in detail in Scherer et al., GENE THERAPY 9: 102 (2002), the disclosure of which is incorporated herein by reference.
[0198] Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is sonoporation, a technique that involves the use of sound (typically ultrasonic frequencies) for modifying the permeability of the cell plasma membrane permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., METHODS IN CELL BIOLOGY 82:309 (2007), the disclosure of which is incorporated herein by reference.
[0199] Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For example, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genomemodifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site-specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence. The use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13, Abstract No. 122.
VI. Nucleic Acid Vectors
[0200] Effective intracellular concentrations of a coding sequence (e.g., a gene) disclosed herein can be achieved via the stable expression of a vector encoding a coding sequence (e.g., by integration into the nuclear or mitochondrial genome of a mammalian cell). In order to introduce such a gene into a mammalian cell, the gene can be incorporated into a vector.
[0201] Vectors can be introduced into a cell by a variety of methods, including transformation, transfection, direct uptake, projectile bombardment, and by encapsulation of the vector in a liposome. Examples of suitable methods of transfecting or transforming cells are calcium phosphate precipitation, electroporation, microinjection, infection, lipofection, and direct uptake. Such methods are described in more detail, for example, in Green et al., Molecular Cloning: A Laboratory Manual, Fourth Edition (Cold Spring Harbor University Press, New York (2014)); and Ausubel et al. , Current Protocols in Molecular Biology (John Wiley & Sons, New York (2015)), the disclosures of each of which are incorporated herein by reference.
[0202] The genes disclosed herein can also be introduced into a mammalian cell by targeting a vector containing a polynucleotide encoding such a gene to cell membrane phospholipids. For example, vectors can be targeted to the phospholipids on the extracellular surface of the cell membrane by linking the vector molecule to a VSV-G protein, a viral protein with affinity for all cell membrane phospholipids. Such, a construct can be produced using conventional and routine methods of the art. In addition to achieving high rates of transcription and translation, stable expression of an exogenous polynucleotide in a mammalian cell can be achieved by integration of the polynucleotide containing the gene into the nuclear genome of the mammalian cell. A variety of vectors for the delivery and integration of polynucleotides encoding exogenous proteins into the nuclear DNA of a mammalian cell have been developed. Examples of expression vectors are disclosed in, e.g., WO 1994/011026 and are incorporated herein by reference. Expression vectors for use in the compositions and methods described herein contain a polynucleotide sequence that encodes a gene as well as, e.g., additional sequence elements used for the expression of these genes and/or the integration of these polynucleotide sequences into the genome of a mammalian cell. Certain vectors that can be used include plasmids that contain regulatory sequences, such as promoter and enhancer regions, which direct gene transcription. Other useful vectors contain polynucleotide sequences that enhance the rate of translation of these genes or improve the stability or nuclear export of the mRNA that results from gene transcription. These sequence elements include, e.g., 5' and 3' UTR regions, an IRES, and polyA in order to direct efficient transcription of the gene carried on the expression vector. The expression vectors suitable for use with the compositions and methods described herein may also contain a polynucleotide encoding a marker for selection of cells that contain such a vector. Examples of a suitable marker are genes that encode resistance to antibiotics, such as ampicillin, chloramphenicol, kanamycin, and nourseothricin.
[0203] In some embodiments, any of the vectors disclosed herein are capable of inducing at least 20%, at least 50%, at least 100%, at least 150%, at least 200%, at least 250%, at least 300%, at least 400%, at least 500%, at least 700%, at least 900%, at least 1000%, at least 1100%, at least 1500%, or at least 2000% higher expression of CFTR, ATP2B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS I, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, RIGA, PRKN, RPE65, SERPINGI, SGSH, SLCI3A5, or SLC6A1 or a functional fragment or variant thereof in a target cell, as compared to the endogenous expression of CFTR, ATP2B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMET, MIMI, NAGLE, ND4, PAH, PIGA, PRKN, RPE65, SERPINGI, SGSH, SLC13A5, or SLC6A1, respectively, in the target cell.
[0204] In some embodiments, expression of any of the vectors disclosed herein in a target cell results in at least 20%, at least 50%, at least 100%, at least 150%, at least 200%, at least 250%, at least 300%, at least 400%, at least 500%, at least 700%, at least 900%, at least 1000%, at least 1100%, at least 1500%, or at least 2000% higher activity levels of CFTR, ATP2B, ATP7A, AGL, CPS1, Al AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMUT, MIMI, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, or SLC6A1 or a functional fragment or variant thereof in the target cell as compared to endogenous activity levels of CFTR, ATP2B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDEA, LAMP2, LCA5, MECP2, MFN2, MMET, MTM1, NAGLE, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, or SLC6A1, respectively, in the target cell.
[0205] In some embodiments, any of the vectors disclosed herein are capable of inducing at least 20%, at least 50%, at least 100%, at least 150%, at least 200%, at least 250%, at least 300%, at least 400%, at least 500%, at least 700%, at least 900%, at least 1000%, at least 1100%, at least 1500%, or at least 2000% higher expression of F8, F9, PIGA, SGSH, G6PC, NAGLE, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDEA, LAMP2, CLN1, ATP7B, A1AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGE, MMET, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MTM1, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASPA, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, EBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, EPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, EMD, SERF!, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GECY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPING1, AQP1, PPP 1 RIA, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, or MYBPC3 or a functional fragment or variant thereof in a target cell, as compared to the endogenous expression of F8, F9, PIGA, SGSH, G6PC, NAGLE, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDEA, LAMP2, CLN1, ATP7B, Al AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGE, MMET, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MTM1, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASPA, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, EBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, EPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, EMD, SERF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GECY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPINGI, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, AT0H1, or MYBPC 3, respectively, in the target cell.
[0206] In some embodiments, expression of any of the vectors disclosed herein in a target cell results in at at least 20%, at least 50%, at least 100%, at least 150%, at least 200%, at least 250%, at least 300%, at least 400%, at least 500%, at least 700%, at least 900%, at least 1000%, at least 1100%, at least 1500%, or at least 2000% higher activity levels of F8, F9, PIGA, SGSH, G6PC, NAGLU, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDUA, LAMP2, CLN1, ATP7B, Al AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGU, MMUT, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MIMI, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASP A, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, UBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, UPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, EMD, SURF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GUCY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPINGI, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, or MYBPC 3 or a functional fragment or variant thereof in the target cell as compared to endogenous activity levels of S, F9, PIGA, SGSH, G6PC, NAGLU, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDUA, LAMP2, CLN1, ATP7B, A1AT, GALT, EMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGU, MMUT, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MIMI, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASP A, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, UBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, UPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, ANO5, SGCG, EMD, SURF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GUCY2D, RLBP1, CD59, OPN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPINGI, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, or MYBPC 3 respectively, in the target cell.
VII. Viral Vectors
[0207] Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous polynucleotides into a mammalian cell. Viral genomes are particularly useful vectors for gene delivery as the polynucleotides contained within such genomes are typically incorporated into the nuclear genome of a mammalian cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration. Examples of viral vectors are a parvovirus (e.g., AAV, retrovirus (e.g., Retroviridae family viral vector), adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus e.g., rabies and vesicular stomatitis virus), paramyxovirus e.g. measles and Sendai), positive strand RNA viruses, such as picornavirus and alphavirus, and double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MV A), fowlpox, and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, human papilloma virus, human foamy virus, and hepatitis virus, for example. Examples of retroviruses are avian leukosis-sarcoma, avian C-type viruses, mammalian C-type, B-type viruses, D-type viruses, oncoretroviruses, HTLV-BLV group, lentivirus, alpharetrovirus, gammaretrovirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, Virology, Third Edition (Lippincott-Raven, Philadelphia, (1996))). Other examples are murine leukemia viruses, murine sarcoma viruses, murine mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus, and lentiviruses. Other examples of vectors are described, for example, in McVey et al., (U.S. Patent No. 5,801,030), the teachings of which are incorporated herein by reference.
A. Regulatory Sequences
[0208] A nucleic acid of the disclosure may be operably linked to a regulatory sequence. For example, in some embodiments, regulatory sequences are operably linked to a transgene including a heterologous coding sequence or a functional fragment or variant thereof. The regulatory sequences may include conventional control elements which permit the coding sequence’s transcription, translation, and/or expression in a cell transfected with the vector or infected with the virus produced by the disclosure.
[0209] The regulatory sequences useful in the constructs of the present disclosure may include an intron, such as an intron located between the compact bidirectional promoter and the coding sequence. In some embodiments, the intron sequence is derived from SV-40 and is a 100 bp mini-intron splice donor/splice acceptor referred to as SD-SA. [0210] In some embodiments, a vector of the disclosure may include a woodchuck hepatitis virus post-transcriptional element. (See, e.g., L. Wang and I. Verma, 1999 PROC. NATL. ACAD. SCI., USA, 96:3906-3910).
[0211] In some embodiments, a vector of the disclosure may include a polyA signal, such as a polyA signal derived from many suitable species, including, without limitation SV-40, human, and bovine.
[0212] Another regulatory component of the rAAV useful in the method of the disclosure is an IRES. An IRES sequence, or other suitable systems, may be used to produce more than one polypeptide from a single gene transcript (for example, to produce more polypeptides). An IRES may be used to produce a protein that contains more than one polypeptide chains or to express two different proteins from or within the same cell. In some embodiments, the IRES is located 3' to the transgene in the rAAV vector.
[0213] Other regulatory sequences useful in the vectors of the disclosure include enhancer sequences. Enhancer sequences useful in the disclosure include the 1RBP enhancer, immediate early cytomegalovirus enhancer, an enhancer derived from an immunoglobulin gene, an enhancer derived from the SV40 enhancer, or an enhancer identified in a c/.s-acting element in a mouse proximal promoter.
[0214] Selection of these and other common vector and regulatory elements are well-known in the art and many such sequences are available (see, e.g., Sambrook et al., and references cited therein at, for example, pages 3.18-3.26 and 16, 17-16.27 and Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1989).
[0215] A vector herein may also contain a reporter sequence for co-expression, such as but not limited to lacZ, GFP, CFP, YFP, RFP, mCherry, and tdTomato. In some embodiments, the rAAV vector may include a selectable marker.
B. A A V Vectors
[0216] Genes described herein can be incorporated into rAAV vectors in order to facilitate their introduction into a cell, such as a target cell. rAAV vectors useful in the conjunction with the compositions and methods described herein include recombinant nucleic acid constructs that contain (1) a gene and (2) nucleic acids that facilitate and expression of the heterologous genes. The viral nucleic acids may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion. Such rAAV vectors may also contain marker or reporter genes. [0217] Useful rAAV vectors include those having one or more of the naturally occurring AAV genes deleted in whole or in part, but retain functional flanking ITR sequences. The AAV ITRs may be of any serotype suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tai et al., J. BIOMED. SCI. 7:279-291 (2000), and Monahan and Samulski, GENE DELIVERY 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
[0218] In some embodiments, the AAV includes two ITRs.
[0219] The genes described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell. The capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV Cap gene. The Cap gene encodes three viral coat proteins, VP1, VP2 and VP3, which are required for virion assembly. The construction of rAAV virions has been described, for example, in US Patent Nos. 5,173,414; 5,139,941; 5,863,541; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al., J. VIROL. 76:791-801 (2002) and Bowles et al., J. VIROL. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
[0220] In some embodiments, the recombinant AAV vector, including rep sequences, cap sequences, and helper functions required for producing the rAAV of the disclosure may be delivered to the packaging host cell using any appropriate genetic element (e.g, vector). In some embodiments, a single nucleic acid encoding all three capsid proteins (e.g, VP1, VP2 and VP3) is delivered into the packaging host cell in a single vector. In some embodiments, nucleic acids encoding the capsid proteins are delivered into the packaging host cell by two vectors; a first vector including a first nucleic acid encoding two capsid proteins (e.g., VP1 and VP2) and a second vector including a second nucleic acid encoding a single capsid protein (e.g., VP3). In some embodiments, three vectors, each including a nucleic acid encoding a different capsid protein, are delivered to the packaging host cell. The selected genetic element may be delivered by any suitable method, including those described herein. The methods used to construct any embodiment of this disclosure are known to those with skill in nucleic acid manipulation and include genetic engineering, recombinant engineering, and synthetic techniques. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. Similarly, methods of generating rAAV virions are well known and the selection of a suitable method is not a limitation on the present disclosure. See, e.g., K. Fisher et al., J. VIROL., 70:520-532 (1993) and U.S. Pat. No. 5,478,745. These publications are incorporated by reference herein. [0221] rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1, 2, 3, 4, 5, 6, 7, 8, and 9. Construction and use of AAV vectors and AAV proteins of different serotypes are described, for example, in Chao et al., MOL. THER. 2:619-623 (2000); Davidson et al., PROC. NATL. ACAD. SCI. USA 97:3428-3432 (2000); Xiao etal., J. VIROL. 72:2224-2232 (1998); Halbert etal., J. VIROL. 74: 1524-1532 (2000); Halbert et al., J. VIROL. 75:6615-6624 (2001); and Auricchio et al., HUM. MOLEC. GENET. 10:3075-3081 (2001), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
[0222] Also useful in conjunction with the compositions and methods described herein are pseudotyped rAAV vectors. Pseudotyped vectors include AAV vectors of a given serotype pseudotyped with a capsid gene derived from a serotype other than the given serotype e.g., AAV1, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, or AAV9, among others). For example, a representative pseudotyped vector is an AAV2 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 8 or AAV serotype 9. Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for example, in Duan et al., J. VIROL. 75:7662-7671 (2001); Halbert et al., J. VIROL. 74: 1524-1532 (2000); Zolotukhin et al., METHODS, 28: 158-167 (2002); and Auricchio et al., HUM. MOLEC. GENET., 10:3075-3081 (2001).
[0223] AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions. For example, suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types. The construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. VIROL. 74:8635- 45 (2000).
[0224] As used herein, artificial AAV capsids may be used. Such an artificial capsid may be generated by any suitable technique using a selected AAV sequence (e.g., a fragment of a VP1 capsid protein) in combination with heterologous sequences which may be obtained from a different selected AAV serotype, non-contiguous portions of the same AAV serotype, from a non-AAV viral source, or from a non-viral source. An artificial AAV serotype may be, without limitation, a pseudotyped AAV, a chimeric AAV capsid, a recombinant AAV capsid, or a “humanized” AAV capsid.
[0225] Other rAAV virions that can be used in methods of the invention include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., NAT. GENET., 25:436-439 (2000); and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001).
[0226] In some embodiments, the capsid is modified to improve therapy. The capsid may be modified using conventional molecular biology techniques. For example, in some embodiments, the capsid is modified for minimized immunogenicity, better stability and particle lifetime, efficient degradation, and/or accurate delivery of the heterologous coding sequence or a functional fragment or variant thereof to the nucleus. In some embodiments, the modification or mutation is an amino acid deletion, insertion, substitution, or any combination thereof in a capsid protein. A modified polypeptide may include 1, 2, 3, 4, 5, up to 10, or more amino acid substitutions and/or deletions and/or insertions. In some embodiments, one or more amino acid substitutions are introduced into one or more of VP1, VP2, and VP3. In one aspect, a modified capsid protein includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 conservative or nonconservative substitutions relative to the wild-type polypeptide.
[0227] In another aspect, the modified capsid polypeptide of the disclosure includes modified sequences, wherein such modifications can include both conservative and non-conservative substitutions, deletions, and/or additions, and typically include peptides that share at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 87%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to the corresponding wild-type capsid protein.
[0228] In some embodiments, the vector includes a “stuffer” or “filler” sequence to bring the total size of the nucleic acid sequence between the two ITRs to between 2 and 5 kB. For example, in some embodiments, any of the vectors disclosed herein may include a spacer, e.g., a DNA sequence interposed between the promoter and the Rep gene ATG start site. In some embodiments, the spacer may be a random sequence of nucleotides, or alternatively, it may encode a gene product, such as a marker gene. In some embodiments, the spacer may contain genes which typically incorporate start/stop and poly A sites. In some embodiments, the spacer may be a non-coding DNA sequence from a prokaryote or eukaryote, a repetitive non-coding sequence, a coding sequence without transcriptional controls or a coding sequence with transcriptional controls. In some embodiments, the spacer is a phage ladder sequences or a yeast ladder sequence. In some embodiments, the spacer is of a size sufficient to reduce expression of the Rep78 and Rep68 gene products, leaving the Rep52, Rep40 and Cap gene products expressed at normal levels. In some embodiments, the length of the spacer may therefore range from about 10 bp to about 10.0 kbp, such as in the range of about 100 bp to about 8.0 kbp. In some embodiments, the spacer is less than 2 kbp in length.
[0229] The rAAV vector may also contain additional sequences, for example, from an adenovirus, which assist in effecting a desired function for the vector. Such sequences include, for example, those which assist in packaging the rAAV vector in adenovirus-associated virus particles.
[0230] rAAV vectors useful in the methods of the disclosure are further described in PCT publication No. WO 2015/168666 and PCT publication no. WO 2014/011210, the contents of which are incorporated by reference herein.
[0231] In some embodiments, the rAAV particle is a single stranded AAV (ssAAV). Accordingly, in some embodiments, the compact bidirectional promoters described herein allow for the use of ssAAV vectors with genes previously thought to be too large to fit into an ssAAV (FIG. 2). Alternatively, for example, in some embodiments, the rAAV particle is a self- complementary AAV (sc-AAV) (see e.g., US 2012/0141422 which is incorporated herein by reference). Self-complementary vectors package an inverted repeat genome that can fold into dsDNA without the requirement for DNA synthesis or base-pairing between multiple vector genomes. Because scAAV have no need to convert the single-stranded DNA (ssDNA) genome into double -stranded DNA (dsDNA) prior to expression, they are more efficient vectors. However, the trade-off for this efficiency is the loss of half the coding capacity of the vector, scAAV are useful for small protein-coding genes (e.g., up to about 1.7 kb) and any currently available RNA-based therapy.
(i) Production of rAA V vectors
[0232] Numerous methods are known in the art for production of rAAV vectors, including transfection, stable cell line production, and infectious hybrid virus production systems which include adenovirus- AAV hybrids, herpesvirus-AAV hybrids (Conway, Je et al., (1997). VIROLOGY 71(11):8780-8789) and baculovirus-AAV hybrids. rAAV production cultures for the production of rAAV virus particles all require; 1) suitable host cells, including, for example, human-derived cell lines such as HeLa, A549, or 293 cells, or insect-derived cell lines such as SF-9, in the case of baculovirus production systems; 2) suitable helper virus function, provided by wild-type or mutant adenovirus (such as temperature sensitive adenovirus), herpes virus, baculovirus, or a plasmid construct providing helper functions; 3) AAV Rep and Cap genes and gene products; 4) a transgene (such as a transgene including a heterologous coding sequence (e.g. CFTR, ATP2B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, and SLC6A1, or a functional fragment or variant thereof) flanked by at least one AAV ITR sequence; and 5) suitable media and media components to support rAAV production. Suitable media known in the art may be used for the production of rAAV vectors. These media include, without limitation, media produced by Hyclone Laboratories and JRH including Modified Eagle Medium (MEM), Dulbecco’s Modified Eagle Medium (DMEM), custom formulations such as those described in U.S. Patent No. 6,566,118, and Sf-900 II SFM media as described in U.S. Patent No. 6,723,551, each of which is incorporated herein by reference in its entirety, particularly with respect to custom media formulations for use in production of recombinant AAV vectors.
[0233] The rAAV particles can be produced using methods known in the art. See, e.g., U.S. Pat. Nos. 6,566,118; 6,989,264; and 6,995,006. In practicing the disclosure, host cells for producing rAAV particles include mammalian cells, insect cells, plant cells, microorganisms, and yeast. Host cells can also be packaging cells in which the AAV Rep and Cap genes are stably maintained in the host cell or producer cells in which the AAV vector genome is stably maintained. Exemplary packaging and producer cells are derived from 293, A549, or HeLa cells. AAV vectors are purified and formulated using standard techniques known in the art.
[0234] Recombinant AAV particles are generated by transfecting producer cells with a plasmid (cv.s-plasmid) containing a rAAV genome including a transgene flanked by the 145 nucleotide- long AAV ITRs and a separate construct expressing the AAV Rep and Cap genes in trans. In addition, adenovirus helper factors such as El A, E1B, E2A, E40RF6, and VA RNAs may be provided by either adenovirus infection or by transfecting a third plasmid providing adenovirus helper genes into the producer cells. Producer cells may be HEK293 cells. Packaging cell lines suitable for producing AAV vectors may be readily accomplished given readily available techniques (see e.g., U.S. Pat. No. 5,872,005). The helper factors provided will vary depending on the producer cells used and whether the producer cells already carry some of these helper factors.
[0235] In some embodiments, rAAV particles may be produced by a triple transfection method, such as the exemplary triple transfection method provided infra. Briefly, a plasmid containing a Rep gene and a Cap gene, along with a helper adenoviral plasmid, may be transfected (e.g., using the calcium phosphate method) into a cell line (e.g., HEK-293 cells), and virus may be collected and optionally purified. [0236] In some embodiments, rAAV particles may be produced by a producer cell line method, such as the exemplary producer cell line method provided infra (see also (referenced in Martin et al., (2013) HUMAN GENE THERAPY METHODS 24:253-269). Briefly, a cell line (e.g., a HeLa cell line) may be stably transfected with a plasmid containing a Rep gene, a Ccap gene, and a promoter-transgene sequence. Cell lines may be screened to select a lead clone for rAAV production, which may then be expanded to a production bioreactor and infected with an adenovirus (e.g., a wild-type adenovirus) as helper to initiate rAAV production. Virus may subsequently be harvested, adenovirus may be inactivated (e.g., by heat) and/or removed, and the rAAV particles may be purified.
[0237] In some aspects, a method is provided for producing any rAAV particle as disclosed herein including: (a) culturing a host cell under a condition that rAAV particles are produced, wherein the host cell includes (i) one or more AAV package genes, wherein each said AAV packaging gene encodes an AAV replication and/or encapsidation protein; (ii) a rAAV provector including a nucleic acid encoding a therapeutic polypeptide and/or nucleic acid as described herein flanked by at least one AAV ITR; and (iii) an AAV helper function; and (b) recovering the rAAV particles produced by the host cell. In some embodiments, said at least one AAV ITR is selected from the group consisting of AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAVrh8, AAVrh8R, AAV9, AAV10, AAVrhlO, AAV1 1, AAV 12, AAV2R471 A, AAV DJ, a goat AAV, bovine AAV, or mouse AAV or the like. In some embodiments, the encapsidation protein is an AAV2 encapsidation protein.
[0238] Suitable rAAV production culture media of the present disclosure may be supplemented with serum or serum-derived recombinant proteins at a level of 0.5-20 (v/v or w/v). Alternatively, as is known in the art, rAAV vectors may be produced in serum-free conditions which may also be referred to as media with no animal-derived products. One of ordinary skill in the art may appreciate that commercial or custom media designed to support production of rAAV vectors may also be supplemented with one or more cell culture components know in the art, including without limitation glucose, vitamins, amino acids, and/or growth factors, in order to increase the titer of rAAV in production cultures.
[0239] rAAV production cultures can be grown under a variety of conditions (over a wide temperature range, for varying lengths of time, and the like) suitable to the particular host cell being utilized. As is known in the art, rAAV production cultures include attachment-dependent cultures which can be cultured in suitable attachment-dependent vessels such as, for example, roller bottles, hollow fiber filters, microcarriers, and packed-bed or fluidized-bed bioreactors. rAAV vector production cultures may also include suspension-adapted host cells such as HeLa, 293, and SF-9 cells which can be cultured in a variety of ways including, for example, spinner flasks, stirred tank bioreactors, and disposable systems such as the Wave bag system.
[0240] rAAV vector particles of the disclosure may be harvested from rAAV production cultures by lysis of the host cells of the production culture or by harvest of the spent media from the production culture, provided the cells are cultured under conditions known in the art to cause release of rAAV particles into the media from intact cells, as described more fully in U.S. Patent No. 6,566,118. Suitable methods of lysing cells are also known in the art and include, for example, multiple freeze/thaw cycles, sonication, microfluidization, and treatment with chemicals, such as detergents and/or proteases.
[0241] In a further embodiment, the rAAV particles are purified. The term “purified” as used herein includes a preparation of rAAV particles devoid of at least some of the other components that may also be present where the rAAV particles naturally occur or are initially prepared from. Thus, for example, isolated rAAV particles may be prepared using a purification technique to enrich it from a source mixture, such as a culture lysate or production culture supernatant. Enrichment can be measured in a variety of ways, such as, for example, by the proportion of DNase-resistant particles (DRPs) or genome copies (gc) present in a solution, or by infectivity, or it can be measured in relation to a second, potentially interfering substance present in the source mixture, such as contaminants, including production culture contaminants or in-process contaminants, including helper virus, media components, and the like.
[0242] In some embodiments, the rAAV production culture harvest is clarified to remove host cell debris. In some embodiments, the production culture harvest is clarified by filtration through a series of depth filters including, for example, a grade DOHC Millipore Millistak+ HC Pod Filter, a grade A1HC Millipore Millistak+ HC Pod Filter, and a 0.2 pm Filter Opticap XL 10 Millipore Express SHC Hydrophilic Membrane filter. Clarification can also be achieved by a variety of other standard techniques known in the art, such as, centrifugation or filtration through any cellulose acetate filter of 0.2 pm or greater pore size known in the art.
[0243] In some embodiments, the rAAV production culture harvest is further treated with Benzonase® to digest any high molecular weight DNA present in the production culture. In some embodiments, the Benzonase® digestion is performed under standard conditions known in the art including, for example, a final concentration of 1-2.5 units/mL of Benzonase® at a temperature ranging from ambient to 37 °C for a period of 30 minutes to several hours.
[0244] rAAV particles may be isolated or purified using one or more of the following purification steps: equilibrium centrifugation; flow-through anionic exchange filtration; tangential flow filtration (TFF) for concentrating the rAAV particles; rAAV capture by apatite chromatography; heat inactivation of helper virus; rAAV capture by hydrophobic interaction chromatography; buffer exchange by size exclusion chromatography (SEC); nanofiltration; and rAAV capture by anionic exchange chromatography, cationic exchange chromatography, or affinity chromatography. These steps may be used alone, in various combinations, or in different orders. In some embodiments, the method includes all the steps in the order as described below. Methods to purify rAAV particles are found, for example, in Xiao el al., (1998) Journal of Virology 72:2224-2232; U.S. Patent Numbers 6,989,264 and 8,137,948; and WO 2010/148143. [0245] Cells may also be transfected with a vector (e.g., helper vector) which provides helper functions to the AAV. The vector providing helper functions may provide adenovirus functions, including, e.g., Ela, Elb, E2a, and E40RF6. The sequences of adenovirus gene providing these functions may be obtained from any known adenovirus serotype, such as serotypes 2, 3, 4, 7, 12, and 40, and further including any of the presently identified human types known in the art. Thus, in some embodiments, the methods involve transfecting the cell with a vector expressing one or more genes necessary for AAV replication, AAV gene transcription, and/or AAV packaging. [0246] In some embodiments, such a stable host cell will contain the required component(s) under the control of an inducible promoter. Alternatively, the required component(s) may be under the control of a constitutive promoter. In still another alternative, a selected stable host cell may contain selected component(s) under the control of a constitutive promoter and other selected component(s) under the control of one or more inducible promoters. For example, a stable host cell may be generated which is derived from 293 cells (which contain El helper functions under the control of a constitutive promoter), but which contains the Rep and/or Cap proteins under the control of inducible promoters. Still other stable host cells may be generated by one of skill in the art.
[0247] The minigene, Rep sequences, Cap sequences, and helper functions required for producing the rAAV of the disclosure may be delivered to the packaging host cell in the form of any genetic element which transfers the sequences. The selected genetic element may be delivered by any suitable method known in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY.
VIII. Pharmaceutical Compositions
[0248] The present disclosure provides, among other things, pharmaceutical compositions including a nucleic acid including a compact bidirectional promoter, or a functional fragment or variant thereof, as described herein and a heterologous coding sequence, or a functional fragment or variant thereof, and a pharmaceutically acceptable carrier. The pharmaceutical compositions may be suitable for any mode of administration described herein.
[0249] In some embodiments, the pharmaceutical compositions including a nucleic acid described herein and a pharmaceutically acceptable carrier is suitable for administration to a human subject. Such carriers are well known in the art (see, e.g., Remington’s Pharmaceutical Sciences, 15th Edition, pp. 1035-1038 and 1570-1580). Such pharmaceutically acceptable carriers can be sterile liquids, such as water and oil, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, and the like. Saline solutions and aqueous dextrose, polyethylene glycol (PEG) and glycerol solutions can also be employed as liquid carriers, particularly for injectable solutions. The pharmaceutical composition may further include additional ingredients, for example preservatives, buffers, tonicity agents, antioxidants and stabilizers, nonionic wetting or clarifying agents, viscosityincreasing agents, and the like. The pharmaceutical compositions described herein can be packaged in single unit dosages or in multi-dosage forms. The compositions are generally formulated as sterile and substantially isotonic solution.
[0250] For example, in some embodiments, the pharmaceutical compositions of the disclosure include a pharmaceutically acceptable carrier. For example, in some embodiments, the pharmaceutical compositions of the disclosure include PBS. In some embodiments, the pharmaceutical compositions of the disclosure include pluronic. In some embodiments, the pharmaceutical compositions of the disclosure include PBS, NaCl, and pluronic. In some embodiments, the vectors are administered by intravitreal injection in a solution of PBS, with additional NaCl and pluronic.
[0251] In one embodiment, the nucleic acid including the desired compact bidirectional promoter, or a functional fragment or variant thereof, as described herein and the desired heterologous coding sequence or a functional fragment or variant thereof for use in target cells, as detailed above, is formulated into a pharmaceutical composition intended for oral, inhalation, intranasal, intratracheal, intravenous, intramuscular, subcutaneous, intradermal, or other parental routes of administration. Such formulation involves the use of a pharmaceutically and/or physiologically acceptable vehicle or carrier, such as buffered saline or other buffers, e.g., HEPES, to maintain pH at appropriate physiological levels, and, optionally, other medicinal agents, pharmaceutical agents, stabilizing agents, buffers, carriers, adjuvants, or diluents. For injection, the carrier will typically be a liquid. Exemplary physiologically acceptable carriers include sterile, pyrogen-free water and sterile, pyrogen-free, and phosphate buffered saline. A variety of such known carriers are provided in U.S. Patent No. 7,629,322, incorporated herein by reference. In some embodiments, the carrier is an isotonic sodium chloride solution. In some embodiments, the carrier is balanced salt solution. In some embodiments, the carrier includes tween. If the virus is to be stored long-term, it may be frozen in the presence of glycerol or Tween20. In some embodiments, the pharmaceutically acceptable carrier includes a surfactant, such as perfluorooctane (Perfluoron liquid). Routes of administration may be combined, if desired.
[0252] Pharmaceutical compositions useful in the methods of the disclosure are further described in PCT publication No. WO 2015/168666 and PCT publication No. WO 2014/011210, the contents of which are incorporated by reference herein.
IX. Methods of Treatment or Prophylaxis
[0253] Provided herein are various methods of preventing, treating, arresting progression of or ameliorating disease and disorders. Generally, the methods include administering to a subject, e.g., a mammalian subject, in need thereof, an effective amount of a composition including a vector described above (e.g., an rAAV), carrying a heterologous coding sequence or a functional fragment or variant thereof under the control of a compact bidirectional promotor and, optionally, regulatory sequences which express the product of the gene in target cells of a subject, and a pharmaceutically acceptable carrier. Any of the vectors, such as AAV (e.g., ssAAV e.g., scAAV) described herein are useful in the methods described below.
[0254] The disclosure also provides a method of treating a subject having a disease, including the step of administering to the subject a vector of the disclosure.
[0255] In some embodiments, the disclosure provides a method of treating a subject having a disease as described herein, comprising the step of administering to the subject a vector of the disclosure. In some embodiments, the vector is administered at a dose between 2.5 x 1010 vg/kg and 1.4 x 1011 vg/kg. In some embodiments, the vectors are administered at a dose between 1.0 x 1011 vg/kg and 1.5 x 1013 vg/kg. In some embodiments, the vectors are administered at a dose between 1.0 x 1011 vg/kg and 1.5 x 1012 vg/kg. In some embodiments, the vectors are administered at a dose of about 1.4 x 1012. In some embodiments, the vectors are administered at a dose of 1.4 x 1012 vg/kg.
[0256] In some embodiments, the pharmaceutical compositions of the disclosure comprise a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical compositions of the disclosure comprise PBS. In some embodiments, the pharmaceutical compositions of the disclosure comprise pluronic. In some embodiments, the pharmaceutical compositions of the disclosure comprise PBS, NaCl, and pluronic. [0257] In some embodiments, any of the treatment and/or prophylactic methods disclosed herein are applied to a subject. In some embodiments, the subject is a mammal. In some embodiments, the subject is a human.
[0258] In some embodiments, the human is a newborn, an infant, child, pre-adolescent, adolescent, or adult.
A. Reduced Dosing Methods Using sc AAV
[0259] It has additionally been discovered that the compact bidirectional promoters described herein allow for the use of scAAV vectors with genes previously thought to be too large to fit into an scAAV (see, FIG. 3). scAAV vectors are about half the size of wild-type vectors and can package a double-stranded, hairpin-like genome that is self-complementary. (See, e.g., Wang et al. (2003) GENE THERAPY 10:2105-2111.) Because the genome is self-complementary, the vector is able to circumvent the single-stranded to double-stranded conversion that takes place for transcriptional activation to occur. This conversion is time-consuming, and causes a delay in expression of a transgene (e.g., a therapeutic coding sequence) which is disadvantageous for applications that require immediate activity. Use of scAAV vectors can reduce the amount of vector (z.e., the dosing) needed, thereby reducing toxicity which can be caused by large doses of AAV.
[0260] Accordingly, the disclosure also provides a method of administering an scAAV vector including a therapeutic coding sequence at a reduced dose for treating a disease treatable by the therapeutic coding sequence. For example, the method may include administering to a subject a scAAV including a compact bidirectional promoter operably linked to the therapeutic coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp (e.g., less than about 800 bp, less than about 600 bp, less than about 400 bp, or less than about 200 bp) and is heterologous to the therapeutic coding sequence, wherein the scAAV vector is administered at a reduced dose as compared to the therapeutically effective dose for an ssAAV vector including the therapeutic coding sequence.
[0261] In some embodiments, the therapeutic coding sequence encodes a protein that is from about 450 amino acids to about 750 amino acids in size. For example, the therapeutic coding sequence can encode a protein from about 450 to about 550 amino acids, about 450 to about 650 amino acids, about 550 to about 650 amino acids, about 550 to about 750 amino acids, or about 650 to about 750 amino acids in size. In some embodiments, the therapeutic coding sequence comprises F8, F9, PIGA, SGSH, G6PC, NAGLU, CLN3, GBA, IDS, GAA, OTC, GLA, CAH, IDUA, LAMP2, CLN1, ATP7B, A1AT, GALT, LMNA, ENPP1, CLN2, CLN5, CLN7/MFSD8, AGU, MMUT, NPC2, ABCB11, ABCB4, ASS1, SMN1, AADC, MTM1, GBA1, GRN, GAD, GALGT2, SGCB, GDNF, ASP A, GLB1, GALC, SGCA, DYSF, HEXA, GAN, FXN, ARSA, MECP2, IGHMBP2, UBE3A, CDKL5, PGRN, FKRP, CYP46A1, OPMD, Cavl, neuropeptide Y/Y2, SCN1A, SHANK3, APOE2(R158C), FMRI, UPF1, CMT4J, MFN2, PRKN, CAPN3, NTF3, AN05, SGCG, EMD, SURF1, GBE1, FMRP, RPE65, RPGR, CHM, ND4, CNGB3, PDE6b, CFI, CNGA3, GUCY2D, RLBP1, CD59, 0PN1LW, CFH, MY07A, RSI, ABCA4, ND1, BEST1, RHO, LCA5, RDH12, NMNAT1, SERPING1, AQP1, PPP1R1A, IL-IRa, CFTR, OTOF, CLRN1, GJB2, ALPL, TMC1, STRC, ATOH1, or MYBPC3.
[0262] For example, in some embodiments, the reduced dose is between about 10-fold and about 600-fold (e.g., about 11-fold and about 550-fold, about 12-fold and about 500-fold, about 13- fold and about 400-fold, about 14-fold and about 300-fold, about 15-fold and about 200-fold, about 20-fold and about 100-fold, or about 50-fold) lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is between about 11 -fold and about 550-fold lower than the therapeutically effective dose for an ssAAV vector.
[0263] In some embodiments, the reduced dose is between about 12-fold and about 500-fold lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is between about 13-fold and about 400-fold lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is between about 14-fold and about 300-fold lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is between about 15-fold and about 200-fold lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is between about 20-fold and about 100-fold lower than the therapeutically effective dose for an ssAAV vector. In some embodiments, the reduced dose is about 50-fold lower than the therapeutically effective dose for an ssAAV vector.
[0264] In some embodiments, the reduced dose is about 10-fold lower than the therapeutically effective dose for an ssAAV vector.
X. Kits
[0265] Any of the vectors disclosed herein may be assembled into a pharmaceutical or diagnostic or research kit to facilitate their use in therapeutic, diagnostic or research applications. A kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
[0266] The kit may be designed to facilitate use of the methods described herein by researchers and can take many forms. Each of the compositions of the kit, where applicable, may be provided in liquid form (e.g., in solution), or in solid form, e.g., a dry powder). In certain cases, some of the compositions may be constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species e.g., water or a cell culture medium), which may or may not be provided with the kit. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape or DVD), internet, and/or web-based communications. The written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which instructions can also reflects approval by the agency of manufacture, use, or sale for animal administration.
EXAMPLES
[0267] The following Examples are merely illustrative and are not intended to limit the scope or content of the invention in any way.
Example 1. Identification of Bidirectional Promoters
[0268] This Example describes the identification of compact bidirectional promoters (see exemplary promoter in FIG. 1) from genomic databases.
[0269] A custom python script was developed to identify bidirectional promoters from genomic annotation files (as outlined in FIG. 7). The steps below specify human annotations, but the script was used to identify bidirectional promoters from other genomes annotations and can similarly be applied to genome-wide transcription data files. First, the input data file was obtained: GRCh38_latest_genomic.gff was used for the human input file, which was an annotated file of the GRCh38 genome and GRCm39_vM27.gff3 was used for the mouse genome. The file was categorized by chromosome with each line pertaining to each region of interest in the genome with examples including genes, pseudogenes, and coding regions for protein-coding genes. The custom script iterated through every line in the file and stored the type of annotation. Once the relevant information had been stored from the input file, the genes were sorted by index on a per-chromosome basis. After sorting, the custom script identified regions in between transcription on the minus strand and transcription on the plus strand, defining the intervening region as a bidirectional promoter. Promoter boundaries can be further refined using the coding sequence (CDS) start for protein coding genes that are capable of expressing the at least one heterologous coding sequence in a target cell.
[0270] Using the approach, more than 9000 bidirectional promoters were identified, 332 of which had a length of no more than 1000 bp. 291 of these promoters were no more than 800 bp. 234 of these promoters were no more than 600 bp. 137 of these promoters were no more than 400 bp. 34 of these promoters were no more than 200 bp.
[0271] Compact bidirectional promoters identified using the method of Example 1 are provided at SEQ ID NOs: 1-800.
Example 2. Tissue expression of compact bidirectional promoters
[0272] Tissue expression for an exemplary compact bidirectional promoter identified in Example 1 was determined using expression databases for each protein coding gene flanking the bidirectional promoter. Specifically, tissue expression data was obtained using the Human Protein Atlas (HPA) and the Genotype-Tissue Expression (GTEx) databases. As shown in FIG. 4A and FIG. 4B, a compact bidirectional promoter flanked by COX15 and CUTC drives expression of CUTC in skin and tongue. Another exemplary bidirectional promoter is flanked by DYNLT2 and ERMARD, which drives expression of DYNLT2 in the testes. Yet another exemplary bidirectional promoter is flanked by BHMT2 and DMGDH, which shows the same tissue specificity in both orientations, which includes expression in both the kidney and liver. [0273] Tissue expression for a number of compact bidirectional promoter identified in Example 1 was determined as above, and the exemplary expression profiles are provided in FIGs. 5A-H and FIGs. 11A-26. For example, FIGs. 5A-H provides a set of graphs depicting the unique liver-, hepatocyte-, neuronal-, kidney tubular-, skeletal muscle-, cerebral cortex-, retina-, and rod photoreceptor-specific expression profiles of compact bidirectional promoters of less than 300 bp identified in Example 1.
[0274] FIGs. 6A-6D are a set of graphs depicting cell sub-type expression profiles in the lung for four exemplary compact bidirectional promoters of the disclosure.
Example 3. Transgene expression driven by compact bidirectional promoters
[0275] This Example describes the characterization of a library of compact bidirectional promoters for their capacity to drive gene expression usingluciferase reporters (e.g., Firefly luciferase and NANOLUC®) in cell lines. A normalized luciferase expression was quantified for compact bidirectional promoters of the disclosure and a benchmark against a control thymidine kinase (TK) promoter was determined. [0276] Promoter expression activity was assessed using a luciferase reporter assay. Characterization of the luciferase assay was performed, for example, by co-transfecting cells with a plasmid encoding Firefly luciferase and with a plasmid encoding NANOLUC® reporters. The luciferase reporters were under transcriptional control of standard promoters (e.g., TK). A standard curve of the normalized luciferase signal (Firefly signal/NANOLUC® signal) was generated using a transfection ratio, such as the following exemplary transfection ratios, 90 ng Firefly: 10 ng NANOLUC®, 99 ng Firefly: 1 ng NANOLUC®, and 100 ng Firefly:0.1 ng NANOLUC®. Establishing such a ratiometric luciferase reporter assay allowed the determination of promoter expression activity without cross-signal interference.
[0277] Compact bidirectional promoters of the disclosure (e.g., any one or more of the promoters having the nucleic acid sequence of SEQ ID NOs: 1-800), including Human M0RN5 (“p387;” e.g, SEQ ID NOs. 221 and/or 621), human RPL9 (“p389;” e.g, SEQ ID NOs. 300 and/or 700), human NDUFB9 (“p390;” e.g, SEQ ID NOs. 339 and/or 739), human RPS28 (“p391 e.g, SEQ ID NOs. 220 and/or 620), and human SLIRP (“p392;” e.g., SEQ ID NOs. 17 and/or 417), were evaluated for reporter expression in HeLa (FIG. 8), A549 (FIG. 9), and CFBE (FIG. 10) cell lines. Activity of the compact promoters, along with activity of a control Hl promoter (“p096”) and the standard TK promoter (“p322”) was plotted (FIGs. 8-10), showing that the strongest promoters exceed TK-controlled expression activity.
Example 4: In vivo Promoter Expression
[0278] This Example describes assessment of promoter activity and payload expression in vivo in mice. To demonstrate promoter activity, in vivo luminescence driven by the candidate promoter is examined. For example, a promoter-Luciferase reporter construct that is flanked by ITR sequences can be constructed, packaged into an AAV (e.g., scAAV), and delivered via intranasal administration to mice. Exemplary scAAV comprising a compact bidirectional promoter for testing include SEQ ID NOs. 812-818, having the CUTC promoter (SEQ ID NO. 80 and SEQ ID NO. 480 (e.g., SEQ ID NO. 812)), NDUFA7 promoter (SEQ ID NOs. 220 and SEQ ID NO. 620 (e.g., SEQ ID NO: 813)), and NDUFB9 promoter (SEQ ID NO. 339 and SEQ ID NO. 739 (e.g., SEQ ID NOs. 814-818)), respectively, and each comprising a MTM1 heterologous coding sequence. A time course of in vivo luciferase imaging can provide a direct readout of promoter activity and transgene expression in specific tissues of the mice.
Cloning and AA V6 Virus Production
[0279] A luciferase- AAV reporter construct (e.g., luciferase-scAAV reporter constructs) including a compact bidirectional promoter of the disclosure is generated using a plasmid transfection method, as known in the art. At 8 weeks of age, a group of mice will receive, for example, a single 50 pl intranasal instillation of either 2 x 1014 vg/kg AAV or sterile PBS.
In vivo Luciferase Activity
[0280] Mice are monitored for 32 weeks post-transfection to comprehensively assess peak luciferase expression and vector durability. In such an experiment, for example, mice can be injected intraperitoneally with 75 mg/kg D-luciferin in 100 pL of PBS and placed in a chamber of an imaging system under isoflurane anesthesia. 10 minutes post-injection, luminescent images can be acquired (Xenogen IVIS). In vivo luciferase expression enables following the kinetics of expression onset along with quantification of promoter activity without having to sacrifice the mice. A control vector driving luciferase expression from a control promoter (e.g., PGK1) can be used to compare tissue distribution and expression level. Tissue distribution can be examined over time to confirm that expression is not silenced as compared with the control promoter. As yet another demonstration of in vivo expression of a payload by a compact bidirectional promoter of the disclosure, relevant tissue samples (e.g., lungs, testes, and brain) from the mice may be collected and RT-qPCR or a Western Blot may be performed to validate gene and protein expression, respectively. For example, the kidney and liver of mice may be collected and determined that the genes BHMT2 and DMGDH, and their respectively encoded proteins, show elevated levels of expression following transfection with an AAV encoding a compact bidirectional promotor of the disclosure, as compared to control mice. Such experiments can be used to confirm in vivo payload expression.
INCORPORATION BY REFERENCE
[0281] The entire disclosure of each of the patent and scientific documents referred to herein is incorporated by reference for all purposes.
EQUIVALENTS
[0282] The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.

Claims

WHAT IS CLAIMED IS:
1. A nucleic acid comprising a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to at least one heterologous coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
2. The nucleic acid of claim 1, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 800 bp.
3. The nucleic acid of claim 1, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 600 bp.
4. The nucleic acid of claim 1, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 400 bp.
5. The nucleic acid of claim 1, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 200 bp.
6. The nucleic acid of claim 1, wherein the compact bidirectional promoter comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 1-800, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto.
7. The nucleic acid of any one of claims 1-6, wherein the compact promoter, or the functional fragment or the variant thereof, is operably linked to a 5' untranslated region (UTR).
8. The nucleic acid of any one of claims 1-7, wherein the compact promoter, or a functional fragment or variant thereof, is operably linked to a Kozak consensus sequence.
9. The nucleic acid of any one of claims 1-8, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, comprises at least 95%, at least 98%, at least 99%, at least 99.5% or 100% sequence identity to a naturally occurring mammalian promoter. The nucleic acid of any one of claims 1-9, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to only one heterologous coding sequence. The nucleic acid of any one of claims 1-9, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to two heterologous coding sequences positioned on opposite sides of the promoter. The nucleic acid of claim 11, wherein the two heterologous coding sequences comprise the same coding sequence. The nucleic acid of claim 11, wherein the two heterologous coding sequences comprise different coding sequences. The nucleic acid of any one of claims 1-13, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing the at least one heterologous coding sequence in a target cell. The nucleic acid of claim 14, wherein the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. The nucleic acid of any one of claims 11-13, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing each of the two heterologous coding sequences:
(a) in the same target cell or cells,
(b) in different target cells, or
(c) in a partially overlapping set of target cells. The nucleic acid of any one of claims 1-16, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing a luciferase reporter at a higher level than is a HSV thymidine kinase (TK) promoter. The nucleic acid of any one of claims 1-17, wherein the at least one coding sequence encodes cystic fibrosis transmembrane conductance regulator (CFTR), ATP7B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, SLC6A1, or a functional fragment or variant thereof. The nucleic acid of claim 18, wherein the at least one coding sequence is codon optimized, optionally wherein the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. An expression construct comprising the nucleic acid of any one of claims 1-19. A vector comprising the expression construct of claim 20, optionally wherein the vector is a plasmid, a DNA vector, an RNA vector, a virion, or a viral vector. The vector of claim 21, wherein the vector is a viral vector. The viral vector of claim 22, wherein the viral vector is an adeno-associated virus (AAV), lentivirus, adenovirus, simian virus 40, vaccinia virus, measles virus, herpes virus, or poxvirus. The vector of claim 23, wherein the viral vector is an AAV vector. The vector of claim 24, wherein the AAV is a single-stranded AAV (ssAAV) vector. The vector of claim 25, wherein the AAV is a self-complementary AAV (scAAV) vector. A method of expressing a heterologous coding sequence in a cell, the method comprising transfecting the cell with the expression construct of claim 20 or the vector of any one of claims 21-26. A method of treating a disease in a subject in need thereof, the method comprising administering to the subject the vector of any one of claims 21-26. A method of expressing at least one heterologous coding sequence in a target cell, the method comprising introducing into a subject a nucleic acid comprising a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to at least one heterologous coding sequence, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the bidirectional promoter is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter in the cell. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 800 bp. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 600 bp. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 400 bp. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 200 bp. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to a 5' UTR. The method of claim 29, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to a Kozak consensus sequence. The method of claim 29, wherein the compact bidirectional promoter comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 1-800, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. The method of any one of claims 29-36, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, comprises at least 95%, at least 98%, at least 99%, at least 99.5% or 100% sequence identity to a naturally occurring mammalian promoter. The method of any one of claims 29-37, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to only one heterologous coding sequence. The method of any one of claims 29-37, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is operably linked to two heterologous coding sequences positioned on opposite sides of the promoter. The method of claim 39, wherein the two heterologous coding sequences comprise the same coding sequence. The method of claim 39, wherein the two heterologous coding sequences comprise different coding sequences. The method of any one of claims 29-41, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses the at least one heterologous coding sequence in a target cell. The method of claim 42, wherein the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. The method of any one of claims 39-43, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing each of the two heterologous coding sequences:
(a) in the same target cell or cells,
(b) in different target cells, or
(c) in a partially overlapping set of target cells. The method of any one of claims 29-44, wherein the compact bidirectional promoter expresses a luciferase reporter at a higher level than is a HSV TK promoter. The method of any one of claims 29-45, wherein the at least one coding sequence encodes CFTR, ATP7B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, F0XG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, SLC6A1, or a functional fragment or variant thereof. The method of any one of claims 29-46, wherein the at least one coding sequence is codon optimized, optionally wherein the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. A method of expressing two heterologous coding sequences in different target cells, the method comprising introducing into a subject a nucleic acid comprising a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to the two heterologous coding sequences positioned on opposite sides of the compact bidirectional promoter in the cell, wherein the compact bidirectional promoter is less than about 1000 bp, and wherein the compact bidirectional promoter promotes transcription of one of the coding sequences in a first target cell and promotes transcription of the other coding sequence in a second target cell. The method of claim 48, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 800 bp. The method of claim 48, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 600 bp. The method of claim 48, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 400 bp. The method of claim 48, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 200 bp. The method of claim 48, wherein the compact promoter, or the functional fragment or the variant thereof, is operably linked to a 5' UTR. The method of claim 48, wherein the compact promoter, or the functional fragment or the variant thereof, is operably linked to a Kozak consensus sequence. The method of any one of claims 48-54, wherein the compact bidirectional promoter comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 1-800, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. The method of any one of claims 48-55, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, comprises at least 95%, at least 98%, at least 99%, at least 99.5% or 100% sequence identity to a naturally occurring mammalian promoter. The method of any one of claims 48-56, wherein the two heterologous coding sequences comprise the same coding sequence. The method of any one of claims 48-57, wherein the two heterologous coding sequences comprise different coding sequences. The method of any one of claims 48-58, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses the at least one heterologous coding sequence in a target cell. The method of claim 59, wherein the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. The method of any one of claims 48-60, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is capable of expressing each of the two heterologous coding sequences in a partially overlapping set of target cells. The method of any one of claims 48-61, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses a luciferase reporter at a higher level than is a HSV TK promoter. The method of any one of claims 48-62, wherein the at least one coding sequence encodes CFTR, ATP7B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, SLC6A1, or a functional fragment or variant thereof. The method of any one of claims 48-63, wherein the at least one coding sequence is codon optimized, optionally wherein the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. A method of administering an scAAV vector comprising a therapeutic coding sequence at a reduced dose for treating a disease treatable by the therapeutic coding sequence, the method comprising, administering to a subject a scAAV comprising a compact bidirectional promoter, or a functional fragment or variant thereof, operably linked to the therapeutic coding sequence, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is less than about 1000 bp and is heterologous to the therapeutic coding sequence, wherein the sc AAV vector is administered at a reduced dose as compared to the therapeutically effective dose for an ssAAV vector comprising the therapeutic coding sequence.
66. The method of claim 65, wherein the reduced dose is between about 10-fold and about 600-fold lower than the therapeutically effective dose for an ssAAV vector.
67. The method of claim 65, wherein the reduced dose is about 10-fold lower than the therapeutically effective dose for an ssAAV vector.
68. The method of claim 65, wherein the bidirectional promoter, or the functional fragment or the variant thereof, is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter.
69. The method of any one of claims 65-68, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 800 bp.
70. The method of any one of claims 65-68, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 600 bp.
71. The method of any one of claims 65-68, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 400 bp.
72. The method of any one of claims 65-68, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, is between about 30 bp and about 200 bp.
73. The method of any one of claims 65-68, wherein the compact bidirectional promoter comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 1-800, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. The method of any one of claims 65-73, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, comprises at least 95%, at least 98%, at least 99%, at least 99.5% or 100% sequence identity to a naturally occurring mammalian promoter. The method of any one of claims 65-74, wherein the compact bidirectional promoter, or the functional fragment or the variant thereof, expresses the therapeutic coding sequence in a target cell. The method of claim 75, wherein the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. The method of any one of claims 65-76, wherein the compact bidirectional promoter expresses a luciferase reporter at a higher level than is a HSV TK promoter. The method of any one of claims 65-77, wherein the therapeutic coding sequence encodes A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, SLC6A1, or a functional fragment or variant thereof. The method of any one of claims 65-78, wherein the therapeutic coding sequence is codon optimized, optionally wherein the codon optimized coding sequence comprises a nucleic acid sequence selected from any one of SEQ ID NOs: 819-836, or a nucleic acid sequence having at least about 90%, at least about 95%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity thereto. The method of any one of claims 65-79, wherein the therapeutic coding sequence is less than about 750 amino acids.
81. The method of any one of claims 65-80, wherein the therapeutic coding sequence is from about 350 amino acids to about 750 amino acids.
82. A method comprising: obtaining a genome file comprising information about the location of transcription start sites on the plus and minus strands of a chromosome; and identifying regions between a transcription start site on the minus strand of the chromosome and a transcription start site on the plus strand of the chromosome, thereby identifying one or more bidirectional promoters.
83. The method of claim 82, wherein the genome file comprising annotations categorized by chromosome, wherein the annotations comprise indices, wherein the indices comprise genes, pseudogenes, and coding regions for protein-coding genes, wherein each coding region comprises a transcription start site.
84. The method of claim 82 or 83, wherein the one or more bidirectional promoters are identified by obtaining a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to identify the regions between the transcription start site on the minus strand of a chromosome and the transcription start site on the plus strand of the chromosome.
85. The method of any one of claims 82-84, wherein the genome file comprising annotations comprises mammalian annotations.
86. The method of claim 83, wherein the mammalian annotations comprise human annotations or mouse annotations.
87. The method of claim 83, wherein the genome file comprising annotations is GRCh38_latest_genomic.gff or GRCm39_vM27.gff3.
88. The method of claim 87, wherein the genome file is GRCm39_vM27.gff3. The method of claim 82, wherein the one or more bidirectional promoters are less than about 1000 bp. The method of claim 89, wherein the one or more bidirectional promoters are between about 30 bp and about 800 bp. The method of claim 89, wherein the one or more bidirectional promoters are between about 30 bp and about 600 bp. The method of claim 89, wherein the one or more bidirectional promoters are between about 30 bp and about 400 bp. The method of claim 89, wherein the one or more bidirectional promoters are between about 30 bp and about 200 bp. The method of any one of claims 90-93, further comprising linking the one or more bidirectional promoters to at least one heterologous coding sequence. The method of any one of claims 90-93, further comprising linking the one or more bidirectional promoters to two heterologous coding sequences. The method of any one of claims 85-95, wherein the one or more bidirectional promoters is capable of promoting transcription of two coding sequences positioned on opposite sides of the promoter. The method of claim 89, wherein the compact promoter is operably linked to a 5' UTR. The method of claim 89, further comprising linking each of the one or more bidirectional promoters to only one heterologous coding sequence. The method of claim 89, further comprising linking each of the one or more bidirectional promoters to two heterologous coding sequences positioned on opposite sides of the promoter.
. The method of claim 99, wherein the two heterologous coding sequences comprise the same coding sequence. . The method of claim 99, wherein the two heterologous coding sequences comprise different coding sequences. . The method of claim 94, wherein the one or more bidirectional promoters are capable of expressing the at least one heterologous coding sequence in a target cell. . The method of claim 102, wherein the target cell is a lung cell, a pancreatic cell, a kidney cell, a muscle cell, a liver cell, a retinal cell, a neuron, a glial cell, an endothelial cell, or an epithelial cell. . The method of claim 94, wherein the one or more bidirectional promoters are capable of expressing each of the two heterologous coding sequences:
(a) in the same target cell or cells,
(b) in different target cells, or
(c) in a partially overlapping set of target cells. . The method of claim 89, wherein the one or more bidirectional promoters are capable of expressing a luciferase reporter at a higher level than is a HSV TK promoter. . The method of claim 94, wherein the at least one coding sequence encodes CFTR, ATP7B, ATP7A, AGL, CPS1, A1AT, ALPL, ARSA, BBS1, BEST1, CAH, CFH, CFI, CHM, CLN2, CLN7, CNGA3, CYP46A1, F9, FKRP, FMRI, FMRP, FOXG1, GAD, GALC, GALGT2, GBA1, GBE1, GLB1, GRN, HEXA, HTRA1, IDS, IDUA, LAMP2, LCA5, MECP2, MFN2, MMUT, MTM1, NAGLU, ND4, PAH, PIGA, PRKN, RPE65, SERPING1, SGSH, SLC13A5, or SLC6A1.
PCT/US2023/073367 2022-09-02 2023-09-01 Compact bidirectional promoters for gene expression WO2024050547A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263403571P 2022-09-02 2022-09-02
US63/403,571 2022-09-02

Publications (2)

Publication Number Publication Date
WO2024050547A2 true WO2024050547A2 (en) 2024-03-07
WO2024050547A3 WO2024050547A3 (en) 2024-05-16

Family

ID=90098786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/073367 WO2024050547A2 (en) 2022-09-02 2023-09-01 Compact bidirectional promoters for gene expression

Country Status (1)

Country Link
WO (1) WO2024050547A2 (en)

Similar Documents

Publication Publication Date Title
US11034974B2 (en) Hairpin MRNA elements and methods for the regulation of protein translation
AU2018337833B2 (en) Adeno-associated virus variant capsids and methods of use thereof
AU2016362317B2 (en) Scalable methods for producing recombinant Adeno-Associated Viral (AAV) vector in serum-free suspension cell culture system suitable for clinical use
KR102373765B1 (en) Capsid-free aav vectors, compositions, and methods for vector production and gene delivery
JP5911069B2 (en) Adeno-associated virus (AAV) syngeneic strains (clades), sequences, vectors containing them and uses thereof
US20200165632A1 (en) ENHANCING AGENTS FOR IMPROVED CELL TRANSFECTION AND/OR rAAV VECTOR PRODUCTION
JP2017510264A (en) Further improved AAV vectors produced in insect cells
CN110606874A (en) Variant AAV and compositions, methods and uses for gene transfer into cells, organs and tissues
CN106884014B (en) Adeno-associated virus inverted terminal repeat sequence mutant and application thereof
JP2021514659A (en) AAV chimera
WO2021113634A1 (en) Transgene cassettes designed to express a human mecp2 gene
TW201837173A (en) shRNA expression cassette, polynucleotide sequence carrying same and application thereof sequentially containing a DNA sequence for expressing shRNA and a filling sequence according to a sequence 5'-3'
JP6929230B2 (en) Nucleic acid molecules containing spacers and methods of their use
US20210301305A1 (en) Engineered untranslated regions (utr) for aav production
WO2021246909A1 (en) Codon-optimized nucleic acid encoding smn1 protein
WO2024050547A2 (en) Compact bidirectional promoters for gene expression
US20230049066A1 (en) Novel aav3b variants that target human hepatocytes in the liver of humanized mice
JP2023518415A (en) Compositions and methods for reducing reverse packaging of CAP and REP sequences in recombinant AAV
US20220177529A1 (en) Fusion protein for enhancing gene editing and use thereof
OA21075A (en) Codon-optimized nucleic acid that encodes SMN1 protein, and use thereof
WO2023025920A1 (en) Insect cell-produced high potency aav vectors with cns-tropism
WO2023144565A1 (en) Recombinant optimized mecp2 cassettes and methods for treating rett syndrome and related disorders
WO2024015877A2 (en) Novel aav3b capsid variants with enhanced hepatocyte tropism
JP2024506681A (en) Use of histidine-rich peptides as transfection reagents for rAAV and rBV production
CN117377500A (en) Adeno-associated viral vector capsids with improved tissue tropism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23861621

Country of ref document: EP

Kind code of ref document: A2